Published on 8/23/2026

Open Source Traffic Simulation: Learn & Compare

A night aerial view of a highway interchange with blurred streaks of moving cars, featuring “Traffic Simulation” text centered on a solid background block at the golden ratio position, rendered with sharp, high-contrast edges, while the roadway infrastructure remains softly blurred and minimal to keep focus on the text.

You shipped a feature on Friday. Staging looked clean. Synthetic load tests passed. Dashboards stayed green for the short window you watched them.

Then production traffic hit.

Not a neat stream of isolated requests, but overlapping sessions, retries from mobile clients, stale auth tokens, slow downstream calls, bursty cache misses, and user paths nobody modeled in the test plan. The system didn’t fail because your team skipped testing. It failed because you tested a simplified world.

That gap is where open source traffic simulation becomes useful. Not as an academic exercise, and not only for transportation engineers. For DevOps and QA teams, simulation is a practical way to bring more reality into pre-production testing. Sometimes that means modeling movement through a network with tools like SUMO or CityFlow. Sometimes it means replaying real HTTP behavior into staging so an application sees something much closer to live traffic.

Those are different jobs. Teams often blur them together and end up with the wrong tool, the wrong expectations, or both. If you’re testing a dispatch system, routing service, mobility platform, logistics backend, or any API exposed to messy user behavior, that distinction matters.

Why Your Tests Pass but Your Application Fails

A common failure pattern looks like this.

A team validates a release with unit tests, integration tests, and a standard load generator. The test users hit a handful of endpoints with predictable timing. Response times look acceptable. The deployment goes out. In production, latency rises on a workflow no one considered critical because users don’t follow the ideal path. They refresh pages, open multiple tabs, abandon carts halfway through, retry from mobile networks, and trigger side effects in combinations that staging never saw.

The issue usually isn’t a single slow endpoint. It’s interaction. Session state, cache churn, queue buildup, and uneven request distribution create a shape of traffic that synthetic tests often flatten away.

Production behavior is rarely “more traffic.” It’s usually “different traffic.”

That’s why open source traffic simulation matters. It helps teams test conditions that resemble reality instead of only testing capacity in the abstract. For some systems, reality starts with network movement. If you build software tied to roads, routing, ride sharing, fleet operations, or signal timing, network-level simulation can model vehicles, intersections, and congestion behavior before your app even receives a request. For other systems, reality starts at the edge of the application. Real headers, timing, payload structure, and session flow matter more than an idealized throughput target.

Both approaches solve a real problem. Neither replaces the other.

The practical goal is simple. Stop asking whether a service survives generic load. Start asking whether it survives the kinds of traffic your users and upstream systems generate.

Understanding Core Simulation Concepts

Most confusion around open source traffic simulation comes from treating very different methods as if they do the same thing. They don’t.

A digital graphic design titled Network Basics featuring interconnected translucent spheres with metallic highlights on a dark background.

Network-level simulation and application-level replay

Network-level simulation models movement through a system such as roads, intersections, transit routes, and pedestrian flows. Similar to city planning, its aim is to understand the implications of changes like a lane alteration, a signal plan adjustment, or a shift in demand across a corridor. The simulation engine creates and updates agents moving through a network.

Application-level replay models what your software receives. This approach resembles observing customers inside a store instead of only modeling traffic outside the parking lot. You capture real requests, then send them to another environment to see how the application behaves under realistic patterns.

These methods can overlap in a modern stack. A mobility platform might need both. The backend may depend on realistic HTTP request flows, while its dispatch or routing logic depends on conditions generated from a road network model.

Research and operational practice around transportation tools show how mature the network side has become. SUMO has been continuously available since 2001 and supports microscopic multi-modal simulation with vehicles, pedestrians, and public transport, plus traffic light scheduling, intersection modeling, and large-scale networks through direct OpenStreetMap import, according to this SUMO overview on arXiv.

Synthetic load and replayed traffic

Teams also confuse synthetic load generation with traffic replay.

Synthetic load is invented traffic. You define routes, concurrency, pacing, headers, and maybe a few parameter variations. This is useful when you need controlled experiments. It’s fast to set up, easy to scale, and good for isolating a bottleneck. It also tends to produce cleaner behavior than production.

Replay is observed traffic. You capture requests that real users or real systems generated, then send them to a test target. That gives you more realistic sequencing, payload variety, session behavior, and edge cases. It’s especially useful when the hard part isn’t raw volume but stateful interaction.

A simple way to picture it:

Approach	What it starts with	Best for	Weak spot
Synthetic load	Test scripts and assumptions	Controlled benchmarking	Misses real behavior patterns
HTTP replay	Real captured requests	Release validation and regression hunting	Needs careful masking and environment control
Network simulation	Road and agent models	Mobility, logistics, routing, signal scenarios	Doesn’t replace app-layer realism

Where open data fits

On the network side, open data is a big reason open source traffic simulation is practical. OpenStreetMap has been used effectively for realistic traffic scenarios for years. Research from the TU Berlin VSP group describes OSM as a critical source for simulators like MATSim and SUMO, with applications dating back to at least 2011, and notes that OSM data can be directly converted for SUMO-based studies through this OpenStreetMap for traffic simulation paper.

That matters for engineering teams because open data removes a common blocker. You can build geographically grounded scenarios without depending on proprietary base maps.

A practical decision rule

Use this rule when choosing an approach:

Pick network simulation when the question is about movement through space. Congestion, routes, demand shifts, signals, intersections, fleet flow.
Pick HTTP replay when the question is about how your application behaves under real request patterns. Auth, caching, database behavior, retries, queueing, and downstream service pressure.
Combine them when software behavior depends on both. That’s common in transport, logistics, delivery, and smart-city systems.

Practical rule: If your failure starts with “users behaved differently than expected,” replay is usually the first tool. If it starts with “the network conditions changed,” simulation usually comes first.

Comparing Prominent Open Source Simulation Tools

If you strip away the marketing language, the tool choice comes down to one question. Are you simulating a transport network or replaying application traffic?

A comparison infographic featuring three open source traffic simulation software tools: SUMO, VISSIM, and MATSim.

Open source traffic simulation tool comparison

Tool	Simulation Level	Primary Use Case	Data Source	Key Strength
SUMO	Microscopic network simulation	Urban traffic studies, signal logic, policy testing	OpenStreetMap and generated scenarios	Mature modeling breadth and large-scale network support
CityFlow	City-scale traffic simulation	Reinforcement learning and high-speed city scenarios	Synthetic and real-world road definitions, including OSM-based inputs	Fast execution for many simulation episodes
GoReplay	Application-level HTTP replay	Staging validation, shadow traffic, release testing	Captured live HTTP traffic	Real request behavior against non-production targets

For teams comparing broader testing options around application traffic, this roundup of open source load testing tools is useful because it separates script-driven generators from replay-based approaches.

SUMO when realism at the road level matters

SUMO is the old workhorse in this category. It has the depth people expect from a tool that has been around since 2001. It supports vehicles, pedestrians, public transport, traffic lights, intersections, and large imported road networks. It also gives you outputs engineers can use, such as travel times, speeds, emissions, fuel consumption, and noise levels, as described in the earlier linked SUMO reference.

What works well with SUMO:

Geographic grounding: You can import real road layouts from OpenStreetMap.
Scenario generation: It supports trip generation, assignment, and large network studies.
Policy and infrastructure testing: It’s well suited to route changes, signal experiments, and urban planning models.

What doesn’t work as well:

Fast onboarding for app teams: SUMO’s command-line orientation is powerful, but it asks for scripting comfort.
Direct app realism: It won’t tell you how your API gateway, auth middleware, or database pool behaves under real user request sequences.

SUMO is a strong choice when the behavior of a physical or transport network is the system you need to understand.

CityFlow when speed matters more than breadth

CityFlow is built for city-scale simulation and reinforcement learning workflows. According to the project site, it achieves over 20x faster performance than SUMO through a discrete-event engine and can handle scenarios with hundreds of intersections and tens of thousands of vehicles through this CityFlow project page.

That changes the trade-off.

If you’re training control policies or iterating through many episodes, throughput matters more than a broader traditional simulator feature set. CityFlow is attractive when you need repeated runs, fast experimentation, and a Python-friendly workflow. It’s less compelling if your work depends on the ecosystem and long operational history that make SUMO familiar in research and planning environments.

CityFlow is for teams asking, “How many simulation runs can we do today?” SUMO is for teams asking, “How detailed and transferable does this model need to be?”

ETFOMM when FHWA lineage matters

ETFOMM sits in a different place. It grew out of a U.S. Federal Highway Administration effort running from 2008 to 2017, inheriting 40 years of CORSIM traffic simulation algorithms while adding native 64-bit operation, explicit parallel computing, and support across Windows, Linux, and MacOS in the FHWA ETFOMM documentation.

Its strongest value is historical and methodological. If you work in U.S. transportation contexts and care about that lineage, ETFOMM deserves attention. A benchmark in the FHWA material reports approximately 50% better optimization on major streets than TRANSYT-7F and at least 25% reduction in control delay with 10% connected vehicle penetration.

That doesn’t make ETFOMM a replacement for application replay. It makes it useful when roadway behavior itself is the subject of the test.

Where HTTP replay fits

Application replay belongs in the same conversation because many failures happen above the transport network. A route planning API, booking backend, telematics collector, or dispatch service can be fed realistic network scenarios and still fail because of session handling, malformed edge traffic, or interactions between caching and retries.

That’s where GoReplay fits. It captures and replays live HTTP traffic into test environments. It’s not a road simulator. It’s an application-traffic simulator. That distinction is the whole point.

Use it when your question sounds like this:

Will the new release behave correctly under real request patterns?
Does the new cache layer handle production-like bursts?
Can the staging stack survive the same workflows users generate in production?

Don’t use it when your question is about corridor-level congestion or adaptive signal timing. Use the right abstraction level for the problem.

Your First Test with HTTP Replay Using GoReplay

The fastest useful exercise is simple. Capture HTTP traffic from one environment, replay it to another, and inspect what changes under realistic request flow.

Screenshot from https://goreplay.org/index.html#shadowing-animation

Start with shadow traffic, not destructive tests

The safest first run is shadowing. You copy live traffic to a non-production target and let the target process it without affecting users. That gives you realism without putting customer-facing state at risk.

If you need environment setup details first, the official guide for setting up GoReplay in testing environments is a good companion.

A minimal pattern looks like this:

gor --input-raw :8080 --output-http "http://staging.internal"

At a high level:

--input-raw captures live HTTP traffic from a listening interface or port.
--output-http sends the captured requests to your staging target.

That’s enough to prove the flow works. It’s not enough for a trustworthy test.

Add control before adding scale

A better first session narrows scope so you can verify behavior.

gor \
  --input-raw :8080 \
  --output-http "http://staging.internal" \
  --http-allow-url "/api/" \
  --output-http-header "X-Shadow-Test: 1"

This does three useful things:

It keeps the replay focused on a path pattern you actually want to inspect.
It marks replayed requests with a header so downstream logging can separate them from normal traffic.
It gives your team a clean way to check whether the staging environment is handling replay as expected.

A lot of teams jump straight to “high load.” That’s backwards. First prove routing, filtering, auth handling, and observability. Then increase intensity.

Replayed traffic is valuable because it preserves request shape and sequence. If you can’t trace those requests cleanly, you lose most of the benefit.

Watch for realistic bottlenecks

Traditional traffic models can expose where systems slow down under changing conditions. In transportation, the FHWA reports that advanced open-source models like ETFOMM can reduce control delays by over 25% with 10% connected vehicle penetration in the benchmark cited earlier. The software lesson is similar. Realistic traffic patterns reveal delays that idealized tests often hide.

During your first replay, look for:

Authentication drift: Tokens, cookies, or headers may not make sense in staging.
Environment coupling: A replayed request may depend on a service, queue, or object store your test setup forgot.
Cache artifacts: A warm production cache and a cold staging cache create very different latency profiles.
Write side effects: Even shadow traffic can be dangerous if the target writes to shared systems.

Once the basics work, introduce pacing controls and run longer windows. Short, bursty validation catches different defects than sustained replay.

Here’s a pattern for replaying from a stored capture:

gor \
  --input-file requests.gor \
  --output-http "http://staging.internal" \
  --output-http-header "X-Replay: stored"

That’s useful for regression checks. You preserve a known request set and replay it after each infrastructure or application change.

A quick walkthrough helps if you want to see the mechanics in action:

What a successful first run looks like

A good first replay doesn’t need dramatic output. It should answer a few plain questions:

Did the target accept the requests?
Did logs clearly identify replayed traffic?
Did the environment behave predictably?
Did any endpoint fail because of state, sequencing, or dependencies?

If the answer to the last question is yes, that’s progress. You found a production-shaped failure before users did.

Best Practices for Actionable Simulation Results

A replay or simulation run only matters if the result changes an engineering decision. Raw activity isn’t evidence. Clean, interpretable signals are.

A professional woman presenting data visualizations and growth charts on a large digital screen in an office.

Protect data before you protect performance

If you replay real traffic, treat privacy and sensitive data handling as part of the test design, not cleanup afterward. Production requests often include identifiers, tokens, and payload fields that have no business landing in a lower environment untouched.

Mask what you can. Strip what you don’t need. Keep the request shape useful while removing values that would create security or compliance problems.

A replay that exposes user data is a broken test, even if the latency graph looks perfect.

Define pass and fail up front

Many teams run replay jobs, stare at dashboards, and call the result “fine” because nothing caught fire. That’s not a test standard. Decide before the run what counts as success.

Use criteria such as:

Latency behavior: Which endpoints can slow down slightly, and which cannot?
Error patterns: Which status changes are acceptable in staging, and which indicate a real defect?
State integrity: Did replay create invalid writes, duplicate operations, or ordering issues?
Dependency health: Did queues, caches, and backing services remain within expected behavior?

Preserve sessions and ordering

Stateless replay is better than synthetic traffic for many problems, but stateful replay is where the serious defects show up. User flows often depend on sequencing. Login, browse, mutate, confirm. Break the order and you test fragments instead of journeys.

That’s especially important in distributed systems where one request seeds data for the next. Session-aware replay lets you validate the shape of behavior users create.

Field note: The closer your test preserves timing, ordering, and dependency context, the more likely it is to catch the bugs that only appear after deployment.

Don’t ignore integration latency

Complex simulation stacks often fail at the joins between tools. A recent arXiv paper discussing SUMO and CARLA integration notes frequent synchronization issues and reports co-simulation loops that can exceed 500ms, which is a practical warning for anyone building real-time or near-real-time test setups through this SUMO-CARLA co-simulation discussion.

The lesson applies beyond autonomous vehicle tooling. If your replay rig depends on proxies, collectors, message buses, synthetic data services, and observability agents, the harness can distort the result.

Watch for three common mistakes:

Testing cold systems only: First-run latency can be useful, but it’s not the whole story.
Forgetting downstream dependencies: Replaying against app servers while mocking away everything else often hides the bottleneck.
Measuring only averages: Tail behavior and failure clusters usually matter more than a comfortable mean.

Good open source traffic simulation isn’t just about generating pressure. It’s about producing evidence a team can act on with confidence.

Real-World Use Cases and Troubleshooting

A few patterns show up repeatedly in real engineering work.

One team wants to migrate a database without surprising users. They mirror production-shaped traffic into a staging stack wired to the new database version. The goal isn’t peak throughput. It’s to catch odd query paths, serialization mismatches, and request sequences that only occur under normal user behavior.

Another team operates a mobility or logistics platform. They use OpenStreetMap-based network models to build realistic regional scenarios, because open-source tools like MATSim and SUMO have long relied on OSM for accurate network topology, a practice documented in research going back to at least 2011 in the TU Berlin paper cited earlier. That helps them validate route logic and congestion-sensitive workflows before those conditions ever hit the application layer.

A third team is preparing for a seasonal event. They replay historical HTTP traffic against a new recommendation service while keeping the rest of the platform stable. Their biggest finding isn’t CPU saturation. It’s that one enrichment call creates queue contention when users hit certain paths in quick succession.

That’s the broader discipline of testing systems under realistic stress. If you want a deeper engineering framework around that work, this guide to Performance Engineering is worth reading because it treats performance as a lifecycle practice, not a one-off benchmark.

Troubleshooting when replay results look wrong

If staging doesn’t resemble production behavior, start with the environment before blaming the replay.

Headers don’t match: Many failures come from missing auth context, host routing differences, or security middleware reacting to unfamiliar request metadata.
State isn’t portable: Replayed requests may reference objects, sessions, or records that don’t exist in staging.
Dependencies are incomplete: If downstream queues, caches, or third-party integrations are mocked too aggressively, latency and error patterns won’t resemble reality.
Traffic shape is distorted: Captured traffic may have been filtered too narrowly, removing the requests that create session continuity or hotspot pressure.

When replayed requests get blocked, add explicit markers and inspect how gateways, WAF rules, and auth layers treat them. When performance diverges, compare cache state, feature flags, and data volume before comparing app code. In most investigations, the mismatch lives in test conditions, not in the replay mechanism itself.

Conclusion Building Resilient Systems with Realistic Tests

The core distinction is simple. Network simulation answers questions about movement through roads, intersections, fleets, and city systems. HTTP replay answers questions about how software behaves when real request patterns hit it. Teams get better results when they stop forcing one tool to do both jobs.

That matters even more in distributed platforms. As systems spread across services, queues, APIs, and event pipelines, realistic testing becomes less optional. A useful mental model comes from modern Microservices Architecture, where failures often emerge from interactions between components rather than from a single broken endpoint.

Open source traffic simulation provides engineering teams with practical advantages. SUMO and CityFlow help when the network itself is part of the problem. HTTP replay helps when production behavior is the problem. In many real systems, especially in logistics, mobility, and smart infrastructure, you’ll need both perspectives.

The teams that avoid fragile releases usually don’t test more in the abstract. They test closer to reality.

Frequently Asked Questions

Is replaying production traffic safe?

It can be, but only if you treat safety as part of the design. Mask sensitive fields, avoid shared side effects, and route replayed requests to isolated targets. For first runs, shadow traffic into non-production systems and mark replayed requests clearly in logs and traces.

Can open source traffic simulation help with mobile backends?

Yes. Mobile systems often benefit from replay because real devices generate retries, bursty reconnects, and inconsistent request timing that synthetic scripts miss. If the backend also depends on routing, location, or fleet conditions, pair replay with a network simulator instead of choosing only one.

Is HTTP replay the same as load testing?

No. Replay can generate load, but its bigger value is realism. It preserves request shape, sequence, and variability. Traditional load testing is still useful when you need tightly controlled benchmarking or a clean stress profile for one component.

What about WebSockets or non-HTTP traffic?

HTTP replay tools are strongest when the important application behavior is HTTP-based. For systems with significant non-HTTP traffic, you may need protocol-specific capture and replay tooling, or a broader test harness that combines replay with synthetic generation. Don’t assume an HTTP-focused method covers every interaction in your system.

Why doesn’t staging match production even with replay?

Usually because the environment differs in meaningful ways. Common causes include different caches, smaller datasets, missing dependencies, altered feature flags, and authentication behavior that doesn’t mirror production. Before judging the replay method, compare the surrounding conditions.

When should I use a network simulator instead of replay?

Use a network simulator when your question is spatial or operational. Traffic flow, signals, route changes, congestion, and multimodal movement are simulation problems. Use replay when your question is about application behavior under real request patterns. If your platform bridges both worlds, test both layers.

If you want to move beyond synthetic load and test with production-shaped HTTP behavior, GoReplay is a practical place to start. Capture real traffic, replay it safely into staging, and find the failures that scripted tests usually miss.