From Gridlock to Graphs: How CDOT is Revolutionizing Crash Analysis to Save Lives and Time

The Challenge: Untangling the Danger of Secondary Crashes

Every driver has felt the frustration of unexpected delays, and all too often, the cause is a traffic incident. In urban freeways, these incidents are responsible for about 60% of all delays. Beyond the inconvenience, there's a serious safety risk: the presence of a primary crash can make the risk of another collision up to six times higher. These "secondary crashes," which occur within the traffic queue of an initial incident, become increasingly probable the longer the first crash remains uncleared. For years, the Colorado Department of Transportation (CDOT) sought a better way to identify and analyze these events, but traditional methods were slow and manual. An early project to analyze just 34 miles of I-25 took 170 hours of computer processing and 40 hours of manual review, a process that simply could not scale to the entire state.

 

The Solution: A New Approach with Graph Theory

To overcome this challenge, CDOT embraced a cutting-edge solution rooted in a surprising field: 18th-century mathematics. The project team decided to treat the state's road network and traffic data as a massive graph, the same concept pioneered by Leonhard Euler to solve the famous "Bridges of Königsberg" problem. In this new model, each road segment at a specific time is a "node" in the graph. When a traffic slowdown is detected, "edges" are created to connect the affected nodes. Using this graph, a powerful algorithm can instantly determine if two separate crashes are linked by a continuous slowdown. By leveraging the immense power of the Google Cloud Compute Platform, this fully automated approach can process billions of data points with incredible speed.

 

The Benefits: Statewide Insight in a Fraction of the Time

The results of this new graph-based approach are transformative. 430 days to 1 hour and 21 minutes. The analysis of the entire state of Colorado, using seven years of data and 1.3 billion speed records, was completed in just 1 hour and 21 minutes.  Extrapolating the old method to this scale would have taken an estimated 430 days of combined computing and manual analysis. This represents a performance improvement of over 7,800 times and allows for a comprehensive, statewide understanding of secondary crash dynamics for the first time. These insights, visualized in a Power BI dashboard, help CDOT identify hotspot locations, understand the time and distance between related crashes, and ultimately develop strategies to clear primary incidents faster and prevent secondary ones from ever happening

 

Some of the Eye-Catching Outputs

A playback showing how slowdowns on I-25 South ebb and flow over a 12-hour period, leading to a cascade of 13 different crashes.

This playback visualizes a series of five crashes all occurring in the same area on the notorious Vail Pass, demonstrating a high-risk location.

An analysis showing a massive slowdown that begins in Denver on I-25 and stretches all the way into Wyoming, connecting five separate crash events over a huge distance.

Visualize as part of Business Intelligence Dashboard solutions

A Look Under the Hood

The dramatic leap in performance and capability came from a fundamental shift in technical strategy. We moved away from a linear, semi-manual process and embraced a fully automated, cloud-native, and parallelized approach.

The Old Way (Manual)

  • Technology: The original analysis relied on traditional tools like SQL Server and custom Python scripts.
  • Process: The workflow required analysts to manually review heat maps of traffic slowdowns to visually identify a relationship between crashes.
  • Performance: Analyzing four years of data for just 34 miles of I-25 took 170 hours of compute time and an additional 36-40 hours of manual effort. This approach was not scalable statewide.

The New Way (ADAP Graph-Based Approach)

  • Technology: The new solution uses modern data computing technologies, leveraging a graph algorithm executed on the Google Cloud Platform.
  • Process: The system is fully automated. It models road segments and time slots as nodes, connects them with edges during slowdowns, and uses a "connected components" algorithm to find relationships between crashes. The cloud's parallel processing capabilities allow it to analyze the entire state at once.
  • Performance: The initial proof of concept on the same 34-mile stretch of I-25 took just 31 minutes, a 379x improvement. When scaled statewide to cover 7 years of data (a 30-40x data increase), the entire process took only 1 hour and 21 minutes for a compute cost of just over $23.

Lessons Learned

  • Data Wrangling is a Challenge: Joining imperfect, real-world data from different sources is a difficult but critical first step.
  • Official Datasets are Key: Having "official" and complete road network datasets makes solving complex problems like this significantly easier.
  • Aim for "Good Enough": In data science, striving for perfection can be a roadblock. The key is to determine what level of data quality is "good enough" to produce reliable and actionable results.