What Is The CDO Data Analytics Challenge?

2018 marked the first year of the Chief Data Office’s (CDO) Colorado Data Analytics Challenge – an event that asked tech and data science folks in the community to rally around three major issues in Colorado to show how available data can provide solutions and inspiration to understanding questions that can be answered. With a nod to the short timeline, the expectation was to identify an element from the three issues. As a result, these issues could then be clearly scoped and analyzed within the month of the competition. This challenge exists to serve citizens and generate tangible outcomes.

Tangible outcomes of the event included a working Data Inventory and GitHub repository of resources along with the finalist teams competition entries, and of course metadata improvements to datasets on the Colorado Information Marketplace. Furthermore, building community is aksi a benefit, and a special thanks to the judges. They were awesome. Jim at City of Denver, Ashley at DRCOG, who are other 3? Also, check out the free resources for competition judging that the CDO has available.

The intangible values provide context on the value of this event to the “Data Community” and ”Data Climate” for Colorado. Furthermore, the value of public data lies in the potential of combining datasets to create new insights. The Data Science community is comprised of two parties, Data Stewards and Data Users. Data Stewards use data primarily to run government functions, sell ads, provide information, etc. It takes a lot of work to keep a data collection shop running. Unfortunately, all too often there is little time to sit down with the data and use it to see what it says and does in a secondary use case. Consequently, Data Users have a primary initiative to find value in the secondary use of data.

The Goals Of The CDO Data Analytics Challenge

The CDO Data Analytics Challenge connects the Data Stewards and Data Users in a fun and engaging way. The goal of these efforts is to get good feedback on data for Data Stewards. Also, for Data Users to enjoy an exercise of answering questions with available data and the feeling of being heard and appreciated by their peers in the Data Steward world.

The overall flow of the competition was as follows:

  • Kickoff event – Subject matter experts (SMEs) and participants meet and familiarize themselves with available data and the scope of the competition
  • Allow participants two weeks to work on their analyses
  • Virtual checkpoint – allowed the participants to give feedback on their progress and barriers they found
  • Allow participants two more weeks to complete their analyses
  • Final event –
    • Judges get facetime with participants,
    • Teams give quick presentations on the submissions
    • Winners decided and announced for each category

Kickoff And Shotgun Start

The kickoff engaged SMEs in three topic areas to share with a room of burgeoning competitors the details of their knowledge in said topic areas. Following the introductory presentations, the participants broke out into teams and joined the SMEs for discussion. Teams utilized a worksheet designed to maximize their short time with the SMEs. X# people showed at the kickoff as participants, and x# of SMEs were there along with a select number of State Agency Data Stewards as well. Through these efforts, one can clearly see the excitement generated at the kickoff by how many teams and how many were selecting multiple topics.

Challenge CategoryCount of Teams Interested in Category
Opioid Crisis8
Opioid Crisis, Smart Cities2
Opioid Crisis, Water Supply1
Opioid Crisis, Water Supply, Smart Cities2
Smart Cities8
Water Supply6
Water Supply, Smart Cities2
Grand Total29

Virtual Checkpoint

Following the kickoff was a virtual checkpoint. At this checkpoint, teams would write into the CDO with their data requests and data challenges, and the CDO responded.

Challenge CategoryCount of Teams Interested in Category
Opioid Crisis6
Opioid Crisis, Smart Cities2
Opioid Crisis, Water Supply1
Opioid Crisis, Water Supply, Smart Cities1
Smart Cities5
Water Supply6
Grand Total21

As seen in the above tables, the number of teams that participated in the virtual checkpoint was lower than the initial number of teams registered at the kickoff event. The best part about this checkpoint was the valuable feedback the competitors provided to the data stewards and managers. Of the 21 teams that submitted checkpoint responses, over half of them had useful insights and stories on their data discovery process. This included everything from finding datasets, to individual things recognized when cleaning and working with the data.

Final Team Submissions

The final event was hosted at the Denver Aquarium, where each team was given a table to present their submissions from. Of the final submissions, 5/8 teams submitted in the Water Supply category. The Smart Cities category had 2 submissions and the Opioid Crisis category only had 1 submission. This event was clearly not for the faint of heart. Indeed, completing an analysis on any of the three categories was challenging.

Challenge CategoryCount of Teams with a Final Submission
Opioid Crisis1
Smart Cities2
Water Supply5
Grand Total8

The prize for competing is having their  information in the repository as a contribution to the wealth of data available for Colorado. That way, others can visit the GitHub Organization and see all the great work from teams. Understandably this is great for many folks competing in their free time. So, the best takeaway is a high interest in helping the state.

The Winners Of The Challenge

All submissions were of quality and sparked interesting conversations with data stewards from various state offices. The winners of each category:

Opioid Crisis – “Opioid SOS”, for their interactive map of Denver that displays non-marijuana drug crimes, along with drug abuse clinics and Naloxone distribution centers.

Water Supply – “Regis Waterlytics”, for their exploration of the HB 1051 data with a dual focus on (1) water leakage and (2) water production by population size.

Smart Cities – “Data Tigers”, for their analysis of the past 5 years of crime data to understand the trends from the different aspects of crimes, such as year-over-year growth trends and crime rate by neighborhood.

Challenge CategoryCount of Teams at KickoffCount of Teams at Virtual CheckpointCount of Teams at Final Event
Opioid Crisis861
Opioid Crisis, Smart Cities220
Opioid Crisis, Water Supply110
Opioid Crisis, Water Supply, Smart Cities210
Smart Cities852
Water Supply665
Water Supply, Smart Cities200
Grand Total29218

The Data Inventory And Discoverable Datasets

It is interesting to note how the choice of category altered throughout the duration of the competition. For example, Opioid Crisis began with the highest number of teams interested, and ended with only one submission. What is the reasoning behind this shift? We received reports from competitors that some data showcased by opioid crisis subject matter experts at kickoff was later found to be inaccessible or not machine readable. In contrast, there is an abundance of water-related datasets on the Colorado Information Marketplace.

Interesting to note that the Opioid Crisis category attracted the most interest at the kickoff event. However, the Water Supply category had the largest number of entries. It is possible this is a reflection of more pii being associated with Opioid Data than Water Data. On the other hand, perhaps the people in Water Supply and Smart Cities just ended up having more free time? All good fuel for planning the next iteration of the challenge!

Participants were provided an inventory of relevant data and a knowledge base on GitHub. The bulk of the datasets listed in the inventory are located on the Colorado Information Marketplace. The inventory shown in the link does not reflect a full inventoried list of datasets that exist for each category. Instead the list represents the list of datasets that are discoverable and readily available for each category.

Competition ThemeCount of Datasets in Inventory
Smart Cities133
Water Supply58
Grand Total236

Tangible Outcomes

Tangible outcomes will be awesome as future iterations of the competition come in subsequent years. The GitHub knowledgebase holds several documents explore ‘suites’ of data, most often from data providers on the Colorado Information Marketplace. Some examples suites of data outlined in the kbase include NREL data (National Renewable Energy Laboratory), how to create quick data visualizations, and GIS resources. Perhaps one of the best outcomes is the interaction with the users, as shown in this sample of responses teams had at the “Virtual Checkpoint”.

Types of Feedback from Participants
16Data Use FeedbackPeople telling us what they learned from working with the data
4Data Portal FeedbackPeople telling us how they want to see the data portal and data discoverability improved
21Data Exploration QuestionsPeople asking for data updates or questions related to understanding the data
3Data Publishing RequestsPeople making specific data requests
13Competition ResponsesPeople with general comments about the competition

Overall, this competition was quite a success. First, Socializing Data makes it faster to clean data and improve data collection. Also, analyses and applications created by the competitors sparked meaningful conversations in all categories. Data Community Engagement Events and Competitions are an awesome way to get feedback on data. Important gaps in public data for some of the subject areas were identified. Challenges are great for inspiring innovation in communities. With that said, all parties involved in this competition were pleased with the level of engagement and quality of submissions. Most importantly, it is many people’s hopes that this competition will continue in upcoming years.