Real-time Big Data Analysis of Madrid's Traffic


 We are working with traffic data of Madrid city. We will deploy our project on three stages: (1) a simple tool to analyze the Madrid city traffic data, (2) we will include more datasets making the relations between them and (3) we will start a real-time analysis of data.   

Full Description

We are working with traffic data of Madrid City. Due the complex of project, we divided it in some stages.

On a first stage, our project will collect the open data at: This is the open data page of Ayuntamiento de Madrid

This dataset have information divided by months and added data by years. The size is around 6G per year, and we need two years or more to look for seasonality patterns, so our team is going establish workflows to make, first at all, a simple tool to visualize and study data. With this tool we are very interested in events, like music festivals, parades, sport matches, and the way they affect the flow of cars. We are also looking in daily events like rush hour on a city scale, but also local one like fathers bringing and collecting children in schools. This tool, using on an unsupervised mode, can also allow us to find unusual data and try to relate it to events.  

In this phase, we are considering the effect of stationality patterns and removing it from the data. Since we are working with big data and combining large datasets for the analysis, we will load all the pre-prepared information and use dashDB to show the data.

The second stage of the project is to include others datasets, looking for relations on this datasets could be related to traffic. The idea is to merge data of different origins. This way, we improve the original data but also we hope to obtain insights on the influence of other forms of transport in the city (pedestrian, bicycles, bus, metro). At this stage we are planning to use IBM Analytics for Hadoop and IBM Insights for Twitter , combining Hive and a Map Reduce approach to accomplish better and faster results. All the data mixed and analyzed using sentiment analysis, twitter  

Some of our target datasets are:

EMT, weather, bicycles, traffic cameras, security cameras, twitter, traffic lights, taxi, metro, waze, schools locations.

The final result of phase two will be a public API that will be available for the residents of Madrid,  when you can generate your own insights .

The final stage it will deploy a real-time analysis of traffic based in many datasets. It will be difficult due the need of relative big amount of data in real-time, the use of different kind of data, and the different values due many types of sensors.

Video Presentation

If your project has been nominated for global awards, please embed here your 3-minute Youtube presentation. (Select "Insert" --> "Youtube").



Karla Vizcarra:

Tristan Guigue:

Stathis Fotiadis:

Santiago Mota:

Please write down here the Github info for your team and project.


If you used Bluemix, please provide:

1) The URL of code repositories containing code (if a private repository, please add Sandhya Kapoor, to the project with read access)

2) List of Bluemix boilerplates, runtimes, services and add-ons used during the hackathon - if any services were used during the hackathon, but are not part of the demonstration please specify.


If your project has been nominated for global awards, please indicate the global prize for which you wish to be considered. Please select only one prize. The global awards details are here