Throughout the 2012 election cycle, we’ve been fascinated with idea of visualizing realtime election results. On election day starting when voting concludes on the East Coast, newsrooms race to process and visualize vote totals in each of the 50 states, 435 congressional districts, and 3,200 counties across the country. The Associated Press provides a feed of results data aggregated from staff deployed across the country on eight minute intervals. Since nearly all news outlets subscribe to this data, the race to report results first is really about having an incredibly short time to publish, while maintaining a steadfast focus on reliability during what’s often the highest traffic night for news websites. The excitement of the night and availability of a reliable source of fast data make this a really exciting problem to solve.
In the thick of a massive redesign and brand overhaul across all printed and electronic media, USA TODAY brought on our team to help them build a new realtime election mapping platform. Our work culminated in a responsively designed web application that powered a full screen dashboard, a smart phone and tablet view, and eight different embeddable views used throughout election night on the USATODAY.com homepage and several topic and article pages. Here’s how we built it.The maps
Our first decision was around how to render the maps in a way that would maximize reliability while reducing time to update. We considered a few options for this:
We chose D3.js to handle SVG map generation. D3 provides a simple interface for building SVGs, as well as native support for map projections and transitions. Since it doesn’t get bogged down with supporting legacy browsers without SVG support, we switched over to R2D3 for versions of Internet Explorer less than IE9.
D3’s projection support includes Mercator and Albers, as well as an option to set your own. For national views we used an Albers projection, but for state-wide views we instead used the spherical Mercator projection in MapBox.js so we could overlay the maps on the MapBox terrain layer. This was relatively easy to do — depending on the map’s zoom level, we changed the projection function. We also used D3 to animate the reprojection process for a smooth transition effect.Geodata processing
Our maps required two inputs: geographic data for the shapes of and election results values for each political division. We used Census shape files for states and counties, and newly redistricted congressional districts from Azavea. The source shape files were several megabytes each, so we needed to simplify their geographic detail considerably to get them small enough to transfer over the internet load in the browser. Our target was no more than 300kb gzip compressed size per file.
Nate processed the data with a whole suite of open source tools including QGIS, PostGIS, and GDAL OGR. The first challenge was to reduce geographic complexity and maintain adjacent boundaries — more technically known as preserving topology. Secondly, we needed to make sure the boundaries of features on different layers of data match — for example, when we layer state borders over congressional districts — which we solved by clipping congressional districts and counties to the boundary of each parent state.
Pretty quickly we realized that we could not depend on one simplification file to give us both the small file size and high resolution detail we wanted. So we produced a set of heavily simplified congressional districts to use along with D3’s included simplified counties and states. This data would be used for small maps, like the thumbnails for each race, the national House race map, and the national county-level maps. For zoomed views, like when you click on a state to see county level detail, we produced higher resolution files for each state of all of that state’s counties and congressional districts. The application then dynamically loads data as you navigate the map. When you load the initial view, you receive a few simplified files to draw low resolution thumbnails and a high resolution main map of states. When you click on a state, a high resolution file of counties for that state loads, providing better detail without needing to preload a lot of extra geographic data.
In the end, we essentially built an API for requesting geographic data at two simplification levels. To make it scale, we did not do the processing on the fly. Requesting geographic data from a live database is a completely unnecessary risk when the geographic data is not going to change. We rendered all of the data to flat geoJSON text files that we hosted will all of the applications other static files.The web application
As with the maps, we achieved high scalability and reliability by offloading as much as possible of the web application to the browser. There was no backend server application behind the web application. A single Backbone.js web application powered all of the embeds, full screen dashboard, and smart phone and tablet views.
Initially, viewing the application only loads the most basic HTML layout and starts the Backbone router. Depending on the URL hash of the request, the router dynamically loads the rest of the HTML layout. Going to /#embed loads a particular embed view, whereas requesting /#map or /#table loads a main application map or table view. Additional values following the view parameter determine configuration settings like embed size, race, and state. For embeds, the main application with appropriate URL hash gets embedded in an iframe. For the main application, as you click through, you trigger new routes by updating the URL hash with new parameters, which updates the page without the need for a full refresh.
For a complex application like this, we spent a lot of time in partnership with USA TODAY’s staff designing and developing the web application. But the principle is simple - lazily request only the data you actually need for a given view, including the HTML template, geodata, and results data. This allows us to have a large client-side application hosted on a single html page with no server application requirement.Results data API
With the USA TODAY team, we drew up a spec for a simple JSON API to transfer the live election results. They built a process to ingest the Associated Press’s XML data and expose it according to the API’s schema. We predetermined and calculated all possible requests so they could be cached in a CDN with a very high TTL. There was little chance requests would ever need to go back to the data API’s server. Only one API endpoint updated frequently - a simple number of the latest version of data available on the server. The web application polled this endpoint every 30 seconds to make sure it had the latest data. If the API response had a higher number, the application knew new data was available and would issue requests against the endpoints it needed to rebuild its current view. The method provided a balance between detecting and updating data changes quickly and having highly cacheable requests to protect the API server. Since this would be the only part of the entire application that would require a server-side application, it was crucial to have a reliable caching process.Version-controlled data
We initially set up a version control process on each of the eight minute data updates as a reliability failsafe. If anything went wrong in a new version of data, we could always roll back to a previous one. But we soon realized that these versions provided an interesting by-product - we could analyze differences between the versions to produce a feed of updates to draw viewers’ attention to the latest news coming out of the results data.
The updates feed listed three main events - new states reporting results, races called for a candidate, and states that swung to a different party in the presidential election compared to 2008. Simple comparisons between the versions made it possible for us to show the narrative of the night through changes in the data. With an average time on site of about ten minutes, we were very excited by this feature.
Bonus: Now we can use the data versions replay the night later to see the results as they unfolded.
Let us know on Twitter if you have any questions or are thinking through your own realtime data mapping project.
http://www.openstreetmap.org/browse/way/89902961/history Note that wambag changed the maxspeed (presumably correctly; the city recently took over maintenance from the state), but the source:maxspeed was not touched (and is thus incorrect now). I haven't gotten a response, but it's very likely that he was using PL2's simple mode, which allows you to choose the maxspeed from a dropdown, but has no mention of source:maxspeed.
Possible solution: put a list of all tags at the bottom (possibly ignoring certain ones like TIGER), with some sort of hint showing the correlation between the simple tags up top and the ones at the bottom. This has the beneficial side effect of easing the learning curve.
I've been contributing to OpenStreetMap for a while now, but I actually put OSM data to the test today.
A family member bought a new vehicle recently which has a built-in manufacturer-supplied GPS with proprietary mapping data on it. We decided to have a drive today to test it out. I brought along my Garmin eTrex 20 loaded up with OpenStreetMap-derived TalkyToaster maps. TalkyToaster's maps are routable and have a large number of points of interest.
We did a 35 mile drive down to Dungeness in Kent. While there, I did some site surveying in the freezing cold and have made a few updates to OSM.
The routing mostly worked. And the data was good. On the outward journey, I checked a lot of the side streets we passed by, and all those I checked had the correct name and There was only one problem I found with the data: there was a tiny little cul-de-sac I spotted that wasn't on the map. I added it as a waypoint on my Garmin and will add it to the map when I next transfer data off.
The routing had only one problem and that was on the return journey, while driving along Lower High Street, Wadhurst. Rather than continue on to the High Street, the Garmin device instead said that one should turn into Church Street. Church Street is a tiny single-width street that is only used for access and is tagged highway=residential. You wouldn't drive up it, especially in a larger vehicle like a van. It actually has a warning sign forbidding HGVs from entering. But the Garmin was quite insistent that one should drive up there. The manufacturer-installed GPS accurately stated to carry on along the High Street.
I shouldn't be surprised at how good OpenStreetMap data is (I've added plenty of it), but this slight routing mishap aside, certainly here in the south east of England, it is now pretty damn close to good enough that it can be used for in-car navigation. Having seen how much data upgrades are for some car GPS units (as much as £150 in some cases), the future really ought to belong to OpenStreetMap.
We've got open data, we've now got open source operating systems (Linux, Android), cheapish hardware, Kickstarter for funding: someone could build a completely open source satellite navigation system...
As the Night of the living maps was (at least in my eyes) a success, I felt motivated to start another global (but somewhat virtual) mapping party at the end of the year.
Here it is: Operation Cowboy As you might expect, this time we try to help our US/American community by bringing more details to this huge nation and assist on improving the imported TIGER data.
But hey, our community is social, so what about organizing a local party for your community?
Guess this will be fun again!
October 22nd, 2012 – November 5th, 2012
A summary of all the things happening in the OpenStreetMap (OSM) world.
Did we miss something? You can contact us via firstname.lastname@example.org
I have decided to get all the 138 English and Scottish football league stadiums on OSM. It will take me months but I'll chip away.
Some local knowledge but mostly Bing tracing.
Done the A's on my list so far.
Adams Park, Wycombe Wanderers (http://osm.org/go/eus8~C2DW--)
Almondvale Stadium, Livingston (http://osm.org/go/evc~7xUzo--)
Amex/Falmer Stadium, Brighton and Hove Albion (http://osm.org/go/euq7FKbpg--)
Anfield, Liverpool (http://osm.org/go/euf9c2Gqv-)
Ashton Gate, Bristol City (http://osm.org/go/eukMSlLBl--)
One of these wasn't on the map and Already mass confusion on the names of 2 of them.
Linking to Wikipedia pages, operators, websites, adding addresses, Stands (sometimes at the moment as a separate building entity inside the stadium, still thinking of the best way to do this) among other info
I'll update the OSM wiki stub for stadium at some point to when I have seen the pitfalls/best way to map all of this.
I recently found myself confronted with the sentiment that, as far as OSM or the OSMF are concerend, I had an “anti business” attitude. That’s a funny allegation about someone who was among the first people on this planet to run a business based on making OSM data available commercially, or training commercial entities how to work with OSM.
I’m not anti business. I confess that my output on mailing lists and other forms of OSM project communication may be large, but anyone with a pair of eyes will find that, for example in the license discussion, I have often argued for the business side. For as long as I can think, I vehemently fought the idea that “nobody should make money from our work”, an idea that was and still is occasionally voiced by community members, notwithstanding the fact even the old CC-BY-SA license allowed commercial use.
The one thing that I do regularly say, and where this “anti business” idea might come from, is this: OSM is not a business (and neither is the the OSMF). We are a movement, or a mass membership organisation. In my eyes, the main difference between us and Google Map Maker is not that they have a proprietary license and ours is open. The main difference lies one level deeper: They are ultimately driven by the stock market and we’re not.
Making this distinction is not anti-business; it is just about saying things as they are. Organisations driven by the stock market have other kinds of goals, are optimizing for different time frames, have other forms of management, a different type of competition, a different constituency, a completely different set of rules and values. There’s also more at stake – if Google goes bankrupt, lots of people lose their jobs, but if OSM breaks down then a different group of people will just carry on where we left off.
I’m all in favour for working with businesses who can help us make OpenStreetMap better known or more widely used, or give us access to data or help us write code. Any such cooperation will only profit from getting the basic facts right: You are a business, we are not; your goal is to make money, our goal is to make a map – and now let’s see how we can do something together that helps us both! It doesn’t help anyone if OSM tries to act like a business. Dealing with OSM will always be totally different from dealing with a commercial map data provider. Our best way to be business friendly is to explain to businesses how we work – to make them understand, and ultimately embrace, the ways in which OSM is special.
Our maps are critical to our users and to the wider public. They were especially crucial for predicting and tracking Sandy’s progression, communicating evacuation plans, and tracking surges.
MapBox powers maps on several storm-related services, including Weather Decision Technologies maps for hundreds of subscribers, USA Today’s main weather map, NYC Government’s evacuation map, and WNYC’s storm surge map. Sunday traffic was up 50% and spiked up 100% when Sandy hit land on Monday. During the storm, our traffic was 450% higher compared to the same time period last month.In Sandy’s path
Predictions put Sandy on a direct course toward our Virginia data center. Our preparations focused on ensuring full availability knowing that one of our data centers was about to get clobbered. We assumed the worst in terms of physical data center impact - that the power would go out for days, generators would fail, physical equipment would be damaged, and the data center would be shut off and temporarily abandoned. But regardless of what was going to happen, our maps needed to keep working and handle the incredible traffic increase during and after the storm.Our plan
MapBox runs hot in two data centers, one in Virginia and the other in Ireland, and our Dyn DNS fails over traffic should one go down. We assumed Virginia would go down and were not comfortable running only in Ireland for what could be several days. It was unclear what network traffic would look like if the only data center were in Ireland and with a badly damaged East coast. Based on the idea that Virginia would fail, we drew up the following plan on Friday:
When Sandy hit landfall, MapBox was running in three data centers, handling 450% of normal traffic, and all traffic going to Virginia was ready to fail over to the remaining data centers.Highly available architecture
Each instance of MapBox within a data center is also designed to be highly available based on the recommendations of AWS. Within each data center a full MapBox instance runs in at least two availability zones, which are basically like separate warehouses of servers, each with separate private networks, power, and internet uplinks. In other words, there are at least two instances of MapBox running in each data center where MapBox is deployed. The use of multiple availability zones does not guarantee an infrastructure to be 100% fail proof. Even though each availability zones has a separate power source, network, and physical location, past issues indicate how one availability zone can affect another based on how the private network across availability zones is used.
These types of issues and issues like the power outage this summer that could easily knock out multiple power supplies are examples of why MapBox runs in multiple AWS regions. However, there are sometimes smaller outages like last week’s event where running in multiple availability zones can prevent a regional outage of your service. Our approach is to design knowing core parts of our system will fail at a point. This level of persistent paranoia helps us avoid failure at as many levels as possible, and this is how we are prepared for AWS single availability zone and region-wide failures.Track our status
Whether an event is somewhat predictable like Sandy, or unpredictable like the earthquake that affected the East coast last summer, we will be open with how MapBox is designed to cope with such unfortunate occurrences and aim to serve your maps, and to serve them as fast as possible, no matter the circumstances. Find MapBox’s current status on the MapBox status page.
The "Surging Seas" map blends OpenStreetMap and aerial imagery to interactively simulate sea level rise due to climate change. Here we see New York after a 10ft rise, the maximum setting. The storm surge of Hurricane Sandy brought a rise of up to 13ft.
The article introduces the LCC journey planner that we created for them, and talks about how it uses OpenStreetMap, a project that cyclists can contribute to.
The article also includes a box about the England Cycling Data project.
Thanks to LCC for this great publicity for OSM, and thanks to Shaun, Andy and Harry who had a look over the drafts for us!
Since I was at it I checked again Average tracks mentioned yesterday.
In the valley of Trenta there is a nice highway with a lot of logs uploaded but it isn't drawn really nice so far. I downloaded all the GPS data using JOSM, converted it to a data layer cleaned it so that only good traces remained. Saving the data as gpx again, the resulting file contained nearly fifty tracksegments. I split that file into single gpx files each containing one of the segments. These I converted into csv format since the script wants this format.
A first run on all segments I cancelled after short time remembering the runtime of osm-makeroads.
I randomly selected some segments but since some covered only a part of that highway the result was unusable.
So I selected nine segments which covered the same part of the road with a length of 19 kilometres. The duration of the calculation with some seconds was satisfying short. The result itself is not to my liking. Though it is in most parts quite usable I will draw a better way by hand.
Conclusion: Most likely won't use this script in the future. It is not much advantage from drawing some kilometers of a way to checking and fixing a calculated one over its complete length before uploading.
(Sorry for the big screenshots, but all for the best visualisation :) The yellow line is the averaged track.)
"faulty" length: about 500m
"faulty" length: about 1200m
We usually talk about data on this blog, but OpenStreetMap wouldn't happen without code, too. Running the world's biggest user-editable map (and, increasingly, one of the world's biggest maps, full stop) requires thousands upon thousands of lines of code... from the low-level stuff that keeps the servers flying along, via the API and editing software that enables you to contribute, to the programs and stylesheets that turn all this raw data into pretty maps.
Much of this work happens in isolation, co-ordinated by IRC conversations or mailing lists. But we also have 'hack weekends', where developers - experienced and newcomers alike - come together to share knowledge and bounce ideas off each other.
Last weekend saw a major hack weekend in Toronto, attracting developers from the US, Britain, and the Netherlands as well as Canada. This weekend saw a rather smaller gathering in Charlbury, a tiny town in the Cotswolds, England, which coincidentally is home to the lead developers of both Potlatch (our online editing software) and Mapnik (the 'renderer' which turns OSM data into map images).
The focus for this weekend was Potlatch, with a vast list of improvements undertaken over the two days. The theme was "little things that mean a lot", so when the new version goes live soon, you'll notice quicker loading, neater appearance, more reliable operation, and so on. (Those of a technical bent can see the long list of code changes.)