Quality control on french OSM administrative boundaries

Tracing France administrative boundaries is a tedious work that took several yers as France has 36000+ municipalities which is around 40% of all Europe municipalities !
 
In 2011, then in 2013, IGN (french geographic national authority) published 2 datasets in opendata, GEOFLA and Route500, but these data are simplified with a much lower level of details compared to OSM goals.
 
So, the only authorized and good enough source for that job is the french cadastre.
 
As we're getting close to 100% completeness of french administrative boundaries, it is time to study their quality. Without freely available reference data, an exhaustive study is not possible, but let's try with what we have.
 
GEOFLA and Route500 contain simplified administrative boundaries, but it seems that the crossing nodes have been more or less maintained very close to their original location.
 
So I've tried to do some comparison on these crossing nodes between OSM and Route500 (less simplified that GEOFLA).
 
(if PostGIS SQL queries are not your favorite reading, you can skip to the end of this post)
 
Import Route500 in PostGIS...
 
export SHAPE_ENCODING="ISO-8859-1"
ogr2ogr -t_srs EPSG:900913 -f PostgreSQL PG:dbname=osm LIMITE_ADMINISTRATIVE.SHP -overwrite -nlt GEOMETRY -lco SCHEMA=public
ogr2ogr -t_srs EPSG:900913 -f PostgreSQL PG:dbname=osm NOEUD_COMMUNE.SHP -overwrite -nlt GEOMETRY  -lco SCHEMA=public
ogr2ogr -t_srs EPSG:900913 -f PostgreSQL PG:dbname=osm COMMUNE.SHP -overwrite -nlt GEOMETRY  -lco SCHEMA=public
 
Geometries a reprojected in mercator as for the OSM data we will use an osm2pgsql schema usually used to generate tiles.
 
We then to some cleaning and add some usefull indexes...
 
ALTER TABLE NOEUD_COMMUNE RENAME TO r500_communes_noeuds;
ALTER TABLE r500_communes_noeuds drop ogc_fid;
ALTER TABLE LIMITE_ADMINISTRATIVE RENAME TO r500_limite_admin;
ALTER TABLE r500_limite_admin drop ogc_fid;
ALTER TABLE LIMITE_ADMINISTRATIVE RENAME TO r500_communes;
ALTER TABLE r500_communes drop ogc_fid;
CREATE TABLE r500_noeuds (pt geometry, nb integer, insee text);
CREATE INDEX on r500_noeuds  using gist(pt);
CREATE INDEX on r500_noeuds (insee);
 
Then we extract the crossing nodes...
INSERT INTO r500_noeuds select ST_Line_Interpolate_Point(l.wkb_geometry,0) as pt, null, null from r500_limite_admin l;
INSERT INTO r500_noeuds select ST_Line_Interpolate_Point(l.wkb_geometry,1) as pt, null, null from r500_limite_admin l;
 
and determine which municipalities are sharing them...
INSERT INTO r500_noeuds select pt, count(distinct(insee_comm)), string_agg(distinct(insee_comm),' ' order by insee_comm) as insee from r500_noeuds join r500_communes c on (st_touches(pt,wkb_geometry)) join r500_communes_noeuds n on (n.id_rte500=c.id_rte500) group by pt;
 
and a final cleanup:
DELETE FROM r500_noeuds where nb is null;
 
The resulting table contains 72598 nodes (Route500 2012) with:
  • crossing node
  • number of municipalities sharing that node: 1 for islands, enclaves and exclaves, 2 for municipalities on the seaside or at international border, 3 or more in most cases
  • INSEE unique codes identifying the municipalities sharing the node (in ascending order)
 
OSM crossing nodes
 
Same structure for the table:
 
CREATE TABLE osm_noeuds (pt geometry, nb integer, insee text);
CREATE INDEX on osm_noeuds  using gist(pt);
CREATE INDEX on osm_noeuds (insee);
 
To speedup processing, a temporary table only containing municipalities polygons is created:
 
CREATE TEMP TABLE osm_communes as (select "ref:INSEE" as insee, way FROM planet_osm_polygon WHERE boundary is not null and boundary='administrative' and admin_level='8' and "ref:INSEE" is not null);
CREATE INDEX on osm_communes using gist(way);
 
Then crossing nodes are extracted (with this poorly optimized query)...
 
INSERT INTO osm_noeuds select pt, null, null from (select  st_line_interpolate_point((st_dump(st_linemerge(st_intersection(c.way,l.way)))).geom,0) as pt  from osm_communes l join osm_communes c on (st_touches(c.way,l.way) and c.insee > l.insee)) as lim group by pt;
INSERT INTO osm_noeuds select pt, null, null from (select st_line_interpolate_point((st_dump(st_linemerge(st_intersection(c.way,l.way)))).geom,1) as pt  from osm_communes l join osm_communes c on (st_touches(c.way,l.way) and c.insee > l.insee)) as lim group by pt;
 
Remove the doubles...
 
INSERT INTO osm_noeuds select pt, 0,null from osm_noeuds group by pt;
DELETE FROM osm_noeuds where nb is null;
 
and determine which municipalities are sharing them:
 
INSERT INTO osm_noeuds select pt, count(*), string_agg(c.insee,' ' order by c.insee) as insee from osm_noeuds join osm_communes c on (st_touches(pt,way)) group by pt;
DELETE FROM osm_noeuds where nb=0;
 
Tables r500_noeuds and osm_noeuds are similar, we can start comparing their content... and get the 100 largest differences :
 
SELECT min(st_distance(st_transform(r.pt,2154),st_transform(o.pt,2154))) as d , r.insee from r500_noeuds r join osm_noeuds o on (r.insee=o.insee) group by r.insee order by d desc limit 100;
 
Résults
 
A CSV file containing the above query result is updated every night at: http://osm13.openstreetmap.fr/~cquest/openfla/ecarts-osm-route500.csv
 
It contains :
  • the distance between OSM and Route500 crossing nodes, which median value is 22m (half of differences are below 22m)
  • INSEE codes of the municipalities (space separated)
These raw data are available for further analysis, and also to check/fix OSM data where needed.
 
Here is a data visualisation of these differences, provided thru a new overlay on tile.openstreetmap.fr  
 
 
More reading (in french) about this overlay is available on OSM wiki : http://wiki.openstreetmap.org/wiki/FR:Servers/tile.openstreetmap.fr#Couc...