Experience with harvesting metadata to an Open Data Portal Jiří Poláček Czech Office for Surveying, Mapping and Cadastre
Open Spatial Data in ČÚZK Dataset series Data format Number of datasets Cadastral Parcels INSPIRE harmonized (GML 3.2.1) 13091 Addresses 6259 Buildings Administrative Units 1 Geographical grid systems 2 Cadastral map VFK (national standard) 13185 VKM (national standard) 632 SHP DGN RÚIAN – complete dataset VFR (national standard) 12520 RÚIAN – change records 31 RÚIAN – selected records (addresses) CSV 12652 RÚIAN/ISÚI – complete dataset with history 6266 RÚIAN – special records (electoral district) Approved survey sketches Total 116526
Open Spatial Data in ČÚZK Data series Data format Number of datasets Cadastral Parcels INSPIRE harmonized (GML 3.2.1) 13091 Addresses 6259 Buildings Administrative Units 1 Geographical grid systems 2 Cadastral map VFK (national standard) 13185 VKM (national standard) 632 SHP DGN RÚIAN – complete dataset VFR (national standard) 12520 RÚIAN – change records 31 RÚIAN – selected records (addresses) CSV 12652 RÚIAN/ISÚI – complete dataset with history 6266 RÚIAN – special records (electoral district) Approved survey sketches Total 116526 INSPIRE harmonized 25512 National Standard 45821 Proprietary Format 45093
Terminological remark Collection of spatial data sets sharing the same product specification Dataset Serie Dataset Datasets Identifiable collection of spatial data Sections of a dataset
Terminological remark Cadastral Map Dataset Serie Dataset Datasets Sections of a dataset 13091 files (1 for each cadastral unit)
National Open Data Portal (MVČR) European INSPIRE Geoportal (JRC) CKAN Missing information on Datasets (file addresses) CSW National INSPIRE Geoportal (CENIA) CSW Branch Geoportal (ČÚZK)
Too complex task, lack of funding National Open Data Portal (MVČR) European INSPIRE Geoportal (JRC) CKAN CSW National INSPIRE Geoportal (CENIA) Too complex task, lack of funding CSW Branch Geoportal (ČÚZK)
National Open Data Portal (MVČR) European INSPIRE Geoportal (JRC) CKAN CSW National INSPIRE Geoportal (CENIA) CSW Publication Database (ČÚZK) Branch Geoportal (ČÚZK) CKAN API
Specification of open data in metadata URI – link to the detailed description of the licence
Implementation obstacles National Open Data Portal (MVČR) Publication Database (ČÚZK) CKAN CKAN API ATOM Map server Branch Geoportal (ČÚZK) Viewing Cadastre Application INSPIRE network services
Implementation obstacles National Open Data Portal (MVČR) Publication Database (ČÚZK) CKAN CKAN API ATOM insufficient HW Map server Branch Geoportal (ČÚZK) Viewing Cadastre Application INSPIRE network services
Implementation obstacles National Open Data Portal (MVČR) Publication Database (ČÚZK) CKAN CKAN API ATOM Map server Branch Geoportal (ČÚZK) Viewing Cadastre Application INSPIRE network services Database overload
Implementation obstacles National Open Data Portal (MVČR) Publication Database (ČÚZK) CKAN CKAN API ATOM Metadata file cash Map server Branch Geoportal (ČÚZK) Viewing Cadastre Application INSPIRE network services
Open Spatial Data in ČÚZK Data series Data format Number of datasets Cadastral Parcels INSPIRE harmonized (GML 3.2.1) 13091 Addresses 6259 Buildings Administrative Units 1 Geographical grid systems 2 Cadastral map VFK (national standard) 13185 VKM (national standard) 632 SHP DGN RÚIAN – complete dataset VFR (national standard) 12520 RÚIAN – change records 31 RÚIAN – selected records (addresses) CSV 12652 RÚIAN/ISÚI – complete dataset with history 6266 RÚIAN – special records (electoral district) Approved survey sketches Total 116526 Next day after any change Once a month Once a week Once a day
All the metadata has been harvested Conclusion All the metadata has been harvested It took about 5 hours Current status : 44399 datasets metadata are harvested to the National Open Data Portal once a week Operational run : Incremental harvesting for CKAN (and some optimization) is neccesary for future Future solution : Learn CKAN to use standard CSW and implement incremental harvesting Standardize specification of open data in metadata Clear up terminology
Thank you for your attention jiri.polacek@cuzk.cz 19.3.2014 ELF workshop Stránka 16