Reportnet standards and next steps Søren Roug, Information and Data Services (IDS)
Use of Standards Historically, Reportnet has been targeted towards the webbrowser. It is a standard called REST You can upload any file to CDR Communication between sites is done with XML-RPC Transfer of metadata uses RDF (Semantic Web) Reportnet does not use Service Oriented Architecture (SOA) SOAP INSPIRE
Introduction of XML (A standard for file formats) In 2004 Reportnet started to give preferential treatment to XML One single requirement: That the XML file has a schema identifier From this we can: Run QA scripts using the XQuery language Convert to other formats using XSL-T Edit the XML content using XForms for webforms
2008 focus Integration of national repositories into Reportnet Guidelines on How to implement a Reportnet/SEIS node Use of QA service from national node Use of conversion service from national node Registration of datasets Via a manifest file Via manual registration at website
2009 focus (next steps) How to register the datasets How to search for the datasets How to track updates to the datasets How to bookmark found datasets How to merge datasets How to trust the dataset How to trust the trust
Registering a SEIS dataset Discovered via manifest files and manual registration
Adding metadata
Bookmarking and searching the dataset
Working with files vs. records Now we know where the files are in the SEIS universe But we can do more: We can read the content of XML files Example of an XML snippet: <stations xmlns:xsi= xsi:noNamespaceSchemaLocation=" St. Pölten Industrial urban...
Merging principles Station structure as a table (austria.xml) Identifierlocal_codename... # St. Pölten... # Linz... Quadruple structure SubjectPredicateObjectSource #32301typeRiver Stationaustria.xml #32301local_code32301austria.xml #32301nameSt. Pöltenaustria.xml #32302typeRiver Stationaustria.xml #32302local_code32302austria.xml #32302nameLinzaustria.xml
Merging the datasets Austria Stations.xml Belgium Stations.xml Germany Stations.xml Aggregation Database XSL Transformation to quadruples SubjectPredicateObjectSource #32301nameSt. PöltenAu..xml #30299nameGentBe..xml #42882nameKölnGe..xml
Merging the datasets (with later updates) Austria Stations.xml Austria update1.xml Aggregation Database XSL Transformation SubjectPredicateObjectSource #32301nameSt. PöltenAu..xml #32301date Au..xml #32301nameSpratzernAu..update1.xml #32301date Au..update1.xml
Searching To find all river stations in Europe you search for subjects with the type=”River Station” The query will format it as a table for you Obviously you get duplicates because has been updated IdentifierLocal_codeNameDateLongitude # St. Pölten #32301Spratzern # Gent # Köln
QA work Let’s first colour the cells by their source IdentifierLocal_codeNameDateLongitude # St. Pölten #32301Spratzern # Gent # Köln
QA work Then we merge by letting the newer sources overwrite the older: IdentifierLocal_codeNameDateLongitude # Spratzern # Gent # Köln
QA work Don’t trust one source? Turn it off before you merge IdentifierLocal_codeNameDateLongitude # St. Pölten #32301Spratzern # Gent # Köln
QA work Then we merge IdentifierLocal_codeNameDateLongitude # St. Pölten # Gent # Köln
QA work Gapfilling? Add your own source as a layer The layer is stored on QAW IdentifierLocal_codeNameDateLongitude # St. Pölten #32301Spratzern # Gent # Köln # Hermann’s gapfilling layer created
QA work Then we merge IdentifierLocal_codeNameDateLongitude # Spratzern # Gent # Köln And we export to our working database for production...
Trusting the dataset and trusting trust Datasets and values can be evaluated by looking at the source Is the source URL from a reliable organisation/person? Is the methodology described? Are there reviews on QAW? Who wrote the reviews? Are there others who have used the data? Who are they?
Summary These new tools intend to solve the use of the Reportnet deliveries: Aggregation/Merging Manual QA and gap-filling Traceability to the sources Noticing when the source has been updated/deleted Review of the source for inclusion That was no problem before because only authorised parties could upload to CDR With SEIS now anyone can participate