Introduction to the Aggregation Database Søren Roug, IT Project manager.

Slides:



Advertisements
Similar presentations
European Topic Centre on Biological Diversity Reporting Tool workshop March 2012.
Advertisements

1 Reportnet – an introduction Reportnet – an introduction Presentation for the Meeting on Noise Copenhagen, 12 November 2008 Søren Roug, EEA
Linked Environment Data and how we are implementing SEIS Søren Roug.
Effective management Accurate tracking Easier automation.
19 th Advanced Summer School in Regional Science An introduction to GIS using ArcGIS.
Federated Searching Pre-Conference Workshop - The federated searching cookbook Qin Zhu HP Labs Research Library February 18, 2007.
1 Introducing Reportnet Miruna Badescu. 2 A linear view of Reportnet process.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
New “Collaborate” Button Integrate UI directly into the browser. Preferred target: Firefox Easiest browser to extend in terms of UI.
1 User Manual C Manufacturer & Association Registrations See user manual A for free and skydiver registrations and user manual B for business and rigger.
An Overview of Social Bookmarking By: Aimee Martin Technology Resource Teacher
Free e-Sources for English Language Teachers by Wallace Barboza Carolina TESOL December 6th, 2008 Charleston, SC.
Status of upgrading CDI service (user interface, harvesting via GeoNetwork, CDI interoperability options following SeaDataNet D8.7) By Dick M.A. Schaap.
Reportnet standards and next steps Søren Roug, Information and Data Services (IDS)
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Event-Based Model for Reconciling Digital Entries Thesis Proposal Ahmet Fatih Mustacoglu 10/3/20151Ahmet.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
Julie Hannaford Director, Information Resources & Services OISE, University of Toronto Image credit to:
1 Reportnet for beginners Hermann Peifer, EEA. 2 What is Reportnet? The EEA’s data collection machinery.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. ACCESS 2007 M I C R O S O F T ® THE PROFESSIONAL APPROACH S E R I E S Lesson 13 – Advanced.
© 2005 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice The China Digital Museum Project.
ETMS Documentation Roger Milego Stockholm, Project meeting 8-9 October 2013.
CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen,
Marshall Breeding Director for Innovative Technology and Research Vanderbilt University
Introduction to Web Services Eric Lease Morgan University Libraries of Notre Dame June 24, 2005.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
1 Looking for The Next Great Band An inside look at Yahoo! Audio Search March 5, 2006 Michael Spiegelman.
Introduction to Morpho BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
The Metadata Tool Custom Metadata Tool Who this tool is for: This tool designed to be used a data management system. This tool is geared more for the.
11 Version Control Systems Mauro Jaskelioff (originally by Gail Hopkins)
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
Introduction to Archon for CARLI Members Jen Masciadrelli, Library Systems Coordinator, CARLI Office Sarah Horowitz, Special Collections Librarian, Augustana.
Meta-Server System Software Lab. Overview In the Music Virtual Channel system, clients can’t query for a song initiatively Through the metadata server,
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Database Management Systems (DBMS)
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
Comparison of different output options from Stata
Digital Literacy Concepts and basic vocabulary. Digital Literacy Knowledge, skills, and behaviors used in digital devices (computers, tablets, smartphones)
Serving society Stimulating innovation Supporting legislation Workshop on the INSPIRE registry and registers Søren Roug European Environment.
Metadata Training for SEFSC Science Staff Part Two.
ESRI Education User Conference – July 6-8, 2001 ESRI Education User Conference – July 6-8, 2001 Introducing ArcCatalog: Tools for Metadata and Data Management.
1 Reportnet – an introduction Reportnet – an introduction Presentation for the Meeting on IPPC/WI Brussels, 3 March 2009 Søren Roug, EEA
The ELAR Metadata Set David Evans, ELAR 3 November 2006.
X-RAY. A java project can be scanned for instances of design patterns The results are represented in a table – design pat- tern participants are associated.
Reportnet – progress and next steps Søren Roug European Environment Agency.
OpenGrey – a new environment for OpenSIGLE and European Grey Literature Christiane Stock (INIST-CNRS)
Networked Information Resources Federated search, link server, e-books.
Social Bookmarking Services : Delicious, Connotea, Citeulike, work through Blackboard Scholar.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
WISE and the future of WFD reporting
Choosing the Discovery Model Martin Forsberg
Delivering electronic data via Reportnet Central Data Repository
Introduction into Knowledge and information
‘Basic approach: Reporting and data handling’
New to Yammer? ! Use a professional adress
3a. Status of the 2012 interim report on the implementation of POM
This presentation document has been prepared by Vault Intelligence Limited (“Vault") and is intended for off line demonstration, presentation and educational.
WISER Humanities: Keeping up to date
Module P4 Identify Data Products and Views So Their Requirements and Attributes Can Be Controlled Learning Objectives: Understand the value of data. Understand.
Reportnet for beginners
Trustworthy Distributed Search and Retrieval over the Internet
Reporting and WISE — future development and other directives
Reportnet An Introduction
Reportnet 3.0 Database Feasibility Study – Approach
Best Digital Marketing Tips For Quick Web Pages Indexing Presented By:- Abhinav Shashtri.
Reportnet for absolute beginners
WIS Project Office WMO Managing WIS WIS Project Office WMO
Citation databases and social networks for researchers: measuring research impact and disseminating results - exercise Elisavet Koutzamani
State-of-play of current Water Directive integration into WISE
This presentation document has been prepared by Vault Intelligence Limited (“Vault") and is intended for off line demonstration, presentation and educational.
Presentation transcript:

Introduction to the Aggregation Database Søren Roug, IT Project manager

What is the problem? How to discover the datasets How to search for the datasets How to track updates to the datasets How to bookmark found datasets How to merge datasets How to trust the dataset How to trust the trust The Aggregation Database is being developed to solve some new problems occuring when data is distributed in SEIS: The system is going to be an extension to EEA’s Reportnet system

Discovering the data 1.A collaborating node provides a manifest of the relevant files existing on its own website 2.Users register the files at a new website called QAW along the principles of del.icio.us and digg.com Datasets tend to be only numbers. Unlike Google, that mainly harvests webpages, a SEIS search engine can’t get information from the content of the files. The aggregation database will use two mechanisms to discover relevant datasets.

Discovering with the manifest file The manifest lists location and any metadata the provider has on each file The format can handle any type of metadata property Bathing water report, T16:17:27Z 2006

Registering a SEIS dataset Discovered via manifest files and manual registration

Adding metadata

Bookmarking and searching the dataset

Working with files vs. records Now we know where the files are in the SEIS universe But we can do more: We can read the content of XML files Example of an XML snippet: <stations xmlns:xsi= xsi:noNamespaceSchemaLocation=" St. Pölten Industrial urban...

Merging principles Station structure as a table (austria.xml) Identifierlocal_codename... # St. Pölten... # Linz... Quadruple structure SubjectPredicateObjectSource #32301typeRiver Stationaustria.xml #32301local_code32301austria.xml #32301nameSt. Pöltenaustria.xml #32302typeRiver Stationaustria.xml #32302local_code32302austria.xml #32302nameLinzaustria.xml

Merging the datasets Austria Stations.xml Belgium Stations.xml Germany Stations.xml Aggregation Database XSL Transformation to quadruples SubjectPredicateObjectSource #32301nameSt. PöltenAu..xml #30299nameGentBe..xml #42882nameKölnGe..xml

Merging the datasets (with later updates) Austria Stations.xml Austria update1.xml Aggregation Database XSL Transformation SubjectPredicateObjectSource #32301nameSt. PöltenAu..xml #32301date Au..xml #32301nameSpratzernAu..update1.xml #32301date Au..update1.xml

Searching To find all river stations in Europe you search for subjects with the type=”River Station” The query will format it as a table for you Obviously you get duplicates because has been updated IdentifierLocal_codeNameDateLongitude # St. Pölten #32301Spratzern # Gent # Köln

QA work Let’s first colour the cells by their source IdentifierLocal_codeNameDateLongitude # St. Pölten #32301Spratzern # Gent # Köln

QA work Then we merge by letting the newer sources overwrite the older: IdentifierLocal_codeNameDateLongitude # Spratzern # Gent # Köln

QA work Don’t trust one source? Turn it off before you merge IdentifierLocal_codeNameDateLongitude # St. Pölten #32301Spratzern # Gent # Köln

QA work Then we merge IdentifierLocal_codeNameDateLongitude # St. Pölten # Gent # Köln

QA work Gapfilling? Add your own source as a layer The layer is stored on QAW IdentifierLocal_codeNameDateLongitude # St. Pölten #32301Spratzern # Gent # Köln # Hermann’s gapfilling layer created

QA work Then we merge IdentifierLocal_codeNameDateLongitude # Spratzern # Gent # Köln And we export to our working database for production...

Trusting the dataset and trusting trust Datasets and values can be evaluated by looking at the source Is the source URL from a reliable organisation/person? Is the methodology described? Are there reviews on QAW? Who wrote the reviews? Are there others who have used the data? Who are they?

Summary These new tools intend to solve the use of the Reportnet deliveries: Aggregation/Merging Manual QA and gap-filling Traceability to the sources Noticing when the source has been updated/deleted Review of the source for inclusion That was no problem before because only authorised parties could upload to CDR With SEIS now anyone can participate