METADATA from observation to its use METADATA from observation to its use Dr Esa Falkenroth Information Architect, SMHI 1st Data Provider Workshop St Petersburg November 2016
three perspectives - producer - infra-structure - users Create data sets Format data sets Provide data sets online Write abstract Geo-relate data sets Classify data sets Enter metadata Maintain metadata Develop metadata standard Develop classification system Locate matching keyword Locate datasets matching geo Listing matching data sets Show data sample Search using classification Search based on geolocation Search based on keyword Pick data sets Provide download service Download dataset Open dataset Understand dataset Use data set three perspectives - producer - infra-structure - users
Create data sets Format data sets Provide data sets online Write abstract Geo-relate data sets Classify data sets Enter metadata Maintain metadata Develop metadata standard Develop classification system Locate datasets matching keyword Locate datasets matching geo Listing matching data sets Show data sample Search using classification Search based on geolocation Search based on keyword Pick data sets Provide download service Download dataset Open dataset Understand dataset Use data set
SOMEBODY ELSE RESPONSIBILITY all very busy ” research is done”, ” can’t update allportals.” producer Create data sets Format data sets Provide data sets online Write abstract Geo-relate data sets Classify data sets Enter metadata Maintain metadata Develop metadata standard Develop classification system Locate datasets matching keyword Locate datasets matching geo Listing matching data sets Show data sample Search using classification Search based on geolocation Search based on keyword Pick data sets Provide download service Download dataset Open dataset Understand dataset Use data set SOMEBODY ELSE RESPONSIBILITY
digital infra-structure infra-structure view Create data sets Format data sets Provide data sets online Write abstract Geo-relate data sets Classify data sets Enter metadata Maintain metadata Develop metadata standard Develop classification system Locate datasets matching keyword Locate datasets matching geo Listing matching data sets Show data sample Search using classification Search based on geolocation Search based on keyword Pick data sets Provide download service Download dataset Open dataset Understand dataset Use data set SOMEBODY ELSE digital infra-structure all very busy ”I don’t know the data…” ”..OGC, XML, WFS, HDF!” SOMEBODY ELSE
SOMEBODY ELSE user ”I just want to search, download and use the data” Create data sets Format data sets Provide data sets online Write abstract Geo-relate data sets Classify data sets Enter metadata Maintain metadata Develop metadata standard Develop classification system Locate datasets matching keyword Locate datasets matching geo Listing matching data sets Show data sample Search using classification Search based on geolocation Search based on keyword Pick data sets Provide download service Download dataset Open dataset Understand dataset Use data set SOMEBODY ELSE user ”I just want to search, download and use the data” all very busy
digital infra-structure MIND THE GAP Helicopter view producer Create data sets Format data sets Provide data sets online Write abstract Geo-relate data sets Classify data sets Enter metadata Maintain metadata Develop metadata standard Develop classification system Locate datasets matching keyword Locate datasets matching geo Listing matching data sets Show data sample Search using classification Search based on geolocation Search based on keyword Pick data sets Provide download service Download dataset Open dataset Understand dataset Use data set digital infra-structure user
digital infra-structure MIND THE GAP ?= Helicopter view producer Create data sets Format data sets Provide data sets online Write abstract Geo-relate data sets Classify data sets Enter metadata Maintain metadata Develop metadata standard Develop classification system Locate datasets matching keyword Locate datasets matching geo Listing matching data sets Show data sample Search using classification Search based on geolocation Search based on keyword Pick data sets Provide download service Download dataset Open dataset Understand dataset Use data set Who writes metadata for old inactive projects ? ? ? digital infra-structure ? ? ? ? ? user Who should make the classification? producer user or ”mediators”?
”where is that book?” …. before the librarians not a new problem…
We can do better with metadata for open data ”somebody elses problem” does not help. Collaborate w. providers NOT a software issue, just hard work.
SWITCH-ON METADATA LIBRARY Create data sets Format data sets Provide data sets online Write abstract Geo-relate data sets Classify data sets Enter metadata Maintain metadata Develop metadata standard Develop classification system Locate datasets matching keyword Locate datasets matching geo Listing matching data sets Show data sample Search using classification Search based on geolocation Search based on keyword Pick data sets Provide download service Download dataset Open dataset Understand dataset Use data set producer UPLOAD TOOL SWITCH-ON METADATA LIBRARY SEARCH TOOL MIND THE GAP user DOWNLOAD DOWNLOAD (SVN)
SWITCH-ON METADATA LIBRARY FILLING METADATA producer Create data sets Format data sets Provide data sets online Write abstract Geo-relate data sets Classify data sets Enter metadata Maintain metadata Develop metadata standard Develop classification system Locate datasets matching keyword Locate datasets matching geo Listing matching data sets Show data sample Search using classification Search based on geolocation Search based on keyword Pick data sets Provide download service Download dataset Open dataset Understand dataset Use data set UPLOAD TOOL SWITCH-ON METADATA LIBRARY Maintain ontologies Catalogue resources Create metadata what a SEARCH TOOL ”BYOD” MIND THE GAP user Hydrologist work during the summer period to improve abstracts, geographical information, license information for data sets by contacting data providers.
8300 resources usable metadata free copy- right Acknow- ledgement science special non-commercial 48% LICENSE 8300 resources usable metadata - varying formats - many licences - ways to access - being added to GEOSS ACCESS direct download 78 % request 14% viewing 8% other 10% hdf netcdf 24% Downloadable datasets (direct ) Request-datasets require registration View-services (no download ) Download services (e.g. ftp-servers) Other websites (w. open data) SWITCH-ON datasets asc txt 7% FORMAT dat 9% excel 24% shape 9% html 10%
SWITCH-ON innovation in usable metadata search Innovative (usable) classification Innovative (usable) geospatial data Innovative (usable) interface Innovative budget (0.2 % for ”librarian work”) Extend/correct incomplete or missing abstracts More detailed spatial coverage for point sources Reclassification for water-science (user perspective)
classification problems (1) Generic themes give many “hits” (not specific) - GEOSS Water (155035 hits) GEOSS Climate (24436 hits), a generic portal has GEOSS Agriculture (11866 hits) generic keywords… (2) Producer and users use different sets of keywords. - Producer: WFS, realtime portal, operational data store - User: mass fraction pm2p5 nitrate dry aerosol, runoff (3) Neither the producer, user or the mediators necessarily have the ”whole picture” needed to make a good classification. Here, user communities can help with develop usable classifications (that work for search).
usability-driven classification Resources catalogued based on how the users will search instead of using the producers terms Balancing “specialisation-degree” Too specific (zero hits) Too generic (too many hits) Good enough (7 -100 hits) SWITCH-ON extended the well- known CUAHSI ontology for the hydrosphere with additional keywords to cover land-use and population data.
Good fit with GEOSS DAB thesaurus
Usable spatial search and the world box problem Bounding boxes are great for describing coverage of maps and gridded data. However, for in-situ data, bounding boxes give false positives. More detailed spatial resolution with individual in-situ positions for datasets facilitate search on local or regional scale. Technically, bounding boxes / polygons are replaced with multipoint coverage
Usable spatial search and the world box problem What happens if the user does a search for his/her area of interest? The search box matches the bounding box of the data …but there are no relevant data in the dataset found.
Balancing and pragmatic approach to metadata Not all data sets are equally popular. More popular datasets need more refined/detailed metadata. Not all ISO metadata attributes are necessary for search. Water scientists mainly want to search by classification and/or spatial reference (co-location). The rest is simple matter of automatic filtering (as implemented by geoportal) or simply sifting through the often limited results. This means the search metadata can be simplified while still maintaining compatibility with GEOSS and ISO-standards.
agile development of user interfaces Agile approach is a proven/established method of finding user requirements and develop software Easier interfaces Less coding Easier testing Faster time to market Happier users
Very basic search tool
Welcome to the SWITCH-ON Portal SWITCH-ON is developing a large number of commercial water-information products and services Open Virtual Water-Science Laboratory: Research infrastructure to facilitate collaboration, transparency and repeatable computational experiments. Tailored data, research results and marketing SWITCH-ON will give free access: tools for datasearch and knowledge brokering for development and marketing of commercial information products and services one-stop-shop with water information and tools to water scientists, consultancies and managers:
Increasing the use of GEOSS: summary from three perspectives Producers can do better: Sharing data to enable innovation and better research in e.g. climate Use clear permitting licences!! Preferrably Creative Commons. Provide complete and correct metadata in standard machine-readable formats Mediators (portals, brokers, data hubs ) can do better: Monitor availability, completeness and usability of the data sets Encourage open data and the adoption of Creative Commons Active pragmatic collaboration with data providers increase precision of spatial information (e.g. multipoint coverages) update broken links in collaboration (some 4% yearly loss of data in our collection). fix missing descriptions of all data sets Librarian effort in SWITCH-ON is less than 0.2% of the total project cost. User communities, research organisations and product developers: Better communicate their primary requirements for data search Contribute to metadata (especially for data sets from smaller local projects)
Thank you !