Improving Data Discovery Through Semantic Search

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Forest Markup / Metadata Language FML
Querying Integrated Observation and Measurement data SONet June 8,
Information Extraction Lecture 4 – Named Entity Recognition II CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Semantic annotation on the SONet and Semtools projects: Challenges for broad multidisciplinary exchange of observational data Mark Schildhauer, NCEAS/UCSB.
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Dan Bunker TraitNet RCN: Foster the curation, discovery, and sharing of ecological trait data.
SRDC Ltd. 1. Problem  Solutions  Various standardization efforts ◦ Document models addressing a broad range of requirements vs Industry Specific Document.
SONet (Scientific Observations Network) and OBOE (Extensible Observation Ontology): Mark Schildhauer, Director of Computing National Center for Ecological.
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Workflow Exchange and Archival: The KSW File and the Kepler Object Manager Shawn Bowers (For Chad Berkley & Matt Jones) University of California, Davis.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
BioText Infrastructure Ariel Schwartz Gaurav Bhalotia 10/07/2002.
Information Retrieval
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Query Relevance Feedback and Ontologies How to Make Queries Better.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.
Semantic Interoperability and Retrieval Paradigms Paradigms and conceptual systems in KO February 23, 2010 – February 26, 2010 Prof. Winfried Gödert Felix.
SONet: Scientific Observations Network Semtools: Semantic Enhancements for Ecological Data Management Mark Schildhauer, Matt Jones, Shawn Bowers, Huiping.
Clustering User Queries of a Search Engine Ji-Rong Wen, Jian-YunNie & Hon-Jian Zhang.
Extracting Semantic Constraint from Description Text for Semantic Web Service Discovery Dengping Wei, Ting Wang, Ji Wang, and Yaodong Chen Reporter: Ting.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
MPEG-7 Interoperability Use Case. Motivation MPEG-7: set of standardized tools for describing multimedia content at different abstraction levels Implemented.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
Standards and tools for publishing biodiversity data Yu-Huang Wang June 25, 2012.
From FAUST to VOYAGER efforts to maintain map and geodata stocks 17th Conference of the LIBER Groupe des Cartothécaires TALLINN, Estonia June 2010.
Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer.
Semantic Mediation in SEEK/Kepler: Exploiting Semantic Annotation for Discovery, Analysis, and Integration of Scientific Data and Workflows Bertram Ludäscher.
Growing challenges for biodiversity informatics Utility of observational data models Multiple communities within the earth and biological sciences are.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
Introduction to Morpho BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Controlled Vocabulary Giri Palanisamy Eda C. Melendez-Colom Corinna Gries Duane Costa John Porter.
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Semantic Overlay Networks in P2P systems A. Crespo, H. Garcia-Molina Speaker: Pavel Serdyukov Tutor: Jens Graupmann.
Ontology Resource Discussion
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
LTER IM Meeting 2008 – Benson, Boose, Bohm, Gries, Gu, Kaplan, Koskela, Laney, Porter, Remillard, Sheldon and others.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
OBOE v.s. OGC O&M SONet June 8,2010. OBOE Entity Context Characteristic Measurement Observation Standard hasCharacteristic hasMeasurement ofEntity hasContext.
An Ontological Approach to Financial Analysis and Monitoring.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Controlled Vocabulary Working Group Activities
Ontology Technology applied to Catalogues Paul Kopp.
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
SEMANTIC WEB Presented by- Farhana Yasmin – MD.Raihanul Islam – Nohore Jannat –
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
ONTOLOGY LIBRARIES: A STUDY FROM ONTOFIER AND ONTOLOGIST PERSPECTIVES Debashis Naskar 1 and Biswanath Dutta 2 DSIC, Universitat Politècnica de València.
Passage 1 1. Meaningful Search
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Genomics research paper presentation
PDAP Query Language International Planetary Data Alliance
ece 627 intelligent web: ontology and beyond
OBI – Standard Semantic
Introduction to Information Retrieval
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
Measurement Semantics: “MEASEM”
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Semantic Interoperability and Retrieval Paradigms
Presentation transcript:

Improving Data Discovery Through Semantic Search Collaborators: Chad Berkley, Shawn Bowers, Matt Jones, Mark Schildhauer, Josh Madin

Motivation Increasing numbers of datasets in online repositories including the KNB Precision and Recall of current search technology is not satisfactory (definitions on next slide) Ecological metadata does not lend itself to traditional text based searching Ecological metadata is susceptible to “Semantic Drift”

Definitions Precision: number of relevant documents retrieved by a search divided by the total number of documents retrieved by that search Recall: the number of relevant documents retrieved by a search divided by the total number of existing relevant documents (which should have been retrieved)

Precision Document set of 20 files 10 files are relevant to your search If only 8 files are retrieved and they are all relevant documents, the precision is 8/10 or 0.8 If 10 documents are returned and all 10 are relevant, the precision is 1.0 Precision says nothing about whether all relevant documents are actually returned.

Recall Same document set of 20 with 10 documents relevant to your search. If 12 documents are returned including all 10 of the relevant documents, recall is 1.0 If 12 documents are returned with only 8 of the 10 relevant documents, recall is 0.8 Recall shows how many relevant documents are returned but says nothing about false positives also returned.

Precision and Recall They are inversely related. You can increase precision by decreasing recall and visa versa. Effective search engines must find a balance between the two. Better precision and recall generally mean a better search engine I.E. if you increase precision and recall, you should have more relevant results

Our Semantic Approach Data, EML (metadata), Annotations and Ontologies Ontology: specification of a conceptualization. Hierarchical structure of concepts Concepts lower in the tree are defined with respect to higher level concepts Annotations link EML attributes to concepts defined in an ontology

Document Relationships

XML Links

Concepts of Semantic Search Annotations give metadata attributes semantic meaning w.r.t. an ontology Enable structured search against annotations to increase precision Enable ontological term expansion to increase recall Precisely define a measured characteristic and the standard used to measure it via OBOE

OBOE Quick Overview Extensible Observation Ontology (OBOE) OBOE provides a high-level abstraction of scientific observations and measurements Enables data (or metadata) structures to be linked to domain-specific ontology concepts For more OBOE information, talk to Shawn B., Matt J., Mark S. or Josh M.

Types of Implemented Searches Simple Keyword (baseline) Keyword-based (ontological) term expansion Annotation enhanced term expansion Observation based structured query

Simple Keyword Search High false positive rate Metadata structure is often ignored Project level metadata often conflicts with attribute level metadata Example: search for “soil” will return frog data because the description of the lake the frogs were studied in contained the word “soil” Synonyms for search terms are ignored

Keyword-based Term Expansion Synonyms and subclasses of the search term are discovered via the ontology Additional terms are added to the query of metadata docs Example: Search for “Grasshopper” also searches for “Orchilimum,” “Romaleidae,” etc. Increases recall, probably decreases precision Helps fight “semantic drift”

Annotation Enhanced Term Expansion Terms are first expanded similarly to the keyword-based term expansion Search performed against annotations not the metadata itself Returns metadata documents that are linked to the annotation Increase of precision. Not sure about recall, depending on the document base, it could go up or down.

Observation Based Structured Query Takes advantage of observation and measurement structures and relationships Search based on an observed entity (e.g. a Grasshopper) and the measurement standards and characteristics used to measure it Observed entity is a “template” on which the measurement characteristic and standard are applied

Observation Based Structured Query Both datasets contain “tree lengths” Annotation search for “tree length” would return both datasets Structured search allows the search to be limited by the observed entity (e.g. a tree or a tree branch) Would seem to increase precision and recall

Metacat Implementation

Keyword-based Term Expansion

Annotation Enhanced Term Expansion

Structured Search

Structured Search

Thanks Play with it: http://linus.nceas.ucsb.edu/sms Future: New grant to explore this more Future: Do better experiments to find out if our intuitions about precision and recall are correct Paper: https://svn.ecoinformatics.org/semtools/docs/pubs/iSEEK09/iSEEK09.doc Thanks to Shawn, Matt, Mark and Josh