Seed-based Generation of Personalized Bio-Ontologies for Information Extraction Cui Tao & David W. Embley Data Extraction Research Group Department of.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
David W. Embley Brigham Young University Provo, Utah, USA WoK: A Web of Knowledge.
Ontologies for multilingual extraction Deryle W. Lonsdale David W. Embley Stephen W. Liddle Supported by the.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.
David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Aaron Stewart, and Cui Tao* Brigham Young University, Provo, Utah, USA *Mayo Clinic, Rochester,
FOCIH: Form-based Ontology Creation and Information Harvesting Cui Tao, David W. Embley, Stephen W. Liddle Brigham Young University Nov. 11, 2009 Supported.
Semi-automatic Ontology Creation through Conceptual-Model Integration David W. Embley Brigham Young University ER2008.
Schema Matching and Data Extraction over HTML Tables Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University supported.
Data-Extraction Ontology Generation by Example Yuanqiu (Joe) Zhou Data Extraction Group Brigham Young University Sponsored by NSF.
OWL-AA: Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation 2006 Spring Research Conference Yihong Ding.
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March, 2003 Funded by National.
Data Extraction From HTML Tables Cui Tao Department of Computer Science Brigham Young University.
6/17/20151 Table Structure Understanding by Sibling Page Comparison Cui Tao Data Extraction Group Department of Computer Science Brigham Young University.
1 Semi-Automatic Semantic Annotation for Hidden-Web Tables Cui Tao & David W. Embley Data Extraction Research Group Department of Computer Science Brigham.
Overall Information Extraction vs. Annotating the Data Conference proceedings by O. Etzioni, Washington U, Seattle; S. Handschuh, Uni Krlsruhe.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Toward Making Online Biological Data Machine Understandable Cui Tao.
Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department.
ER 2002BYU Data Extraction Group Automatically Extracting Ontologically Specified Data from HTML Tables with Unknown Structure David W. Embley, Cui Tao,
1 Data Integration and Extraction over Molecular Biological Data Cui Tao supported by NSF.
A Tool to Support Ontology Creation Based on Incremental Mini-Ontology Merging Zonghui Lian Data Extraction Research Group Supported by.
A New Web Semantic Annotator Enabling A Machine Understandable Web BYU Spring Research Conference 2005 Yihong Ding Sponsored by NSF.
Toward Making Online Biological Data Machine Understandable Cui Tao Data Extraction Research Group Department of Computer Science, Brigham Young University,
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
SOLUTION: Source page understanding – Table interpretation Table recognition Table pattern generalization Pattern adjustment Information extraction & semantic.
Generating Data-Extraction Ontologies By Example Joe Zhou Data Extraction Group Brigham Young University.
Table Interpretation by Sibling Page Comparison Cui Tao & David W. Embley Data Extraction Group Department of Computer Science Brigham Young University.
1 Ontology Generation Based on a User-Specified Ontology Seed Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University.
1 Cui Tao PhD Dissertation Defense Ontology Generation, Information Harvesting and Semantic Annotation For Machine-Generated Web Pages.
BYU A Synergistic Semantic Annotation Model December 2007 Yihong Ding,
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Slide 1 The 5R Adaptation Framework for Location- Based Mobile Learning Systems Kinshuk, PhD Associate Dean, Faculty of Science & Technology Professor,
Managing Information Quality in e-Science using Semantic Web technology Alun Preece, Binling Jin, Edoardo Pignotti Department of Computing Science, University.
EXCS Sept Knowledge Engineering Meets Software Engineering Hele-Mai Haav Institute of Cybernetics at TUT Software department.
Knowledge based Learning Experience Management on the Semantic Web Feng (Barry) TAO, Hugh Davis Learning Society Lab University of Southampton.
Ensemble Computing in the National Science Digital Library (NSDL)
1 Ontology-based Semantic Annotatoin of Process Template for Reuse Yun Lin, Darijus Strasunskas Depart. Of Computer and Information Science Norwegian Univ.
Dimitrios Skoutas Alkis Simitsis
Aude Dufresne and Mohamed Rouatbi University of Montreal LICEF – CIRTA – MATI CANADA Learning Object Repositories Network (CRSNG) Ontologies, Applications.
An Aspect of the NSF CDI InitiativeNSF CDI: Cyber-Enabled Discovery and Innovation.
David W. Embley Brigham Young University Provo, Utah, USA WoK: A Web of Knowledge.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg.
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
12/7/2015Page 1 Service-enabling Biomedical Research Enterprise Chapter 5 B. Ramamurthy.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Mining the Biomedical Research Literature Ken Baclawski.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Semantic Web unleashes your data! The Semantic Web will transform the use of content. Semantic Web – is an extension of the current web. Semantic Web.
David W. Embley Brigham Young University Provo, Utah, USA.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Constructing A Yami Language Lexicon Database from Yami Archiving Projects Meng-Chien Yang(Providence University, Taiwan) D. Victoria Rau(National Chung.
Representing and Reasoning with Heterogeneous, Modular and Distributed ontologies UniTN/IRST contribution to KnowledgeWeb.WP 2.1.
Development of the Amphibian Anatomical Ontology
Presented by: Hassan Sayyadi
Ontology.
Zachary Cleaver Semantic Web.
Metadata Construction in Collaborative Research Networks
Ontology.
Source Page Understanding for Heterogeneous Molecular Biological Data
Grant Number: IIS Institution of PI: Brigham Young University PI’s: David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale Title:
Workshop Organization Support SAR Environment Schematic
Presentation transcript:

Seed-based Generation of Personalized Bio-Ontologies for Information Extraction Cui Tao & David W. Embley Data Extraction Research Group Department of Computer Science Brigham Young University Supported by NSF

Personalized Information Harvesting Biology domain  huge (other domains too) Data collection – Many (web) sources – Only a tiny subpart wanted – Personalized view Personalized extraction ontology – Creation: Form specification – Application: Seed-based harvesting

Example Harvest information about large proteins in humans and the functions of these proteins – Find proteins in humans that are >20 kDa – Find all the proteins in humans that serve as receptors –... Information sources  various online repositories – NCBI – Gene Cards – The Gene Ontology – GPM Proteomics Database – …

Extraction Ontology Instance: ^\d{1,5}(\.\d{1,2})? Context: weight|wght|wt\. Unit: kilodaltons?|kdas?|kds?|das?|daltons? … T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15 …

Extraction Ontology Instance: ^\d{1,5}(\.\d{1,2})? Context: weight|wght|wt\. Unit: kilodaltons?|kdas?|kds?|das?|daltons? … T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15 …

Can We Make Construction Easier? Forms – General familiarity – Reasonable conceptual framework – Appropriate correspondence Transformable to ontological descriptions Capable of accepting source data Instance recognizers – Some pre-existing instance recognizers – Lexicons Need for a full extraction ontology?

Form Creation User Interface Basic form-construction facilities: single-entry field multiple-entry field nested form …

Created Sample Form

Generated Ontology View

Source-to-Form Mapping Establishing a Seed

Source-to-Form Mapping Establishing a Seed

Source-to-Form Mapping Establishing a Seed

Source-to-Form Mapping Establishing a Seed

Almost Ready to Harvest … Need reading path: DOM-tree structure Need to resolve mapping problems – Split/Merge – Union/Selection

Almost Ready to Harvest … Need reading path: DOM-tree structure Need to resolve mapping problems – Split/Merge – Union/Selection Voltage-dependent anion-selective channel protein 3 VDAC-3 hVDAC3 Outer mitochondrial membrane Protein porin 3 Name

Almost Ready to Harvest … Need reading path: DOM-tree structure Need to resolve mapping problems – Split/Merge – Union/Selection Voltage-dependent anion-selective channel protein 3 VDAC-3 hVDAC3 Outer mitochondrial membrane Protein porin 3 Name

Almost Ready to Harvest … Need reading path: DOM-tree structure Need to resolve mapping problems – Split/Merge – Union/Selection Name T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15

Almost Ready to Harvest … Need reading path: DOM-tree structure Need to resolve mapping problems – Split/Merge – Union/Selection Name T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15

Can Now Harvest Name

Can Now Harvest Name protein epsilon Mitochondrial import stimulation factor Lsubunit Protein kinase C inhibitor protein-1 KCIP E

Can Now Harvest Name Voltage-dependent anion-selective channel protein 3 VDAC-3 hVDAC3 Outer mitochondrial membrane Protein porin 3

Can Now Harvest Name Tryptophanyl-tRNA synthetase, mitochondrial precursor EC Tryptophan—tRNA ligase TrpRS (Mt)TrpRS

Harvesting Populates Ontology

Also helps adjust ontology constraints

Can Harvest from Additional Sites Name T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15

Larger Picture Information Harvesting – Not only for biology, but for any application – Not only from one site, but from many sites Opportunities – Extraction ontology creation – Automating site-to-site information harvesting – Automatic semantic annotation – Data/Ontology transformations

Extraction Ontology Creation Lexicons Name protein epsilon Mitochondrial import stimulation factor Lsubunit Protein kinase C inhibitor protein-1 KCIP E Name T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15 Name Tryptophanyl-tRNA synthetase, mitochondrial precursor EC Tryptophan—tRNA ligase TrpRS (Mt)TrpRS … protein epsilon Mitochondrial import stimulation factor Lsubunit Protein kinase C inhibitor protein-1 KCIP E … T-complex protein 1 subunit theta TCP-1-theta CCT-theta Renal carcinoma antigen NY-REN-15 … Tryptophanyl-tRNA synthetase, mitochondrial precursor EC Tryptophan—tRNA ligase TrpRS (Mt)TrpRS …

Automatic Source-to-Form Mapping

Automatic Semantic Annotation

Extraction Ontology Creation Instance Recognizers Number Patterns Context Keywords and Phrases

Automatic Source-to-Form Mapping

Automatic Semantic Annotation Recognize and annotate with respect to an ontology

Ontology Transformation OWL & RDF: standard ontology languages XML & XMLS: data exchange Forms: form filling to populate an ontology

Ontology Transformation Transformations to and from all

Contributions Personalized ontology creation Mapping from sources Information harvesting Opportunities for further work – Extraction ontology creation – Semantic Annotation – Data/Ontology transformations