Ontology Engineering approaches based on semi-automated curation of the primary literature Gully APC Burns, Tommy Ingulfsen, Donghui Feng and Ed Hovy Biomedical.

Slides:



Advertisements
Similar presentations
Using knowledge engineering to study the brain Gully APC Burns Knowledge Mechanics Research Group, University of Southern California Gully APC Burns Knowledge.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Data Documentation Initiative (DDI) Workshop Carol Perry Ernie Boyko April 2005 Kingston Ontario.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
Image Search Presented by: Samantha Mahindrakar Diti Gandhi.
Queensland University of Technology An Ontology-based Mining Approach for User Search Intent Discovery Yan Shen, Yuefeng Li, Yue Xu, Renato Iannella, Abdulmohsen.
How to Read a Technical Paper Locking and Consistency 10/7/05.
1 Automating the Extraction of Domain-Specific Information from the Web A Case Study for the Genealogical Domain Troy Walker Spring Research Conference.
Elements of a Data Management Plan Alison Boyer Environmental Sciences Division Oak Ridge National Laboratory.
Moving beyond free text. Authors Scientist does research Scientist publishes research results in journal article Old Paradigm:
ISMB 2003 presentation Extracting Synonymous Gene and Protein Terms from Biological Literature Hong Yu and Eugene Agichtein Dept. Computer Science, Columbia.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Evaluation of Structure Quality Using RCSB PDB Tools Kyle Burkhardt, Lead Data Annotator The RCSB PDB at Rutgers University.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
Ontology Development in the Sciences Some Fundamental Considerations Ontolytics LLC Topics:  Possible uses of ontologies  Ontologies vs. terminologies.
Automated Patent Classification By Yu Hu. Class 706 Subclass 12.
A Real-World Knowledge Engineering Application: The NeuroScholar Project Gully APC Burns K. M. Research Group University of Southern California.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
Writing a Research Proposal. Today Definition and purpose of the proposal Structure of a proposal The process of writing.
1 The Ferret Copy Detector Finding short passages of similar texts in large document collections Relevance to natural computing: System is based on processing.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
RET in the Classroom Seth Baker RET Teaching Modules June 19, 2007 Dr. Andreas Linninger, RET coordinator.
June 12, 2008 The University of Mississippi Design Strategy for Knowledge Base Formation to Automate a Course Map Creation Susan Lukose
November 15, National STORET Users Conference 1 Progress Report 2004 National STORET Users Conference November 15-17, 2004 Lee Manning.
RESEARCH PROPOSAL Statement of problem Objectives of the study Scope of the study Review of Literature Methodology and theoretical back- ground Benefits.
Kelli Ham, Consumer Health Coordinator National Network of Libraries of Medicine, Pacific Southwest Region.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson The University of Texas at Austin Latin American Digital Library Initiative,
Markov Logic and Deep Networks Pedro Domingos Dept. of Computer Science & Eng. University of Washington.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
Data Integration and Management A PDB Perspective.
Topic Mapping Tools for Biomedical Corpora Gully APC Burns, USC/ISI Dave Newman, UC Irvine Bruce Herr, IU.
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Neural Modeling - Fall NEURAL TRANSFORMATION Strategy to discover the Brain Functionality Biomedical engineering Group School of Electrical Engineering.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
BIRN Knowledge Engineering Working Group Chair: Gully APC Burns.
Knowledge Engineering Start with the question: “What is an ‘atom’ of scientific knowledge?”
Mining the Biomedical Research Literature Ken Baclawski.
Knowledge Engineering “Knowledge Engineering is an engineering discipline that involves integrating knowledge into computer systems in order to solve complex.
Mining and Oil Faculty Department of Oil and Gas Technologies Master program Technology of Oil Fields Development.
Project 1: Classification Using Neural Networks Kim, Kwonill Biointelligence laboratory Artificial Intelligence.
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
ASSESSMENT OF WATER RESOURCES IN DEWEY LAKE AND SANTA ROSA FORMATIONS, LEA COUNTY, NEW MEXICO ALLAN SATTLER. SANDIA NATIONAL LABORATORIES AND JERRY FANT.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
 First thing that the reader will see and this will often determine whether they will read on  Capture their attention, so the title needs to succinctly.
Example Applications of Rough Sets Theory – A Survey Christopher Chretien Laurentian University Sudbury, Ontario Canada October 2002.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Master Program in Cognitive Neuroscience Institute of Social and Political Sciences.
Data on Postdoctoral Experiences Post-Docs: Training and Career Opportunities in the 21 Century Workshop October 23-24, 2003 Eleanor L. Babco Commission.
Rigor and Transparency in Research
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Project GuideBenazir N( ) Mr. Nandhi Kesavan RBhuvaneshwari R( ) Batch no: 32 Department of Computer Science Engineering.
Scientific Literature and Communication Unit 3- Investigative Biology b) Scientific literature and communication.
Computer Representation of Venn and Euler Diagrams Diunuge B. Wijesinghe, Surangika Ranathunga, Gihan Dias Department of Computer Science and Engineering,
Automatically Labeled Data Generation for Large Scale Event Extraction
SAMT 2006.
Gully A. Burns1, Pradeep Dasigi2, Eduard H. Hovy2
Chapter 1 section 2 science as a process Starter.
The Scientific Method.
Block Matching for Ontologies
The Scientific Method.
Prepared by: Mahmoud Rafeek Al-Farra
Unit 1 Vocabulary Science Skills.
The Scientific Method.
Overview of injections and proportion of corticocollicular projection from each layer. Overview of injections and proportion of corticocollicular projection.
Spatiotemporal transmission of tau pathology from additional injection site in non-Tg mice. Spatiotemporal transmission of tau pathology from additional.
Presentation transcript:

Ontology Engineering approaches based on semi-automated curation of the primary literature Gully APC Burns, Tommy Ingulfsen, Donghui Feng and Ed Hovy Biomedical Knowledge Engineering Group, Information Sciences Institute, University of Southern California

Where’s all the knowledge? Image taken from U.S. Geological Survey Energy Resource Surveys Program The primary research literature... … is the end-product of all scientific research … forms the basis for human understanding of the subject... is written in natural language … is structured … is interpretable … is expensive … is terse

Precision and imprecision in biological representation Assay: define model system Experiment: perform measurements Conceptual model ‘Stress’, ‘energy balance’, ‘homeostasis’, ‘glucoprivation’ 2-deoxyglucose (2DG) administrated intravenously to rats, look for activation in ‘stress-responsive’ neurons MAP-K and pERK activate in neurons in PVH, BST and CEAl High-level concepts Independent variables Dependent variables Imprecise Precise

Partitioning the literature

The problem with knowledge: an over-abundance of data

Corpus Preparation for Natural Language Processing The Journal of Comparative Neurology is the foremost international journal for neuroanatomy. We downloaded ~12,000 PDFs in total from We preprocessed papers with consistent formatting from vol ( ) providing a corpus of 9,474 PDF files. This corpus contains 99,094,318 words

Active Learning / Information Extraction Methodology

The logical structure of a tract- tracing experiment Tracer Chemical [1] Injection Site [1]  Location brain structure topography side Labeled region [1...*]  Location brain structure topography ipsi-contra relative to injection site?  Label type  Label density ‘anterograde’ ‘retrograde’

Annotated XML Example from Albanese & Minciacchi, 1983, JCN 216: expt. label delineation injection labeling description

Recall, Precision and F-Score

Field Labeling Results – overall label level System FeaturesPrecisionRecallF-Score Baseline Lexicon Lexicon + Surface Words Lexicon + Surface Words + Window Words Lexicon + Surface + Window Words + Dependency features Preliminary data from a training set of 14 documents + testing on 16 documents

Field Labeling Results- Confusion Matrices

Generalizing the methodology: ‘Histology’ [from Gonzalo-Ruiz et al 1992, JCN 321: ]

The logical structure of a tract- tracing experiment Tracer Chemical [1] Injection Site [1]  Location brain structure topography side Labeled region [1...*]  Location brain structure topography ipsi-contra relative to injection site?  Label type  Label density ‘anterograde’ ‘retrograde’

Time and effort Current performance achieved by annotating 40 documents Each document contains 97 sentences (in results section) on average Annotation rate  ~ 40 Sent/hr (no support)  ~115 Sent/hr (after 20 documents) Time taken to annotate document to train system to perform at this standard  ~65 hours with no support  Estimate ~2 months for a 50% RA (20 hours / week)

Can we discover the schema from the text? Given a large review or a grant proposal specific to a single laboratory Annotate independent and dependent variables in papers. Can we learn and extract these patterns?

An example from current set of annotations 10 independent variables: age species sex weight agonist/antagonist combinations (9) primary antibody preparation protocol brain region 1 dependent variable: signal density

Acknowledgements Funding  Information Sciences Institute, seed funding *  National Library of Medicine (RO1-LM07061) *  NSF (LONI MAP project)  HBP (USCBP) Neuroscience consultants  Alan Watts *  Larry Swanson *  Arshad Khan *  Rick Thompson *  Joel Hahn *  Lori Gorton *  Kim Rapp * Computer Scientists  Eduard Hovy *  Donghui Feng *  Patrick Pantel * Developers  Tommy Ingulfsen *  Wei-Cheng Cheng