Cis-Regulatory/ Text Mining Interface Discussion.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

NoodleBib Create a [bibliography, source list…] * [Your name/title/contact info] *Note: For the brackets, fill in your specific information.
NoodleBib Create a bibliography, source list, works cited page.
WHAT D IS RAW, UNPROCESSED FACTS AND FIGURES COLLECTED, STORED AND PROCESSED BY COMPUTERS.
Work Flows of the Online Review System Copernicus Office Editor Copernicus Publications | April 2014.
Annotation standards in ORegAnno (Draft) Obi Griffith The RegCreative Jamboree Nov 29, 2006 Ghent, Belgium.
Preparing for Implementation January 24 –
Report on Invasive Species
Dissertation Writing.
MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.
ANALYSIS OF INTER-ANNOTATOR AGREEMENT (TEXT MINING & REG. ANNOTATION) RegCreative Jamboree, Friday, December, 1st, (2006) MARTIN KRALLINGER, 2006 TEXT.
LESSONS FROM THE BIOCREATIVE PROTEIN- PROTEIN INTERACTION (PPI) TASK RegCreative Jamboree, Friday, December, 1st, (2006) MARTIN KRALLINGER, 2006 LESSONS.
ACADEMIC WRITING Made by Matukhin D.L. Associate Prof. EEI TPU TOMSK POLYTECHNIC UNIVERSITY.
2/23/ Enterprise Web Accessibility Standards Version 2.0 WebMASSters Presentation 2/23/2005.
© 2006 The MITRE Corporation. ALL RIGHTS RESERVED. Lynette Hirschman The MITRE Corporation Bedford, MA, USA RegCreative Jamboree Nov 29-Dec 1, 2006 Text.
IVITA Workshop Summary Session 1: interactive text analytics (Session chair: Professor Huamin Qu) a) HARVEST: An Intelligent Visual Analytic Tool for the.
1 / 31 CS 425/625 Software Engineering User Interface Design Based on Chapter 15 of the textbook [SE-6] Ian Sommerville, Software Engineering, 6 th Ed.,
Class Projects. Future Work and Possible Project Topic in Gene Regulatory network Learning from multiple data sources; Learning causality in Motifs; Learning.
1 Software Requirements Specification Lecture 14.
Release & Deployment ITIL Version 3
AGENDA Welcome and introductions Brief introduction to PSI Mobile Technical Overview Demonstration Q and A Next Actions.
1 DEVELOPING ASSESSMENT TOOLS FOR ESL Liz Davidson & Nadia Casarotto CMM General Studies and Further Education.
Applying the Principles of Prior Learning Assessment Debra A. Dagavarian Diane Holtzman Dennis Fotia.
This presentation is the property of Paradigm Information Systems It is confidential to the intended recipient for the purpose of evaluating FMS Any other.
Training Session Product File Notes and Registration Reports, 23 October Registration Report: General aspects M. Trybou Federal Public Service of.
Use case lessons: Components of the SEEK architecture Robert K. Peet University of North Carolina.
ITEC224 Database Programming
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
How do we Collect Data for the Ontology? AmphibiaTree 2006 Workshop Saturday 11:30–11:45 J. Leopold.
Using Turnitin® and ETS e-rater® with myWriteSmart
PattArAn – From Annotation Triplets to Sentence Fingerprints Motivation Motivation  Scientific concepts are annotated with controlled vocabulary (CV)
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
HOW TO WRITE A RESEARCH PAPER CGHS Language Arts.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
Federal Aviation Administration By: Giles Strickler, UCS Program Manager Procurement Policy (AJA-A11) Date:September 22, 2010 Unified Contracting System.
1 User Interface Design Components Chapter Key Definitions The navigation mechanism provides the way for users to tell the system what to do The.
Copyright OpenHelix. No use or reproduction without express written consent1.
School of Health Sciences Week 8! AHIMA Practice Briefs Healthcare Delivery & Information Management HI 125 Instructor: Alisa Hayes, MSA, RHIA, CCRC.
Avoiding Plagiarism Quoting, paraphrasing and summarizing
EndNote: The Next Steps Rebecca Starkey Reference Librarian The Joseph Regenstein Library
Database Management Systems (DBMS)
Opportunities for Text Mining in Bioinformatics (CS591-CXZ Text Data Mining Seminar) Dec. 8, 2004 ChengXiang Zhai Department of Computer Science University.
Census Processing Baku Training Module.  Discuss:  Processing Strategies  Processing operations  Quality Assurance for processing  Technology Issues.
Supporting Collaborative Ontology Development in Protégé International Semantic Web Conference 2008 Tania Tudorache, Natalya F. Noy, Mark A. Musen Stanford.
Help! Not bib & note cards… Oh, the horror of it all!
Analysis. This involves investigating what is required from the new system and what facilities are available. It would probably include:
Advanced Higher Computing Science The Project. Introduction Worth 60% of the total marks for the course Must include: An appropriate interface using input.
Coordination and Policy Development in Preparation for a European Open Biodiversity Knowledge Management System Supported by the European Commission through.
Abstract  An abstract is a concise summary of a larger project (a thesis, research report, performance, service project, etc.) that concisely describes.
GCE Software Systems Development AS Agreement Trial November 2015.
Advanced Higher Computing Science
GCE Software Systems Development
Core LIMS Training: Project Management
How to Write a research paper
PresQT Workshop, Tuesday, May 2, 2017
Using Turnitin, GradeMark, and ETS e-rater in myWriteSmart
Inclusive Design Reviews
How does a Requirements Package Vary from Project to Project?
How to Publish with IEEE
It’s Your Agreement – Take Control!
How to Write a research paper
Functional Annotation of the Horse Genome
NoodleBib Create a [bibliography, source list…] *
Reading tasks & Short written task
How to Write a research paper
Reading tasks & Short written task
ICT Word Processing Lesson 5: Revising and Collaborating on Documents
Demystifying Web Content Accessibility Guidelines
Indegene’s AI/NLP Powered Pharmacovigilance/Safety Solution
Presentation transcript:

Cis-Regulatory/ Text Mining Interface Discussion

Questions (1) What does ORegAnno want from text mining? –Curation queue –Document mark-up –Mapping to database IDs (2) What does text mining need from ORegAnno? (3) What can text mining provide? –What level of performance is needed? (4) What is the right way to proceed? –Data sets for BioCreAtIvE? –Custom tools for individual “early adopters”?

Answers: (1) What does ORegAnno Want from Text Mining Management of curation queue –Ideally, user customized, so that user annotates those documents of immediate interest to her/him Document mark-up to highlight relevant passages –A workflow pipeline making either the html or pdf version of the document available, with the (potentially) relevant terms highlighted –Support for “cut and paste” transfer of relevant regions to the database comments fields Mapping to IDs, ontology codes –Gene, transcription factor (protein), organism, cell and tissue type, evidence types

Answers: (2) What does Text Mining Need From ORegAnno? Significant quantity of reliably annotated data to train text mining systems –Annotated at a level useful for natural language processing (e.g., marked for evidence at the phrase, sentence or passage level, depending on task) This requires that ORegAnno have: –A clear statement of the scope of the ORegAnno database and a stable set of annotation guidelines –Annotations with high inter-annotator agreement –Tracking of entries by annotator, including depth of annotation (different annotators will annotate to different levels of detail, depending on interests)

Answers: (3) What Can Text Mining Provide? Curation queue management: –Document classification approaches (from e.g., TREC Genomics or BioCreAtIvE) can be applied and evaluated, making use of new training data from pre-jamboree and jamboree annotation –We can experiment with “user defined” criteria, based on restrictions for gene, transcription factor, organism, tissue, etc. Document mark-up –Users could be provided with a list of genes/transcription factors in a paper, with hot links into the paper to find relevant passages –This would allow the annotator to drive the annotation process, selecting only those annotations that are correct and relevant. This in turn provides feedback using ORegAnno annotations to validate & train the text mining –Such a tool should make it easy for the annotator to provide the underlying text passages as evidence for the annotation, to provide more training data Mapping to unique identifiers/controlled vocabulary/ontology –For each entity type (gene, transcription factor, organism, tissue type...), a tool can provide a mapping to the correct identifier; where there is possible ambiguity, the tool could provide a ranked list for the annotator to choose from –A tool can also flag different evidence types, with suggested code(s)

Answers: (4) How to Proceed? Stabilize guidelines and redo the inter-annotator agreement expt (and write up) Prepare a Gold Standard data set of expert annotated data for training new annotators Collect sufficient amount of training data for the various tasks (queue management, document mark up, automated mapping) Develop end-to-end pipeline (in the style of the FlySlip project) to capture whole documents in machine-readable form for mark-up

Recommendations: Training Materials & Tools Case studies and gold-standard annotated articles On-line training –Perhaps with a way for new annotators to test themselves against a set of gold standard annotations –This will require automated comparison of annotations for certain fields Best tools links Tools: –Copy mechanism for largely duplicated record