1 National Centre for Text Mining Mission To provide TM tools for users, in particular, scientists and researchers To coordinate activities in the TM community.

Slides:



Advertisements
Similar presentations
Who Wants To Be A Millionaire?
Advertisements

Welcome to Who Wants to be a Millionaire
Slide 1 Insert your own content. Slide 2 Insert your own content.
Special Topics in Computer Science Advanced Topics in Information Retrieval Lecture 10: Natural Language Processing and IR. Syntax and structural disambiguation.
OLIF V2 Gr. Thurmair April OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status.
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Technology Roadmap Project Harold Flescher VP-Elect, Technical Activities August 2008, Region 1 Meeting.
Maritime Knowledge Base Semantic Application Semantic Exchange Workshop February 17th, 2009 Eric Freese Semantic Web, XML & Geospatial Technologist Copyright.
Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment Eelco Mossel LSP 2007, Hamburg.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
Language Specification using Metamodelling Joachim Fischer Humboldt University Berlin LAB Workshop Geneva
E-learning and Libraries WSIS Forum, Geneva,11 May 2010 Tullio Basaglia, CERN Scientific Information Service, Geneva.
0 - 0.
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Relational data objects 1 Lecture 6. Relational data objects 2 Answer to last lectures activity.
The ANSI/SPARC Architecture of a Database Environment
A Probabilistic Term Variant Generator for Biomedical Terms Yoshimasa Tsuruoka and Jun ichi Tsujii CREST, JST The University of Tokyo.
Application of the NLP techniques to IE and IR CREST.
Extracting Disease-Gene Associations from MEDLINE abstracts Tsujii laboratory University of Tokyo.
National Centre for Text Mining John Keane NaCTeM Co-director University of Manchester.
Editing Instructions Simply add a question and 4 possible answers by overtyping the white text. The green box on the next slide shows which answer should.
I M S B MMD- A Mathematical Modeling Database for Cell Signaling Pathways. V. Mahesh*, M. Breit, G. Enzenberg, B. Pfeifer, R. Modre-Osprian, B. Tilg Institute.
Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester.
Welcome to Who Wants to be a Millionaire
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Semantics (Chapter 17) Muhammed Al-Mulhem March 1, 2009.
© 2011 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. Towards a Model-Based Characterization of Data and Services Integration Paul.
Ontology-based User Modeling for Web-based Information Systems Anton Andrejko, Michal Barla and Mária Bieliková {andrejko, barla,
:: DIAsDEM :: Seminar: Web Mining WS 2003/2004 Ingo Kampe Heiko Scharff.
Future of scientific information systems and international cooperation V. Kuprianov, A. TolstenkovINIS-ETDE JTC Meeting Russian Federation October.
Page 1 October 31, 2000 An Introduction to Large-Scale Software Development Steve Varnau Core HP-UX Operation October 31, 2000.
12/03/ Second International Workshop on New Generation Enterprise and Business Innovation NGEBIS 2013 Cross Domain Crawling for Innovation Pieruigi.
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
Module 1: Common Core Instruction for ELA & Literacy Informational Text Audience: K-5 Teachers Area V Regional Superintendents of Schools Robert Daiber.
Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.
Addition 1’s to 20.
Test B, 100 Subtraction Facts
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Week 1.
PhD Success in Qualitative Research Sten Ludvigsen InterMedia University of Oslo.
FP7 meeting - Gent - Carlos Rodríguez - April 18 WP4: Conceptual Mining from Text for Knowledge Engineering State of the Art WP Coordinators: Alfonso Valencia.
IPY and Semantics Siri Jodha S. Khalsa Paul Cooper Peter Pulsifer Paul Overduin Eugeny Vyazilov Heather lane.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
The Jikitou Biomedical Question Answering System: Using High-Performance Computing to Preprocess Possible Answers Michael A. Bauer 1,2, Daniel Berleant.
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
Ontology Development in the Sciences Some Fundamental Considerations Ontolytics LLC Topics:  Possible uses of ontologies  Ontologies vs. terminologies.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
Ontology Summit2007 Survey Response Analysis -- Issues Ken Baclawski Northeastern University.
Flexible Text Mining using Interactive Information Extraction David Milward
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
CNI, 3rd April 2006 Slide 1 UK National Centre for Text Mining: Activities and Plans Dr. Robert Sanderson Dept. of Computer Science University of Liverpool.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Collocations and Terminology Vasileios Hatzivassiloglou University of Texas at Dallas.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
Text Mining and Knowledge Management Junichi Tsujii GENIA Project, Kototoi Project ( tokyo.ac.jp/GENIA/) Computer Science, University.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Cell Signaling.
Data Model.
Presentation transcript:

1 National Centre for Text Mining Mission To provide TM tools for users, in particular, scientists and researchers To coordinate activities in the TM community Core Partners University of Manchester: NLP and DM Salford University: Terminology Liverpool University: IR and Digital Archive External Partners San Diego SC, UC Berkeley, University of Geneva, University of Tokyo

2 National Centre for Text Mining Mission To provide TM tools for users, in particular, scientists and researchers To coordinate activities in the TM community Core Partners University of Manchester: NLP and DM Salford University: Terminology Liverpool University: IR and Digital Archive External Partners San Diego SC, UC Berkeley, University of Geneva, University of Tokyo Biomedical domain

3 Strategy and Roadmap for TM in Biomedicine Vast number of Google/Yahoo users, satisfied Small number of users, unsatisfied Huge Demand for specialized tools for TM in Bio-Medical Domains The current TM tools, though successful in some business applications, do not meet requirements of users in bio-medical domains. What are the requirements for TM for users in bio-medical domains? What technologies should be integrated in future TM for science? More demand-oriented approach Is the nature of TM in scientific fields different from that of business applications? More publicity and marketing

4 From technological seeds

5 Science: Knowledge Raw Data Unstructured Information (Text) Semi-structured Information (XML+Text) Structured Information (Data bases) Ontology-based KMS Natural Language Processing Intelligent Text Management System Effective management of text and knowledge is the key

6 Intelligent TM systems Intelligent Information Retrieval and Question Answering Retrieval Integration of Text with Data and Knowledge Integration Text Mining and Knowledge Discovery Discovery

7 From Text to Knowledge Language Domain Knowledge Domain Non-Trivial Mappings Terminology NLP Paraphrasing Motivated Independently of language Ontology Relationships among concepts Metabolic Pathways Signal Pathways Association between Diseases and Genes ……

8 Examples of Technical Seeds Term Variants –Terms (names of proteins, genes, diseases, symptoms, etc.) denote basic conceptual units in the knowledge domain. Syntactic Variants –Relationships and complex conceptual units are mapped to sentences. Term Acquisition from Text –New terms (basic conceptual units) are constantly introduced. Resource building for specialized domains is crucial.

9 Examples of Technical Seeds Term Variants –Terms (names of proteins, genes, diseases, symptoms, etc.) denote basic conceptual units in the knowledge domain. Syntactic Variants –Relationships and complex conceptual units are mapped to sentences. Term Acquisition from Text –New terms (basic conceptual units) are constantly introduced. Resource building for specialized domains is crucial.

10 NF-kappa B NF kappa B NFKB factor NF-KB NF kB acronym Expanded form nuclear factor-kappa B nuclear-factor kappa B nuclear factor kappa B nuclear factor κB Nuclear Factor kappa B ……….. Spelling variation Synonym Hypernym

11 Automatic Generated Term Variants (1) NF kappa B Transcription Factor NF kappa B NF-kappa B NF kB Immunoglobulin Enhancer-Binding Protein Immunoglobulin Enhancer Binding Protein Transcription Factor NF-kB Transcription Factor NF kB Factor NF-kB, Transcription nuclear factor kappa beta NF kappaB NF kappa B chain NF kappa B subunit Transcription Factor NF-kappa B NF-kB, Transcription Factor NF-kB Neurofibromatosis Type kappa B 0

12 Automatic Generated Term Variants (2) tumor necrosis factor A TNF A tumor necrosis factor TNF alpha TNFA TNF Tumour necrosis factor alpha Tumor Necrosis Factor alpha Tumor Necrosis Factor-Alpha TUMOR NECROSIS FACTOR.ALPHA Tumor necrosis factor alpha Tumor Necrosis Factor-alpha TNF-Alpha TNF-alpha 6899

13 Examples of Technical Seeds Term Variants –Terms (names of proteins, genes, diseases, symptoms, etc.) denote basic conceptual units in the knowledge domain. Syntactic Variants –Relationships and complex conceptual units in the knowledge domain are mapped to sentences in the language domain. Term Acquisition from Text –New terms (basic conceptual units) are constantly introduced. Resource building for specialized domains is crucial.

14 Non-trivial Mapping Language Domain Knowledge Domain Independently motivated of Language Spelling Variants Synonyms Acronyms Same relations with different Structures Full-strength Straufen protein lacking this insertion is able to assocaite with osker mRNA and activate its translation, but fails to ….. [A] protein activates [B] (Pathway extraction) Since ……., we postulate that only phosphorylated PHO2 protein could activate the transcription of PHO5 gene. Transcription initiation by the sigma(54)-RNA polymerase holoenzyme requires an enhancer-binding protein that is thought to contact sigma(54) to activate transcription. Syntactic Variants

15 Predicate-argument structure Parser based on Probabilistic HPSG (Enju) The protein is activated by it DT NN VBZ VBN IN PRP dt np vp vp pp np np pp vp s arg1 arg2 mod

16 Text Archive with Feature Obejcts Managing texts, data representation and their semantics Text ID Start Position of the region End Position of the region Annotato r Content Text DB DB of Feature Objects Data Base Module Copy and Unification Specialization by unification Data representation Text Semantics Ubiquitin E is bound with Fine grained units of information Context dependency Persistent nature of knowledge and information Fine grained units of information Context dependency Persistent nature of knowledge and information

17 Demo (The website demo is not available now. )