Applications of Text Mining

Slides:



Advertisements
Similar presentations
Enrichment and Structuring of Archival Description Metadata Kalliopi Zervanou*, Ioannis Korkontzelos**, Antal van den Bosch* & Sophia Ananiadou** * Tilburg.
Advertisements

Connecting Knowledge Silos using Federated Text Mining Guy Singh Senior Manager, Product & Strategic Alliances ©2014 Linguamatics Ltd.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
ClearTK: A Framework for Statistical Biomedical Natural Language Processing Philip Ogren Philipp Wetzler Department of Computer Science University of Colorado.
Mining External Resources for Biomedical IE Why, How, What Malvina Nissim
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
OntoBlog: Informal Knowledge Management by Semantic Blogging Aman Shakya 1, Vilas Wuwongse 2, Hideaki Takeda 1, Ikki Ohmukai 1 1 National Institute of.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University.
A Flexible Workbench for Document Analysis and Text Mining NLDB’2004, Salford, June Gulla, Brasethvik and Kaada A Flexible Workbench for Document.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Mining the Medical Literature Chirag Bhatt October 14 th, 2004.
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab (ECSS 4.416) The University of Texas at Dallas.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
Defining Text Mining Preprocessing Transforming unstructured data stored in document collections into a more explicitly structured intermediate format.
IProLINK – A Literature Mining Resource at PIR (integrated Protein Literature INformation and Knowledge ) Hu ZZ 1, Liu H 2, Vijay-Shanker K 3, Mani I 4,
Natural Language Processing Guangyan Song. What is NLP  Natural Language processing (NLP) is a field of computer science and linguistics concerned with.
Flexible Text Mining using Interactive Information Extraction David Milward
Automatically Generating Gene Summaries from Biomedical Literature (To appear in Proceedings of PSB 2006) X. LING, J. JIANG, X. He, Q.~Z. MEI, C.~X. ZHAI,
Extracting Metadata for Spatially- Aware Information Retrieval on the Internet Clough, Paul University of Sheffield, UK Presented By Mayank Singh.
Subtask 1.8 WWW Networked Knowledge Bases August 19, 2003 AcademicsAir force Arvind BansalScott Pollock Cheng Chang Lu (away)Hyatt Rick ParentMark (SAIC)
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
Chapter Extension 16 Information Systems and Counterterrorism © 2008 Pearson Prentice Hall, Experiencing MIS, David Kroenke.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
CNI, 3rd April 2006 Slide 1 UK National Centre for Text Mining: Activities and Plans Dr. Robert Sanderson Dept. of Computer Science University of Liverpool.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
Effective Reranking for Extracting Protein-protein Interactions from Biomedical Literature Deyu Zhou, Yulan He and Chee Keong Kwoh School of Computer Engineering.
1 Automatic indexing Salton: When the assignment of content identifiers is carried out with the aid of modern computing equipment the operation becomes.
Performance Measurement. 2 Testing Environment.
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Facilitating Document Annotation Using Content and Querying Value.
Developing GRID Applications GRACE Project
AQUAINT Mid-Year Workshop: Observations and Comments Jimmy Lin MIT Artificial Intelligence Laboratory.
Large Scale Semantic Data Integration and Analytics through Cloud: A Case Study in Bioinformatics Tat Thang Parallel and Distributed Computing Centre,
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
BeeSpace Informatics Research
Digital Video Library - Jacky Ma.
STRING Large-scale data and text mining
Development of the Amphibian Anatomical Ontology
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Introduction C.Eng 714 Spring 2010.
Terminology problems in literature mining and NLP
Course Summary (Lecture for CS410 Intro Text Info Systems)
What is IR? In the 70’s and 80’s, much of the research focused on document retrieval In 90’s TREC reinforced the view that IR = document retrieval Document.
Text & Web Mining 9/22/2018.
Mining and Analyzing Data from Open Source Software Repository
What is Pattern Recognition?
Introduction to Information Extraction
CICC Combines Grid Computing with Chemical Informatics
CSE 635 Multimedia Information Retrieval
T H E P U B G P R O J E C T.
Chapter 8: Extensions and Applications
Introduction to Information Retrieval
CS246: Information Retrieval
Search Engine Architecture
Anatomy of a modern data-driven content product
Jonathan Griffin, Managing Director, IFIS Publishing &
Linked Data Reuse in the Language Services Industry
Information Retrieval and Web Design
Topic: Semantic Text Mining
Presentation transcript:

Applications of Text Mining Ewan Klein School of Informatics & NeSC

Text Mining Goals Three Areas: Extract useful information from large bodies of unstructured or semi-structured documents Looks for patterns in natural language text Driven by application needs Three Areas: Adding Metadata E.g., identify Dublin Core elements from document headers Information Extraction Identify nuggets of text data and marshall them into a fixed format Assisting Curation

Text mining and Curation Example workflow: Make an observation Search the research literature for knowledge Incorporate relevant information into database Challenges: Current Information Retrieval (IR) techniques often too imprecise Which enzymes act as catalysts in the glycolysis pathway? We want to identify a relation between two entities Move to augmenting IR with more knowledge of text structure Mostly supervised machine learning techniques Still need training data for each domain Need to integrate text mining into Grid applications

BlueDwarf for Text Mining BioCreative Competitioin Joint entry with Stanford Recognition of drug names, chemical names, and protein names in MEDLINE abstracts Java maximum entropy tagger Used roughly 700,000 features in the early stages Java memory size of 1950 Mb Died on available Informatics and Stanford machines BlueDwarf Arrived at 1,247,77 features, memory: 2560 Mb Several experiments running in parallel Provisional results: we obtained top-scoring results