Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Use Watch folders to automatically add PDFs to Mendeley Desktop.
Intute Repository Search Project A showcase for UK research output Sophia Jones SHERPA October.
RSP Summer School14-16 September 2009 UK Institutional Repository Search: a collaborative project to showcase UK research output through advanced discovery.
Intute Repository Search Project An iterative approach to developing a national search service to support scholarly communication, teaching and learning.
The HILT Pilot Terminologies Server Dennis Nicholson: Centre for Digital Library Research, Strathclyde University.
National Centre for Text Mining John Keane NaCTeM Co-director University of Manchester.
Info-PubMed User Guide University of Tokyo, JAPAN NaCTeM, UK.
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Online Qualitative Data Resources: Best Practice in Metadata Creation.
ESDS Qualidata and QUADS Coordination Louise Corti Online Resources Day 15 November 2005, London.
QUADS Co-ordination Louise Corti QUADS Director, UKDA 28 September 2006.
R e D R e S S Resource Discovery for Researchers in e-Social Science ReDReSS A Joint Application from Lancaster and Daresbury (7 social scientists, 6 computer/computational.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
13 th September 2007 UK e-Science All Hands Meeting Text Mining Services to Support e-Research Brian Rea and Sophia Ananiadou National Centre for Text.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Semantic Web for E-Science and Education Enrico Motta Knowledge Media Institute The Open University, UK.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Color Theory in Web Design Web Design – Sec 2-2. Objectives  The student will: –Have a better understanding of effective use of color on the web. –Be.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
National Centre for Text Mining: Activities in biotext mining John McNaught Deputy Director
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Clinical Trials Program PhUSE Semantic Technology WG.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
National Centre for Text Mining NaCTeM e-science and data mining workshop John Keane Co-Director, NaCTeM School of Informatics,
Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Flexible Text Mining using Interactive Information Extraction David Milward
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST Kick-off.
NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
CNI, 3rd April 2006 Slide 1 UK National Centre for Text Mining: Activities and Plans Dr. Robert Sanderson Dept. of Computer Science University of Liverpool.
Color Theory in Web Design Web Design – Sec 2-2. Objectives  The student will: –Have a better understanding of effective use of color on the web. –Be.
What Is Text Mining? Also known as Text Data Mining Process of examining large collections of unstructured textual resources in order to generate new.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Topic Maps for Cultural Heritage Collections Conal Tuohy Senior Developer New Zealand Electronic Text Centre
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
©2003 Paula Matuszek CSC 9010: AeroText, Ontologies, AeroDAML Dr. Paula Matuszek (610)
Data mining in web applications
Measuring Monolinguality
Color Theory in Web Design
Development of the Amphibian Anatomical Ontology
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Information Retrieval and Web Search
Information Retrieval and Web Search
Information Retrieval and Web Search
Social Knowledge Mining
Knowledge Based Workflow Building Architecture
CSE 635 Multimedia Information Retrieval
Web Mining Department of Computer Science and Engg.
Introduction to Information Retrieval
Web Mining Research: A Survey
Information Retrieval and Web Search
Presentation transcript:

Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester

Contents What is Text Mining/What is NaCTeM? Approaches/Methods Text Mining Tasks –IE, Argumentative Zoning, Terminology Discovery End-user services for researchers NaCTeM activities with social scientists

What is Text Mining? Knowledge discovery from textual sources –Primary sources Documents, News, Web –Scientific Literatures Using NLP, Ontologies, IR on a large scale

What is the Text Mining Centre? Established in 2004 in response to a JISC/EPSRC/BBSRC initiative A Manchester and Liverpool collaboration –Formerly also UMIST, Salford –Accommodated in the Manchester Interdisciplinary Biocentre (MIB) Develop a variety of national services based on the application to biological sciences, with deployment from Autumn 2006 Initially in biological sciences, with a second focus on social science during

Text Mining - Approaches Distinguished from IR by semantic analysis leading to extraction of entities, facts, events, not mere documents. Distinguished from the Semantic Web by use of automated analysis based on robust natural language processing. A wide variety of methods and analyses ranging from domain-independent to domain- specific.

Methods of Text Mining Pipelined processes performing increasing levels of analysis common to all approaches –Document structure analysis, tokenization, tagging, phrasal chunking, named entity recognition/classification, fact and event extraction. –Indexed to provide conceptual IR services

Sample text mining sub-tasks Named entity recognition and classification. Terminology discovery and ontology maintenance Information extraction (IE) in limited domains - for intelligence analysts and scientists Summarization - informative, tailored, multilingual, multi-document Open-domain IE and QA Association mining over databases of extracted facts.

Illustrations of IE on successive full-page screenshots Named entity phrase bracketing Named entity extraction Fact extraction and slot filling An application to a research literature

Terminology Discovery - Ananiadou, NaCTeM A form of unsupervised learning, whose only required resource is a general purpose PoS tagger. Can be applied to text in any language, domain or genre to reveal terminology on the basis of phrasehood and distribution. TerMine will be among the first deployed NaCTeM tools.

Argumentative Zoning Simone Teufel, Cambridge Computing Lab BKG: General scientific background (yellow) OTH: Neutral descrs of others work (orange) OWN: Neutral descrs of own, new work (blue) AIM: Stmts of particular aim of current paper (pink) TXT: Stmts of textual org. of current paper (red) CTR: Contrastive or comparative stmts incl. explicit mention of weaknesses of other work (green) BAS: Stmts that own work is based on other work (purple)

Argumentative Zoning Example

End-user services based on full NLP and conceptual indexing Two conceptual IR services based on prior full-scale NLP analysis of Medline at Tsujii Lab, University of Tokyo –InfoPubMed: A complex tool supporting a research workflow for literature review and knowledge discovery/hypothesis generation –Medie: A simple IR interface as intuitive as Google, but returning fact-bearing sentences, which are more than document surrogates.

Gene/gene products you are interested in

Fields By clicking this button, you can restrict search fields By clicking this button, you can restrict species. GeneBoxes

Drag this GeneBox to the Interaction Viewer

Drag this InteractionBox to ContentViewer

Sentence Box Property which means the co-occurrence In the sentence is a direct evidence of interaction Property which means the co-occurrence In the sentence is a mere co-occurrence

Possible end-user service based on AZ More than Googles PageRank, because the links are typed.

NaCTeM and Social Science/Humanities In Year 3 (from Oct 2006), develop pilot service aimed at social science. Local links with NCESS Preparatory invited workshop held in May, Text-mining and Digitised C19th Research Resources Workshop with British Library

Workshop on Text Mining in Social Sciences Presentations available at NaCTeM Web page –Bridging qualitative and quantitative methods for social sciences using text mining techniques (Sophia Ananiadou) –Text Mining Activities at the National Centre (Sophia Ananiadou, Jun-ich Tsujii, Paul Watry) –Smart Qualitative Data: Methods and Community Tools for Data Mark- Up SQUAD (Louise Corti) –Author Identification (Katerina T. Frantzi) –Sentiment Analysis and Financial Grids (Lee Gillam) –Concordances and semi-automatic coding in qualitative analysis: possibilities and barriers (Graham R. Gibbs) –Bridging quantitative and qualitative methods for social sciences using text mining techniques (Tetsuya Nasukawa) –Computer-Assisted Content Analysis (Andrew Wilson)

NaCTeM status NaCTeM is almost at the end of its tool development phase Moving to deployment of services this Autumn Will include domain-independent terminology management from the outset Other applications of interest to social science researchers will be appearing approx. 1 year from now.