Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining

Slides:



Advertisements
Similar presentations
Using CAB Abstracts to Search for Articles. Objectives Learn what CAB Abstracts is Know the main features of CAB Abstracts Learn how to conduct searches.
Advertisements

Publication costs are research costs Robert Terry Senior Policy Adviser The Wellcome Trust
Committed to making the worlds scientific and medical literature a public resource.
NIH Public Access Compliance Cleveland Health Sciences Library Case Western Reserve University Kathleen C. Blazar.
Supporting Engagement in Open Access: a Publishers Perspective
Interoperability scenarios between UKPMC and OpenAIRE Jo McEntyre, Wolfram Horstmann.
Open Access: what is it about…. l Improving access to peer reviewed original research literature l Improving the use of the literature and data l Improving.
Open Access to Research in the United Kingdom Organic.Edunet Conference, Budapest Jackie Wickham Open Access Adviser Centre for Research Communications.
Queensland University of Technology CRICOS No J How can a Repository Contribute to University Success? APSR - The Successful Repository June 29,
PubMed Central ANCHASL Spring Meeting April 1, 2005 Robert James Associate Director of Public Services Duke University.
OPEN ACCESS PUBLICATION ISSUES FOR NSF OPP Advisory Committee May 30, /24/111 |
Digital Collections: Use, Value and Impact Lorna Hughes University of Wales Chair in Digital Collections, National Library of Wales Aberystwth University.
NATIONAL LIBRARY OF MEDICINE PubMed Central Brooke Dine National Library of Medicine Medical Library Association Conference May 2004.
NATIONAL LIBRARY OF MEDICINE PubMed Central Brooke Dine National Library of Medicine Medical Library Association Conference May 2005.
UCL LIBRARY SERVICES Opening up Research Content in the NHS: Open Access and the Finch report Dr Paul Ayris Director of UCL Library Services and UCL Copyright.
1 Enriching UK PubMed Central SPIDER launch meeting, Wolfson College, Oxford Paul Davey, UK PubMed Central Engagement Manager.
Depositing and Disseminating Digital Resources Alan Morrison Collections Manager AHDS Subject Centre for Literature, Linguistics and Languages.
Greater Reach for your Research: Author’s Rights & the Shifting Landscape of Scholarly Communication Lisa Goddard & Shannon Gordon Memorial University.
KB+, Licences and Usage Statistics Jonathan McAslan Anne Simmons.
Text and data mining for non-commercial research: the UK’s planned exception to copyright UK Government 22 April 2013, Brussels.
Changes to copyright exceptions for libraries and archives Robin Stout Copyright Policy Intellectual Property Office.
B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego
Alma Swan Key Perspectives Ltd Truro, UK.  Study commissioned by JISC  Following up on two recommendations in the ‘Lyon report’  Focus on ‘data scientists’
Copyright 2006 M.R.Thorley/NERC Mark Thorley, Natural Environment Research Council Research Outputs: Their Access & Preservation A perspective.
LEGAL ASPECTS OF CONTENT MINING copyright publisher licences country-specific legislation.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Step 2: Source Information Literacy. Source Where can you look for information to help answer your search question? Information sources can include people.
Open Access: An Introduction Edward Shreeves Director, Collections and Content Development University of Iowa Libraries
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
Alternative routes to the reuse of copyrighted journal content David Hoole Nature Publishing Group OAI 6, Geneva, 2009.
Leiden University. The university to discover. How does technology drive licensing? Kurt De Belder, University Librarian.
Thomas HeckeleiPublishing and Writing in Agricultural Economics1 Publishing and Writing in Agricultural Economics Promotionskolleg Agrarökonomik 1Introduction.
Publisher’s Perspective: Digitization of print resources, and archiving of digital resources Judy Best, June 13, 2006.
UCL LIBRARY SERVICES Licensing and Copyright for education and research Dr Paul Ayris Director of UCL Library Services and UCL Copyright Officer Chief.
Using the University of Northampton Library: an ‘EWO’ guide for students based at other locations Please note: The University’s official term for arrangements.
Digital/Open Access repositories Paul Sheehan Director of Library Services DCU HEAnet National Networking Conference Athlone 11 th November 2005.
15/06/2012 slide 1 OA and Research Information Josh Brown Programme Manager for Research Information Management and Scholarly Communications.
Flexible Text Mining using Interactive Information Extraction David Milward
Europeana Libraries: building a pan-European aggregator Wouter Schallier, LIBER Executive Director Eva/Minerva 15/11/2011.
Establishing a National Strategy for the Provision and Use of e-Books in UK Academic Libraries Ray Lonsdale Department of Information Studies, University.
Open Access in Russia (a view from inside Russian Academy of Sciences) Sergey Parinov, CEMI RAS, principal researcher euroCRIS, Board member.
IFRRO Legal Issues Forum Brussels – 9 June 2011 Martin Delaney Legal Director.
Committed to making the world’s scientific and medical literature a public resource.
Open Access and the Wellcome Trust: providing funds for open-access publishing Kathryn Lallu Grants Policy, Liaison and Support Manager Grants Administration.
Licensing Evolution ICOLC October 2006 – Rome Lorraine Estelle.
Graham McCann & Nina Couzin 2012/2013 Update 15 November 2012.
From description to analysis
The PHEA Educational Technology Initiative. Project Partners PHEA Foundations – Ford, Carnegie, Kresge, MacArthur South African Institute for Distance.
Digital repositories and scientific communication challenge Radovan Vrana Department of Information Sciences, Faculty of Humanities and Social Sciences,
Open access- a funders perspective (or “What we want from institutions”) CRC/RLUK/ARMA/SCONUL meeting 27 th January 2011 Robert Kiley, Head Digital Services,
1 DATABASE INTERNATIONAL BIBLIOGRAPHY OF PERIODICAL LITERATURE IN THE HUMANITIES AND SOCIAL SCIENCES ONLINE.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Open access and subscription journals: implications for low- and middle-income countries Moderated by Subhasree Raghavan Presented by Emma Veitch and Paul.
COLLABORATION : THE KEY TO UNLOCK OPEN ACCESS PUBLICATION Frederick J. Friend Senior Consultant, Information Program, Open Society Institute
Brian Hole COASP, Riga, 20 September 2013.
Filling institutional repositories: considering copyright issues Susan Veldsman eIFL Content Manager
Access to Scholarly literature (Open access) Presentation by Dr.S.K.Patil Professor and Librarian Symbiosis International University.
RCUK Policy on Open Access Name Job title Research Councils UK.
Ukpmc.ac.uk As a result of the mandates Research in the open How mandates work in practice 29 th May, 2009 Paul Davey, UK PubMed Central Engagement Manager,
ECONOMIC IMPLICATIONS OF ALTERNATIVE SCHOLARLY PUBLISHING MODELS - THE HOUGHTON REPORT (JANUARY 2009) RSP CONFERENCE, 29 MAY 2009 Charles Oppenheim Loughborough.
Open Access Publishing and Intellectual Freedom: Remembering Aaron Swartz Rhode Island Library Association Annual Conference June 4, 2013 Andrée Rathemacher.
Our Digital Showcase Scholars’ Mine Annual Report from July 2015 – June 2016 Providing global access to the digital, scholarly and cultural resources.
Publishing DDI-Related Topics Advantages and Challenges of Creating Publications Joachim Wackerow EDDI16 - 8th Annual European DDI User Conference Cologne,
PLOS Facilitating Text & Data Mining The Role the Publisher Can Play
Next Generation Preprint Service
Text Mining, IPR, derived data and licensing
Creative Commons at the Library
Introducing… Welcome to this introduction to Wiley Online Library.
Funding body requirements
Introduction of KNS55 Platform
Presentation transcript:

Text Mining: Opportunities and Barriers John McNaught Deputy Director National Centre for Text Mining

Topics What is text mining? (briefly) What can it offer? (selectively) What are the obstacles? (mostly)

NaCTeM First publicly-funded (JISC) national text mining centre in the world Remit: provide services to research community Initial focus on biology, then social sciences, medicine, chemistry, … Processing on a large scale, e.g. for UKPMC (Wellcome T.+17 other funders)

What is text mining? Goal: Discover new knowledge from old How: –Process very large amounts of text Millions of documents, the more the better –Identify and extract information –(Link extracted information to already curated knowledge) –Mine to discover implicit significant associations –Flag (unknown) associations for researcher to investigate further –Spin-off on the way: render information explicit

From text to new knowledge

What does it offer? Finds unsuspected knowledge –E.g. Disease-gene associations Enables discoveries human effort could not achieve (information overload/overlook) Enables better search/navigation of literature –Semantic search via extracted semantic metadata Reduces time spent searching –15-48% of researcher time spent on classic search, 20-50% of classic searches unsatisfied E.g. Systematic reviews: months to weeks

What does it offer? Text mining boosts research –Makes research possible that would otherwise be impossible or unfeasible Research drives growth and innovation Research produces more information More information is available for text mining Text mining boosts research …

Barriers Access to the literature Format issues (tied to next point…) –“PDF is evil” (Lynch) Main blocks: copyright and licensing issues –<8% of scientific claims found in full article appear in its abstract (Blake) –Abstracts deficient on argumentation, discussion, methods, background, … –Full texts needed to realise full benefits of TM

Barriers Need to copy documents to analyse them Licences typically not favourable to TM Licences established on per institution basis –Prevents community-oriented services Results only for internal use by institutional users –Hinders mining over collections of content from different providers Inconsistency: human can search and manually analyse, but cannot use machine to do same job on same data already subscribed to

Barriers Problem even with liberal OA licences –Author attribution required Author attribution in a data mining environment is impossible/unfeasible –Association finding: cannot track positive, negative, neutral individual author contributions Derived works in a TM environment –Every author of every text processed to produce new derived knowledge may have a claim… –Rights clearance thus an effective barrier

Barriers Laudable effort 1: NESLi2 model licence (JISC Collections) allows TM –Publisher <> single institution –But how many publishers retain TM provisions? –But cannot display annotations produced by TM on document itself Laudable effort 2: NPG licence for self- archived content allows TM –But “content must be destroyed when experiment complete” is vague. So services for community?

Conclusion Copyright and licensing restrictions block full realisation of TM benefits –Economic savings and potential for growth are stifled Japan has introduced an information analysis exception to copyright law –National Diet Library (= British Library) has recently changed its motto to: “Through knowledge we prosper” –Can we say the same in the UK?

Extras

Info=degree of surprise Finding unknown associations: reproducing a discovery reported 5 days ago in Nature Medicine

UKPMC EvidenceFinder by NaCTeM: Questions generated by deep analysis, with known answers

Click on a question to see relevant extracted evidence (from OA subset of the archive)