Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 NFAIS Annual Conference 2004 Text Mining and the New Breed of Licensee: The Information Provider’s Perspective February 23, 2004 Jane L. Rosov National.

Similar presentations


Presentation on theme: "1 NFAIS Annual Conference 2004 Text Mining and the New Breed of Licensee: The Information Provider’s Perspective February 23, 2004 Jane L. Rosov National."— Presentation transcript:

1 1 NFAIS Annual Conference 2004 Text Mining and the New Breed of Licensee: The Information Provider’s Perspective February 23, 2004 Jane L. Rosov National Library of Medicine 301-496-7706janer@nlm.nih.gov

2 Where is NLM? Department of Health and Human Services Public Health Service Public Health Service National Institutes of Health National Institutes of Health National Library of Medicine National Library of Medicine Library Operations Library Operations Bibliographic Services Division Bibliographic Services Division MEDLARS Management Section MEDLARS Management Section

3 3Agenda Brief overview of NLM’s data distribution program Brief overview of NLM’s data distribution program Who leases our data Who leases our data Recent changes increased interest in leasing MEDLINE ® and streamlined distribution processes Recent changes increased interest in leasing MEDLINE ® and streamlined distribution processes Examples of mining projects Examples of mining projects How the Library has adapted to the increased demand for leasing its data How the Library has adapted to the increased demand for leasing its data

4 4 NLM Mission To collect, organize, and disseminate the world’s health-related and biomedical information To collect, organize, and disseminate the world’s health-related and biomedical information

5 5 Dissemination of Data Web-based products and services Web-based products and services ● MEDLINEplus ® ● LOCATORplus ● TOXNET ® ● ClinicalTrials.gov ● Unified Medical Language System ® ● NLM Gateway ● Entrez retrieval system including PubMed ® /MEDLINE Data distribution (leasing) program Data distribution (leasing) program

6 6 What is MEDLINE? 12 million + biomed and life science journal citations 12 million + biomed and life science journal citations Worldwide coverage; currently 4,700 journals Worldwide coverage; currently 4,700 journals Advisory committee recommends titles Advisory committee recommends titles Includes abstracts if present in the published journals Includes abstracts if present in the published journals Controlled vocabulary: Medical Subject Headings Controlled vocabulary: Medical Subject Headings ~600,000 new records in 2004 ~600,000 new records in 2004 New and revised records daily; annual MeSH ® changes New and revised records daily; annual MeSH ® changes Does not include full text of article Does not include full text of article Primary component of PubMed Primary component of PubMed

7 7 Additional Records in PubMed Slightly broader journal coverage in life sciences Slightly broader journal coverage in life sciences Citations prior to date journal selected for MEDLINE Citations prior to date journal selected for MEDLINE Citations to out-of-scope-for-MEDLINE articles Citations to out-of-scope-for-MEDLINE articles In process records In process records OLDMEDLINE OLDMEDLINE ~ 98 % of PubMed records are exported ~ 98 % of PubMed records are exported

8 8 Data Distribution Web Pages Prospective licensees: http://www.nlm.nih.gov/databases/leased.html Prospective licensees: http://www.nlm.nih.gov/databases/leased.html http://www.nlm.nih.gov/databases/leased.html Existing licensees: http://www.nlm.nih.gov/bsd/licensee.html Existing licensees: http://www.nlm.nih.gov/bsd/licensee.html http://www.nlm.nih.gov/bsd/licensee.html ● P aperwork ● DTDs that define the XML format ● Files containing sample records ● Information about distribution media ● Announcements ● Documentation

9 9 Key Elements of NLM’s Licensing Program Standard licenses - basic and non-US research-only; no customizing Standard licenses - basic and non-US research-only; no customizing No charges - funded by NLM Appropriations No charges - funded by NLM Appropriations No search software - data only No search software - data only US licensees may redistribute - clauses in license to ensure accuracy and currency, etc. US licensees may redistribute - clauses in license to ensure accuracy and currency, etc. advised to consult with legal counsel on re-use of abstracts advised to consult with legal counsel on re-use of abstracts ● NLM does not claim copyright but others might ● Citations and MeSH Headings are in the public domain

10 10 Intended Use Worksheet Indicate databases to lease Indicate databases to lease Categorize and briefly describe intended use of the data Categorize and briefly describe intended use of the data Indicate organization type Indicate organization type Data used in summary form for reports to Congress Data used in summary form for reports to Congress

11 11 Alternatives To Leasing Web links to PubMed Web links to PubMed Downloading from PubMed using utilities Downloading from PubMed using utilities

12 12

13 13 Then and Now International MEDLARS Centers  Redistributors  Researchers FIRST: International MEDLARS Centers FIRST: International MEDLARS Centers ● Public institutions - bilateral agreements with NLM to perform as biomedical information resource centers in their countries ● Encouraged to provide access to NLM’s data - particularly important when telecommunications for worldwide online access to data in US was less advanced THEN: Redistributors THEN: Redistributors Including ARIES, BRS (now OVID), Cambridge Scientific Abstracts (now CSA), Dialog, SilverPlatter, EBSCO, and many others Including ARIES, BRS (now OVID), Cambridge Scientific Abstracts (now CSA), Dialog, SilverPlatter, EBSCO, and many others

14 14 Then and Now (cont.) International MEDLARS Centers  Redistributors  Researchers NOW: Researchers NOW: Researchers ● Academic institutions or biotechnology, pharmaceutical and software development companies ● Mine MEDLINE to discover new clinical, public health and health services information or develop better software to assist in the scientific research

15 15 Non-US Research-Only License 2001 2001 Internal use Internal use No commercial redistribution of records No commercial redistribution of records

16 16

17 17 Self-Described Intended Use - 2003 Total MEDLINE Licensees 219 Research purposes152 = 69% Data / Text mining157 = 72% Both 123 = 56%

18 18

19 19 Why Growth and Switch From Redistributors To Researchers? Events outside NLM Events outside NLM ● Recent developments in computer technology and informatics ● Boom of the biotechnology industry ● Mapping of the human genome ● Increasing volume of information ● More researchers are seeking cures to disease or are developing new tools to support this research Reinvention at NLM Reinvention at NLM

20 20 NLM’s Reinvention Purpose: Purpose: Move from outmoded and expensive legacy mainframe-based systems to a more flexible, powerful, and maintainable system to support streamlined internal processing and innovative new services Move from outmoded and expensive legacy mainframe-based systems to a more flexible, powerful, and maintainable system to support streamlined internal processing and innovative new services Result: Result: New software environment for building and maintaining MEDLINE New software environment for building and maintaining MEDLINE

21 21 New Data Creation and Maintenance System Distribution media - transition from old tape technology to state-of-the-art tapes and FTP Distribution format – transition from legacy data format to widely accepted XML format Use of new media and format are more time and cost efficient for NLM and licensees Enabled changes:

22 22 Benefits of Reinvention: Distribution Media Tapes from mainframe Tapes from mainframe 150 + tapes for 10 million records 150 + tapes for 10 million records 10 hours for one set 10 hours for one set Weekly or monthly Weekly or monthly updates updates Cost recovery Cost recovery State-of-the-art DLT tapes and FTP Single tape for 12+million records 4.5 hours per tape Updates 5 days per week via FTP No cost Through 2000:2001 After Reinvention:

23 23 Benefits of Reinvention: Distribution Media In 2002 FTP alternative to DLT tape In 2002 FTP alternative to DLT tape FTP times FTP times ● 3 quickest times for download: ½ hour, 1 hour, 2 hours (all in the US) ● 3 longest times for download: 25 hours, 17.5 hours, 15.5 hours (all outside the US) hours, 15.5 hours (all outside the US) Strong preference for FTP – ~70% in 2004 instead of DLT tape Strong preference for FTP – ~70% in 2004 instead of DLT tape

24 24 Getting Files From NLM’s Public FTP Server Hidden directories Hidden directories Restrict access to data files by ip address Restrict access to data files by ip address Access available 23 hours every day Access available 23 hours every day

25 25 Benefits of Reinvention: Distribution Format Transition from legacy homegrown ELHILL ® Unit Record Format to widely accepted and documented XML format Transition from legacy homegrown ELHILL ® Unit Record Format to widely accepted and documented XML format

26 26 Examples Of Mining Projects Bio-acronyms and abbreviations, bio-relations, and proteins Bio-acronyms and abbreviations, bio-relations, and proteins Gene, drug, and disease relationships Gene, drug, and disease relationships Interfaces for performing efficient and effective searches Interfaces for performing efficient and effective searches Identification, prevention, and treatment of emerging infectious disease or biothreat Identification, prevention, and treatment of emerging infectious disease or biothreat Academic Researchers

27 27 Examples Of Mining Projects (cont.) Biotechnology Companies Gene-to-gene interactions and connection to diseases and/or existing drugs Gene-to-gene interactions and connection to diseases and/or existing drugs Vaccine research and antibacterial drug discovery Vaccine research and antibacterial drug discovery

28 28 Examples Of Mining Projects (cont.) Pharmaceutical Companies Support drug discovery and development efforts Support drug discovery and development efforts Internal access to add unique value to their previously derived data; privacy concerns with external web access Internal access to add unique value to their previously derived data; privacy concerns with external web access

29 29 Examples Of Mining Projects (cont.) Software Developers Data mining methods to help uncover gene/disease relationships leading to discovery of new drugs Data mining methods to help uncover gene/disease relationships leading to discovery of new drugs

30 30 Winning Combination High quality content High quality content Identifiable new areas of outside interest Identifiable new areas of outside interest Well-accepted data distribution format Well-accepted data distribution format Inexpensive, easy-to-use distribution media Inexpensive, easy-to-use distribution media


Download ppt "1 NFAIS Annual Conference 2004 Text Mining and the New Breed of Licensee: The Information Provider’s Perspective February 23, 2004 Jane L. Rosov National."

Similar presentations


Ads by Google