Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Evidence-Based Discovery Catherine Blake School of Information and Library Science University of North Carolina at Chapel Hill

Similar presentations


Presentation on theme: "Towards Evidence-Based Discovery Catherine Blake School of Information and Library Science University of North Carolina at Chapel Hill"— Presentation transcript:

1 Towards Evidence-Based Discovery Catherine Blake School of Information and Library Science University of North Carolina at Chapel Hill http://www.ils.unc.edu/~cablake cablake@email.unc.edu

2 2 Motivation Relentless increase in electronically available text –Life Sciences 17 million th entry added in April 2007 5,200 journals indexed 12,000 new articles each week ! –Chemistry – more than 110,000 articles in 1 year alone Consequences: –Hundreds of thousands of relevant articles –Implicit connections between literature go unnoticed Shift from Retrieval to Synthesis

3 3 Information Overload “One of the diseases of this age is the multiplicity of books; they doth so overcharge the world that it is not able to digest the abundance of idle matter that is every day hatched and brought forth into the world” - Barnaby Rich, 1613

4 Evidence-Based Discovery 4 If I have seen further than others, it is by standing upon the shoulders of giants. Sir Isaac Newton We can't solve problems using the same kind of thinking we used when we created them. Albert Einstein 1 American Heritage Dictionary Goal: Facilitate Discovery from Text To make easy or easier 1 A productive insight 1

5 5 Education Discovery Science Evidence-based Practice Natural Language Processing Human Discovery and Synthesis Human-assisted Discovery and Synthesis Heterogeneous Literature Core Chemistry Breast Cancer Genomics Synthesis and Discovery Work Practices News DocSouth

6 Outline Motivation Case Studies –METIS Human synthesis Natural language processing –Claim Jumping through Scientific Literature Next Steps Summary 6

7 Systematic Review Process –Formulate the problem –Locate and select studies –Assess quality of studies –Collect data –Analyze and present results –Interpret results –Improve and update review 28 months from initial idea to publication Increased demand due to evidence- based medicine

8 Manual Synthesis Select Extract Analyze Verify Guesswork guided by scientifically trained intuition Rescher (1978)

9 Context Information Study Information –e.g. date, location,... Population Information –e.g. gender, age,... Risk Factor or Intervention –e.g. duration of exposure, confounders Disease –e.g. stage, confounders Loosely coupled to review focus Tightly coupled to review focus

10 Collaborative Information Synthesis

11 Key: Estimate Missing Information What are people with Breast Cancer exposed to? What are people in a similar population exposed to? Are these rates significantly different? Studies with Breast Cancer patients Database of risk factors BRFSS Facts for each study number of patients age of patients geographic location risk-factor exposure … Codebook question asked age, gender % responses 1 2 3 T. Tengs & N. D. Osgood (2001) “The link between smoking and Impotence: Two Decades of Evidence”, Preventive Medicine, 32:447-52

12 More than Automated Meta-Analysis Systematic Review External database Entire study Main topic Secondary Information Key Information Synthesis Traditional analysis –same study design –medicine = RCT –epidemiology = cohort Information Synthesis –any study that includes required information –augment missing information

13 13 Education Discovery Science Evidence-based Practice Natural Language Processing Human Discovery and Synthesis Human-assisted Discovery and Synthesis Heterogeneous Literature Core Chemistry Breast Cancer Genomics Synthesis and Discovery Work Practices News DocSouth Natural Language Processing

14 14 METIS Information Extractor Semantic Grammar Features: words, numbers, and semantic types in the Unified Medical Language System (UMLS) Information extracted : risk factor exposure (tobacco and alcohol )  gender age (min, max, mean)  start and end dates number of subjects with medical condition  geographical location {term;’age’} {term:’of’} {number;10<n2<110}{term;’to’}{number;10<n2<110} The age of breast cancer subjects ranged between 20 to 64 years old. {semantic type: neoplastic process, or disease}

15 METIS Info Extractor – Evaluation Diverse text corpus –epidemiology, surgery, biology,... –cohort studies, case-control trials,... Evaluation –Metrics (precision, recall) –Annotators (developer, domain expert, expert annotator, novice) –Primary topic (breast cancer, impotence) –Secondary information (tobacco and alcohol consumption)

16 METIS Info Extractor – Recall

17 METIS Info Extractor – Precision

18 Verify information extracted Electronic version of article Converted Article METIS Verifier

19

20 METIS Analyzer Meta-Analysis –Developed for agricultural application –Requires empirical studies with a quantitative outcome –Unit of study is an article - not a person –Result – a unitless metric called an effect size Two common meta-analysis techniques –Fixed effects –Randomized-effects model Evaluation: Compared generated effect size with examples in text books and published articles, Result: Same effect size

21 Synthetic Estimate Evaluation Tobacco Consumption Alcohol Consumption

22

23

24 Outline Motivation Case Studies –METIS –Claim Jumping Human discovery Natural language processing Human-assisted discovery and synthesis Next Steps Summary 24

25 25 Education Discovery Science Evidence-based Practice Natural Language Processing Human Discovery and Synthesis Human-assisted Discovery and Synthesis Heterogeneous Literature Core Chemistry Breast Cancer Genomics Synthesis and Discovery Work Practices News DocSouth Human Discovery and Synthesis

26 Human Discovery Day-to-day activities of scientists reflect –the complex socio-technical environments in which successful creativity tools will eventually be embedded –the human cognitive processing surrounding creativity Unit of analysis: a paper or grant proposal How do chemists transform an idea into a publication ? How do chemists arrive at their research question ?

27 Approach Recruitment –experienced scientists (7-45 yrs) –local chemists and chemical engineers –response rate 84% (21/25) Semi-structured interviews Critical incident technique 1.seminal paper in their field 2.recent paper authored by the participant 3.paper authored by the participant that they were particularly proud of

28 Interview Questions Discovery Questions –What is your definition of discovery ? –What evidence convinced you that the paper addressed the initial research questions ? –What factors limited the adoption and deployment of the discovery ? –How did you arrive at the research question ? –What if any existing evidence prompted the study/experiment ? –Were there any alternative explanations ? Information Usage questions –Other than the scientific literature, what information resources do you draw from to aid in your research processes ? –How many articles did you read last month that related to each of those projects ? –Is that typical of how many articles you read in a month for research projects ? –Do you read articles for another purpose ? If so what? –How many hours do you spend reading journal articles for research projects? –Which journals do you typically read and draw from ? –How would you characterize the journals that you read- are they only within your domain, or do you read journals that would be considered non-traditional in your research ? –If you only have a few minutes to read an article, what parts would you read? –What do you do with the article once you have read it ?

29 Chemists and Chemical Engineers Compared with other scientists chemists and chemical engineers –read more (Brown,1999) –have more personal subscriptions to journals (Noble & Coughlin, 1997) –spend more time reading (Tenopir & King, 2003) –visit the library more often (Brown, 1999) Consequences –information disseminated quickly –information has a relative short lifespan

30 Human Discovery Findings Discovery definition –Novelty- Balance theory and experimentation –Build on existing ideas- Practical application –Simplicity Hypothesis generation –Discussion- Previous experiments –Combine expertise- Read literature Hypothesis validation –Iterative- Tightly coupled

31 31 Education Discovery Science Evidence-based Practice Natural Language Processing Human Discovery and Synthesis Human-assisted Discovery and Synthesis Heterogeneous Literature Core Chemistry Breast Cancer Genomics Synthesis and Discovery Work Practices News DocSouth Natural Language Processing

32 Causal Relationships Newspaper genre –Causal relationships (Khoo, Chan, & Niu, 1998) Biomedical genre –Causes and treats (Price & Delcambre, 2005) –Causal knowledge (Khoo, Chan, Niu, 2000) Universal Grammar –Causatives (Comrie, 1974, 1981) –Action verbs (Thomson, 1987) 32

33 Claim Definition “To assert in the face of possible contradiction” Example sentence reporting a claim –“This study showed that Tamoxifen reduces the breast cancer risk” Example Claim Framework –Tamoxifen agent –reduces change –[breast cancer risk] object 33

34 The Claim Framework Goal –go beyond genes and proteins –differentiate between different levels of confidence in the claim –consider claims made in the full text Working hypothesis –literature will report findings using constructs within the Claim Framework –human annotators will agree on facets 34

35 Preliminary Results 29 articles from TREC Genomics –Total number of sentences: 5535 –Sentences with >=1 claim: 1250 (22.6%) –Total number of claims: 3228 –Average claims per sentence: 2.51 –Claims that did not fit in the Framework: 31 Per article –Average number of sentences: 191 –Average number of sentences with >=1 claim:43 35

36 Distribution of Claim Categories 36 CategoryTotal (%)Pilot(%)Main(%) Explicit248977.1133283.42215776.63 Implicit872.7030.75842.98 Observation2989.23246.032749.73 Correlation1745.39123.021625.75 Comparison1655.11276.851384.9 Total32281003981002830100

37 37 All Documents AnnotationTotal (%)Words (Avg) Agent289489.6552211.80 Agent Direction2858.832911.02 Agent Modifier124638.6044483.57 Object319799.0468492.14 Object Direction2718.402831.04 Object Modifier156148.3653833.44 Change189758.7719531.03 Change Direction133741.4213581.02 Change Modifier114735.5316181.41 Claim Basis1655.113942.39 Claim Basis Dir.421.30431.02 Claim Basis Mod.862.662663.09 Total3228 281078.70

38 Inter Annotator Agreement Information FacetKappaAgreement Agent0.71 substantial Object 0.77 substantial Change 0.57 moderate Change+ChangeDir 0.88 almost perfect 38

39 Location of Claims 39 Total Sentences With% SectionClaimTotalsectionclaim Abstract9830931.727.84 Introduction35797936.4728.56 Method611210.540.48 Result293182916.0223.44 Discussion539140638.3443.12 Total1250553522.58100.00

40 40 Education Discovery Science Evidence-based Practice Natural Language Processing Human Discovery and Synthesis Human-assisted Discovery and Synthesis Heterogeneous Literature Core Chemistry Breast Cancer Genomics Synthesis and Discovery Work Practices News DocSouth Human-assisted Discovery and Synthesis

41 User Study Timothy S. Carey, MD, MPH Sarah Graham Kenan Professor of Medicine Director, Cecil G Sheps Center for Health Services Research Ila Cote, PhD, DABT Acting Division Director US Environmental Protection Agency National Center for Environmental Assessment Michael T Crimmins PhD. Mary Ann Smith Distinguished Professor of Chemistry UNC and Department Chair, Department of Chemistry Paul Jones Clinical Associate Professor School of Information and Library Science Director of ibiblio.org Rudy L Juliano PhD. Boshamer Distinguished Professor of Pharmacology Principal Investigator, Carolina Center of Cancer Nanotechnology Excellence 41 Steven W. Matson Ph.D. Professor and Chair Department of Biology Robert C Millikan DVM PhD Barbara Sorenson Hulka Distinguished Professor Department of Epidemiology School of Public Health Dr. Rosa Perelmuter, PhD Director, Moore Undergraduate Research Apprentice Program Professor of Spanish and Assistant Dean, Academic Advising Program Jan F. Prins PhD. Professor of Computer Science and Chairman, Department of Computer Science Alexander Tropsha, Ph.D. Professor and Chair Director, Laboratory for Molecular Modeling Suzanne West, PhD Researcher Health, Social and Economics Research RTI International

42 42 Education Discovery Science Evidence-based Practice Natural Language Processing Human Discovery and Synthesis Human-assisted Discovery and Synthesis Heterogeneous Literature Core Chemistry Breast Cancer Genomics Synthesis and Discovery Work Practices News DocSouth

43 Closing Comments Accelerate synthesis Breast cancer study without METIS would take >13 years Without synthetic estimate = systematic review Accelerate discovery –Connections between literature –Speculative and orthogonal views Human discovery and synthesis –As important if not more so than automation 43 “Tap the vast reservoir of human knowledge” Louis Round Wilson, 1929

44 Acknowledgements METIS Funded in part by –California Breast Cancer Research program –University of California, Irvine Thanks to user groups –Particularly to Dr. Adams and Dr. Tengs Academic mentoring –Primary Advisor: Dr. Wanda Pratt –Medical Mentor: Dr. Catherine Carpenter –Co-Advisors: Dr Dennis Kibler and Dr Michael Pazzani –Committee Member: Dr Paul Dourish Claim Jumping Funded in part by –Faculty fellowship from the Renaissance Computing Institute –UNC Faculty Award Thanks to collaborators Nassib Nassar and Mats Rynge (RENCI) Amol Bapat and Ryan Jones (SILS) Chemists and Chemical Engineers Study Funded in part by –NSF Center for Environmentally Responsible Solvents and Processes

45 Questions and Comments Welcome Catherine Blake cablake@email.unc.edu School of Information and Library Science University of North Carolina at Chapel Hill http://www.ils.unc.edu/~cablake

46 Publication Bias Studies that find a correlation between a risk factor and disease are more likely to be published (Easterbrook et al, 1991, Ingelfinger et al, 1994) METIS provides a new way to explore this bias Bias introduced by authors, editors, funding,...


Download ppt "Towards Evidence-Based Discovery Catherine Blake School of Information and Library Science University of North Carolina at Chapel Hill"

Similar presentations


Ads by Google