Natural Language Processing in Bioinformatics: Uncovering Semantic Relations Barbara Rosario Joint work with Marti Hearst SIMS, UC Berkeley.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
What is Statistical Modeling
Probabilistic inference
Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.
Semantic Relation Detection in Bioscience Text Marti Hearst SIMS, UC Berkeley Supported by NSF DBI and a gift from.
I256 Applied Natural Language Processing Fall 2009 Lecture 14 Information Extraction (2) Barbara Rosario.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
1 Noun compounds (NCs) Any sequence of nouns that itself functions as a noun asthma hospitalizations asthma hospitalization rates health care personnel.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley.
Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley Supported by NSF DBI
Presented by Zeehasham Rasheed
1 Classification of Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy Barbara Rosario, Marti Hearst SIMS, UC Berkeley.
Discovering Semantic Relations (for Proteins and Digital Devices) Barbara Rosario Intel Research.
Scalable Text Mining with Sparse Generative Models
Thanks to Nir Friedman, HU
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Semantic Interpretation of Medical Text Barbara Rosario, SIMS Steve Tu, UC Berkeley Advisor: Marti Hearst, SIMS.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Review: Probability Random variables, events Axioms of probability
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Natural Language Processing in Bioinformatics: Uncovering Semantic Relations Barbara Rosario SIMS UC Berkeley.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
The Descent of Hierarchy, and Selection in Relational Semantics* Barbara Rosario, Marti Hearst, Charles Fillmore UC Berkeley *with apologies to Charles.
Text Classification, Active/Interactive learning.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Naive Bayes Classifier
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Classifying Semantic Relations in Bioscience Texts Barbara Rosario Marti Hearst SIMS, UC Berkeley Supported by NSF DBI
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.
Labeling protein-protein interactions Barbara Rosario Marti Hearst Project overview The problem Identifying the interactions between proteins. Labeling.
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Learning Sequence Motif Models Using Expectation Maximization (EM)
Statistical NLP: Lecture 9
LECTURE 23: INFORMATION THEORY REVIEW
The Descent of Hierarchy, and Selection in Relational Semantics*
Classifying Semantic Relations in Bioscience Texts
Statistical NLP : Lecture 9 Word Sense Disambiguation
Marti Hearst Associate Professor SIMS, UC Berkeley
Stance Classification of Ideological Debates
Presentation transcript:

Natural Language Processing in Bioinformatics: Uncovering Semantic Relations Barbara Rosario Joint work with Marti Hearst SIMS, UC Berkeley

2 Outline of Talk Goal: Extract semantics from text Information and relation extraction Protein-protein interactions Noun compounds

3 Text Mining Text Mining is the discovery by computers of new, previously unknown information, via automatic extraction of information from text

4 Text Mining Text: Stress is associated with migraines Stress can lead to loss of magnesium Calcium channel blockers prevent some migraines Magnesium is a natural calcium channel blocker 1: Extract semantic entities from text

5 Text Mining Text: Stress is associated with migraines Stress can lead to loss of magnesium Calcium channel blockers prevent some migraines Magnesium is a natural calcium channel blocker StressMigraine Magnesium Calcium channel blockers 1: Extract semantic entities from text

6 Text Mining (cont.) Text: Stress is associated with migraines Stress can lead to loss of magnesium Calcium channel blockers prevent some migraines Magnesium is a natural calcium channel blocker StressMigraine Magnesium Calcium channel blockers 2: Classify relations between entities Associated with Lead to lossPrevent Subtype-of (is a)

7 Text Mining (cont.) Text: Stress is associated with migraines Stress can lead to loss of magnesium Calcium channel blockers prevent some migraines Magnesium is a natural calcium channel blocker StressMigraine Magnesium Calcium channel blockers 3: Do reasoning: find new correlations Associated with Lead to loss Prevent Subtype-of (is a)

8 Text Mining (cont.) Text: Stress is associated with migraines Stress can lead to loss of magnesium Calcium channel blockers prevent some migraines Magnesium is a natural calcium channel blocker StressMigraine Magnesium Calcium channel blockers 4: Do reasoning: infer causality Associated with Lead to loss Prevent Subtype-of (is a) No prevention Deficiency of magnesium  migraine

9 My research StressMigraine Magnesium Calcium channel blockers Information Extraction Stress is associated with migraines Stress can lead to loss of magnesium Calcium channel blockers prevent some migraines Magnesium is a natural calcium channel blocker

10 My research Relation extraction StressMigraine Magnesium Calcium channel blockers Associated with Lead to lossPrevent Subtype-of (is a)

11 Information and relation extraction Problems: Given biomedical text: Find all the treatments and all the diseases Find the relations that hold between them TreatmentDisease Cure? Prevent? Side Effect?

12 Hepatitis Examples Cure These results suggest that con A-induced hepatitis was ameliorated by pretreatment with TJ-135. Prevent A two-dose combined hepatitis A and B vaccine would facilitate immunization programs Vague Effect of interferon on hepatitis B

13 Two tasks Relationship extraction: Identify the several semantic relations that can occur between the entities disease and treatment in bioscience text Information extraction (IE): Related problem: identify such entities

14 Outline of IE Data and semantic relations Quick intro to graphical models Models and results Features Conclusions

15 Data and Relations MEDLINE, abstracts and titles 3662 sentences labeled Relevant: 1724 Irrelevant: 1771 e.g., “Patients were followed up for 6 months” 2 types of Entities treatment and disease 7 Relationships between these entities The labeled data are available at

16 Semantic Relationships 810: Cure Intravenous immune globulin for recurrent spontaneous abortion 616: Only Disease Social ties and susceptibility to the common cold 166: Only Treatment Flucticasone propionate is safe in recommended doses 63: Prevent Statins for prevention of stroke

17 Semantic Relationships 36: Vague Phenylbutazone and leukemia 29: Side Effect Malignant mesodermal mixed tumor of the uterus following irradiation 4: Does NOT cure Evidence for double resistance to permethrin and malathion in head lice

18 Outline of IE Data and semantic relations Quick intro to graphical models Models and results Features Conclusions

19 Graphical Models Unifying framework for developing Machine Learning algorithms Graph theory plus probability theory Widely used Error correcting codes Systems diagnosis Computer vision Filtering (Kalman filters) Bioinformatics

20 (Quick intro to) Graphical Models Nodes are random variables Edges are annotated with conditional probabilities Absence of an edge between nodes implies conditional independence “Probabilistic database” BCD A

21 Graphical Models A BCD Define a joint probability distribution: P(X 1,..X N ) =  i P(X i | Par(X i ) ) P(A,B,C,D) = P(A)P(D)P(B|A)P(C|A,D) Learning Given data, estimate P(A), P(B|A), P(D), P(C | A, D)

22 Graphical Models A BCD Define a joint probability distribution: P(X 1,..X N ) =  i P(X i | Par(X i ) ) P(A,B,C,D) = P(A)P(D)P(B|A)P(C,A,D) Learning Given data, estimate P(A), P(B|A), P(D), P(C | A, D) Inference: compute conditional probabilities, e.g., P(A|B, D) Inference = Probabilistic queries. General inference algorithms (Junction Tree)

23 Naïve Bayes models Simple graphical model X i depend on Y Naïve Bayes assumption: all X i are independent given Y Currently used for text classification and spam detection x1x1 x2x2 x3x3 Y

24 Dynamic Graphical Models Graphical model composed of repeated segments HMMs (Hidden Markov Models) POS tagging, speech recognition, IE tNtN wNwN

25 HMMs Joint probability distribution P(t 1,.., t N, w 1,.., w N) = P(t 1 )  P(t i |t i-1 )P(w i |t i ) Estimate P(t 1 ), P(t i |t i-1 ), P(w i |t i ) from labeled data tNtN wNwN

26 HMMs Joint probability distribution P(t 1,.., t N, w 1,.., w N) = P(t 1 )  P(t i |t i-1 )P(w i |t i ) Estimate P(t 1 ), P(t i |t i-1 ), P(w i |t i ) from labeled data Inference: P(t i | w 1, w 2,… w N ) tNtN wNwN

27 Graphical Models for IE Different dependencies between the features and the relation nodes D3 D1 S1 D2 S2 DynamicStatic

28 Graphical Model Relation node: Semantic relation (cure, prevent, none..) expressed in the sentence Relation generate the state sequence and the observations Relation

29 Graphical Model Markov sequence of states (roles) Role nodes: Role t {treatment, disease, none} Role t-1 Role t Role t+1

30 Graphical Model Roles generate multiple observations Feature nodes (observed): word, POS, MeSH… Features

31 Graphical Model Inference: Find Relation and Roles given the features observed ??? ?

32 Features Word Part of speech Phrase constituent Orthographic features ‘is number’, ‘all letters are capitalized’, ‘first letter is capitalized’ … Semantic features (MeSH)

33 MeSH MeSH Tree Structures 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]

34 MeSH (cont.) 1. Anatomy [A] Body Regions [A01] + Musculoskeletal System [A02] Digestive System [A03] + Respiratory System [A04] + Urogenital System [A05] + Endocrine System [A06] + Cardiovascular System [A07] + Nervous System [A08] + Sense Organs [A09] + Tissues [A10] + Cells [A11] + Fluids and Secretions [A12] + Animal Structures [A13] + Stomatognathic System [A14] (…..) Body Regions [A01] Abdomen [A01.047] Groin [A ] Inguinal Canal [A ] Peritoneum [A ] + Umbilicus [A ] Axilla [A01.133] Back [A01.176] + Breast [A01.236] + Buttocks [A01.258] Extremities [A01.378] + Head [A01.456] + Neck [A01.598] (….)

35 Use of lexical Hierarchies in NLP Big problem in NLP: few words occur a lot, most of them occur very rarely (Zipf’s law) Difficult to do statistics One solution: use lexical hierarchies Another example: WordNet Statistics on classes of words instead of words

36 Mapping Words to MeSH Concepts headache pain C G C G [Neurologic Manifestations][Nervous System Physiology ] C23 G11 [Pathological Conditions, Signs and Symptoms][Musculoskeletal, Neural, and Ocular Physiology] headache recurrence C C breast cancer cells A C04 A11

37 Graphical Model Joint probability distribution over relation, roles and features nodes Parameters estimated with maximum likelihood and absolute discounting smoothing

38 Graphical Model Inference: Find Relation and Roles given the features observed ??? ?

39 Relation extraction Results in terms of classification accuracy (with and without irrelevant sentences) 2 cases: Roles given Roles hidden (only features)

40 Relation classification: Results Good results for a difficult task One of the few systems to tackle several DIFFERENT relations between the same types of entities; thus differs from the problem statement of other work on relations Accuracy SentencesInputBase.GM D2 Only rel.only feat roles given76.6 Rel. + irrel.only feat roles given82.0

41 Role Extraction: Results Junction tree algorithm F-measure = (2*Prec*Recall)/(Prec + Recall) (Related work extracting “diseases” and “genes” reports F-measure of 0.50) SentencesF-measure Only rel.0.73 Rel. + irrel.0.71

42 Features impact: Role extraction Most important features: 1)Word 2)MeSH Rel. + irrel. Only rel. All features No word % -9.6% No MeSH % -5.5%

43 Most important features: Roles Accuracy All feat. + roles 82.0 Features impact: Relation classification (rel. + irrel.) All feat. – roles % All feat. + roles – Word % All feat. + roles – MeSH %

44 Features impact: Relation classification Most realistic case: Roles not known Most important features: 1) Word 2) Mesh Accuracy All feat. – roles 74.9 (rel. + irrel.) All feat. - roles – Word % All feat. - roles – MeSH %

45 Conclusions Classification of subtle semantic relations in bioscience text Graphical models for the simultaneous extraction of entities and relationships Importance of MeSH, lexical hierarchy

46 Outline of Talk Goal: Extract semantics from text Information and relation extraction Protein-protein interactions; using an existing database to gather labeled data

47 Protein-Protein interactions One of the most important challenges in modern genomics, with many applications throughout biology There are several protein-protein interaction databases (BIND, MINT,..), all manually curated

48 Protein-Protein interactions Supervised systems require manually labeled data, while purely unsupervised are still to be proven effective for these tasks. Some other approaches: semi-supervised, active learning, co-training. We propose the use of resources developed in the biomedical domain to address the problem of gathering labeled data for the task of classifying interactions between proteins

49 HIV-1, Protein Interaction Database Documents interactions between HIV-1 proteins and host cell proteins other HIV-1 proteins disease associated with HIV/AIDS 2224 pairs of interacting proteins, 65 types

50 HIV-1, Protein Interaction Database Protein 1Protein 2Paper IDInteraction Type Tat, p14AKT , activates AIP1Gag, Pr ,…binds Tat, p14CDK induces Tat, p14CDK enhances Tat, p14CDK downregulates ….

51 Most common interactions

52 Protein-Protein interactions Idea: use this to “label data” Protein 1Protein 2InteractionPaper ID Tat, p14AKT3activates Extract from the paper all the sentences with Protein 1 and Protein 2 …

53 Protein-Protein interactions Idea: use this to “label data” Protein 1Protein 2InteractionPaper ID Tat, p14AKT3activates Extract from the paper all the sentences with Protein 1 and Protein 2 … Label them with the interaction given in the database activates

54 Protein-Protein interactions Use citations Find all the papers that cite the papers in the database Protein 1Protein 2InteractionPaper ID Tat, p14AKT3activates ID ID

55 Protein-Protein interactions From the papers, extract the citation sentences; from these extract the sentences with Protein 1 and Protein 2 Label them Protein 1Protein 2InteractionPaper ID Tat, p14AKT3activates ID ID activates

56 Examples of sentences Papers: The interpretation of these results was slightly complicated by the fact that AIP-1/ALIX depletion by using siRNA likely had deleterious effects on cell viability, because a Western blot analysis showed slightly reduced Gag expression at later time points (fig. 5C ). Citations: They also demonstrate that the GAG protein from membrane - containing viruses, such as HIV, binds to Alix / AIP1, thereby recruiting the ESCRT machinery to allow budding of the virus from the cell surface (TARGET_CITATION; CITATION ).

57 10 Interaction types

58 Protein-Protein interactions Tasks: Given sentences from Paper ID, and/or citation sentences to ID Predict the interaction type given in the HIV database for Paper ID Extract the proteins involved 10-way classification problem

59 Protein-Protein interactions Models Dynamic graphical model Naïve Bayes

60 Graphical Models

61 Evaluation Evaluation at document level All (sentences from papers + citations) Papers (only sentences from papers) Citations (only citation sentences) “Trigger word” approach List of keywords (ex: for inhibits: “inhibitor”, “inhibition”, “inhibit”…etc. If keyword presents: assign corresponding interaction

62 Results Accuracies on interaction classification ModelAllPapersCitations Markov Model Naïve Bayes Baselines Most freq. inter TriggerW TriggerW + BO (Roles hidden)

63 Results: confusion matrix For All. Overall accuracy: 60.5%

64 Hiding the protein names Replaced protein names with tokens PROT_NAME Selective CXCR4 antagonism by Tat Selective PROT_NAME antagonism by PROT_NAME

65 Results with no protein names ModelPapersCitations Markov Model44.4 (-23.1%) 52.3 (-2.0%) Naïve Bayes46.7 (-19.2%) 53.4 (-4.1 %)

66 Protein extraction (Protein name tagging, role extraction) The identification of all the proteins present in the sentence that are involved in the interaction These results suggest that Tat - induced phosphorylation of serine 5 by CDK9 might be important after transcription has reached the +36 position, at which time CDK7 has been released from the complex. Tat might regulate the phosphorylation of the RNA polymerase II carboxyl - terminal domain in pre - initiation complexes by activating CDK7

67 Protein extraction: results RecallPrecisionF-measure All Papers Citations No dictionary used

68 Conclusions of protein- protein interaction project Encouraging results for the automatic classification of protein-protein interactions Use of an existing database for gathering labeled data Use of citations

69 Noun compounds (NCs) Any sequence of nouns that itself functions as a noun asthma hospitalizations asthma hospitalization rates health care personnel hand wash Technical text is rich with NCs Open-labeled long-term study of the subcutaneous sumatriptan efficacy and tolerability in acute migraine treatment.

70 NCs: 3 computational tasks Identification Syntactic analysis (attachments) [Baseline [headache frequency]] [[Tension headache] patient] Semantic analysis Headache treatment treatment for headache Corticosteroid treatment treatment that uses corticosteroid

71 Two approaches Treat it as a classification problem (and use a machine learning algorithm) Linguistically motivated: consider the “semantics” of the nouns which will determine the relations between them

72 Second approach Linguistic Motivation Head noun has argument structure Meaning of the head noun determines what kinds of things can be done to it, what it is made of, what it is a part of…

73 Linguistic Motivation Material + Cutlery  Made of steel knife, plastic fork, wooden spoon Food + Cutlery  Used on meat knife, dessert spoon, salad fork Profession + Cutlery  Used by chef's knife, butcher's knife

74 Linguistic Motivation Hypothesis: A particular semantic relation holds between all 2-word NCs that can be categorized by a MeSH pair. Use the classes of MeSH to identify semantic relations

75 Grouping the NCs A02 C04 (Musculoskeletal System, Neoplasms) skull tumors, bone cysts, bone metastases, skull osteosarcoma… B06 B06 (Plants, Plants) eucalyptus trees, apple fruits, rice grains, potato plants A01 M01 (Body region, Person) shoulder patient, eye physician, eye donor Too different: need to be more specific: go down the hierarchy A01 M ( Body Regions, Patients) shoulder patient C04 M (Body Regions, Occupational Groups) eye physician, chest physicians

76 Classification Decisions + Relations A02 C04  Location of Disease B06 B06  Kind of Plants C04 M01 C04 M  Person afflicted by Disease C04 M  Person who treats Disease A01 H01 A01 H A01 H A01 H A01 H A01 M01 A01 M  Person afflicted by Disease A01 M  Specialist of A01 M  Donor of

77 Evaluation Accuracy: Anatomy: 91% accurate Natural Science: 79% Neoplasm: 100% Total Accuracy : 90.8%

78 Conclusion of NCs Problem of assigning semantic relations to two-word technical NCs Important problem: many NCs in technical text Especially difficult for the lack of syntactic clues State-of-the-art results One of very few working systems to tackle this task for NCs

79 Conclusion Machine Learning methods for NLP tasks Three lines of research in this area, state-of-the art results Information and relation extraction for “treatments” and “diseases” Protein-protein interactions (Noun compounds)

Thank you! Barbara Rosario SIMS, UC Berkeley

81 Future work Unsupervised, semi-supervised methods Reasoning (knowledge representation and inference procedures) Huge amount of textual data (Web) Connection between several databases and/or text collections for linking different pieces of information System architecture to support multiple layers of annotation on text Development of effective interface

Additional slides on IE

83 Related work Several DIFFERENT Relations between the Same Types of Entities Thus differs from the problem statement of other work on relations Many find one relation which holds between two entities (many based on ACE)

84 Related work (cont.) Agichtein and Gravano (2000), lexical patterns for location of Zelenko et al. (2002) SVM for person affiliation and organization-location Hasegawa et al. (ACL 2004) Person- Organization -> President “relation” Craven (1999, 2001) HMM for subcellular- location and disorder-association Doesn’t identify the actual relation

85 Related work: Bioscience Many hand-built rules Feldman et al. (2002), Friedman et al. (2001) Pustejovsky et al. (2002) Saric et al.; (2004)

86 Our D1 Thompson et al Frame classification and role labeling for FrameNet sentences Target word must be observed More relations and roles

87 Smoothing: absolute discounting Lower the probability of seen events by subtracting a constant from their count (ML estimate: ) The remaining probability is evenly divided by the unseen events

88 F-measures for role extraction in function of smoothing factors

89 Relation accuracies in function of smoothing factors

90 Relation classification: Confusion Matrix Computed for “rel + irrel.”, “only features”

91 Proteins: sentence-level evaluation Total accuracy: 38.9% (49.4% without interact with)

92 Learning with the hand-labeled sentences ModelAllPapersCitations Markov Model Naïve Bayes Baselines Most freq. inter TriggerW TRiggerW + BO

93 Learning with the hand-labeled sentences

94 Q & A system Q: What are the treatments of cervical carcinoma A: Stage Ib and IIa cervical carcinoma can be cured by radical surgery or radiotherapy

95 Q & A system Q: What are the methods of administration of headache treatment A: intranasal migraine treatment

96 Evaluation Mj: For each triple, for each sentence of the triple, find the interaction that maximizes the posterior probability of the interaction given the features; then assign to all sentences of this triple the most frequent interaction between those predicted for the individual sentences. Mj*: Same as Mj, except that if the interaction predicted is the generic interacts with, choose instead the next most frequent interaction (retain interacts with only if it is the only interaction predicted.

97 Evaluation Cf: Retain all the conditional probabilities (i.e., don't first choose an interaction per sentence), then for each triple choose the interaction that maximizes the sum over all the sentences of the triple. Cf*: Same as Cf, substituting interacts with with the next most confident interaction.