Download presentation
Presentation is loading. Please wait.
Published byLeslie Melton Modified over 9 years ago
1
Domain Adaptation for Biomedical Information Extraction Jing Jiang BeeSpace Seminar Oct 17, 2007
2
10/17/072 Outline Why do we need domain adaptation? Solutions: Intelligent learning methods Knowledge bases Expert supervision Connections with BeeSpace V4
3
10/17/073 Why do we need domain adaptation? Many biomedical information extraction problems are solved by supervised machine learning methods such as support vector machines (SVMs). Entity recognition Relation extraction Sentence categorization In supervised machine learning, it is assumed that the training data and the test data have the same distribution.
4
10/17/074 Why do we need domain adaptation? Existing labeled training data is often limited to certain domains. GENIA corpus human, blood cells, transcription factors PennBioIE Genetic variation in malignancy, Cytochrome P450 inhibition Training data for sentence categorization in gene summarizer fly Even when the training data is diverse (containing multiple domains), it would still be nice to customize the classifier for the particular target domain that we are working on.
5
10/17/075 Why do we need domain adaptation? NER TaskTrain → TestF1 to find PER, LOC, ORG from news text NYT → NYT0.855 Reuters → NYT0.641 to find gene/protein from biomedical literature mouse → mouse0.541 fly → mouse0.281
6
10/17/076 Solutions to domain adaptation Intelligent learning methods Instance weighting Feature selection Knowledge bases Expert supervision thesis research future work discussion
7
10/17/077 Domain adaptive learning methods Two-stage approach Two frameworks Instance weighting Feature selection Use of unlabeled data
8
10/17/078 Intuition Source Domain Target Domain
9
10/17/079 Goal Target Domain Source Domain
10
10/17/0710 Start from the source domain Source Domain Target Domain
11
10/17/0711 Focus on the common part Source Domain Target Domain
12
10/17/0712 Pick up some part from the target domain Source Domain Target Domain
13
10/17/0713 Formal formulation? Source Domain Target Domain How to formally formulate these ideas?
14
10/17/0714 Instance weighting Source Domain Target Domain instance space (each point represents an example) to assign different weights to different instances in the objective function
15
10/17/0715 Instance weighting Observation source domain target domain
16
10/17/0716 Instance weighting Observation source domain target domain
17
10/17/0717 Instance weighting Analysis of domain difference p(x, y) p(x)p(y | x) p s (y | x) ≠ p t (y | x) p s (x) ≠ p t (x) labeling difference instance difference labeling adaptation instance adaptation ?
18
10/17/0718 Instance weighting Three sets of instances DsDs D t, l D t, u X D s + D t,l + D t,u ?
19
10/17/0719 Instance weighting Framework a flexible setup covering both standard methods and new domain adaptive methods labeled source data labeled target data unlabeled target data
20
10/17/0720 Feature selection Source Domain Target Domain feature space (each point represents a feature) to identify features that behave similarly across domains
21
10/17/0721 Feature selection Observation Domain-specific features wingless daughterless eyeless apexless … “suffix -less” weighted high in the model trained from fly data Useful for other organisms? in general NO ! May cause generalizable features to be downweighted fly genes
22
10/17/0722 Feature selection Observation Generalizable features: generalize well in all domains …decapentaplegic and wingless are expressed in analogous patterns in each … …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues. flymouse
23
10/17/0723 Feature selection Observation Generalizable features: generalize well in all domains …decapentaplegic and wingless are expressed in analogous patterns in each … …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues. flymouse “w i+2 = expressed” is generalizable
24
10/17/0724 Feature selection Intuition for identification of generalizable features … source domains … -less … expressed … expressed … -less … expressed … -less … expressed … -less … 1234567812345678 1234567812345678 1234567812345678 1234567812345678 … expressed … -less … flymouseD3D3 DKDK
25
10/17/0725 Feature selection Framework Matrix A is for feature selection
26
10/17/0726 Feature selection results on gene/protein recognition
27
10/17/0727 New directions to explore Knowledge bases Expert supervision
28
10/17/0728 Knowledge bases – entity recognition Well-documented nomenclatures Fly, Mouse, Rat FlyMouseRat Help filter out false positives? Help select features? Dictionaries of entities “Dictionary features” Automatic summarization of nomenclatures? Automatic identification of good features?
29
10/17/0729 Knowledge bases – sentence categorization in gene summarizer For fly, the training sentences are automatically extracted from FlyBase. For other organisms, do we have similar resources?
30
10/17/0730 Expert supervision – entity recognition Computer system selects ambiguous examples for human experts to judge. Computer system asks human experts other questions. Similar organisms? Typical surface features? (e.g. cis-regulatory elements, “-RE”) Computer system summarizes possible features from pseudo labeled data, and asks human experts for confirmation.
31
10/17/0731 Connections to BeeSpace V4 A major challenge in BeeSpace V4 is extraction of new types of entities and relations. Exploiting knowledge bases and expert supervision is especially important. For new types, no labeled data is available even from other domains. Use of bootstrapping methods should be explored.
32
10/17/0732 New entity types Recognition of many new types will be dictionary based: organism, anatomy, biological process, etc. Recognition of some new types will need some NER techniques: chemical, regulatory element
33
10/17/0733 New relation types Bootstrapping (?) Seed patterns from knowledge bases or human experts Human inspection of newly discovered patterns?
34
10/17/0734 The end
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.