Download presentation
Presentation is loading. Please wait.
1
Zhipeng (Patrick) Luo December 6th, 2016
Active Learning Zhipeng (Patrick) Luo December 6th, 2016
2
Motivation Labeling cost in supervised learning can be huge.
A typical supervised learning is called passive learning. Active Learning: Aims to find a good mapping without using too many labeled data.
3
Active Learning Paradigm
Iterative learning framework: Start with initial labeled data; Repeat: Fit a classifier based on current labeled data; Actively sample a most important unlabeled point based on current classifier; Obtain its label from a human labeler. A glut of different querying strategies: Uncertainty Novelty (Redundancy) Representativeness
4
Three Scenarios of Active Learning
Pool-based Pick one point from a unlabeled data pool. Uncertainty closest to decision boundary highest entropy in the predicted label Query synthesize Aggressively search the unlabeled data space for a data point. But can be unrealistic for labeling. Sequential Labeling Unlabeled data arrive sequentially. Mellow learner
5
Notations The unlabeled data space: πΏ The label space: a finite set π
A mapping: β :πΏβπ, β βπ― β π : a hypothesis learned after seeing n examples. A pre-specified hypothesis space: π― An underlying distribution of πΏΓπ: π· Error of a hypothesis β: πππ β =π·[β(π)β π] The optimal hypothesis β β = ππππππ π― πππ(β) Separable case: πππ(β β )=0; non-separable otherwise Active Learning: hope β π β β β when n grows.
6
A Toy Example Setting: πΏ is one-dimensional; π is binary; π― is a set of linear separators (thresholds); Linear separable; π is a pre-defined error rate. Passive learning: π(1/π) randomly labeled points. Binary Search: π( log 1/π ) examples.
7
Separable Case π― 1 =π― For t = 1, 2, β¦ Receive an unlabeled point π₯ π‘ ;
If disagreement about π₯ π‘ βs label: Query label π¦ π‘ of π₯ π‘ ; π― π‘+1 ={ββ π― π‘ :β π₯ π‘ = π¦ π‘ }. Else: π― π‘+1 = π― π‘
8
Separable Case
9
Separable Case Label complexity How many labels are needed?
πΏ π,πΏ = π‘ π π .π‘. πππ β π‘β₯ π‘ π , π βββ π» π‘ , πππ β >π β€πΏ How many labels are needed? πΏ π,πΏ β€ π (πππππ1/π) π suppresses terms logarithmic in π, π and πππ1/π π is the VC dimension of a hypothesis space π is constant called disagreement coefficient (to be talked about later) In contrast to passive learning: The label complexity is Ξ©(π/π). Active learning yields an exponential improvement.
10
Separable Case Non-separable Case π― 1 =π―
For t = 1, 2, β¦ Receive an unlabeled point π₯ π‘ ; If there exists disagreement about π₯ π‘ βs label: Query label π¦ π‘ of π₯ π‘ ; π― π‘+1 ={ββ π― π‘ :β π₯ π‘ = π¦ π‘ }. Else: π― π‘+1 = π― π‘ If there exists disagreement about π₯ π‘ βs label: Query label π¦ π‘ of π₯ π‘ ; Else: Infer label π¦ π‘ of π₯ π‘ ; π¦ π‘ may be wrong. π― π‘+1 = {ββ π― π‘ :ππππ β β€ πππ π β π‘ β + β π‘ }
11
Separable Case Non-separable Case Label complexity Label complexity
πΏ π,πΏ = π‘ π π .π‘. πππ β π‘β₯ π‘ π , π βββ π» π‘ , πππ β >π β€πΏ Active Learning πΏ π,πΏ β€ π (πππππ1/π) Passive Learning Ξ©(π/π) Label complexity πΏ π,πΏ = π‘ π π .π‘. πππ β π‘β₯ π‘ π , π βββ π» π‘ , πππ β >π+π£ βββ π» π‘ , πππ β >π+π£ β€πΏ π£=πππ( β β ) Active Learning πΏ π,πΏ β€ π (ππ(πππ21/π+ π£ 2 π 2 )) Passive Learning Ξ©( π π + π π£ 2 π 2 )
12
The Disagreement Coefficient π
π β, β β = π· πΏ [β π β β β (π)] π΅(β β ,π)={ββπ―:π β, β β β€π} π·πΌπ(π΅(β β ,π))={π₯βπΏ,ββ, β β² β π΅(β β ,π), π .π‘. β(π₯)β ββ²(π₯)} π= π π’π π>0 π·[ π·πΌπ(π΅(β β ,π))] π This coefficient measures how π·[ π·πΌπ(π΅(β β ,π))] scales with π.
13
Active Clinical Trials for Personalized Medicine
Stanislav Minsker, Ying-Qi Zhao, and Guang Cheng JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION 2016
14
Introduction Individualized treatment rules (ITRs)
Find the optimal treatment for each patient Each patientβs characteristics include graphics, medical history, genetic or genomic information etc. Randomized clinical trials (RCTs) Only focuses on efficacy of treatments Not efficient, as sample size (cost) can be large Active clinical trials (ACTs) Exclude patients for whom the benefit of some treatment is clear. Select those whose optimal treatment is hard to determine An instance of uncertainty sampling.
15
Problem Setting Data (π,π΄,π
) has a joint probability π·.
πβ πΉ π , representing a patient case; π΄β{1,β1}, representing a treatment decision: standard or alternative π
βπΉ stands for a treatment outcome, the larger the better. An ITR π·:πβπ΄ is a binary mapper: π· β (π₯)=π πππ{ π β (π₯)} π β π₯ =πΈ π
π΄=1,π=π₯ βπΈ[π
|π΄=β1,π=π₯] Called contrast function. Defines the optimal decision boundary. Can be modeled with a regression model.
16
Active Clinical Trials
Its key idea is to find the active set in each iteration The optimal treatment of any patient in this set is uncertain.
17
Active Clinical Trials
Step 1: Initialization Randomly recruit patients and randomly treat them; Observe their outcomes (labeled); Train the initial estimators. Step 2: Active learning Find the active set patients; Randomly treat them; Observe their outcomes; Update the estimators; Repeat until budget runs out, then output the final estimator.
18
Active Set Define πΉ(π,πΏ) to be a set of hypotheses that are πΏ-close to π. πΉ π,πΏ ={π: πβπ β β€πΏ} For each iteration: Find πΉ( π π‘β1 ,πΏ); Active Set π΄π π‘ ={π₯:β π 1 , π 2 βπΉ( π π‘β1 ,πΏ), sign( π 2 )β π πππ( π 1 )} Approximate π΄π π‘ with a regular set πππ‘ π‘ Its purpose is to determine the active set based on intrinsic dimensions.
19
Smoothed Kernel Estimator
20
Kernel Bandwidth
21
Kernel Bandwidth
22
Theoretic Bound With probability greater than , it holds:
πΆ is a constant depending on kernel and X distribution π is the total number of patients recruited. π is the intrinsic dimension cardinality and πΎβ 1,π a constant.
23
Real Data Analysis Two data sets: Methods:
Nefazodone-CBASP Clinical Trial Twelve-Step Intervention on Simulant Drug Use Methods: AL-BV: active learning with smoothed kernel (non-parametric) AL-GP: active learning with Gaussian Process (parametric) OWS: passive learning with outcome weighted learning (hinge loss) OLS: passive learning with ordinary least squares loss
24
Nefazodone-CBASP Clinical Trials
Patient predicators: The baseline HRSD (Hamilton rating scale for depression) scores, the alcohol dependence, and the HAMA somatic anxiety scores Three treatments: Nefazodone, CBASP and combination of both Outcome: HRSD: the higher the worse
25
Nefazodone-CBASP Clinical Trials
26
Twelve-Step Intervention on Simulant Drug Use
Patient predicators: Age, average number of days per month of self-reported stimulant drug use in the 3 months prior to randomization, baseline alcohol use, drug use, employment status, medical status, and psychiatric status composite scores on the addiction severity index (ASI). To reduce stimulant drug use, treatments are: As usual (TAU) or to TAU integrated with Stimulant Abuser Groups to Engage in 12-step intervention Outcome: The number of days of self-reported stimulant drug use over the 3- to 6-month post-randomization period. where a smaller value is preferable.
27
Twelve-Step Intervention on Simulant Drug Use
28
Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.