Intelligent Information Technology Research Lab, Acadia University, Canada 1 Machine Lifelong Learning: Inductive Transfer with Context- sensitive Neural.

Intelligent Information Technology Research Lab, Acadia University, Canada 1 Machine Lifelong Learning: Inductive Transfer with Context- sensitive Neural Networks Danny Silver Ryan Poirier, Duane Currie, Liangliang Tu and Ben Fowler Acadia University, Wolfville, NS, Canada danny.silver@acadiau.ca

Intelligent Information Technology Research Lab, Acadia University, Canada 2 Thank You and an Invitation …

Intelligent Information Technology Research Lab, Acadia University, Canada 3 Outline Machine Lifelong Learning (ML3) and Inductive Transfer Multiple Task Learning (MTL) and its Limitations csMTL – context sensitive MTL Empirical Studies of csMTL Recent Work Conclusions and Future Work Machine Learning, Vol 73 (2008), p313-336

Intelligent Information Technology Research Lab, Acadia University, Canada 4 Machine Lifelong Learning ( ML3 ) Considers methods of retaining and using learned knowledge to improve the effectiveness and efficiency of future learning [Thrun97] We investigate systems that must learn: From impoverished training sets For diverse domains of related/unrelated tasks Where practice of the same task is possible Applications: IA, User Modeling, Robotics, DM

Intelligent Information Technology Research Lab, Acadia University, Canada Appropriate to Seriously Consider such Systems Exists a body of related work: constructive induction, continual learning, sequential task learning, learning with deep architectures Computational and data storage power of modern computers Significant challenges and benefits to pursuing programs of research in AI and brain sciences 5

Intelligent Information Technology Research Lab, Acadia University, Canada 6 Knowledge-Based Inductive Learning: An ML3 Framework Instance Space X Training Examples Testing Examples ( x, f(x)) Model of Classifier h Inductive Learning System short-term memory h(x) ~ f(x) Domain Knowledge long-term memory Retention & Consolidation Inductive Bias Selection Knowledge Transfer

Intelligent Information Technology Research Lab, Acadia University, Canada 7 Inductive Bias and Knowledge Transfer ASH ST THI RDSEC OND ELM ST FIR ST PINE ST OAK ST Inductive bias depends upon: Knowledge of task domain Selection of most related tasks Human learners use Inductive Bias

Intelligent Information Technology Research Lab, Acadia University, Canada 8 Knowledge-Based Inductive Learning: An ML3 Framework Instance Space X Training Examples Testing Examples ( x, f(x)) Model of Classifier h h(x) ~ f(x) Domain Knowledge long-term memory Retention & Consolidation Knowledge Transfer f2(x)f2(x) x1x1 xnxn f1(x)f1(x) fk(x)fk(x) Multiple Task Learning (MTL) Inductive Bias Selection

Intelligent Information Technology Research Lab, Acadia University, Canada 9 Single Task Learning (STL) x1x1 xnxn y=f(x) Target Concept f: X→Y; with prob. dist. P on X × Y Example (x,f(x)); x=(x 1,…,x n ) Training set S = {(x,f(x)} Hypothesis / Hyp. Space h: X→Y / H STL Objective function to minimize ∑ xєS error[f(x),h(x)]

Intelligent Information Technology Research Lab, Acadia University, Canada 10 Multiple Task Learning (MTL) f2(x)f2(x) x1x1 xnxn f1(x)f1(x) Task specific representation Common internal Representation [Caruana, Baxter] Common feature layer fk(x)fk(x) Multiple hypotheses develop in parallel within one back- propagation network [Caruana, Baxter 93-95] An inductive bias occurs through shared use of common internal representation Knowledge or Inductive transfer to primary task f 1 (x) depends on choice of secondary tasks

Intelligent Information Technology Research Lab, Acadia University, Canada 11 Multiple Task Learning (MTL) Target Concept {f i } such that each f i : X→Y with dist. P i on X × Y; and prob. dist. Q over all P i Example (x,{fi(x)})(x,{fi(x)}) Training set S = {(x,{f i (x)})} Hypothesis / Hyp. Space {h i } such that each h i : X→Y / H MTL Objective function to minimize ∑ xєS ∑ i error[f i (x),h i (x)] f2(x)f2(x) x1x1 xnxn f1(x)f1(x) Task specific representation Common internal Representation [Caruana, Baxter] Common feature layer fk(x)fk(x)

Intelligent Information Technology Research Lab, Acadia University, Canada 12 Consolidation and Transfer via MTL & Task Rehearsal Short Term Learning Network y2y2 y3y3 x1x1 xnxn f1(x)f1(x) Virtual examples from related prior tasks for knowledge transfer Virtual Examples of f 1 (x) for Long-term Consolidation y5y5 y4y4 y5y5 y6y6 x1x1 xnxn Long-term Consolidated Domain Knowledge y3y3 y2y2 f1(x)f1(x) Rehearsal of virtual examples for y 2 –y 6 ensures knowledge retention 1.Lots of internal representation 2.Rich set of virtual training examples 3.Small learning rate = slow learning 4.Validation set to prevent growth of high magnitude weights [Poirier04]

Intelligent Information Technology Research Lab, Acadia University, Canada 13 Research Software – RASL3

Intelligent Information Technology Research Lab, Acadia University, Canada 14 Lifelong Learning with MTL Band Domain Logic Domain Coronary Artery Disease A B C D Mean Percent Misclass.

Intelligent Information Technology Research Lab, Acadia University, Canada 15 MTL – Environmental Example Stream flow rate prediction [Lisa Gaudette, 2006] x = weather data f(x) = flow rate

Intelligent Information Technology Research Lab, Acadia University, Canada Transfer Learning Competitions 16

Intelligent Information Technology Research Lab, Acadia University, Canada Transfer Learning Competitions 17

Intelligent Information Technology Research Lab, Acadia University, Canada 18 Limitations of MTL for ML3 Problems with multiple outputs: Training examples must have matching target values Redundant representation Frustrates practice of a task Prevents a fluid development of domain knowledge No way to naturally associate examples with tasks Inductive transfer limited to sharing of hidden node weights Inductive transfer relies on selecting related secondary tasks f2(x)f2(x) x1x1 xnxn f1(x)f1(x) Task specific representation Common internal Representation [Caruana, Baxter] Common feature layer fk(x)fk(x)

Intelligent Information Technology Research Lab, Acadia University, Canada 19 Context Sensitive MTL (csMTL) x1x1 xnxn c1c1 ckck Primary Inputs x One output for all tasks y’=f’(c,x) Context Inputs c We have developed an alternative approach that is meant to overcome these limitations: Uses a single output neural network structure Context inputs associate an example with a task All weights are shared - focus shifts from learning separate tasks to learning a domain of tasks Conjecture: No measure of task relatedness is required

Intelligent Information Technology Research Lab, Acadia University, Canada 20 x1x1 xnxn c1c1 ckck Context Inputs c Primary Inputs x One output for all tasks y’=f’(c,x) Context Sensitive MTL (csMTL) Target Concept f’: C × X→Y; with prob. dist. P’ on C × X × Y; where P’ = g(P,Q) Example (c,x, f’(c,x)); where f’(c,x) = f i (x), when c → i (ie. c i = 1) Training set S’ = {(c,x, f’(c,x))} Hypothesis / Hyp. Space h’: C × X→Y / H csMTL Objective function to min ∑ (c,x) є S error[f’(c,x), h’(c,x)]

Intelligent Information Technology Research Lab, Acadia University, Canada 21 Context Sensitive MTL (csMTL) x1x1 xnxn c1c1 ckck Primary Inputs x One output for all tasks y’=f’(c,x) Context Inputs c During training c selects an inductive bias relative to secondary tasks If x is held constant then c indexes over the domain of tasks f’ If c is real-valued, from the environment; it provides a “grounded” sense of task relatedness If c is a set of task identifiers it differentiates between otherwise conflicting examples; selects internal representation for related tasks

Intelligent Information Technology Research Lab, Acadia University, Canada 22 Context Sensitive MTL (csMTL) Common representation will develop for all tasks, following examples driven by P’ c selects an inductive bias over H csMTL relative to the secondary tasks learned If x is held constant then c indexes over the domain of tasks f’ If c is real-valued, from the environment; provides a “grounded” sense of task relatedness If c is a set of task identifiers, it differentiates between otherwise conflicting examples; selects internal representation for related tasks x1x1 xnxn c1c1 ckck Primary Inputs x One output for all tasks y’=f’(c,x) Context Inputs c

Intelligent Information Technology Research Lab, Acadia University, Canada 23 Context Sensitive MTL (csMTL) Overcomes limitations of standard MTL for long-term consolidation of tasks: Eliminates redundant outputs for the same task Facilitates accumulation of knowledge through practice Examples can be associated with tasks directly by the environment Develops a fluid domain of task knowledge index by the context inputs … AND … Acommodates tasks that have multiple outputs x1x1 xnxn c1c1 ckck Context Inputs Primary Inputs One output for all tasks y’

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL Empirical Studies Task Domains 24 Sec. Task

Intelligent Information Technology Research Lab, Acadia University, Canada 25 csMTL Empirical Studies Task Domains Band 7 tasks, 2 primary inputs Logic T 0 = (x 1 > 0.5  x 2 > 0.5)  (x 3 > 0.5  x 4 > 0.5) 6 tasks, 10 primary inputs fMRI 2 tasks, 24 primary inputs T0T0 T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 Band of positive examples

Intelligent Information Technology Research Lab, Acadia University, Canada 26 csMTL ML3 Empirical Study: Task Domain Objective is to predict TRUE expressions (positive examples) 6 non-linear tasks, 10 input attributes  MTL MTL and  MTL networks: 10 input, 20 hidden, 6 output architecture csMTL long-term network: 16 input, 20 hidden, 1 output csMTL short-term network: additional 5 hidden, 1 output CDK for T 1 throughT 5 developed from 200 training examples 30 training examples, 20 validation (tuning) examples for T 0 1000 test examples Task Logical Expression for Task Name T 0 (A > 0.5  B > 0.5)  (C > 0.5  D > 0.5) T 1 (C > 0.5  D > 0.5)  (E > 0.5  F > 0.5) T 2 (C > 0.5  D > 0.5)  (G > 0.5  H > 0.5) T 3 (E > 0.5  F > 0.5)  (G > 0.5  H > 0.5) T 4 (E > 0.5  F > 0.5)  (I > 0.5  J > 0.5) T 5 (G > 0.5  H > 0.5)  (I > 0.5  J > 0.5) The Logic Domain shared features

Intelligent Information Technology Research Lab, Acadia University, Canada 27 csMTL Empirical Studies Task Domains Covertype data from four wilderness areas in northern Colorado 10 Inputs: elevation, soil type, etc; Output: cover type = six species of tree Dermatology 33 skin input attributes per patient; six types of disease Glass Nine inputs, outputs one of six types of glass Heart Disease clinical data from three hospitals Output: probability of patient having a coronary artery disease Five input attributes: age, gender, type of chest pain, resting blood pressure, and resting electrocardiogram.

Intelligent Information Technology Research Lab, Acadia University, Canada 28 csMTL Empirical Studies Method Objective: To compare generalization accuracy of hypotheses developed by csMTL to STL, MTL Primary tasks have impoverished data sets Secondary tasks have larger data sets For csMTL the primary examples are duplicated to match the number for each secondary task Standard BP three layer networks used for all methods Sufficient representation in hidden layer Tuning sets used to prevent over-fitting Independent test sets used to assess accuracy Repeated studies - mean accuracy is performance metric

Intelligent Information Technology Research Lab, Acadia University, Canada Model accuracy as a function of number of hidden nodes … 29

Intelligent Information Technology Research Lab, Acadia University, Canada Accuracy as a function of number of primary task training examples.. Logic domain 30

Intelligent Information Technology Research Lab, Acadia University, Canada 31 csMTL Empirical Studies Results from Repeated Studies

Intelligent Information Technology Research Lab, Acadia University, Canada 32 csMTL Empirical Studies Determatology Domain [Poirier, recent] Disease diagnosis: n = 360 patients Primary task: psoriasis Secondary tasks seboreic dermatitis lichen planus pityriasis rosea cronic dermatitis pityriasis rubra pilaris

Intelligent Information Technology Research Lab, Acadia University, Canada 33 Why is csMTL doing so well? Constraint between context to hidden and hidden node bias weights (hidden node j, training example n, task z ) Constraint between context to hidden and output node weights (hidden node j, output k, task z ) Reduces number of free parameters in csMTL x1x1 xnxn c1c1 ckck y’

Intelligent Information Technology Research Lab, Acadia University, Canada 34 Recently, have shown that csMTL has two important constraints: Context and bias weights Context and output weights VC(csMTL) < VC(MTL) j Context Sensitive MTL (csMTL) k x1x1 xnxn c1c1 ckck Primary Inputs x One output for all tasks y’=f’(c,x) Context Inputs c

Intelligent Information Technology Research Lab, Acadia University, Canada 35 Why is csMTL doing so well? Consider two unrelated tasks: From a task relatedness perspective - correlation or mutual information over all examples is 0 From an example by example perspective - 50% of examples have matching target values csMTL transfers knowledge at the example level Greater sharing of representation Ex #Task ATask B 100 200 301 401 511 611 710 810 x1x1 xnxn c1c1 ckck y’

Intelligent Information Technology Research Lab, Acadia University, Canada 36 Transfer from the same task … x1x1 x 10 c1c1 c6c6 f’(c,x) Consider a task domain of n tasks where: 20 training examples per task, all examples drawn from same function Learn primary task with no transfer Learn primary task with transfer from n-1 secondary tasks Learn task using AllData training set composed of all 20 x n examples …Which method will produce the best models? x1x1 x 10 f1(x)f1(x)f2(x)f2(x)f6(x)f6(x) MTLcsMTL

Intelligent Information Technology Research Lab, Acadia University, Canada Transfer from the same task … 37

Intelligent Information Technology Research Lab, Acadia University, Canada Transfer under increasing task diversity … 38

Intelligent Information Technology Research Lab, Acadia University, Canada 39 Measure of Task Relatedness? Early conjecture: Context to hidden node weight vectors can be used to measure task relatedness Not true: Two hypotheses for the same examples can develop that have equivalent function use different representation Transfer is functional in nature. x1x1 xnxn c1c1 ckck Context Inputs Primary Inputs One output for all tasks y’

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL and Other ML Methods Will the csMTL encoding work with other machine learning methods? IDT kNN SVM? Bayesian Nets? Deep Architecture Learning Networks ? 40

Intelligent Information Technology Research Lab, Acadia University, Canada 41 csMTL Using IDT (Logic Domain)

Intelligent Information Technology Research Lab, Acadia University, Canada 42 csMTL Using kNN (Logic Domain, k=5)

Intelligent Information Technology Research Lab, Acadia University, Canada 43 Recent Work Develop a csMTL plug-in for WEKA Explore domains of tasks that have multiple outputs Image transformation Machine Lifelong Learning (ML3) with csMTL Consolidation of task knowledge

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL for WEKA 2009-10 completed a new version WEKA MLP (work with B. Fowler and L. Tu) Called MLP_CS Will accept the csMTL encoded examples with context inputs Can be used for transfer learning by researchers and practitioners See http://ml3.acadiau.ca/ 44

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL for WEKA 45 See http://ml3.acadiau.ca/

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL and Tasks with Multiple Outputs Liangliang Tu (2010) Image Morphing: Inductive Transfer between Tasks That Have Multiple Outputs Transforms 30x30 grey scale images using inductive transfer 46

Intelligent Information Technology Research Lab, Acadia University, Canada 47 csMTL and Tasks with Multiple Outputs

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL and Tasks with Multiple Outputs Liangliang Tu (2010) Image Morphing Transforms 30x30 grey scale images Inductive transfer used to develop more accurate transform functions 48

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL and Tasks with Multiple Outputs 49

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL and Tasks with Multiple Outputs 50 Demo

Intelligent Information Technology Research Lab, Acadia University, Canada 51 Conclusions csMTL is a method of inductive transfer using multiple tasks: Single task output, additional context inputs Shifts focus to learning a continuous domain of tasks Eliminates redundant task representation (multiple outputs) Empirical studies: csMTL performs transfer at or above level of MTL May not require a measure of task relatedness Capable of transfer when tasks have multiple outputs

Intelligent Information Technology Research Lab, Acadia University, Canada 52 Future Work Conditions under which csMTL ANNs succeed / fail General ML characteristics under which csMTL encoding will work Explore deep learning networks (G.Hinton) Develop and test a more complete csMTL based Machine Lifelong Learning (ML3) system Explore domains with real-valued context inputs grounded in their environment

Intelligent Information Technology Research Lab, Acadia University, Canada 53 Thank You! danny.silver@acadiau.ca http://plato.acadiau.ca/courses/comp/dsilver/ http://ml3.acadiau.ca

Intelligent Information Technology Research Lab, Acadia University, Canada 54 x1x1 xnxn c1c1 ckck Task Context Standard Inputs Long-term Consolidated Domain Knowledge Network f 1 (c,x) Short-term Learning Network Representational transfer from CDK for rapid learning Functional transfer virtual examples) for consolidation f’(c,x) A ML3 based on csMTL One output for all tasks Work with Ben Fowler, 2010 Stability-Plasticity Problem

Intelligent Information Technology Research Lab, Acadia University, Canada 55 csMTL ML3 Algorithm Short-term Learning via Inductive Transfer Fix the representation of the long-term network Initialize hidden and output node connection weights unique to short-term network to s.r.v. Train and test model for new task in short-term network using available data (tuning set prevents overfit) If generalization accuracy is sufficient, consolidate into long-term network

Intelligent Information Technology Research Lab, Acadia University, Canada 56 csMTL ML3 Algorithm Long-term Consolidation via Task Rehearsal Generate lots of virtual examples for new and prior tasks using existing representations of short- term and long-term networks Unfix the representation of the long-term network Train the long-term network using the virtual examples under cross-validation (tuning set prevents overfit)

Intelligent Information Technology Research Lab, Acadia University, Canada Consolidation based on csMTL Variation in number of virtual examples 57 Work with Ben Fowler, 2010

Intelligent Information Technology Research Lab, Acadia University, Canada Consolidation based on csMTL Variation in transfer and validation set 58 Work with Ben Fowler, 2010

Intelligent Information Technology Research Lab, Acadia University, Canada 59 Benefits of csMTL ML3: Long-term Consolidation … Effective retention (all tasks in DK net improve) Efficient retention (redundancy eliminated) Meta-knowledge collection (context cues) Short-term Learning … Effective learning (inductive transfer) Efficient learning (representation + function) Transfer / training examples used appropriately

Intelligent Information Technology Research Lab, Acadia University, Canada 60 Limitations of csMTL ML3 Consolidation is space and time complex: Rehearsal of all tasks means lots of virtual training examples required Back-propagation of error computational complexity = O(W 3 ); where W = # of weights

Intelligent Information Technology Research Lab, Acadia University, Canada 61 Context Sensitive MTL (csMTL) We are currently investigating the theory of Hints [Abu-Mostafa 93- 96] for formalization of: How each task of the domain can be seen as a Hint task for learning the domain of tasks How the VC dimension for learning a particular task, f k (c,x), is reduced by learning others x1x1 xnxn c1c1 ckck Context Inputs Primary Inputs One output for all tasks y’

Intelligent Information Technology Research Lab, Acadia University, Canada 62 Requirements for a ML3 System: Req. for Long-term Retention … Effective Retention Resist introduction and accumulation of error Retention of new task knowledge should improve related prior task knowledge (practice should improve performance) Efficient Retention Minimize redundant use of memory via consolidation Meta-knowledge Collection e.g. Example distribution over the input space Ensures Effective and Efficient Indexing Selection of related prior knowledge for inductive bias should be accurate and rapid

Intelligent Information Technology Research Lab, Acadia University, Canada 63 Requirements for a ML3 System: Req. for Short-term Learning … Effective (transfer) Learning New learning should benefit from related prior task knowledge ML3 hypotheses should meet or exceed accuracy of those hypotheses developed without benefit of transfer Efficient (transfer) Learning Transfer should reduce training time Increase in space complexity should be minimized Transfer versus Training Examples Must weigh relevance and accuracy of prior knowledge, against Number and accuracy of available training examples

Intelligent Information Technology Research Lab, Acadia University, Canada 64 Benefits of csMTL ML3: Long-term Consolidation … Effective Retention Rehearsal overcomes stability-plasticity problem [Robins95] Increases accuracy of all related tasks [Silver04] Facilitates practice of same task [O’Quinn05] Efficient Retention Eliminates redundant, inaccurate, older hypotheses [Silver04] Meta-knowledge Collection Focus is on learning a continuous domain of tasks Changes in context inputs selects task domain knowledge Ensures Effective and Efficient Indexing Conjecture: Prior knowledge selection is made implicitly by training examples. Indexing occurs as connection weights between long-term and short-term are learned.

Intelligent Information Technology Research Lab, Acadia University, Canada 65 Benefits of csMTL ML3: Short-term Learning … Effective (transfer) Learning Accurate inductive bias via transfer from long-term net A measure of task relatedness is not required Efficient (transfer) Learning Rapid inductive bias via transfer from long-term net Short-term network weights are reusable Transfer versus Training Examples If new task previously learned, weights between long-term and short- term network are quickly learned If new task is different but related to prior task most appropriate features from long-term network will be selected If new task is unrelated to prior tasks, the supplemental hidden nodes will developed needed features.

Intelligent Information Technology Research Lab, Acadia University, Canada 1 Machine Lifelong Learning: Inductive Transfer with Context- sensitive Neural.

Similar presentations

Presentation on theme: "Intelligent Information Technology Research Lab, Acadia University, Canada 1 Machine Lifelong Learning: Inductive Transfer with Context- sensitive Neural."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Intelligent Information Technology Research Lab, Acadia University, Canada 1 Machine Lifelong Learning: Inductive Transfer with Context- sensitive Neural.

Similar presentations

Presentation on theme: "Intelligent Information Technology Research Lab, Acadia University, Canada 1 Machine Lifelong Learning: Inductive Transfer with Context- sensitive Neural."— Presentation transcript:

Similar presentations

About project

Feedback