Intelligent Information Technology Research Lab, Acadia University, Canada 1 Machine Lifelong Learning: Inductive Transfer with Context- sensitive Neural.

Slides:



Advertisements
Similar presentations
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Advertisements

Stream flow rate prediction x = weather data f(x) = flow rate.
Slides from: Doug Gray, David Poole
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
For Wednesday Read chapter 19, sections 1-3 No homework.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Tuomas Sandholm Carnegie Mellon University Computer Science Department
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
Machine Learning Neural Networks
Artificial Neural Networks ML Paul Scheible.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Connectionist Modeling Some material taken from cspeech.ucd.ie/~connectionism and Rich & Knight, 1991.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Data Mining: Discovering Information From Bio-Data Present by: Hongli Li & Nianya Liu University of Massachusetts Lowell.
Artificial Neural Networks
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Neural Networks
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CS621: Artificial Intelligence Lecture 24: Backpropagation Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Lifelong Machine Learning and Reasoning
CS623: Introduction to Computing with Neural Nets (lecture-6) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Artificial Neural Networks
COMP3503 Intro to Inductive Modeling
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Computer Science and Engineering
Intelligent Information Technology Research Lab, Acadia University, Canada 1 Lifelong Machine Learning and Reasoning Daniel L. Silver Acadia University,
Chapter 9 Neural Network.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
NEURAL NETWORKS FOR DATA MINING
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Gap filling of eddy fluxes with artificial neural networks
Neural and Evolutionary Computing - Lecture 9 1 Evolutionary Neural Networks Design  Motivation  Evolutionary training  Evolutionary design of the architecture.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
CogNova Technologies 1 Evaluating Induced Models Evaluating Induced Models with Daniel L. Silver Daniel L. Silver Copyright (c), 2004 All Rights Reserved.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
CITS7212: Computational Intelligence An Overview of Core CI Technologies Lyndon While.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.
Artificial Neural Network
EEE502 Pattern Recognition
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Transfer and Multitask Learning Steve Clanton. Multiple Tasks and Generalization “The ability of a system to recognize and apply knowledge and skills.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Lifelong Machine Learning and Reasoning
Inductive Transfer, Machine Lifelong Learning, and AGI
ANN Design and Training
cs540 - Fall 2016 (Shavlik©), Lecture 20, Week 11
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Implementation of Learning Systems
Modeling IDS using hybrid intelligent systems
Presentation transcript:

Intelligent Information Technology Research Lab, Acadia University, Canada 1 Machine Lifelong Learning: Inductive Transfer with Context- sensitive Neural Networks Danny Silver Ryan Poirier, Duane Currie, Liangliang Tu and Ben Fowler Acadia University, Wolfville, NS, Canada

Intelligent Information Technology Research Lab, Acadia University, Canada 2 Thank You and an Invitation …

Intelligent Information Technology Research Lab, Acadia University, Canada 3 Outline Machine Lifelong Learning (ML3) and Inductive Transfer Multiple Task Learning (MTL) and its Limitations csMTL – context sensitive MTL Empirical Studies of csMTL Recent Work Conclusions and Future Work Machine Learning, Vol 73 (2008), p

Intelligent Information Technology Research Lab, Acadia University, Canada 4 Machine Lifelong Learning ( ML3 ) Considers methods of retaining and using learned knowledge to improve the effectiveness and efficiency of future learning [Thrun97] We investigate systems that must learn: From impoverished training sets For diverse domains of related/unrelated tasks Where practice of the same task is possible Applications: IA, User Modeling, Robotics, DM

Intelligent Information Technology Research Lab, Acadia University, Canada Appropriate to Seriously Consider such Systems Exists a body of related work: constructive induction, continual learning, sequential task learning, learning with deep architectures Computational and data storage power of modern computers Significant challenges and benefits to pursuing programs of research in AI and brain sciences 5

Intelligent Information Technology Research Lab, Acadia University, Canada 6 Knowledge-Based Inductive Learning: An ML3 Framework Instance Space X Training Examples Testing Examples ( x, f(x)) Model of Classifier h Inductive Learning System short-term memory h(x) ~ f(x) Domain Knowledge long-term memory Retention & Consolidation Inductive Bias Selection Knowledge Transfer

Intelligent Information Technology Research Lab, Acadia University, Canada 7 Inductive Bias and Knowledge Transfer ASH ST THI RDSEC OND ELM ST FIR ST PINE ST OAK ST Inductive bias depends upon: Knowledge of task domain Selection of most related tasks Human learners use Inductive Bias

Intelligent Information Technology Research Lab, Acadia University, Canada 8 Knowledge-Based Inductive Learning: An ML3 Framework Instance Space X Training Examples Testing Examples ( x, f(x)) Model of Classifier h h(x) ~ f(x) Domain Knowledge long-term memory Retention & Consolidation Knowledge Transfer f2(x)f2(x) x1x1 xnxn f1(x)f1(x) fk(x)fk(x) Multiple Task Learning (MTL) Inductive Bias Selection

Intelligent Information Technology Research Lab, Acadia University, Canada 9 Single Task Learning (STL) x1x1 xnxn y=f(x) Target Concept f: X→Y; with prob. dist. P on X × Y Example (x,f(x)); x=(x 1,…,x n ) Training set S = {(x,f(x)} Hypothesis / Hyp. Space h: X→Y / H STL Objective function to minimize ∑ xєS error[f(x),h(x)]

Intelligent Information Technology Research Lab, Acadia University, Canada 10 Multiple Task Learning (MTL) f2(x)f2(x) x1x1 xnxn f1(x)f1(x) Task specific representation Common internal Representation [Caruana, Baxter] Common feature layer fk(x)fk(x) Multiple hypotheses develop in parallel within one back- propagation network [Caruana, Baxter 93-95] An inductive bias occurs through shared use of common internal representation Knowledge or Inductive transfer to primary task f 1 (x) depends on choice of secondary tasks

Intelligent Information Technology Research Lab, Acadia University, Canada 11 Multiple Task Learning (MTL) Target Concept {f i } such that each f i : X→Y with dist. P i on X × Y; and prob. dist. Q over all P i Example (x,{fi(x)})(x,{fi(x)}) Training set S = {(x,{f i (x)})} Hypothesis / Hyp. Space {h i } such that each h i : X→Y / H MTL Objective function to minimize ∑ xєS ∑ i error[f i (x),h i (x)] f2(x)f2(x) x1x1 xnxn f1(x)f1(x) Task specific representation Common internal Representation [Caruana, Baxter] Common feature layer fk(x)fk(x)

Intelligent Information Technology Research Lab, Acadia University, Canada 12 Consolidation and Transfer via MTL & Task Rehearsal Short Term Learning Network y2y2 y3y3 x1x1 xnxn f1(x)f1(x) Virtual examples from related prior tasks for knowledge transfer Virtual Examples of f 1 (x) for Long-term Consolidation y5y5 y4y4 y5y5 y6y6 x1x1 xnxn Long-term Consolidated Domain Knowledge y3y3 y2y2 f1(x)f1(x) Rehearsal of virtual examples for y 2 –y 6 ensures knowledge retention 1.Lots of internal representation 2.Rich set of virtual training examples 3.Small learning rate = slow learning 4.Validation set to prevent growth of high magnitude weights [Poirier04]

Intelligent Information Technology Research Lab, Acadia University, Canada 13 Research Software – RASL3

Intelligent Information Technology Research Lab, Acadia University, Canada 14 Lifelong Learning with MTL Band Domain Logic Domain Coronary Artery Disease A B C D Mean Percent Misclass.

Intelligent Information Technology Research Lab, Acadia University, Canada 15 MTL – Environmental Example Stream flow rate prediction [Lisa Gaudette, 2006] x = weather data f(x) = flow rate

Intelligent Information Technology Research Lab, Acadia University, Canada Transfer Learning Competitions 16

Intelligent Information Technology Research Lab, Acadia University, Canada Transfer Learning Competitions 17

Intelligent Information Technology Research Lab, Acadia University, Canada 18 Limitations of MTL for ML3 Problems with multiple outputs: Training examples must have matching target values Redundant representation Frustrates practice of a task Prevents a fluid development of domain knowledge No way to naturally associate examples with tasks Inductive transfer limited to sharing of hidden node weights Inductive transfer relies on selecting related secondary tasks f2(x)f2(x) x1x1 xnxn f1(x)f1(x) Task specific representation Common internal Representation [Caruana, Baxter] Common feature layer fk(x)fk(x)

Intelligent Information Technology Research Lab, Acadia University, Canada 19 Context Sensitive MTL (csMTL) x1x1 xnxn c1c1 ckck Primary Inputs x One output for all tasks y’=f’(c,x) Context Inputs c We have developed an alternative approach that is meant to overcome these limitations: Uses a single output neural network structure Context inputs associate an example with a task All weights are shared - focus shifts from learning separate tasks to learning a domain of tasks Conjecture: No measure of task relatedness is required

Intelligent Information Technology Research Lab, Acadia University, Canada 20 x1x1 xnxn c1c1 ckck Context Inputs c Primary Inputs x One output for all tasks y’=f’(c,x) Context Sensitive MTL (csMTL) Target Concept f’: C × X→Y; with prob. dist. P’ on C × X × Y; where P’ = g(P,Q) Example (c,x, f’(c,x)); where f’(c,x) = f i (x), when c → i (ie. c i = 1) Training set S’ = {(c,x, f’(c,x))} Hypothesis / Hyp. Space h’: C × X→Y / H csMTL Objective function to min ∑ (c,x) є S error[f’(c,x), h’(c,x)]

Intelligent Information Technology Research Lab, Acadia University, Canada 21 Context Sensitive MTL (csMTL) x1x1 xnxn c1c1 ckck Primary Inputs x One output for all tasks y’=f’(c,x) Context Inputs c During training c selects an inductive bias relative to secondary tasks If x is held constant then c indexes over the domain of tasks f’ If c is real-valued, from the environment; it provides a “grounded” sense of task relatedness If c is a set of task identifiers it differentiates between otherwise conflicting examples; selects internal representation for related tasks

Intelligent Information Technology Research Lab, Acadia University, Canada 22 Context Sensitive MTL (csMTL) Common representation will develop for all tasks, following examples driven by P’ c selects an inductive bias over H csMTL relative to the secondary tasks learned If x is held constant then c indexes over the domain of tasks f’ If c is real-valued, from the environment; provides a “grounded” sense of task relatedness If c is a set of task identifiers, it differentiates between otherwise conflicting examples; selects internal representation for related tasks x1x1 xnxn c1c1 ckck Primary Inputs x One output for all tasks y’=f’(c,x) Context Inputs c

Intelligent Information Technology Research Lab, Acadia University, Canada 23 Context Sensitive MTL (csMTL) Overcomes limitations of standard MTL for long-term consolidation of tasks: Eliminates redundant outputs for the same task Facilitates accumulation of knowledge through practice Examples can be associated with tasks directly by the environment Develops a fluid domain of task knowledge index by the context inputs … AND … Acommodates tasks that have multiple outputs x1x1 xnxn c1c1 ckck Context Inputs Primary Inputs One output for all tasks y’

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL Empirical Studies Task Domains 24 Sec. Task

Intelligent Information Technology Research Lab, Acadia University, Canada 25 csMTL Empirical Studies Task Domains Band 7 tasks, 2 primary inputs Logic T 0 = (x 1 > 0.5  x 2 > 0.5)  (x 3 > 0.5  x 4 > 0.5) 6 tasks, 10 primary inputs fMRI 2 tasks, 24 primary inputs T0T0 T1T1 T2T2 T3T3 T4T4 T5T5 T6T6 Band of positive examples

Intelligent Information Technology Research Lab, Acadia University, Canada 26 csMTL ML3 Empirical Study: Task Domain Objective is to predict TRUE expressions (positive examples) 6 non-linear tasks, 10 input attributes  MTL MTL and  MTL networks: 10 input, 20 hidden, 6 output architecture csMTL long-term network: 16 input, 20 hidden, 1 output csMTL short-term network: additional 5 hidden, 1 output CDK for T 1 throughT 5 developed from 200 training examples 30 training examples, 20 validation (tuning) examples for T test examples Task Logical Expression for Task Name T 0 (A > 0.5  B > 0.5)  (C > 0.5  D > 0.5) T 1 (C > 0.5  D > 0.5)  (E > 0.5  F > 0.5) T 2 (C > 0.5  D > 0.5)  (G > 0.5  H > 0.5) T 3 (E > 0.5  F > 0.5)  (G > 0.5  H > 0.5) T 4 (E > 0.5  F > 0.5)  (I > 0.5  J > 0.5) T 5 (G > 0.5  H > 0.5)  (I > 0.5  J > 0.5) The Logic Domain shared features

Intelligent Information Technology Research Lab, Acadia University, Canada 27 csMTL Empirical Studies Task Domains Covertype data from four wilderness areas in northern Colorado 10 Inputs: elevation, soil type, etc; Output: cover type = six species of tree Dermatology 33 skin input attributes per patient; six types of disease Glass Nine inputs, outputs one of six types of glass Heart Disease clinical data from three hospitals Output: probability of patient having a coronary artery disease Five input attributes: age, gender, type of chest pain, resting blood pressure, and resting electrocardiogram.

Intelligent Information Technology Research Lab, Acadia University, Canada 28 csMTL Empirical Studies Method Objective: To compare generalization accuracy of hypotheses developed by csMTL to STL, MTL Primary tasks have impoverished data sets Secondary tasks have larger data sets For csMTL the primary examples are duplicated to match the number for each secondary task Standard BP three layer networks used for all methods Sufficient representation in hidden layer Tuning sets used to prevent over-fitting Independent test sets used to assess accuracy Repeated studies - mean accuracy is performance metric

Intelligent Information Technology Research Lab, Acadia University, Canada Model accuracy as a function of number of hidden nodes … 29

Intelligent Information Technology Research Lab, Acadia University, Canada Accuracy as a function of number of primary task training examples.. Logic domain 30

Intelligent Information Technology Research Lab, Acadia University, Canada 31 csMTL Empirical Studies Results from Repeated Studies

Intelligent Information Technology Research Lab, Acadia University, Canada 32 csMTL Empirical Studies Determatology Domain [Poirier, recent] Disease diagnosis: n = 360 patients Primary task: psoriasis Secondary tasks seboreic dermatitis lichen planus pityriasis rosea cronic dermatitis pityriasis rubra pilaris

Intelligent Information Technology Research Lab, Acadia University, Canada 33 Why is csMTL doing so well? Constraint between context to hidden and hidden node bias weights (hidden node j, training example n, task z ) Constraint between context to hidden and output node weights (hidden node j, output k, task z ) Reduces number of free parameters in csMTL x1x1 xnxn c1c1 ckck y’

Intelligent Information Technology Research Lab, Acadia University, Canada 34 Recently, have shown that csMTL has two important constraints: Context and bias weights Context and output weights VC(csMTL) < VC(MTL) j Context Sensitive MTL (csMTL) k x1x1 xnxn c1c1 ckck Primary Inputs x One output for all tasks y’=f’(c,x) Context Inputs c

Intelligent Information Technology Research Lab, Acadia University, Canada 35 Why is csMTL doing so well? Consider two unrelated tasks: From a task relatedness perspective - correlation or mutual information over all examples is 0 From an example by example perspective - 50% of examples have matching target values csMTL transfers knowledge at the example level Greater sharing of representation Ex #Task ATask B x1x1 xnxn c1c1 ckck y’

Intelligent Information Technology Research Lab, Acadia University, Canada 36 Transfer from the same task … x1x1 x 10 c1c1 c6c6 f’(c,x) Consider a task domain of n tasks where: 20 training examples per task, all examples drawn from same function Learn primary task with no transfer Learn primary task with transfer from n-1 secondary tasks Learn task using AllData training set composed of all 20 x n examples …Which method will produce the best models? x1x1 x 10 f1(x)f1(x)f2(x)f2(x)f6(x)f6(x) MTLcsMTL

Intelligent Information Technology Research Lab, Acadia University, Canada Transfer from the same task … 37

Intelligent Information Technology Research Lab, Acadia University, Canada Transfer under increasing task diversity … 38

Intelligent Information Technology Research Lab, Acadia University, Canada 39 Measure of Task Relatedness? Early conjecture: Context to hidden node weight vectors can be used to measure task relatedness Not true: Two hypotheses for the same examples can develop that have equivalent function use different representation Transfer is functional in nature. x1x1 xnxn c1c1 ckck Context Inputs Primary Inputs One output for all tasks y’

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL and Other ML Methods Will the csMTL encoding work with other machine learning methods? IDT kNN SVM? Bayesian Nets? Deep Architecture Learning Networks ? 40

Intelligent Information Technology Research Lab, Acadia University, Canada 41 csMTL Using IDT (Logic Domain)

Intelligent Information Technology Research Lab, Acadia University, Canada 42 csMTL Using kNN (Logic Domain, k=5)

Intelligent Information Technology Research Lab, Acadia University, Canada 43 Recent Work Develop a csMTL plug-in for WEKA Explore domains of tasks that have multiple outputs Image transformation Machine Lifelong Learning (ML3) with csMTL Consolidation of task knowledge

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL for WEKA completed a new version WEKA MLP (work with B. Fowler and L. Tu) Called MLP_CS Will accept the csMTL encoded examples with context inputs Can be used for transfer learning by researchers and practitioners See 44

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL for WEKA 45 See

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL and Tasks with Multiple Outputs Liangliang Tu (2010) Image Morphing: Inductive Transfer between Tasks That Have Multiple Outputs Transforms 30x30 grey scale images using inductive transfer 46

Intelligent Information Technology Research Lab, Acadia University, Canada 47 csMTL and Tasks with Multiple Outputs

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL and Tasks with Multiple Outputs Liangliang Tu (2010) Image Morphing Transforms 30x30 grey scale images Inductive transfer used to develop more accurate transform functions 48

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL and Tasks with Multiple Outputs 49

Intelligent Information Technology Research Lab, Acadia University, Canada csMTL and Tasks with Multiple Outputs 50 Demo

Intelligent Information Technology Research Lab, Acadia University, Canada 51 Conclusions csMTL is a method of inductive transfer using multiple tasks: Single task output, additional context inputs Shifts focus to learning a continuous domain of tasks Eliminates redundant task representation (multiple outputs) Empirical studies: csMTL performs transfer at or above level of MTL May not require a measure of task relatedness Capable of transfer when tasks have multiple outputs

Intelligent Information Technology Research Lab, Acadia University, Canada 52 Future Work Conditions under which csMTL ANNs succeed / fail General ML characteristics under which csMTL encoding will work Explore deep learning networks (G.Hinton) Develop and test a more complete csMTL based Machine Lifelong Learning (ML3) system Explore domains with real-valued context inputs grounded in their environment

Intelligent Information Technology Research Lab, Acadia University, Canada 53 Thank You!

Intelligent Information Technology Research Lab, Acadia University, Canada 54 x1x1 xnxn c1c1 ckck Task Context Standard Inputs Long-term Consolidated Domain Knowledge Network f 1 (c,x) Short-term Learning Network Representational transfer from CDK for rapid learning Functional transfer virtual examples) for consolidation f’(c,x) A ML3 based on csMTL One output for all tasks Work with Ben Fowler, 2010 Stability-Plasticity Problem

Intelligent Information Technology Research Lab, Acadia University, Canada 55 csMTL ML3 Algorithm Short-term Learning via Inductive Transfer Fix the representation of the long-term network Initialize hidden and output node connection weights unique to short-term network to s.r.v. Train and test model for new task in short-term network using available data (tuning set prevents overfit) If generalization accuracy is sufficient, consolidate into long-term network

Intelligent Information Technology Research Lab, Acadia University, Canada 56 csMTL ML3 Algorithm Long-term Consolidation via Task Rehearsal Generate lots of virtual examples for new and prior tasks using existing representations of short- term and long-term networks Unfix the representation of the long-term network Train the long-term network using the virtual examples under cross-validation (tuning set prevents overfit)

Intelligent Information Technology Research Lab, Acadia University, Canada Consolidation based on csMTL Variation in number of virtual examples 57 Work with Ben Fowler, 2010

Intelligent Information Technology Research Lab, Acadia University, Canada Consolidation based on csMTL Variation in transfer and validation set 58 Work with Ben Fowler, 2010

Intelligent Information Technology Research Lab, Acadia University, Canada 59 Benefits of csMTL ML3: Long-term Consolidation … Effective retention (all tasks in DK net improve) Efficient retention (redundancy eliminated) Meta-knowledge collection (context cues) Short-term Learning … Effective learning (inductive transfer) Efficient learning (representation + function) Transfer / training examples used appropriately

Intelligent Information Technology Research Lab, Acadia University, Canada 60 Limitations of csMTL ML3 Consolidation is space and time complex: Rehearsal of all tasks means lots of virtual training examples required Back-propagation of error computational complexity = O(W 3 ); where W = # of weights

Intelligent Information Technology Research Lab, Acadia University, Canada 61 Context Sensitive MTL (csMTL) We are currently investigating the theory of Hints [Abu-Mostafa ] for formalization of: How each task of the domain can be seen as a Hint task for learning the domain of tasks How the VC dimension for learning a particular task, f k (c,x), is reduced by learning others x1x1 xnxn c1c1 ckck Context Inputs Primary Inputs One output for all tasks y’

Intelligent Information Technology Research Lab, Acadia University, Canada 62 Requirements for a ML3 System: Req. for Long-term Retention … Effective Retention Resist introduction and accumulation of error Retention of new task knowledge should improve related prior task knowledge (practice should improve performance) Efficient Retention Minimize redundant use of memory via consolidation Meta-knowledge Collection e.g. Example distribution over the input space Ensures Effective and Efficient Indexing Selection of related prior knowledge for inductive bias should be accurate and rapid

Intelligent Information Technology Research Lab, Acadia University, Canada 63 Requirements for a ML3 System: Req. for Short-term Learning … Effective (transfer) Learning New learning should benefit from related prior task knowledge ML3 hypotheses should meet or exceed accuracy of those hypotheses developed without benefit of transfer Efficient (transfer) Learning Transfer should reduce training time Increase in space complexity should be minimized Transfer versus Training Examples Must weigh relevance and accuracy of prior knowledge, against Number and accuracy of available training examples

Intelligent Information Technology Research Lab, Acadia University, Canada 64 Benefits of csMTL ML3: Long-term Consolidation … Effective Retention Rehearsal overcomes stability-plasticity problem [Robins95] Increases accuracy of all related tasks [Silver04] Facilitates practice of same task [O’Quinn05] Efficient Retention Eliminates redundant, inaccurate, older hypotheses [Silver04] Meta-knowledge Collection Focus is on learning a continuous domain of tasks Changes in context inputs selects task domain knowledge Ensures Effective and Efficient Indexing Conjecture: Prior knowledge selection is made implicitly by training examples. Indexing occurs as connection weights between long-term and short-term are learned.

Intelligent Information Technology Research Lab, Acadia University, Canada 65 Benefits of csMTL ML3: Short-term Learning … Effective (transfer) Learning Accurate inductive bias via transfer from long-term net A measure of task relatedness is not required Efficient (transfer) Learning Rapid inductive bias via transfer from long-term net Short-term network weights are reusable Transfer versus Training Examples If new task previously learned, weights between long-term and short- term network are quickly learned If new task is different but related to prior task most appropriate features from long-term network will be selected If new task is unrelated to prior tasks, the supplemental hidden nodes will developed needed features.