Lifelong Machine Learning and Reasoning

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

Stream flow rate prediction x = weather data f(x) = flow rate.
ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
Modeling Human Reasoning About Meta-Information Presented By: Scott Langevin Jingsong Wang.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Spring 2004.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Learning from Observations Chapter 18 Section 1 – 4.
Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Comp 5013 Deep Learning Architectures Daniel L. Silver March,
Lifelong Machine Learning and Reasoning
CHAPTER 12 ADVANCED INTELLIGENT SYSTEMS © 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang.
Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.
COMP3503 Intro to Inductive Modeling
Intelligent Information Technology Research Lab, Acadia University, Canada 1 Lifelong Machine Learning and Reasoning Daniel L. Silver Acadia University,
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
NEURAL NETWORKS FOR DATA MINING
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Friday, February 4, 2000 Lijun.
Learning from Observations Chapter 18 Through
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Modern Topics in Multivariate Methods for Data Analysis.
Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Agents that Reduce Work and Information Overload and Beyond Intelligent Interfaces Presented by Maulik Oza Department of Information and Computer Science.
International Conference on Fuzzy Systems and Knowledge Discovery, p.p ,July 2011.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Intelligent Information Technology Research Lab, Acadia University, Canada 1 Machine Lifelong Learning: Inductive Transfer with Context- sensitive Neural.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Done by Fazlun Satya Saradhi. INTRODUCTION The main concept is to use different types of agent models which would help create a better dynamic and adaptive.
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Knowledge Representation Techniques
Model Discovery through Metalearning
The Relationship between Deep Learning and Brain Function
Deep Learning Amin Sobhani.
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
CS 9633 Machine Learning Inductive-Analytical Methods
Computational Learning Theory
Deep Learning with Symbols
Presented By S.Yamuna AP/CSE
CS 4700: Foundations of Artificial Intelligence
Artificial Intelligence
Chapter 11: Learning Introduction
Machine Learning Basics
Unsupervised Learning and Autoencoders
Deep Learning Workshop
Inductive Transfer, Machine Lifelong Learning, and AGI
Instance Based Learning (Adapted from various sources)
Artificial Intelligence
ANN Design and Training
Joseph Xu Soar Workshop 31 June 2011
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Overview of Machine Learning
KNOWLEDGE REPRESENTATION
Artificial Intelligence Lecture No. 28
Designing Neural Network Architectures Using Reinforcement Learning
Learning linguistic structure with simple recurrent neural networks
Machine Learning: UNIT-3 CHAPTER-2
Heterogeneous convolutional neural networks for visual recognition
Lecture 14 Learning Inductive inference
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Supervised machine learning: creating a model
Machine Learning: Lecture 5
Habib Ullah qamar Mscs(se)
Presentation transcript:

Lifelong Machine Learning and Reasoning Daniel L. Silver Acadia University, Wolfville, NS, Canada CoCo Workshop @ NIPS 2015 Montreal, Canada - Dec 12, 2015

Significant contributions by Jane Gomes Moh. Shameer Iqbal Ti Wang Xiang Jiang Geoffrey Mason Hossein Parvar

Talk Outline Overview Lifelong Machine Learning Role of Deep Learning Connection to Knowledge Rep and Reasoning Learning to Reason (L2R) Empirical Studies Conclusion and Future Work

Overview It is now appropriate to seriously consider the nature of systems that learn and reason over a lifetime Advocate a systems approach in the context of an agent that can: Acquire new knowledge through learning Retain and consolidate that knowledge Use it in future learning, reasoning and other aspects of AI [D.Silver, Q. Yang, L.Li 2013]

Overview Machine learning has made great strides in Learning to Classify (L2C) in a probabilistic manner in accord with the environment P(x) x

Overview Propose: Learning to Reason, or L2R P(x) x As per L.Valiant, D.Roth, R.Khardon, L.Bottou in a PAC sense, reasoning has to be adequate P(x) x

Overview Motivation: Learning to Reason, or L2R: LML  KR: New insights into how to best represent common background knowledge acquired over time and over the input space KR places additional constraints on internal representation in the same way as LML Generative Deep Learning – to use wealth of unlabelled examples and provide greater plasticity

Lifelong Machine Learning (LML) Considers systems that can learn many tasks over a lifetime From impoverished training sets Across a diverse domain of tasks Where practice of tasks happens Able to effectively and efficiently Consolidate (retain and integrate) learned knowledge Transfer prior knowledge when learning a new task

Lifelong Machine Learning (LML) space of hypothesis spaces H’ h'k hj space of hypotheses H space of examples X xi

Lifelong Machine Learning (LML) Testing Examples Instance Space X Retention & Consolidation Domain Knowledge long-term memory Knowledge Transfer (x, f(x)) Inductive Bias Knowledge Selection Model of Classifier h Inductive Learning System short-term memory Training Examples h(x) ~ f(x) S

Lifelong Machine Learning (LML) Testing Examples Instance Space X Domain Knowledge long-term memory Retention & Consolidation Knowledge Transfer (x, f(x)) Inductive Bias Knowledge Selection Model of Classifier h Inductive Learning System short-term memory Training Examples h(x) ~ f(x) S

Lifelong Machine Learning (LML) Testing Examples Instance Space X f2(x) f1(x) … f9(x) fk(x) Domain Knowledge long-term memory Consolidated MTL Knowledge Transfer Retention & Consolidation (x, f(x)) Inductive Bias Knowledge Selection Model of Classifier h f2(x) x1 xn fk(x) f5(x) Training Examples Multiple Task Learning (MTL) [R. Caruana 1997] h(x) ~ f(x) S

csMTL and An Environmental Example x = weather data f(x) = flow rate Stream flow rate prediction [Gaudette, Silver, Spooner 2006]

Context Sensitive MTL (csMTL) We have developed an alternative approach that is meant to overcome limitations of MTL networks: Uses a single output Context inputs associate an example with a task; or indicate absence of a primay input Develops a fluid domain of task knowledge index by the context inputs Supports consolidation of knowledge Facilitates practicing a task More easily supports tasks with vector outputs x1 xn c1 ck Primary Inputs x One output for all tasks y=f(c,x) Context Inputs c [Silver, Poirier and Currie, 2008]

csMTL and Tasks with Multiple Outputs Liangliang Tu (2010) Image Morphing: Inductive transfer between tasks that have multiple outputs Transforms 30x30 grey scale images using inductive transfer Three mapping tasks NA NH NS [Tu and Silver, 2010]

Two more Morphed Images Passport Angry Filtered Passport Sad Filtered

Domain Knowledge Network LML via csMTL Task Rehearsal Functional transfer (virtual examples) for slow consolidation One output for all tasks f’(c,x) f1(c,x) Short-term Learning Network Representational transfer from CDK for rapid learning Long-term Consolidated Domain Knowledge Network c1 ck x1 xn Context Inputs Standard Inputs

LML via csMTL Consolidation via task rehearsal can be achieved very effciently: Need only train on a few virtual examples (as few as one) selected at random during each training iteration Maintains stable prior functionality while allowing representational plasticity for integration of new task [Silver, Mason and Eljabu 2015]

Deep Learning and LML Stacked RBMs develop a rich feature space from unlabelled examples using unsupervised algorithms [Source: Caner Hazibas – slideshare]

Deep Learning and LML x1 xn c1 ck Primary Inputs x One output for all tasks y=f(c,x) Context Inputs c Transfer learning and consolidation works better with a deep learning csMTL Generative models are built using an RBM stack and unlabelled examples Inputs include context and primary attributes Can produce a rich variety of features indexed by the context nodes Supervised learning used to fine-tune all or portion of weights for multiple-task knowledge transfer or consolidation [Jiang and Silver, 2015]

Deep Learning and LML Experiments using the MNIST dataset x1 xn c1 ck Primary Inputs x One output for all tasks y=f(c,x) Context Inputs c [Jiang and Silver, 2015]

Deep Learning and LML http://ml3cpu.acadiau.ca [Wang and Silver, 2015] [Iqbal and Silver, in press]

Deep Learning and LML Stimulates new ideas about: How knowledge of the world is learned, consolidated, and then used for future learning and reasoning How best to learn and represent common background knowledge Important to Big AI problem solving ... such as reasoning

Knowledge Representation and Reasoning Focuses on the representation of information that can be used for reasoning It enables an entity to determine consequences by thinking rather than acting Traditionally requires a reasoning/inference engine to answer queries about beliefs

Knowledge Representation and Reasoning Reasoning could be considered “algebraic [systematic] manipulation of previously acquired knowledge in order to answer a new question” (L. Bottou 2011) Requires a method of acquiring and storing knowledge Learning from the environment is the obvious choice …

Learning to Reason (L2R) Concerned with the process of learning a knowledge base and reasoning with it [Kardon and Roth 97] Reasoning is subject to the errors that can be bounded in terms of the inverse of the effort invested in the learning process Requires knowledge representations that are learnable and facilitate reasoning “This statement is false”

Learning to Reason (L2R) Takes a probabilistic perspective on learning and reasoning [Kardon and Roth 97] Agent need not answer all possible knowledge queries Only those that are relevant to the environment in a (PAC) sense [Valiant 08, Juba 12&13 ]

Learning to Reason (L2R) Valiant and Khardon show formally: L2R allows efficient learning of Boolean logical assertions in the PAC-sense Learned knowledge can be used to reason efficiently, and to an expected level of accuracy and confidence We wish to demonstrate that: A knowledge base of Boolean functions, is PAC learnable from examples using a csMTL network Even when the examples provide information about only portion of the input space … explore a LML approach - consolidation over time and over the input space

Learning to Reason (L2R) Simple terms and clauses: ~A B C (~A v B) (~B v C) Propositional Logic Functions Input “truth table terms: A B C …True/False 0 1 0 … 1 More complex functions: (~A v B) v (~B v C) (~A v C) Functions of Functions: ~(~A v B) v ~(~B v C) v (~A v C)

L2R with LML – Study 1 Consider the Law of Syllogism KB: (A  B)∧(B  C) Q: (A  C)

L2R with LML – Study 1 Learning the Law of Syllogism: Training Set, KB: A C cA cC and Query Set, Q: cB B

L2R with LML – Study 1 Learning the Law of Syllogism: Training Set, KB: A C cA cC 6-10-10-1 network Query Set, Q: cB B Results: Average over 30 runs: 89% correct

L2R with LML – Study 2 Objective: Learn the Law of Syllogism (10 literals) KB: (A∧B∨C)  (D∨E∨~F) ∧ (D∨E∨~F) (G∨(~H∧I)∨~J) Q: (A∧B∨C)  (G∨(~H∧I)∨~J) Training set: 100% of subKB examples (A∧B∨C)  (D∨E∨~F) (D∨E∨~F) (G∨(~H∧I)∨~J) Q: (A∧B∨C)  (G∨(~H∧I)∨~J) Average over 10 runs: 78% accuracy 20-10-10-1 network

L2R with LML – Study 3 Objective: To learning the following knowledge base: Two different ways: From examples of KB (1024 in total) From examples of sub-clauses of KB (sub-KB) 20-10-10-1 network Training Set: All possible sub-KB:

L2R with LML – Study 3 Objective: To learning the following knowledge base: Results: Test on all KB examples (over 5 runs) Mean accuracy % of examples used for training

Conclusion Learning to Reason (L2R) using a csMTL neural network: Uses examples to learn a model of logical functions in a probabilistic manner Consolidates knowledge from examples that represent only portion of the input space Reasoning = testing the model using truth table of Q Relies on context nodes to select inputs that are relevant Results on simple Boolean logic domain suggests promise

Future work Create a scope for determining those tasks that a trained network finds TRUE Thoroughly examined the affect of a probability distribution over the input space (train and test sets) Combine csMTL with deep learning architectures to learn hierarchies of abstract features (tend to be DNF) Consider other learning algorithms Consider more complex knowledge bases – beyond propositional logic

Thank You! danny.silver@acadiau.ca http://tinyurl/dsilver References: L. G. Valiant. Knowledge infusion: In pursuit of robustness in artificial intelligence. FSTTCS, 415-422, 2008. Brendan Juba. Implicit learning of common sense for reasoning. IJCAI, 939-946, 2013. Roni Khardon and D. Roth. Learning to reason. Journal of the ACM, 44(5):697-725, 1997. D. Siver, R. Poirier, and D. Currie. Inductive transfer with context sensitive neural networks. Machine Learning - Special Issue on Inductive Transfer, Springer, 73(3):313-336, 2008. Silver, D. and Mason, G. and Eljabu, L. 2015, Consolidation using Sweep Task Rehearsal: Overcoming the Stability-Plasticity Problem, Advances in Artificial Intelligence, 28th Conference of the Canadian Artificial Intelligence Association (AI 2015), Springer, LNAI 9091, pp 307-324. Wang.T and Silver,D. 2015, Learning Paired-associate Images with An Unsupervised Deep Learning Architecture, LNAI 9091, pp 250-263.  Gomes, J. and Silver,D. 2015, Learning to Reason in A Probably Approximately Correct Manner, Proceeding of the CCECE 2014, Halifax, NS, May 2015, IEEE Press, pp. 1475-8. Silver, D. The Consolidation of Task Knowledge for Lifelong Machine Learning. Proceedings of the AAAI Spring Symposium on Lifelong Machine Learning, Stanford University, CA, AAAI, March, 2013, pp 46–48. Silver, D. and Yang, Q. and Li, L. Lifelong machine learning systems: Beyond learning algorithms. Proceedings of the AAAI Spring Symposium on Lifelong Machine Learning, Stanford University, CA, AAAI, March, 2013, pp 49–55. Silver, D. and Tu, L. Image Morphing: Transfer Learning between Tasks that have Multiple Outputs. Advances in Artificial Intelligence, 25th Conference of the Canadian Artificial Intelligence Association (AI 2012), Toronto, ON, May, 2012, Springer, LNAI 7310, pp. 194-205. Silver, D. and Spooner, I. and Gaudette, L. 2009. Inductive Transfer Applied to Modeling River Discharge in Nova Scotia, Atlantic Geology: Journal of the Atlantic Geoscience Society, (45) 191–203.