Lifelong Machine Learning and Reasoning

Slides:



Advertisements
Similar presentations
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Advertisements

Cognitive Systems, ICANN panel, Q1 What is machine intelligence, as beyond pattern matching, classification and prediction. What is machine intelligence,
Stream flow rate prediction x = weather data f(x) = flow rate.
ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
A new approach to Artificial Intelligence.  There are HUGE differences between brain architecture and computer architecture  The difficulty to emulate.
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Deep Learning.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Learning from Observations Chapter 18 Section 1 – 4.
An Introduction to Machine Learning In the area of AI (earlier) machine learning took a back seat to Expert Systems Expert system development usually consists.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Chapter 12: Intelligent Systems in Business
Introduction to machine learning
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Artificial Intelligence (AI) Addition to the lecture 11.
Comp 5013 Deep Learning Architectures Daniel L. Silver March,
West Virginia University
Machine Learning. Learning agent Any other agent.
How to do backpropagation in a brain
Artificial Intelligence Lecture No. 28 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Chapter 14: Artificial Intelligence Invitation to Computer Science, C++ Version, Third Edition.
COMP3503 Intro to Inductive Modeling
Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.
Introduction to Neural Networks. Neural Networks in the Brain Human brain “computes” in an entirely different way from conventional digital computers.
Intelligent Information Technology Research Lab, Acadia University, Canada 1 Lifelong Machine Learning and Reasoning Daniel L. Silver Acadia University,
What is AI:-  Ai is the science of making machine do things that would requires intelligence.  Computers with the ability to mimic or duplicate the.
 The most intelligent device - “Human Brain”.  The machine that revolutionized the whole world – “computer”.  Inefficiencies of the computer has lead.
Chapter 9 Neural Network.
Artificial Neural Networks. Applied Problems: Image, Sound, and Pattern recognition Decision making  Knowledge discovery  Context-Dependent Analysis.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
NEURAL NETWORKS FOR DATA MINING
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Modern Topics in Multivariate Methods for Data Analysis.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
The roots of innovation Future and Emerging Technologies (FET) Future and Emerging Technologies (FET) The roots of innovation Proactive initiative on:
Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.
© 2008 SRI International Systems Learning for Complex Pattern Problems Omid Madani AI Center, SRI International.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
CS 478 – Tools for Machine Learning and Data Mining Perceptron.
Subsumption Architecture and Nouvelle AI Arpit Maheshwari Nihit Gupta Saransh Gupta Swapnil Srivastava.
Course Overview  What is AI?  What are the Major Challenges?  What are the Main Techniques?  Where are we failing, and why?  Step back and look at.
A Roadmap towards Machine Intelligence
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 24, 2000 William.
Intelligent Information Technology Research Lab, Acadia University, Canada 1 Machine Lifelong Learning: Inductive Transfer with Context- sensitive Neural.
RULES Patty Nordstrom Hien Nguyen. "Cognitive Skills are Realized by Production Rules"
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
Deep Learning Overview Sources: workshop-tutorial-final.pdf
FNA/Spring CENG 562 – Machine Learning. FNA/Spring Contact information Instructor: Dr. Ferda N. Alpaslan
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Artificial Neural Networks By: Steve Kidos. Outline Artificial Neural Networks: An Introduction Frank Rosenblatt’s Perceptron Multi-layer Perceptron Dot.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Brief Intro to Machine Learning CS539
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Model Discovery through Metalearning
Deep Learning Amin Sobhani.
Lifelong Machine Learning and Reasoning
Deep Learning Workshop
Inductive Transfer, Machine Lifelong Learning, and AGI
MANAGING KNOWLEDGE FOR THE DIGITAL FIRM
Artificial Intelligence
Limitations of Traditional Deep Network Architectures
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Overview of Machine Learning
Co-champions: Mike Bennett, Andrea Westerinen
Intelligent Process Automation in Audit
Presentation transcript:

Lifelong Machine Learning and Reasoning Daniel L. Silver Acadia University, Wolfville, NS, Canada

Talk Outline Position and Motivation Lifelong Machine Learning Team Silver Position and Motivation Lifelong Machine Learning Deep Learning Architectures Neural Symbolic Integration Learning to Reason Summary and Recommendations

Position It is now appropriate to seriously consider the nature of systems that learn and reason over a lifetime Advocate a systems approach in the context of an agent that can: Acquire new knowledge through learning Retain and consolidate that knowledge Use it in future learning, reasoning and other aspects of AI

Moving Beyond Learning Algorithms - Rationale 1. Strong foundation in prior work 2. Inductive bias is essential to learning (Mitchell, Utgoff 1983; Wolpert 1996) Learning systems should retain and use prior knowledge as a source for shifting inductive bias Many real-world problems are non-stationary; have drift Last year (2013) at AAAI Spring Symposium, Qiang Yang (HK U of Sci and Tech) and I proposed that the ML community seriously consider moving beyond learning algorithms to systems that learning continuously over a life time.

Moving Beyond Learning Algorithms - Rationale 3. Practical Agents/Robots Require LML Advances in autonomous robotics and intelligent agents that run on the web or in mobile devices present opportunities for employing LML systems. The ability to retain and use learned knowledge is very attractive to the researchers designing these systems.

Moving Beyond Learning Algorithms - Rationale 4. Increasing Capacity of Computers Advances in modern computers provide the computational power for implementing and testing practical LML systems IBMs Watson (2011) 90 IBM Power-7 servers Each with four 8-core processors 15 TB (220M text pages) of RAM Tasks divided into thousands of stand-alone jobs distributed among 80 teraflops (1 trillion ops/sec)

Moving Beyond Learning Algorithms - Rationale 5. Theoretical advances in AI: ML  KR “The acquisition, representation and transfer of domain knowledge are the key scientific concerns that arise in lifelong learning.” (Thrun 1997) KR plays an important a role in LML Interaction between knowledge retention & transfer LML has the potential to make advances on the learning of common background knowledge Leads to questions about learning to reason

Lifelong Machine Learning My first biological learning system 2013 1994

Lifelong Machine Learning Considers systems that can learn many tasks over a lifetime from one or more domains Concerned with methods of retaining and using learned knowledge to improve the effectiveness and efficiency of future learning We investigate systems that must learn: From impoverished training sets For diverse domains of tasks Where practice of the same task happens Applications: Agents, Robotics, Data Mining, User Modeling

Lifelong Machine Learning Framework Testing Examples Instance Space X Universal Knowledge Retention Domain Knowledge Knowledge Transfer (xi, y =f(xi)) Inductive Bias, BD Knowledge Selection Model of Classifier h Inductive Learning System short-term memory Training Examples Prediction/Action = h(x) S

Essential Ingredients of LML The retention (or consolidation) of learned task knowledge Knowledge Representation perspective Effective and Efficient Retention Resists the accumulation of erroneous knowledge Maintains or improves model performance Mitigates redundant representation Allows the practice of tasks

Essential Ingredients of LML The selective transfer of prior knowledge when learning new tasks Machine Learning perspective More Effective and Efficient Learning More rapidly produce models That perform better Selection of appropriate inductive bias to guide search

Essential Ingredients of LML A systems approach Ensures the effective and efficient interaction of the retention and transfer components Much to be learned from the writings of early cognitive scientists, AI researchers and neuroscientists such as Albus, Holland, Newel, Langley, Johnson-Laird and Minsky

Overview of LML Work Supervised Learning Unsupervised Learning Hybrids (semi-supervised, self-taught, co-training, etc) Reinforcement Learning Mark Ring, Rich Sutton, Tanaka and Yamamura

Supervised LML Michalski (1980s) Utgoff and Mitchell (1983) Constructive inductive learning Utgoff and Mitchell (1983) Importance of inductive bias to learning - systems should be able to search for an appropriate inductive bias using prior knowledge Solomonoff (1989) Incremental learning Thrun and Mitchell (1990s) Explanation-based neural networks (EBNN) Lifelong Learning Michalski (1980s) Constructive inductive learning Principle: New knowledge is easier to induce if search is done using the correct representation Two interrelated searches during learning: Search for the best representational space for hypotheses Search for best hypothesis in the current representational space Utgoff and Mitchell (1983) Importance of inductive bias to learning - systems should be able to search for an appropriate inductive bias using prior knowledge Proposed a system that shifted its bias by adjusting the operations of the modeling language Solomonoff (1989) Incremental learning System primed on a small, incomplete set of primitive concepts; first learns to express the solutions to a set of simple problems Then given more difficult problems and, if necessary, additional primitive concepts, etc Thrun and Mitchell (1990s) Explanation-based neural networks (EBNN) Transfers knowledge across multiple learning tasks Uses domain knowledge of previous learning tasks (back-prop. gradients) to guide the development of a new task

LML via context sensitve csMTL Task Rehearsal Functional transfer (virtual examples) for slow consolidation One output for all tasks f’(c,x) f1(c,x) Short-term Learning Network Representational transfer from CDK for rapid learning Long-term Consolidated Domain Knowledge Network c1 ck x1 xn Task Context Standard Inputs Silver, Poirier, Currie (also Tu, Fowler) Inductive transfer with context-sensitive neural networks MMach Learn (2008) 73: 313–336

LML via context sensitve csMTL Task Rehearsal Functional transfer (virtual examples) for slow consolidation One output for all tasks f’(c,x) f1(c,x) Short-term Learning Network Representational transfer from CDK for rapid learning Long-term Consolidated Domain Knowledge Network c1 ck x1 xn Task Context Standard Inputs Silver, Poirier, Currie (also Tu, Fowler) Inductive transfer with context-sensitive neural networks MMach Learn (2008) 73: 313–336

An Environmental Example x = weather data f(x) = flow rate Stream flow rate prediction [Lisa Gaudette, 2006]

csMTL and Tasks with Multiple Outputs Liangliang Tu (2010) Image Morphing: Inductive transfer between tasks that have multiple outputs Transforms 30x30 grey scale images using inductive transfer Three mapping tasks NA NH NS

csMTL and Tasks with Multiple Outputs

csMTL and Tasks with Multiple Outputs Demo

Two more Morphed Images Passport Angry Filtered Passport Sad Filtered

Unsupervised LML Deep Learning Architectures Consider the problem of trying to classify these hand-written digits. Hinton, G. E., Osindero, S. and Teh, Y. (2006)
A fast learning algorithm for deep belief nets.
Neural Computation 18, pp 1527-1554. Layered networks of unsupervised auto-encoders efficiently develop hierarchies of features that capture regularities in their respective inputs Some of the earliest work on unsupervided LML can be traced to: Grossberg and Carpenter (1987) Stability-Plasticity problem Integrating new knowledge in with old? ART – Adaptive Resonance Theory Strehl and Ghosh (2003) Cluster ensemble framework Reuses prior partitionings to cluster data for new task Three techniques for obtaining high quality ensemble combiners

Deep Learning Architectures 2000 top-level artificial neurons 2 1 3 500 neurons (higher level features) 1 2 3 4 5 6 7 8 9 DLA Neural Network: Unsupervised training, followed by back-fitting 40,000 examples Learns to: * recognize digits using labels * reconstruct digits given a label Stochastic in nature 500 neurons (low level features) Images of digits 0-9 (28 x 28 pixels)

Deep Learning Architectures Develop common features from unlabelled examples using unsupervised algorithms Courtesy of http://youqianhaozhe.com/research.htm

Deep Learning Architectures Andrew Ng’s work on Deep Learning Networks (ICML-2012) Problem: Learn to recognize human faces, cats, etc from unlabeled data Dataset of 10 million images; each image has 200x200 pixels 9-layered locally connected neural network (1B connections) Parallel algorithm; 1,000 machines (16,000 cores) for three days Building High-level Features Using Large Scale Unsupervised Learning Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. Ng ICML 2012: 29th International Conference on Machine Learning, Edinburgh, Scotland, June, 2012.

Deep Learning Architectures Results: A face detector that is 81.7% accurate Robust to translation, scaling, and rotation Further results: 15.8% accuracy in recognizing 20,000 object categories from ImageNet 70% relative improvement over the previous state-of-the-art.

Deep Learning Architectures Stimulates new ideas about how knowledge of the world is learned, consolidated, and then used for future learning and reasoning Learning and representation of common background knowledge Important to Big AI problem solving

LMLR: Learning to Reason ML  KR … a very interesting area Knowledge consolidation provides insights into how best to represent common knowledge for use in learning and reasoning A survey of learning / reasoning paradigms has identified two additional promising bodies of work: NSI - Neural-Symbolic Integration L2R - Learning to Reason Opportunities for advances in AI: ML  KR Knowledge consolidation can provide insights into common knowledge representation for use in learning and reasoning. Beyond LML, a survey of learning methods identified three promising bodies of work: Neural-Symbolic Integration (NSI), NSI considers the benefits of integrating neural network learning with symbolic reasoning. Deep Learning Architectures (DLA). DLA shares a common interest with LML in developing abstract knowledge of the world from training examples. Learning to Reason (L2R), L2R takes a probabilistic perspective on learning and reasoning; an agent need not answer all possible knowledge queries, but only those that are relevant to the environment of a learner in a probably approximately correct (PAC) sense; that is, assertions can be learned to a desired level of accuracy and confidence.

Neural-Symbolic Integration Considers hybrid systems that integrate neural networks and symbolic logic Takes advantage of: Learning capacity of connectionist networks Transparency and reasoning capacity of logic [Garcez09,Lamb08] Three major areas of inquiry: Use of connectionist systems for symbolic representation, reasoning and learning Efficient and effective extraction of high-level concepts from complex networks Development of applications in areas such as vision, robotics, agents, and simulation

Neural-Symbolic Integration An integrated framework for NSI and LML Adapted from [Bader and Hitzler, 2005]

Neural-Symbolic Integration An integrated framework for NSI and LML Adapted from [Bader and Hitzler, 2005]

Open Questions Choice of Machine Learning to Use Which choice of ML works best in the context of knowledge for reasoning Unsupervised learning taking a more central role Others feel that reinforcement learning is the only true predictive modeling Hybrid methods are challenge for knowledge consolidation

Open Questions Training Examples versus Prior Knowledge Both NSI and LML systems must weigh the accuracy and relevance of retained knowledge Theories of how to selectively transfer common knowledge are needed Measures of relatedness needed Small Nova Scotia Trout !

Open Questions Effective and Efficient Knowledge Retention Refinement/Consolidation are key to NSI/LML Stability-plasticity: no loss of prior knowledge, increase accuracy/resolution if possible Approach should allow NSI/LML to efficiently select knowledge for use Has the potential to make serious advances on the learning of common background knowledge

Open Questions Effective and Efficient Knowledge Transfer Transfer learning should quickly develop accurate models Model accuracy should never degrade Functional transfer  more accurate models e.g. rehearsal of examples from prior tasks Representational transfer  more rapid learning e.g. priming with weights of prior models

Open Questions Practice makes perfect ! An LML system must be capable of learning from examples of tasks over a lifetime Practice should increase model accuracy and overall domain knowledge How can this be done? Research important to AI, Psych, and Education

Open Questions Scalability For NSI symbolic extraction is demanding For LML retention and transfer adds complexity Both must scale to large numbers of: Inputs, outputs Training examples Tasks over a lifetime Big Data means Big scaling problems

Learning to Reason (L2R) Takes a probabilistic perspective on learning and reasoning [Kardon and Roth 97] Agent need not answer all possible knowledge queries Only those that are relevant to the environment of a learner in a probably approximately correct (PAC) sense (w.r.t. some prob. dist.) [Valiant 08, Juba 12&13 ] Assertions can be learned to a desired level of accuracy and confidence using training examples of the assertions

Learning to Reason (L2R) We are working on a LMLR approach that uses: Multiple task learning Primed by unsupervised deep learning PAC-learns multiple logical assertions expressed as binary examples of Boolean functions Reasoning is done by querying the trained network using similar Boolean examples and looking for sufficient agreement on T/F Uses a combination of: DLA used to create hierarchies of abstract DNF-like features Consolidation is used to integrate new assertions with prior knowledge and to share abstract features across a domain knowledge model

Learning to Reason (L2R) Example: To learn the assertions (A ∧ B) ∨ C = True, and (A ∨ C) ∧ D = True The L2R system would be provided with examples of the Boolean functions equivalent to the assertion and subject to a distribution D over the examples : a b c d T a b c d T a b c d T a b c d T 0 0 0 * 0 1 0 0 * 1 0 * 0 0 0 1 * 0 0 0 0 0 1 * 1 1 0 1 * 1 0 * 0 1 0 1 * 0 1 1 0 1 0 * 1 1 1 0 * 1 0 * 1 0 0 1 * 1 0 0 0 1 1 * 1 1 1 1 * 1 0 * 1 1 1 1 * 1 1 1 To query the L2R system with an assertion such as A ∨ ~C = True then examples for this function would be used to test the system to see if it agreed

Summary Propose that the AI community move to systems that are capable of learning, retaining and using knowledge over a lifetime Opportunities for advances in AI lie at the locus of machine learning and knowledge representation Consider the acquisition of knowledge in a form that can be used for more general AI, such as Learning to Reason (L2R)  Methods of knowledge consolidation will provide insights into how to best represent common knowledge – fundamental to intelligent systems

Recommendations Researchers should Find low-hanging fruit Exploit common ground Explore differences Find low-hanging fruit Encourage pursuit of AI systems that are able to learn the knowledge that they use for reasoning Make new discoveries

Thank You! QUESTONS? danny.silver@acadiau.ca http://tinyurl/dsilver http://ml3.acadiau.ca