Lifelong Machine Learning and Reasoning Daniel L. Silver Acadia University, Wolfville, NS, Canada
Talk Outline Position and Motivation Lifelong Machine Learning Team Silver Position and Motivation Lifelong Machine Learning Deep Learning Architectures Neural Symbolic Integration Learning to Reason Summary and Recommendations
Position It is now appropriate to seriously consider the nature of systems that learn and reason over a lifetime Advocate a systems approach in the context of an agent that can: Acquire new knowledge through learning Retain and consolidate that knowledge Use it in future learning, reasoning and other aspects of AI
Moving Beyond Learning Algorithms - Rationale 1. Strong foundation in prior work 2. Inductive bias is essential to learning (Mitchell, Utgoff 1983; Wolpert 1996) Learning systems should retain and use prior knowledge as a source for shifting inductive bias Many real-world problems are non-stationary; have drift Last year (2013) at AAAI Spring Symposium, Qiang Yang (HK U of Sci and Tech) and I proposed that the ML community seriously consider moving beyond learning algorithms to systems that learning continuously over a life time.
Moving Beyond Learning Algorithms - Rationale 3. Practical Agents/Robots Require LML Advances in autonomous robotics and intelligent agents that run on the web or in mobile devices present opportunities for employing LML systems. The ability to retain and use learned knowledge is very attractive to the researchers designing these systems.
Moving Beyond Learning Algorithms - Rationale 4. Increasing Capacity of Computers Advances in modern computers provide the computational power for implementing and testing practical LML systems IBMs Watson (2011) 90 IBM Power-7 servers Each with four 8-core processors 15 TB (220M text pages) of RAM Tasks divided into thousands of stand-alone jobs distributed among 80 teraflops (1 trillion ops/sec)
Moving Beyond Learning Algorithms - Rationale 5. Theoretical advances in AI: ML KR “The acquisition, representation and transfer of domain knowledge are the key scientific concerns that arise in lifelong learning.” (Thrun 1997) KR plays an important a role in LML Interaction between knowledge retention & transfer LML has the potential to make advances on the learning of common background knowledge Leads to questions about learning to reason
Lifelong Machine Learning My first biological learning system 2013 1994
Lifelong Machine Learning Considers systems that can learn many tasks over a lifetime from one or more domains Concerned with methods of retaining and using learned knowledge to improve the effectiveness and efficiency of future learning We investigate systems that must learn: From impoverished training sets For diverse domains of tasks Where practice of the same task happens Applications: Agents, Robotics, Data Mining, User Modeling
Lifelong Machine Learning Framework Testing Examples Instance Space X Universal Knowledge Retention Domain Knowledge Knowledge Transfer (xi, y =f(xi)) Inductive Bias, BD Knowledge Selection Model of Classifier h Inductive Learning System short-term memory Training Examples Prediction/Action = h(x) S
Essential Ingredients of LML The retention (or consolidation) of learned task knowledge Knowledge Representation perspective Effective and Efficient Retention Resists the accumulation of erroneous knowledge Maintains or improves model performance Mitigates redundant representation Allows the practice of tasks
Essential Ingredients of LML The selective transfer of prior knowledge when learning new tasks Machine Learning perspective More Effective and Efficient Learning More rapidly produce models That perform better Selection of appropriate inductive bias to guide search
Essential Ingredients of LML A systems approach Ensures the effective and efficient interaction of the retention and transfer components Much to be learned from the writings of early cognitive scientists, AI researchers and neuroscientists such as Albus, Holland, Newel, Langley, Johnson-Laird and Minsky
Overview of LML Work Supervised Learning Unsupervised Learning Hybrids (semi-supervised, self-taught, co-training, etc) Reinforcement Learning Mark Ring, Rich Sutton, Tanaka and Yamamura
Supervised LML Michalski (1980s) Utgoff and Mitchell (1983) Constructive inductive learning Utgoff and Mitchell (1983) Importance of inductive bias to learning - systems should be able to search for an appropriate inductive bias using prior knowledge Solomonoff (1989) Incremental learning Thrun and Mitchell (1990s) Explanation-based neural networks (EBNN) Lifelong Learning Michalski (1980s) Constructive inductive learning Principle: New knowledge is easier to induce if search is done using the correct representation Two interrelated searches during learning: Search for the best representational space for hypotheses Search for best hypothesis in the current representational space Utgoff and Mitchell (1983) Importance of inductive bias to learning - systems should be able to search for an appropriate inductive bias using prior knowledge Proposed a system that shifted its bias by adjusting the operations of the modeling language Solomonoff (1989) Incremental learning System primed on a small, incomplete set of primitive concepts; first learns to express the solutions to a set of simple problems Then given more difficult problems and, if necessary, additional primitive concepts, etc Thrun and Mitchell (1990s) Explanation-based neural networks (EBNN) Transfers knowledge across multiple learning tasks Uses domain knowledge of previous learning tasks (back-prop. gradients) to guide the development of a new task
LML via context sensitve csMTL Task Rehearsal Functional transfer (virtual examples) for slow consolidation One output for all tasks f’(c,x) f1(c,x) Short-term Learning Network Representational transfer from CDK for rapid learning Long-term Consolidated Domain Knowledge Network c1 ck x1 xn Task Context Standard Inputs Silver, Poirier, Currie (also Tu, Fowler) Inductive transfer with context-sensitive neural networks MMach Learn (2008) 73: 313–336
LML via context sensitve csMTL Task Rehearsal Functional transfer (virtual examples) for slow consolidation One output for all tasks f’(c,x) f1(c,x) Short-term Learning Network Representational transfer from CDK for rapid learning Long-term Consolidated Domain Knowledge Network c1 ck x1 xn Task Context Standard Inputs Silver, Poirier, Currie (also Tu, Fowler) Inductive transfer with context-sensitive neural networks MMach Learn (2008) 73: 313–336
An Environmental Example x = weather data f(x) = flow rate Stream flow rate prediction [Lisa Gaudette, 2006]
csMTL and Tasks with Multiple Outputs Liangliang Tu (2010) Image Morphing: Inductive transfer between tasks that have multiple outputs Transforms 30x30 grey scale images using inductive transfer Three mapping tasks NA NH NS
csMTL and Tasks with Multiple Outputs
csMTL and Tasks with Multiple Outputs Demo
Two more Morphed Images Passport Angry Filtered Passport Sad Filtered
Unsupervised LML Deep Learning Architectures Consider the problem of trying to classify these hand-written digits. Hinton, G. E., Osindero, S. and Teh, Y. (2006) A fast learning algorithm for deep belief nets. Neural Computation 18, pp 1527-1554. Layered networks of unsupervised auto-encoders efficiently develop hierarchies of features that capture regularities in their respective inputs Some of the earliest work on unsupervided LML can be traced to: Grossberg and Carpenter (1987) Stability-Plasticity problem Integrating new knowledge in with old? ART – Adaptive Resonance Theory Strehl and Ghosh (2003) Cluster ensemble framework Reuses prior partitionings to cluster data for new task Three techniques for obtaining high quality ensemble combiners
Deep Learning Architectures 2000 top-level artificial neurons 2 1 3 500 neurons (higher level features) 1 2 3 4 5 6 7 8 9 DLA Neural Network: Unsupervised training, followed by back-fitting 40,000 examples Learns to: * recognize digits using labels * reconstruct digits given a label Stochastic in nature 500 neurons (low level features) Images of digits 0-9 (28 x 28 pixels)
Deep Learning Architectures Develop common features from unlabelled examples using unsupervised algorithms Courtesy of http://youqianhaozhe.com/research.htm
Deep Learning Architectures Andrew Ng’s work on Deep Learning Networks (ICML-2012) Problem: Learn to recognize human faces, cats, etc from unlabeled data Dataset of 10 million images; each image has 200x200 pixels 9-layered locally connected neural network (1B connections) Parallel algorithm; 1,000 machines (16,000 cores) for three days Building High-level Features Using Large Scale Unsupervised Learning Quoc V. Le, Marc’Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. Ng ICML 2012: 29th International Conference on Machine Learning, Edinburgh, Scotland, June, 2012.
Deep Learning Architectures Results: A face detector that is 81.7% accurate Robust to translation, scaling, and rotation Further results: 15.8% accuracy in recognizing 20,000 object categories from ImageNet 70% relative improvement over the previous state-of-the-art.
Deep Learning Architectures Stimulates new ideas about how knowledge of the world is learned, consolidated, and then used for future learning and reasoning Learning and representation of common background knowledge Important to Big AI problem solving
LMLR: Learning to Reason ML KR … a very interesting area Knowledge consolidation provides insights into how best to represent common knowledge for use in learning and reasoning A survey of learning / reasoning paradigms has identified two additional promising bodies of work: NSI - Neural-Symbolic Integration L2R - Learning to Reason Opportunities for advances in AI: ML KR Knowledge consolidation can provide insights into common knowledge representation for use in learning and reasoning. Beyond LML, a survey of learning methods identified three promising bodies of work: Neural-Symbolic Integration (NSI), NSI considers the benefits of integrating neural network learning with symbolic reasoning. Deep Learning Architectures (DLA). DLA shares a common interest with LML in developing abstract knowledge of the world from training examples. Learning to Reason (L2R), L2R takes a probabilistic perspective on learning and reasoning; an agent need not answer all possible knowledge queries, but only those that are relevant to the environment of a learner in a probably approximately correct (PAC) sense; that is, assertions can be learned to a desired level of accuracy and confidence.
Neural-Symbolic Integration Considers hybrid systems that integrate neural networks and symbolic logic Takes advantage of: Learning capacity of connectionist networks Transparency and reasoning capacity of logic [Garcez09,Lamb08] Three major areas of inquiry: Use of connectionist systems for symbolic representation, reasoning and learning Efficient and effective extraction of high-level concepts from complex networks Development of applications in areas such as vision, robotics, agents, and simulation
Neural-Symbolic Integration An integrated framework for NSI and LML Adapted from [Bader and Hitzler, 2005]
Neural-Symbolic Integration An integrated framework for NSI and LML Adapted from [Bader and Hitzler, 2005]
Open Questions Choice of Machine Learning to Use Which choice of ML works best in the context of knowledge for reasoning Unsupervised learning taking a more central role Others feel that reinforcement learning is the only true predictive modeling Hybrid methods are challenge for knowledge consolidation
Open Questions Training Examples versus Prior Knowledge Both NSI and LML systems must weigh the accuracy and relevance of retained knowledge Theories of how to selectively transfer common knowledge are needed Measures of relatedness needed Small Nova Scotia Trout !
Open Questions Effective and Efficient Knowledge Retention Refinement/Consolidation are key to NSI/LML Stability-plasticity: no loss of prior knowledge, increase accuracy/resolution if possible Approach should allow NSI/LML to efficiently select knowledge for use Has the potential to make serious advances on the learning of common background knowledge
Open Questions Effective and Efficient Knowledge Transfer Transfer learning should quickly develop accurate models Model accuracy should never degrade Functional transfer more accurate models e.g. rehearsal of examples from prior tasks Representational transfer more rapid learning e.g. priming with weights of prior models
Open Questions Practice makes perfect ! An LML system must be capable of learning from examples of tasks over a lifetime Practice should increase model accuracy and overall domain knowledge How can this be done? Research important to AI, Psych, and Education
Open Questions Scalability For NSI symbolic extraction is demanding For LML retention and transfer adds complexity Both must scale to large numbers of: Inputs, outputs Training examples Tasks over a lifetime Big Data means Big scaling problems
Learning to Reason (L2R) Takes a probabilistic perspective on learning and reasoning [Kardon and Roth 97] Agent need not answer all possible knowledge queries Only those that are relevant to the environment of a learner in a probably approximately correct (PAC) sense (w.r.t. some prob. dist.) [Valiant 08, Juba 12&13 ] Assertions can be learned to a desired level of accuracy and confidence using training examples of the assertions
Learning to Reason (L2R) We are working on a LMLR approach that uses: Multiple task learning Primed by unsupervised deep learning PAC-learns multiple logical assertions expressed as binary examples of Boolean functions Reasoning is done by querying the trained network using similar Boolean examples and looking for sufficient agreement on T/F Uses a combination of: DLA used to create hierarchies of abstract DNF-like features Consolidation is used to integrate new assertions with prior knowledge and to share abstract features across a domain knowledge model
Learning to Reason (L2R) Example: To learn the assertions (A ∧ B) ∨ C = True, and (A ∨ C) ∧ D = True The L2R system would be provided with examples of the Boolean functions equivalent to the assertion and subject to a distribution D over the examples : a b c d T a b c d T a b c d T a b c d T 0 0 0 * 0 1 0 0 * 1 0 * 0 0 0 1 * 0 0 0 0 0 1 * 1 1 0 1 * 1 0 * 0 1 0 1 * 0 1 1 0 1 0 * 1 1 1 0 * 1 0 * 1 0 0 1 * 1 0 0 0 1 1 * 1 1 1 1 * 1 0 * 1 1 1 1 * 1 1 1 To query the L2R system with an assertion such as A ∨ ~C = True then examples for this function would be used to test the system to see if it agreed
Summary Propose that the AI community move to systems that are capable of learning, retaining and using knowledge over a lifetime Opportunities for advances in AI lie at the locus of machine learning and knowledge representation Consider the acquisition of knowledge in a form that can be used for more general AI, such as Learning to Reason (L2R) Methods of knowledge consolidation will provide insights into how to best represent common knowledge – fundamental to intelligent systems
Recommendations Researchers should Find low-hanging fruit Exploit common ground Explore differences Find low-hanging fruit Encourage pursuit of AI systems that are able to learn the knowledge that they use for reasoning Make new discoveries
Thank You! QUESTONS? danny.silver@acadiau.ca http://tinyurl/dsilver http://ml3.acadiau.ca