Probabilistic Models in Human and Machine Intelligence.

Slides:



Advertisements
Similar presentations
Approaches, Tools, and Applications Islam A. El-Shaarawy Shoubra Faculty of Eng.
Advertisements

The influence of domain priors on intervention strategy Neil Bramley.
The Logic of Intelligence Pei Wang Department of Computer and Information Sciences Temple University.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Rulebase Expert System and Uncertainty. Rule-based ES Rules as a knowledge representation technique Type of rules :- relation, recommendation, directive,
Bayesian Inference Chris Mathys Wellcome Trust Centre for Neuroimaging UCL SPM Course London, May 12, 2014 Thanks to Jean Daunizeau and Jérémie Mattout.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Dynamic Bayesian Networks (DBNs)
Announcements CS Ice Cream Social 9/5 3:30-4:30, ECCR 265 includes poster session, student group presentations.
1 Slides for the book: Probabilistic Robotics Authors: Sebastian Thrun Wolfram Burgard Dieter Fox Publisher: MIT Press, Web site for the book & more.
Probabilistic Models of Cognition Conceptual Foundations Chater, Tenenbaum, & Yuille TICS, 10(7), (2006)
Probabilistic inference in human semantic memory Mark Steyvers, Tomas L. Griffiths, and Simon Dennis 소프트컴퓨팅연구실오근현 TRENDS in Cognitive Sciences vol. 10,
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
CS B553: A LGORITHMS FOR O PTIMIZATION AND L EARNING aka “Neural and Genetic Approaches to Artificial Intelligence” Spring 2011 Kris Hauser.
Introduction  Bayesian methods are becoming very important in the cognitive sciences  Bayesian statistics is a framework for doing inference, in a principled.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.
Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.
Tom Griffiths CogSci C131/Psych C123 Computational Models of Cognition.
Machine Learning CMPT 726 Simon Fraser University
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Priors and predictions in everyday cognition Tom Griffiths Cognitive and Linguistic Sciences.
1 11 Lecture 12 Overview of Probability and Random Variables (I) Fall 2008 NCTU EE Tzu-Hsien Sang.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Roles of Knowledge in Cognition 1 Knowledge is often thought of as constituting particular bodies of facts, techniques, and procedures that cultures develop,
Review: Probability Random variables, events Axioms of probability
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
Bayesian approaches to cognitive sciences. Word learning Bayesian property induction Theory-based causal inference.
COMP3503 Intro to Inductive Modeling
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future Optimality and Bayesian inference Results.
A Brief Introduction to GA Theory. Principles of adaptation in complex systems John Holland proposed a general principle for adaptation in complex systems:
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Perceptual Multistability as Markov Chain Monte Carlo Inference.
Randomized Algorithms for Bayesian Hierarchical Clustering
Bayesian Inference, Review 4/25/12 Frequentist inference Bayesian inference Review The Bayesian Heresy (pdf)pdf Professor Kari Lock Morgan Duke University.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
Chapter 7. Learning through Imitation and Exploration: Towards Humanoid Robots that Learn from Humans in Creating Brain-like Intelligence. Course: Robots.
Probabilistic Models in Human and Machine Intelligence.
Gaussian Processes For Regression, Classification, and Prediction.
RULES Patty Nordstrom Hien Nguyen. "Cognitive Skills are Realized by Production Rules"
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
A Brief History of AI Fall 2013 COMP3710 Artificial Intelligence Computing Science Thompson Rivers University.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Chapter 1 Introduction to Research in Psychology.
From NARS to a Thinking Machine Pei Wang Temple University.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
Basic Bayes: model fitting, model selection, model averaging Josh Tenenbaum MIT.
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Cognitive Modeling Cogs 4961, Cogs 6967 Psyc 4510 CSCI 4960 Mike Schoelles
A probabilistic approach to cognition
What is cognitive psychology?
Modeling human action understanding as inverse planning
Course: Autonomous Machine Learning
CSCI 5822 Probabilistic Models of Human and Machine Learning
(Very Brief) Introduction to Bayesian Statistics
CSE-490DF Robotics Capstone
Probabilistic Models in Human and Machine Intelligence
EA C461 – Artificial Intelligence Introduction
LECTURE 07: BAYESIAN ESTIMATION
Presentation transcript:

Probabilistic Models in Human and Machine Intelligence

A Very Brief History of Cog Sci and AI 1950’s-1980’s  The mind is a von Neumann computer architecture  Symbolic models of cognition 1980’s-1990’s  The mind is a massively parallel neuron-like networks of simple processors  Connectionist models of cognition Late 1990’s -?  The mind operates according to laws of probability and statistical inference  Invades cog sci, AI (planning, natural language processing), ML  Formalizes the best of connectionist ideas

Relation of Probabilistic Models to Connectionist and Symbolic Models Connectionist models Symbolic models Probabilistic models strong bias principled, elegant incorporation of prior knowledge & assumptions rule learning from (small # examples) structured representations weak (unknown) bias ad hoc, implicit incorporation of prior knowledge & assumptions statistical learning (large # examples) feature-vector representations

Two Notions of Probability Frequentist notion  Relative frequency obtained if event were observed many times (e.g., coin flip) Subjective notion  Degree of belief in some hypothesis  Analogous to connectionist activation Long philosophical battle between these two views  Subjective notion makes sense for cog sci and AI given that probabilities represent mental states

Is Human Reasoning Bayesian? The probability of breast cancer is 1% for a woman at 40 who participates in routine screening. If a woman has breast cancer, the probability is 80% that she will have a positive mammography. If a woman does not have breast cancer, the probability is 9.6% that she will also have a positive mammography. A woman in this age group had a positive mammography in a routine screening? What is the probability that she actually has breast cancer? A. A. greater than 90% B. between 70% and 90% C. between 50% and 70% D. between 30% and 50% E. between 10% and 30% F. less than 10% Is this typical or the exception? Perhaps high-level reasoning isn’t Bayesian but underlying mechanisms of learning, inference, memory, language, and perception are. 95 / 100 doctors correct answer

Griffiths and Tenenbaum (2006) Optimal Predictions in Everyday Cognition If you were assessing an insurance case for an 18-year-old man, what would you predict for his lifespan? If you phoned a box office to book tickets and had been on hold for 3 minutes, what would you predict for the total time you would be on hold? If your friend read you her favorite line of poetry, and told you it was line 5 of a poem, what would you predict for the total length of the poem? If you opened a book about the history of ancient Egypt to a page listing the reigns of the pharaohs, and noticed that in 4000 BC a particular pharaoh had been ruling for 11 years, what would you predict for the total duration of his reign?

Griffiths and Tenenbaum Conclusion Average responses reveal a “close correspondence between peoples’ implicit probabilistic models and the statistics of the world.” People show a statistical sophistication and optimality of reasoning generally assumed to be absent in the domain of higher-order cognition.

Griffiths and Tenenbaum Bayesian Model If an individual has lived for t cur =50 years, how many years t total do you expect them to live?

What Does Optimality Entail? Individuals have complete, accurate knowledge about the domain priors. Fairly sophisticated computation involving Bayesian integral

From The Economist (1/5/2006) “[Griffiths and Tenenbuam]…put the idea of a Bayesian brain to a quotidian test. They found that it passed with flying colors.” “The key to successful Bayesian reasoning is … in having an appropriate prior… With the correct prior, even a single piece of data can be used to make meaningful Bayesian predictions.”

My Caution Bayesian formalism is sufficiently broad that nearly any theory can be cast in Bayesian terms  E.g., adding two numbers as Bayesian inference Emphasis on how cognition conforms to Bayesian principles often directs attention away from important memory and processing limitations.

Value Of Probabilistic Models In Cognitive Science Elegant theories  Optimality assumption produces strong constraints on theories  Key claims of theories are explicit  Can minimize assumptions via Bayesian model averaging Principled mathematical account  Wasn’t true of symbolic or connectionist theories  Currency of probability provides strong constraints (vs. neural net activation)

Rationality in Cognitive Science Some theories in cognitive science are based on premise that human performance is optimal  Rational theories, ideal observer theories  Ignores biological constraints  Probably true in some areas of cognition (e.g., vision) More interesting: bounded rationality  Optimality is assumed to be subject to limitations on processing hardware and capacity, representation, experience with the world.

Latent Dirichlet Allocation (a.k.a. Topic Model) Problem  Given a set of text documents, can we infer the topics that are covered by the set, and can we assign topics to individual documents  Unsupervised learning problem Technique  Exploit statistical regularities in data  E.g., documents that are on the topic of education will likely contain a set of words such as ‘teacher’, ‘student’, ‘lesson’, etc.

Generative Model of Text Each document is a collection of topics (e.g., education, finance, the arts) Each topic is characterized by a set of words that are likely to appear The string of words in a document is generated by: 1)Draw a topic from the probability distribution associated with a document 2)Draw a word from the probability distribution associated with a topic Bag of words approach

Inferring (Learning) Topics Input: set of unlabeled documents Learning task  Infer distribution over topics for each document  Infer distribution over words for each topic Distribution over topics can be helpful for classifying or clustering documents

Dan Knights and Rob Lindsey’s work at JDPA

Rob’s Work: Phrase Discovery

Value Of Probabilistic Models In AI and ML Provides language for re-casting many existing algorithms in a unified framework  Allows you to see interrelationship among algorithms  Allows you to develop new algorithms AI and ML fundamentally have to deal with uncertainty in the world, and uncertainty is well described in the language of random events. It’s the optimal thing to compute, in the sense that any other strategy will lead to lower expected returns  e.g., “I bet you $1 that roll of die will produce number < 3. How much are you willing to wager?”

Bayesian Analysis Make inferences from data using probability models about quantities we want to predict  E.g., expected age of death given 51 yr old  E.g., latent topics in document 1. Set up full probability model that characterizes distribution over all quantities (observed and unobserved) 2. Condition model on observed data to compute posterior distribution 3. Evaluate fit of model to data

Important Ideas in Bayesian Models Generative models  Likelihood function, prior distribution Consideration of multiple models in parallel  Potentially infinite model space Inference  prediction via model averaging  diminishing role of priors with evidence  explaining away Learning  Just another form of inference  Bayesian Occam's razor: trade off between model simplicity and fit to data

Important Technical Issues representing structured data  grammars  relational schemas (e.g., paper authors, topics) hierarchical models  different levels of abstraction nonparametric models  flexible models that grow in complexity as the data justifies approximate inference  Markov chain Monte Carlo, particle filters, variational approximations