Mini-course on Artificial Neural Networks and Bayesian Networks Michal Rosen-Zvi Mini-course on ANN and BN, The Multidisciplinary Brain Research center,

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Bayesian Belief Propagation
Thomas Trappenberg Autonomous Robotics: Supervised and unsupervised learning.
Slides from: Doug Gray, David Poole
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Perceptron Learning Rule
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
Supervised Learning Recap
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
Machine Learning Neural Networks
Soft computing Lecture 6 Introduction to neural networks.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Neural NetworksNN 11 Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Principal Component Analysis
Un Supervised Learning & Self Organizing Maps Learning From Examples
Part IV: Inference algorithms. Estimation and inference Actually working with probabilistic models requires solving some difficult computational problems…
Neural Networks Marco Loog.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Latent Semantic Analysis Probabilistic Topic Models & Associative Memory.
A probabilistic approach to semantic representation Paper by Thomas L. Griffiths and Mark Steyvers.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Lecture #1COMP 527 Pattern Recognition1 Pattern Recognition Why? To provide machines with perception & cognition capabilities so that they could interact.
Concepts & Categorization. Measurement of Similarity Geometric approach Featural approach  both are vector representations.
Radial Basis Function Networks
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Crash Course on Machine Learning
This week: overview on pattern recognition (related to machine learning)
Self organizing maps 1 iCSC2014, Juan López González, University of Oviedo Self organizing maps A visualization technique with data dimension reduction.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
2 2  Background  Vision in Human Brain  Efficient Coding Theory  Motivation  Natural Pictures  Methodology  Statistical Characteristics  Models.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Mini-course on Artificial Neural Networks and Bayesian Networks Michal Rosen-Zvi Mini-course on ANN and BN, The Multidisciplinary Brain Research center,
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.
Lecture 2: Statistical learning primer for biologists
Latent Dirichlet Allocation
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Data Mining and Decision Support
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
3.Learning In previous lecture, we discussed the biological foundations of of neural computation including  single neuron models  connecting single neuron.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Perceptrons Michael J. Watts
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Mini-course on Artificial Neural Networks and Bayesian Networks Michal Rosen-Zvi Mini-course on ANN and BN, The Multidisciplinary Brain Research center,
Mini-course on Artificial Neural Networks and Bayesian Networks Michal Rosen-Zvi Mini-course on ANN and BN, The Multidisciplinary Brain Research center,
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Brief Intro to Machine Learning CS539
Who am I? Work in Probabilistic Machine Learning Like to teach 
Data Mining Lecture 11.
Perceptron as one Type of Linear Discriminants
Michal Rosen-Zvi University of California, Irvine
Expectation-Maximization & Belief Propagation
Prepared by: Mahmoud Rafeek Al-Farra
CS639: Data Management for Data Science
Chapter 14 February 26, 2004.
CS249: Neural Language Model
Patterson: Chap 1 A Review of Machine Learning
Presentation transcript:

Mini-course on Artificial Neural Networks and Bayesian Networks Michal Rosen-Zvi Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Section 1: Introduction Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Networks (1) Networks serve as a visual way for displaying relationships: Social networks are examples of ‘flat’ networks where the only information is relation between entities Example - collaboration networks Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Collaboration Network Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Networks (2) Artificial Neural Networks represent rules – deterministic relations - between input and output Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Networks (3) Bayesian Networks represent probabilistic relations - conditional independencies and dependencies between variables Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Outline Introduction/Motivation Artificial Neural Networks – The Perceptron, multilayered FF NN and recurrent NN – On-line (supervised) learning – Unsupervised learning and PCA – Classification – Capacity of networks Bayesian networks (BN) – Bayes rules and the BN semantics – Classification using Generative models Applications: Vision, Text Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Motivation The research of ANNs is inspired by neurons in the brain and (partially) driven by the need for models of the reasoning in the brain. Scientists are challenged to use machines more effectively for tasks traditionally solved by humans (example - driving a car, inferring scientific referees to papers and many others) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

History of (modern) ANNs and BNs Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May McCulloch and Pitts Model Hebbian Learning rule Minsky and Papert’s book PerceptronHopfield Network Pearl’s Book Gardner’s studies

Section 2: On-line Learning Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 Based on slides from Michael Biehl’s summer course

Section 2.1: The Perceptron Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

The Perceptron  Input:  W Adaptive Weights W Output: S Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Perceptron: binary output Implements a linearly separable classification of inputs Milestones: Perceptron convergence theorem, Rosenblatt (1958) Capacity, winder (1963) Cover(1965) Statistical Physics of perceptron weights, Gardner (1988) How does this device learn? Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 W

Learning a linearly separable rule from reliable examples  B  Unknown rule: S T (  )=sign(B  ) =±1 Defines the correct classification. BBB Parameterized through a teacher perceptron with weights B  R N, (B  B=1) Only available information: example data  B  D= {  , S  T (  )=sign(B   ) for  =1…P } Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Learning a linearly… (Cont.) W Training: finding the student weights W – W  W  – W parameterizes a hypothesis S S (  )=sign(W  ) D – Supervised learning is based on the student performance with respect to the training data D – Binary error measure W    T (W)=  [S  S (  ),S  T (  )] W  W    T (W)=0 if S  S (  )  S  T (  )   T (W)=1 if S  S (  )=S  T (  ) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Off-line learning W Guided by the minimization of a cost function H(W), e.g., the training error WW H(W)   t T (W) Equilibrium statistical mechanics treatment: – Energy H of N degrees of freedm – Ensemble of systems is in thermal equilibrium at formal temperature – Disorder avg. over random examples (replicas) assumes distribution over the inputs – Macroscopic description, order parameters – Typical properties of large sustems, P=  N Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

On-line training  Single presentation of uncorrelated (new) {  ,S  T (  )} Update of student weights: Learning dynamics in discrete time Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

On-line training - Statistical Physics approach Consider sequence of independent, random Thermodynamic limit Disorder average over latest example self- averaging properties Continuous time limit Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Section 3: Unsupervised learning Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 Based on slides from Michael Biehl’s summer course

Dynamics of unsupervised learning Learning without a teacher? Real world data is, in general, not isotropic and structure less in input space. Unsupervised learning = extraction of information from unlabelled iputs Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Potential aims Correlation analysis Clustering of data – grouping according to some similarity criterion Identification of prototypes – represent large amount of data by few examples Dimension reduction – represent high dimensional data by few relevant features Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Dimensionality Reduction The goal is to compress information with minimal loss Methods: – Unsupervised learning Principle Component Analysis – Nonnegative Matrix Factorization Bayesian Models (Matrices are probabilities) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Section 4: Bayesian Networks Some slides are from Baldi’s course on Neural Networks Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Bayesian Statistics Bayesian framework for induction: we start with hypothesis space and wish to express relative preferences in terms of background information (the Cox-Jaynes axioms). Axiom 0: Transitivity of preferences. Theorem 1: Preferences can be represented by a real number π (A). Axiom 1: There exists a function f such that π(non A) = f(π(A)) Axiom 2: There exists a function F such that π (A,B) = F(π(A), π(B|A)) Theorem2: There is always a rescaling w such that p(A)= w( π (A)) is in [0,1], and satisfies the sum and product rules. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Probability as Degree of Belief Sum Rule: P(non A) = 1- P(A) Product Rule: P(A and B) = P(A) P(B|A) BayesTheorem: P(B|A)=P(A|B)P(B)/P(A) Induction Form: P(M|D) = P(D|M)P(M)/P(D) Equivalently: log[P(M|D)] = log[P(D|M)]+log[P(M)]-log[P(D)] Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

The Asia problem “Shortness-of-breath (dyspnoea) may be due to Tuberculosis, Lung cancer or bronchitis, or none of them. A recent visit to Asia increases the chances of tuberculosis, while Smoking is known to be a risk factor for both lung cancer and Bronchitis. The results of a single chest X-ray do not discriminate between lung cancer and tuberculosis, as neither does the presence or absence of Dyspnoea.” Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 Lauritzen & Spiegelhalter 1988

Graphical models “Successful marriage between Probabilistic Theory and Graph Theory” M. I. Jordan Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 P(x 1,x 2,x 3 )  P(x 1,x 3 ) P(x 2,x 3 ) P(x 1,x 2,x 3 )   (x 1,x 3 )  (x 2,x 3 ) x1x1 x3x3 x2x2 Applications: Vision, Speech Recognition, Error correcting codes, Bioinformatics

Directed acyclic Graphs Involves conditional dependencies Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 x1x1 x3x3 x2x2 P(x 1,x 2,x 3 ) = P(x 1 )P(x 2 )P(x 3 |x 1,x 2 )

Directed Graphical Models (2) Each node is associated with a random variable Each arrow is associated with conditional dependencies (Parents–child) Shaded nodes illustrates an observed variable Plates stand for repetitions of i.i.d. drawings of the random variables Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Directed graph: ‘real world’ example Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 The author topic model Statistical modeling of data mining: Huge corpus, authors and words are observed, topics and relations are learned.

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Topics Model for Semantic Representation Based on a Professor Mark Steyver’s slides, a joint work of Mark Steyver’s (UCI) and Tom Griffiths (Stanford) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

The DRM Paradigm The Deese (1959), Roediger, and McDermott (1995) Paradigm: Subjects hear a series of word lists during the study phase, each comprising semantically related items strongly related to another non-presented word (“false target”). Subjects (later) receive recognition tests for all words plus other distracted words including the false target. DRM experiments routinely demonstrate that subjects claim to recognize false tagets. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Example: test of false memory effects in the DRM Paradaigm STUDY: Bed, Rest, Awake, Tired, Dream, Wake, Snooze, Blanket, Doze, Slumber, Snore, Nap, Peace, Yawn, Drowsy FALSE RECALL: “Sleep” 61% Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

A Rational Analysis of Semantic Memory Our associative/semantic memory system might arise from the need to efficiently predict word usage with just a few basis functions (i.e., “concepts” or “topics”) The topics model provides such a rational analysis Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

A Spatial Representation: Latent Semantic Analysis (Landauer & Dumais, 1997) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 Document/Term count matrix 1…1… 16 … 0…0… SCIENCE … 6190 RESEARC H 2012 SOUL 3034 LOVE Doc3 …Doc2Doc1 High dimensional space SOUL RESEARCH LOVE SCIENCE SVD EACH WORD IS A SINGLE POINT IN A SEMANTIC SPACE

Triangle Inequality constraint on words with multiple meanings Euclidian distance:AC  AB + BC Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 FIELD MAGNETIC SOCCER AB BC AC

A generative model for topics Each document (i.e. context) is a mixture of topics. Each topic is a distribution over words. Each word is chosen from a single topic. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 N D T 

Application to corpus data TASA corpus: text from first grade to college – representative sample of text 26,000+ word types (stop words removed) 37,000+ documents 6,000,000+ word tokens Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Fitting the model Learning is unsupervised Learning means inverting the generative model – We estimate P( z | w ) – assign each word in the corpus to one of T topics – With T=500 topics and 6x10 6 words, the size of the discrete state space is (500) 6,000,000 HELP! – Efficient sampling approach  Markov Chain Monte Carlo (MCMC) – Time & Memory requirements linear with T and N Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004

Gibbs Sampling & MCMC see Griffiths & Steyvers, 2003 for details Assign every word in corpus to one of T topics Sampling distribution for z: Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 number of times word w assigned to topic j number of times topic j used in document d

A selection from 500 topics [ P(w|z = j) ] Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 THEORY SCIENTISTS EXPERIMENT OBSERVATIONS SCIENTIFIC EXPERIMENTS HYPOTHESIS EXPLAIN SCIENTIST OBSERVED EXPLANATION BASED OBSERVATION IDEA EVIDENCE THEORIES BELIEVED DISCOVERED SPACE EARTH MOON PLANET ROCKET MARS ORBIT ASTRONAUTS FIRST SPACECRAFT JUPITER SATELLITE SATELLITES ATMOSPHERE SPACESHIP SURFACE SCIENTISTS ASTRONAUT ART PAINT ARTIST PAINTING PAINTED ARTISTS MUSEUM WORK PAINTINGS STYLE PICTURES WORKS OWN SCULPTURE PAINTER ARTS BEAUTIFUL DESIGNS BRAIN NERVE SENSE SENSES ARE NERVOUS NERVES BODY SMELL TASTE TOUCH MESSAGES IMPULSES CORD ORGANS SPINAL FIBERS SENSORY

Polysemy: words with multiple meanings represented in different topics Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORCE MAGNETS BE MAGNETISM SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BIOLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIENTIST BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIELD PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNIS TEAMS GAMES SPORTS JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTUNITIES WORKING TRAINING SKILLS CAREERS POSITIONS FIND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY

Word Association (norms from Nelson et al. 1998) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 Associate N. People: 1 EARTH 2 STARS 3 SPACE 4 SUN 5 MARS CUE: PLANET Model STARS SUN EARTH SPACE SKY

Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004