Download presentation
Presentation is loading. Please wait.
Published byArabella McDaniel Modified over 9 years ago
1
Mini-course on Artificial Neural Networks and Bayesian Networks Michal Rosen-Zvi Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
2
Section 1: Introduction Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
3
Networks (1) Networks serve as a visual way for displaying relationships: Social networks are examples of ‘flat’ networks where the only information is relation between entities Example - collaboration networks Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
4
Collaboration Network Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
5
Networks (2) Artificial Neural Networks represent rules – deterministic relations - between input and output Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
6
Networks (3) Bayesian Networks represent probabilistic relations - conditional independencies and dependencies between variables Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
7
Outline Introduction/Motivation Artificial Neural Networks – The Perceptron, multilayered FF NN and recurrent NN – On-line (supervised) learning – Unsupervised learning and PCA – Classification – Capacity of networks Bayesian networks (BN) – Bayes rules and the BN semantics – Classification using Generative models Applications: Vision, Text Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
8
Motivation The research of ANNs is inspired by neurons in the brain and (partially) driven by the need for models of the reasoning in the brain. Scientists are challenged to use machines more effectively for tasks traditionally solved by humans (example - driving a car, inferring scientific referees to papers and many others) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
9
History of (modern) ANNs and BNs Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 1940195019601970198019902000 McCulloch and Pitts Model Hebbian Learning rule Minsky and Papert’s book PerceptronHopfield Network Pearl’s Book Gardner’s studies
10
Section 2: On-line Learning Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 Based on slides from Michael Biehl’s summer course
11
Section 2.1: The Perceptron Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
12
The Perceptron Input: W Adaptive Weights W Output: S Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
13
Perceptron: binary output Implements a linearly separable classification of inputs Milestones: Perceptron convergence theorem, Rosenblatt (1958) Capacity, winder (1963) Cover(1965) Statistical Physics of perceptron weights, Gardner (1988) How does this device learn? Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 W
14
Learning a linearly separable rule from reliable examples B Unknown rule: S T ( )=sign(B ) =±1 Defines the correct classification. BBB Parameterized through a teacher perceptron with weights B R N, (B B=1) Only available information: example data B D= { , S T ( )=sign(B ) for =1…P } Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
15
Learning a linearly… (Cont.) W Training: finding the student weights W – W W – W parameterizes a hypothesis S S ( )=sign(W ) D – Supervised learning is based on the student performance with respect to the training data D – Binary error measure W T (W)= [S S ( ),S T ( )] W W T (W)=0 if S S ( ) S T ( ) T (W)=1 if S S ( )=S T ( ) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
16
Off-line learning W Guided by the minimization of a cost function H(W), e.g., the training error WW H(W) t T (W) Equilibrium statistical mechanics treatment: – Energy H of N degrees of freedm – Ensemble of systems is in thermal equilibrium at formal temperature – Disorder avg. over random examples (replicas) assumes distribution over the inputs – Macroscopic description, order parameters – Typical properties of large sustems, P= N Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
17
On-line training Single presentation of uncorrelated (new) { ,S T ( )} Update of student weights: Learning dynamics in discrete time Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
18
On-line training - Statistical Physics approach Consider sequence of independent, random Thermodynamic limit Disorder average over latest example self- averaging properties Continuous time limit Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
20
Section 3: Unsupervised learning Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 Based on slides from Michael Biehl’s summer course
21
Dynamics of unsupervised learning Learning without a teacher? Real world data is, in general, not isotropic and structure less in input space. Unsupervised learning = extraction of information from unlabelled iputs Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
22
Potential aims Correlation analysis Clustering of data – grouping according to some similarity criterion Identification of prototypes – represent large amount of data by few examples Dimension reduction – represent high dimensional data by few relevant features Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
23
Dimensionality Reduction The goal is to compress information with minimal loss Methods: – Unsupervised learning Principle Component Analysis – Nonnegative Matrix Factorization Bayesian Models (Matrices are probabilities) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
24
Section 4: Bayesian Networks Some slides are from Baldi’s course on Neural Networks Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
25
Bayesian Statistics Bayesian framework for induction: we start with hypothesis space and wish to express relative preferences in terms of background information (the Cox-Jaynes axioms). Axiom 0: Transitivity of preferences. Theorem 1: Preferences can be represented by a real number π (A). Axiom 1: There exists a function f such that π(non A) = f(π(A)) Axiom 2: There exists a function F such that π (A,B) = F(π(A), π(B|A)) Theorem2: There is always a rescaling w such that p(A)= w( π (A)) is in [0,1], and satisfies the sum and product rules. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
26
Probability as Degree of Belief Sum Rule: P(non A) = 1- P(A) Product Rule: P(A and B) = P(A) P(B|A) BayesTheorem: P(B|A)=P(A|B)P(B)/P(A) Induction Form: P(M|D) = P(D|M)P(M)/P(D) Equivalently: log[P(M|D)] = log[P(D|M)]+log[P(M)]-log[P(D)] Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
27
The Asia problem “Shortness-of-breath (dyspnoea) may be due to Tuberculosis, Lung cancer or bronchitis, or none of them. A recent visit to Asia increases the chances of tuberculosis, while Smoking is known to be a risk factor for both lung cancer and Bronchitis. The results of a single chest X-ray do not discriminate between lung cancer and tuberculosis, as neither does the presence or absence of Dyspnoea.” Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 Lauritzen & Spiegelhalter 1988
28
Graphical models “Successful marriage between Probabilistic Theory and Graph Theory” M. I. Jordan Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 P(x 1,x 2,x 3 ) P(x 1,x 3 ) P(x 2,x 3 ) P(x 1,x 2,x 3 ) (x 1,x 3 ) (x 2,x 3 ) x1x1 x3x3 x2x2 Applications: Vision, Speech Recognition, Error correcting codes, Bioinformatics
29
Directed acyclic Graphs Involves conditional dependencies Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 x1x1 x3x3 x2x2 P(x 1,x 2,x 3 ) = P(x 1 )P(x 2 )P(x 3 |x 1,x 2 )
30
Directed Graphical Models (2) Each node is associated with a random variable Each arrow is associated with conditional dependencies (Parents–child) Shaded nodes illustrates an observed variable Plates stand for repetitions of i.i.d. drawings of the random variables Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
31
Directed graph: ‘real world’ example Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 The author topic model Statistical modeling of data mining: Huge corpus, authors and words are observed, topics and relations are learned.
32
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
33
Topics Model for Semantic Representation Based on a Professor Mark Steyver’s slides, a joint work of Mark Steyver’s (UCI) and Tom Griffiths (Stanford) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
34
The DRM Paradigm The Deese (1959), Roediger, and McDermott (1995) Paradigm: Subjects hear a series of word lists during the study phase, each comprising semantically related items strongly related to another non-presented word (“false target”). Subjects (later) receive recognition tests for all words plus other distracted words including the false target. DRM experiments routinely demonstrate that subjects claim to recognize false tagets. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
35
Example: test of false memory effects in the DRM Paradaigm STUDY: Bed, Rest, Awake, Tired, Dream, Wake, Snooze, Blanket, Doze, Slumber, Snore, Nap, Peace, Yawn, Drowsy FALSE RECALL: “Sleep” 61% Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
36
A Rational Analysis of Semantic Memory Our associative/semantic memory system might arise from the need to efficiently predict word usage with just a few basis functions (i.e., “concepts” or “topics”) The topics model provides such a rational analysis Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
37
A Spatial Representation: Latent Semantic Analysis (Landauer & Dumais, 1997) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 Document/Term count matrix 1…1… 16 … 0…0… SCIENCE … 6190 RESEARC H 2012 SOUL 3034 LOVE Doc3 …Doc2Doc1 High dimensional space SOUL RESEARCH LOVE SCIENCE SVD EACH WORD IS A SINGLE POINT IN A SEMANTIC SPACE
38
Triangle Inequality constraint on words with multiple meanings Euclidian distance:AC AB + BC Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 FIELD MAGNETIC SOCCER AB BC AC
39
A generative model for topics Each document (i.e. context) is a mixture of topics. Each topic is a distribution over words. Each word is chosen from a single topic. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 N D T
41
Application to corpus data TASA corpus: text from first grade to college – representative sample of text 26,000+ word types (stop words removed) 37,000+ documents 6,000,000+ word tokens Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
42
Fitting the model Learning is unsupervised Learning means inverting the generative model – We estimate P( z | w ) – assign each word in the corpus to one of T topics – With T=500 topics and 6x10 6 words, the size of the discrete state space is (500) 6,000,000 HELP! – Efficient sampling approach Markov Chain Monte Carlo (MCMC) – Time & Memory requirements linear with T and N Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
43
Gibbs Sampling & MCMC see Griffiths & Steyvers, 2003 for details Assign every word in corpus to one of T topics Sampling distribution for z: Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 number of times word w assigned to topic j number of times topic j used in document d
44
A selection from 500 topics [ P(w|z = j) ] Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 THEORY SCIENTISTS EXPERIMENT OBSERVATIONS SCIENTIFIC EXPERIMENTS HYPOTHESIS EXPLAIN SCIENTIST OBSERVED EXPLANATION BASED OBSERVATION IDEA EVIDENCE THEORIES BELIEVED DISCOVERED SPACE EARTH MOON PLANET ROCKET MARS ORBIT ASTRONAUTS FIRST SPACECRAFT JUPITER SATELLITE SATELLITES ATMOSPHERE SPACESHIP SURFACE SCIENTISTS ASTRONAUT ART PAINT ARTIST PAINTING PAINTED ARTISTS MUSEUM WORK PAINTINGS STYLE PICTURES WORKS OWN SCULPTURE PAINTER ARTS BEAUTIFUL DESIGNS BRAIN NERVE SENSE SENSES ARE NERVOUS NERVES BODY SMELL TASTE TOUCH MESSAGES IMPULSES CORD ORGANS SPINAL FIBERS SENSORY
45
Polysemy: words with multiple meanings represented in different topics Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORCE MAGNETS BE MAGNETISM SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BIOLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIENTIST BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIELD PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNIS TEAMS GAMES SPORTS JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTUNITIES WORKING TRAINING SKILLS CAREERS POSITIONS FIND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY
46
Word Association (norms from Nelson et al. 1998) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004 Associate N. People: 1 EARTH 2 STARS 3 SPACE 4 SUN 5 MARS CUE: PLANET Model STARS SUN EARTH SPACE SKY
47
Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.