מבוא ללמידה והתנהגות: התניה ומח שעור 2

מבוא ללמידה והתנהגות: התניה ומח שעור 2
התניה קלאסית – חלק א' מבוא ללמידה והתנהגות: התניה ומח שעור 2

נושאים התניה קלאסית – הפרוצדורה, תוצאות בסיסיות
ניסויי מפתח: חסימה והצללה מודל Rescorla Wagner התניה מסדר שני מודל Temporal difference דופמין – 'תאורית טעות הניבוי' תמונה כללית – התניה קלאסית כלמידה של ניבויים

התניה קלאסית Ivan Pavlov = Conditional stimulus (גירוי מותנה/תלוי)
איון פבלוב - פיסיולוג רוסי, חקר את מערכת העיכול של כלבים. שם לב שהכלבים מתחילים לרייר כשהוא נכנס לחדר, והחליט לחקור את התופעה. אנקדוטות מהביוגרפיה: פבלוב קבל פרס נובל על גילוי העצבים המעצבבים את ההפרשות של הלבלב (גילה זאת ב-1888, 5 שנים לאחר שקבל דוקטורט, ועוד לפני שקבל פרופסורה. הנובל – ב- 1904). את רוב המחקר שבעבורו הוא מפורסם כיום, הוא בצע אחרי קבלת הנובל (עד מותו ב-1936). המעבדה שלו עסקה באיסוף נוזלי עיכול – פיתחו טכניקות להשאיר את הכלבים בחיים לאורך שנים עם פיסטולה לניקוז נוזלי עיכול. גילו שהכלבים מפרישים המון נוזלים למראה אוכל או אף למראה האדם שבד"כ האכיל אותם (הם כינו זאת - "psychic secretions"), וכך אספו הרבה מאוד נוזלים – את העודפים מכרו לקהל הרחב (לשימוש לבעיות עיכול שונות) – זה הווה מקור הכנסה חשוב למעבדה במשך שנים. ניסוי אופייני – פעמון מצלצל ואז בשר מוגש לכלב. הכלב תמיד מרייר כשהבשר בפיו (רפלקס). בהתחלה מרייר בזמנים שונים בין הגשת בשר אחת לשניה, אך עם האימון הוא מרייר בתגובה לצליל. = Conditional stimulus (גירוי מותנה/תלוי) = Unconditional stimulus (גירוי בלתי מותנה/תלוי) ריור = Unconditional response(reflex); conditional response (reflex)

חיזוק - Reinforcement פבלוב קרא ל-US חיזוק (reinforcer)
הגדרת החיזוק עפ"י פבלוב: חיזוק הוא כל דבר שמעלה את הסבירות (probability) להופעת ה-CR לאחר הצגת ה-CS הגדרה אופרציונלית טהורה – לא מניחה שום ערך סובייקטיבי או אפקטיבי (affective) של החיזוק רכישה (acquisition) – הצגה חוזרת של CS-US מביאה לרכישת ה-CR.

פרוצדורות נפוצות התניית עפעף בד"כ בארנבות – Nicticating membrane response. התנית קיפול רגל בכלבים (leg flexion) התנית גישה (approach) בחולדות התנית ניקור ביונים (autoshaping) התנית אברסיה למזון (Conditioned taste aversion) התנית תגובה רגשית (CER - Conditioned emotional response); התנית דיכוי מותנה (conditioned suppression) (מה הן קבוצות הביקורת?)

יחסים טמפורלים בין US ל-CS
1. Simultaneous Conditioning 2. Delay conditioning 3. Trace conditioning 4. Backward conditioning Key Variable: The CS-US Interval (ISI) CS US

הכחדה - Extinction לפי ההגדרה של למידה: האם הכחדה היא למידה חדשה או שכחה?

הכחדה – החלמה ספונטנית

תאוריות על הכחדה שכחה – לא, כי לא תלוי רק במעבר הזמן אלא בהצגה בלתי מחוזקת של הגירוי Pavlov – העלמות ה-CR כתוצאה מעיכוב (inhibition) – תהליך שונה מזה של הלמידה. מסביר spontaneous recovery, ו-disinhibition (במונחים של external inhibition). Gutherie – הכחדה כתוצאה מהתנית תגובה מתחרה (אותו תהליך כמו במידה, רק תגובה שונה) תמיכה בפבלוב: אימון מרווח מול צפוף עדיף ברכישה, ההפך בהכחדה; ניתן להשפיע פרמקולוגית שונה על שני התהליכים

תגובה מותנית לשם מטרה? הטענות בעד איך ניתן לבדוק?
ניסויי omission – Hearst + Jenkins 1974 זהו המאפיין המרכזי (והמבחן הקובע) להתנהגות פבלובית!

התאמה בין CS, US, CR לכל US גירויים שקל יותר להתנות אתו (Garcia&Koelling ניסוי המים הרועשים בחולדות, התניה לשוק או ל-LiCl) תלוי בחיה – יונים מקשרות צבע לבחילה, חולדות - טעם יתרון אדפטיבי – מגבלות אבולוציוניות/נלמדות CR לרוב דומה ל-UR אך חלש ממנו. אך: לא כל תגובות ה-UR כלולות ב-CR (ריור לעומת נשיכה) לעתים ה-CR שונה בתכלית מה-UR (קפיאה מול קפיצה/בריחה)

מהי האסוסיאציה שנוצרת? תאוריות ראשונות של התניה קלאסית: S-S / S-R
פבלוב - Stimulus substitution – למידה של קשרים בין ה-CS ל-US (S-S) Gutherie – Stamping in – ההתניה היא בין ה-CS לתגובה, כאשר ה-US רק משמש ל'הטבעת' האסוסיאציה מבחנים: מניעת אפשרות לביצוע תגובה  עדיין יש התניה Sensory preconditioning שינוי ערך החיזוק לאחר ההתניה – נחזור לכך בהמשך מסקנה כיום: שניהם (Rescorla – Two process theories, Mackintosh – התניה מסדר שני, Holland – תלוי במרחק מהחיזוק)

מתי מתרחשת למידה? שלושה ניסויי מפתח
Rescorla – Background conditioning Temporal contiguity is not enough, need contingency Contiguity = סמיכות, הופעה יחד Contingency = תלות

מתי מתרחשת למידה? שלושה ניסויי מפתח
Kamin – Blocking (and unblocking) Reynold – Overshadowing Contingency is also not enough!! Kamin: The US needs to be surprising Seems like the stimuli compete for learning Unblocking – by changing the intensity of the US (upwards)

תאוריות חישוביות של למידה
מנסות להסביר כיצד ה-CS רוכש "ערך" בתהליך הלמידה מנסות להסביר באילו תנאים הוא רוכש ערך מגבלות: עקומת למידה – הדרגתית (אם כי אולי לא בחיה בודדת?) הכחדה (גם הדרגתית, החלמה ספונטנית) חסימה, הצללה יחסים טמפורלים מה קובע את התגובה: stimulus substitution, CS

Rescorla + Wagner 1972 תנאי הכרחי ללמידה: הפרת ציפיות חוק הלמידה:
שילוב כמה מנבאים: אדטיבי הטענה המרכזית: הבדל בין מצוי לצפוי מהווה reinforcement הסביר: רכישה, הכחדה, הצללה, חסימה... ניבא: overexpectation effect (לא אינטואיטיבי) תאוריה עם השפעה רבה מאוד ניתן לגזור את החוק כירידה במורד של ריבוע טעות ניבוי, מקסימיזציה של דמיון בין ערך ה-CS וזה של ה-US (מנסים ליצור מצב בו צפוי = מצוי) ΔVi = the change in the value of stimulus i η = learning rate λ = the value of the US (reinforcer) V = sum of values of all present stimuli Xi = indicator whether stimulus i is present סימולציה על הלוח

אבל... התניה מסדר שני שלב א': צימוד CS1 – US עד ללמידה
שלב ב': צימוד CS2 – CS1 (ללא US) Test: CS2 מה יקרה? צעדים מעורבים או בלוקים (Miller – יחסים טמפורלים מורכבים) מה מנבאת תאורית R-W?

אילו תופעות התאוריות מסבירות?
R-W עקומת למידה הדרגתית הכחדה הדרגתית החלמה ספונטנית חסימה הצללה יחסים טמפורלים מה קובע את התגובה ציפיית יתר התניה מסדר שני

TD learning (Sutton+Barto ‘90s)
The general case: long term prediction. The true predictions should be self consistent: If the predictions are imperfect, there will be an error: Updating V according to this will result in correct (optimal) predictions Temporal Difference error Simulation of learning on the board, for simple CS-US

TD: תאורית Real Time Real-time: מתיחסת למה שקורה בתוך צעד – התנהגותית ולמידתית (כמו Hull – stimulus trace hypothesis) מה יקרה אם פתאום נשמיט את החיזוק? מה יקרה בהתניה מסדר שני? מסבירה דברים שקורים בתוך trial ולא רק מעבר ל-trials Simulation on the board for tapped delay line representation and learning rate 1 The top traces shows us the prediction error, and the bottom the value. We start with all values being zero. In the first trial there is a positive prediction error when the unexpected reward is received, and thus the value for the immediate timestep before it is updated according to the learning rule, to predict the forthcoming reward already before it is encountered. As the trials continue, the prediction error arrives earlier and earlier in time, and eventually the value converges to 1 reward unit right from the time of the stimulus, and precisely until the time of the delivered reward. Now after learning has commenced, the reward is no longer surprising, and induces no prediction error signal, and the only surprising and unpredictable event is the stimulus itself, which thus induces a prediction error

אילו תופעות התאוריות מסבירות?
R-W TD עקומת למידה הדרגתית הכחדה הדרגתית החלמה ספונטנית חסימה הצללה יחסים טמפורלים מה קובע את התגובה ציפיית יתר התניה מסדר שני מה יקרה בבעית XOR?

דופמין - Dopamine Parkinson’s Disease  Motor control + initialtion?
Intracranial self-stimulation; Drug addiction; Natural rewards  Reward pathway?  Learning? Also involved in: Working memory Novel situations ADHD Schizophrenia … → Interest in DA began with the realization that PD is characterized by cell loss in a well defined area from which DA originates, namely the VTA and SNc shown here in blue. Almost all the DA in the brain originates from these cells, from which it is projected to wide areas of the brain, first and foremost the BG and the frontal cortex. → As a result of the tremor and bradykinesia (loss of movement initiation) in PD it was hypothesized that DA has a role in movement control and initiation. → Later work showed that DA-ergic pathways are preferred targets for ICSS, and that DA is strongly involved in drug addiction. → It is well established that brain dopamine is important for the rewarding effects of amphetamine, cocaine, opiates, and several other (but not all) drugs of abuse. Dopamine levels are elevated not only by most drugs of abuse but also by natural rewards such as food or sexual contact. If the dopamine system is blocked, animals do not learn to lever press for these normally habit-forming substances. If the dopamine system is blocked, animals that have already been trained to lever press for food, amphetamine, or cocaine do not continue to do so. → This pointed to another function – that of mediating rewards, → and on a somewhat orthogonal line of research, in learning and reinforcement learning. → However, the implication of DA in working memory functions, novel situations, ADHD (attention-deficit-hyperactive-disorder) and schizophrenia showed that the function of DA is more complex than that.

מה דופמין מייצג? Schultz – רישומים בקופים
(Schultz et al. 1993)

מה דופמין מייצג? Schultz – רישומים בקופים

מה דופמין מייצג? פרשנות של Montague+Dayan
Unpredicted reward (unlearned/no stimulus) Predicted reward (learned task) A crucial light was shed on these hypotheses in the early 90s when the lab of Wolfram Schultz recorded from DA neurons while a monkey was performing a classical or instrumental conditioning task. => These single cell recordings showed a nontrivial pattern of response to the task stimuli. The plot I am showing you is a PSTH – every dot on the bottom part denotes a spike, with each row showing a different trial. On the upper part these have been summed up to show the characteristic firing pattern averaged over trials. The trials are all aligned to the task events – in this case the presentation of the reward. Response to withdrawal of reward – depression of base firing rate. => temporally sophisticated reward signal Omitted reward (probe trial) (Montague et al. 1996)

The TD hypothesis of DA (Montague+Dayan ‘96)
The idea: Phasic dopamine encodes a reward prediction error Precise (normative!) theory for generation of DA firing patterns Compelling account for the role of DA in classical conditioning: prediction error acts as signal driving learning in prediction areas Corticostriatal synapses: three factor learning rule modulated by DA (Wickens+Kotter)

Corticostriatal synapses: 3 factor learning
Stimulus Representation X1 X2 X3 XN Cortex Adjustable Connections (“weights”) V1 V2 V3 VN Striatum Explain 3 factor learning rule and contrast to normal Hebbian learning Prediction Error (Dopamine) PPTN? R P VTA/SNc

More dopamine responses
Partial reinforcement task (Fiorillo, Tobler & Schultz 2003) Accords with TD model

תמונה כללית התניה קלאסית = למידת ניבוי
ההתנהגות: אוטומטית במובן מסוים, נובעת מהניבוי (עוד אין control) איך נלמדים הניבויים? מתוך התבוננות בסביבה, ע"י השוואת צפוי למצוי, בצורה הדרגתית במח: למידה תלוית דופמין (קשר בין מודל חישובי נורמטיבי לפעילות של תאים ספציפים) תאוריות TD, RW – תאוריות מסוג caching (למידה של ערך במנותק מזהות ה-CS וה-US)

תרגיל בית – להגשה ב-30 למרץ
Matlab tutorials – באתר (גישה ל-Matlab?) תכנות TD ו-RW והשוואה בתנאים שונים (בסיס גם לתרגיל השני) שאלות תאורטיות קריאה לשבוע הבא: Tobler, Dickinson + Schultz מה יקרה ב-xor? מה ההבדל בין trace ל-delay?

מבוא ללמידה והתנהגות: התניה ומח שעור 2

Similar presentations

Presentation on theme: "מבוא ללמידה והתנהגות: התניה ומח שעור 2"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

מבוא ללמידה והתנהגות: התניה ומח שעור 2

Similar presentations

Presentation on theme: "מבוא ללמידה והתנהגות: התניה ומח שעור 2"— Presentation transcript:

Similar presentations

About project

Feedback