Probabilistic Topic Models Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Tom Griffiths, UC Berkeley.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Chapter 5: Introduction to Information Retrieval
Splash Screen. Lesson Menu Five-Minute Check (over Lesson 11–1) CCSS Then/Now New Vocabulary Key Concept: Symmetric and Skewed Distributions Example 1:
Cross-Corpus Analysis with Topic Models Padhraic Smyth, Mark Steyvers, Dave Newman, Chaitanya Chemudugunta University of California, Irvine New York Times.
NLP Document and Sequence Models. Computational models of how natural languages work These are sometimes called Language Models or sometimes Grammars.
Supervised Learning Recap
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Probabilistic inference in human semantic memory Mark Steyvers, Tomas L. Griffiths, and Simon Dennis 소프트컴퓨팅연구실오근현 TRENDS in Cognitive Sciences vol. 10,
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Introduction to Machine Learning Anjeli Singh Computer Science and Software Engineering April 28 th 2008.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Extracting Semantic Representations with Probabilistic Topic Models Mark SteyversUC Irvine Tom Griffiths Padhraic Smyth Dave Newman Brown University UC.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Knowing Semantic memory.
Un Supervised Learning & Self Organizing Maps Learning From Examples
Latent Dirichlet Allocation a generative model for text
Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.
Analyzing Federal Funding, Scientific Publications and with Probabilistic Topic Models Mark SteyversUC Irvine Padhraic Smyth Dave Newman Tom Griffiths.
Semantic Representations with Probabilistic Topic Models
Latent Semantic Analysis Probabilistic Topic Models & Associative Memory.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Topic Models for Semantic Memory and Information Extraction Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work.
LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.
Unsupervised Learning
Presented by Zeehasham Rasheed
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Concepts & Categorization. Measurement of Similarity Geometric approach Featural approach  both are vector representations.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Probabilistic Topic Models Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Tom Griffiths, UC Berkeley.
Chapter 5: Information Retrieval and Web Search
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Utilising software to enhance your research Eamonn Hynes 5 th November, 2012.
The Role of Prior Knowledge in Human Reconstructive Memory Mark Steyvers Pernille Hemmer University of California, Irvine 1.
Isolated-Word Speech Recognition Using Hidden Markov Models
General Knowledge Dr. Claudia J. Stanny EXP 4507 Memory & Cognition Spring 2009.
Pattern Recognition & Machine Learning Debrup Chakraborty
(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
What are the main differences and commonalities between the IS and DA systems? How information is transferred between tasks: (i) IS it may be often achieved.
Chapter 6: Information Retrieval and Web Search
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
1 LING 696B: Midterm review: parametric and non-parametric inductive inference.
CONCLUSION & FUTURE WORK Normally, users perform search tasks using multiple applications in concert: a search engine interface presents lists of potentially.
Integrating Topics and Syntax -Thomas L
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
Statistical Modeling of Large Text Collections Padhraic Smyth Department of Computer Science University of California, Irvine MURI Project Kick-off Meeting.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
CS Statistical Machine learning Lecture 24
Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Experimental Psychology PSY 433 Chapter 5 Research Reports.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Online Multiscale Dynamic Topic Models
School of Computer Science & Engineering
Multimodal Learning with Deep Boltzmann Machines
Efficient Estimation of Word Representation in Vector Space
Michal Rosen-Zvi University of California, Irvine
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Probabilistic Topic Models Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Tom Griffiths, UC Berkeley Padhraic Smyth, UC Irvine Dave Newman, UC Irvine Chaitanya Chemudugunta, UC Irvine

Problems of Interest Information retrieval –Finding relevant documents in databases –How can we go beyond keyword matches? Psychology –Retrieving information from episodic/semantic memory –How to access memory at multiple levels of abstraction?

Overview IProbabilistic Topic Models II Topic models for association III Topic models for recall IV Topic models for information retrieval V Conclusion

Probabilistic Topic Models Originated in domain of statistical machine learning Performs unsupervised extraction of topics from large text collections Text documents: –scientific articles –web pages – –list of words in memory experiment

Probabilistic Topic Models Each topic is a probability distribution over words Each document is modeled as a mixture of topics We do not observe these distributions but we can infer them statistically

The Generative Model TOPIC MIXTURE TOPIC WORD for each document, choose a mixture of topics 2. For every word slot, sample a topic [1..T] from the mixture 3. sample a word from the topic

Toy Example Topic 1 Topic 2 Topic mixture Word probabilities for each topic Document: HEART, LOVE, LOVE, SOUL, HEART, ….

Toy Example Topic 1 Topic 2 Topic mixture Word probabilities for each topic Document: SCIENTIFIC, KNOWLEDGE, SCIENTIFIC, RESEARCH, ….

Toy Example Topic 1 Topic 2 Topic mixture Word probabilities for each topic Document: LOVE, SCIENTIFIC, SOUL, LOVE, KNOWLEDGE, ….

Applying model to large corpus of text Simultaneously infer the topics and topic mixtures that best “reconstructs” a corpus of text Bayesian methods (MCMC) Unsupervised!!

Example topics from TASA: an educational corpus 37K docs 26K word vocabulary 300 topics e.g.:

Topics from psych review abstracts All 1281 abstracts since topics – examples: SIMILARITY CATEGORY CATEGORIES RELATIONS DIMENSIONS FEATURES STRUCTURE SIMILAR REPRESENTATION ONJECTS STIMULUS CONDITIONING LEARNING RESPONSE STIMULI RESPONSES AVOIDANCE REINFORCEMENT CLASSICAL DISCRIMINATION MEMORY RETRIEVAL RECALL ITEMS INFORMATION TERM RECOGNITION ITEMS LIST ASSOCIATIVE GROUP INDIVIDUAL GROUPS OUTCOMES INDIVIDUALS GROUPS OUTCOMES INDIVIDUALS DIFFERENCES INTERACTION EMOTIONAL EMOTION BASIC EMOTIONS AFFECT STATES EXPERIENCES AFFECTIVE AFFECTS RESEARCH...

Application: Research Browser Automatically analyzes research papers by UC San Diego and UC Irvine researchers related to California Institute for Telecommunications and Information Technology (CalIT2) Finds topically related researchers

Notes Determining number of topics Word order is ignored -- bags of words assumption Function words (the, a, etc) are typically removed

Latent Semantic Analysis (Landauer & Dumais, 1997) Learned from large text collections Spatial representation: –Each word is a single point in semantic space –Nearby words are similar in meaning Problems: –Static representation for ambiguous words –Dimensions are uninterpretable –Little flexibility to expand JOY MYSTERY MATHEMATICS RESEARCH LOVE (semantic space)

Overview IProbabilistic Topic Models II Topic models for association III Topic models for recall IV Topic models for information retrieval V Conclusion

Word Association CUE: PLAY (Nelson, McEvoy, & Schreiber USF word association norms) Responses BASEBALL BAT BALL GAME PLAY STAGE THEATER

Topic model for Word Association Topics are latent factors that group words Word association modeled as probabilistic “spreading of activation” –Cue activates topics –Topics activates other words BASEBALL BAT BALL GAME PLAY STAGE THEATER BASEBALL BAT BALL GAME STAGE THEATER TOPIC PLAY

Model predictions from TASA corpus RANK 9

Median rank of first associate TOPICS

Median rank of first associate TOPICS LSA

What about collocations? Why are these words related? –PLAY - GROUND –DOW - JONES –BUMBLE - BEE Suggests at least two routes for association: –Semantic –Collocation  Integrate collocations into topic model

TOPIC MIXTURE TOPIC WORD X TOPIC WORD X TOPIC WORD If x=0, sample a word from the topic If x=1, sample a word from the distribution based on previous word Collocation Topic Model

TOPIC MIXTURE TOPIC DOW X=1 JONES X=0 TOPIC RISES Example: “DOW JONES RISES” JONES is more likely explained as a word following DOW than as word sampled from topic Result: DOW_JONES recognized as collocation Collocation Topic Model

Examples Topics from New York Times WEEK DOW_JONES POINTS 10_YR_TREASURY_YIELD PERCENT CLOSE NASDAQ_COMPOSITE STANDARD_POOR CHANGE FRIDAY DOW_INDUSTRIALS GRAPH_TRACKS EXPECTED BILLION NASDAQ_COMPOSITE_INDEX EST_02 PHOTO_YESTERDAY YEN _STOCK_INDEX WALL_STREET ANALYSTS INVESTORS FIRM GOLDMAN_SACHS FIRMS INVESTMENT MERRILL_LYNCH COMPANIES SECURITIES RESEARCH STOCK BUSINESS ANALYST WALL_STREET_FIRMS SALOMON_SMITH_BARNEY CLIENTS INVESTMENT_BANKING INVESTMENT_BANKERS INVESTMENT_BANKS SEPT_11 WAR SECURITY IRAQ TERRORISM NATION KILLED AFGHANISTAN ATTACKS OSAMA_BIN_LADEN AMERICAN ATTACK NEW_YORK_REGION NEW MILITARY NEW_YORK WORLD NATIONAL QAEDA TERRORIST_ATTACKS BANKRUPTCY CREDITORS BANKRUPTCY_PROTECTION ASSETS COMPANY FILED BANKRUPTCY_FILING ENRON BANKRUPTCY_COURT KMART CHAPTER_11 FILING COOPER BILLIONS COMPANIES BANKRUPTCY_PROCEEDINGS DEBTS RESTRUCTURING CASE GROUP TerrorismWall Street FirmsStock MarketBankruptcy

Overview IProbabilistic Topic Models II Topic models for association III Topic models for recall IV Topic models for information retrieval V Conclusion

Experiment Study the following words for a memory test: PEASCARROTSBEANSSPINACHLETTUCEHAMMERTOMATOESCORNCABBAGESQUASH Recall words..... Result: outlier word “HAMMER” better remembered

Hunt & Lamb (2001 exp. 1) OUTLIER LIST PEAS CARROTS BEANS SPINACH LETTUCE HAMMER TOMATOES CORN CABBAGE SQUASH CONTROL LIST SAW SCREW CHISEL DRILL SANDPAPER HAMMER NAILS BENCH RULER ANVIL

Semantic Isolation/ Von Restorff Effects Verbal explanations: –Attention, surprise, distinctiveness Our approach: –Multiple ways to encode each word Isolated item stored verbatim – direct access route Related items stored by topics – gist route

Dual route topic model Study list Topics Word distribution Encoding Reconstructed study list Retrieval Recall involves a reconstructive process from stored information Encoding involves choosing routes to store words -- each word is represented by one route X X (distributed representation) (localist representation)

Study words PEAS CARROTS BEANS SPINACH LETTUCE HAMMER TOMATOES CORN CABBAGE SQUASH ENCODING RETRIEVAL

Model Predictions

Interpretation of Von Restorff effects Contextually unique words are stored in verbatim route Contextually expected items stored by topics Reconstruction is better for words stored in verbatim route

False memory effects MAD FEAR HATE SMOOTH NAVY HEAT SALAD TUNE COURTS CANDY PALACE PLUSH TOOTH BLIND WINTER Number of Associates 369 MAD FEAR HATE RAGE TEMPER FURY SALAD TUNE COURTS CANDY PALACE PLUSH TOOTH BLIND WINTER MAD FEAR HATE RAGE TEMPER FURY WRATH HAPPY FIGHT CANDY PALACE PLUSH TOOTH BLIND WINTER (lure = ANGER) Robinson & Roediger (1997)

Relation to dual retrieval models Familiarity/ Recollection distinction e.g. Atkinson, Juola, 1972; Mandler, 1980 Not clear how these retrieval processes relate to encoding routes Gist/Verbatim distinction e.g. Reyna, Brainerd, 1995 Maps onto topics and verbatim route  Difference –Our approach specifies both encoding and retrieval representations and processes –Routes are not independent –Model explains performance for actual word lists

Overview IProbabilistic Topic Models II Topic models for association III Topic models for recall IV Topic models for information retrieval V Conclusion

Example: finding Dave Huber’s website Works well when keywords are distinctive and match exactly Query Document Dave Huber  Dave Huber UCSD  UCSD

Problem: related but non-matching keywords Challenge: how to match at both the word level as well as a conceptual level Query Document Dave Huber  Dave Huber Mathematical modeling  ??

Dual route model for information retrieval Encode documents with two routes –contextually unique words  verbatim route –Thematic words  topics route

Example encoding Kruschke, J. K.. ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, Contextually unique words: ALCOVE, SCHAFFER, MEDIN, NOSOFSKY Topic 1 (p=0.21): learning phenomena acquisition learn acquired... Topic 22 (p=0.17): similarity objects object space category dimensional categories spatial Topic 61 (p=0.08): representations representation order alternative 1st higher 2nd descriptions problem form

Retrieval Experiments Corpora: –NIPS conference papers For each candidate document, calculate how likely the query was “generated” from the model’s encoding

Retrieval Results with Short Queries NIPS corpus Queries consist of 2-4 words Dual route Standard topics LSA

Overview IProbabilistic Topic Models II Topic models for association III Topic models for recall IV Topic models for information retrieval V Conclusion

Conclusions (1) 1) Memory as a reconstructive process –Encoding –Retrieval 2) Information retrieval as a reconstructive process –What candidate document best “reconstructs” the query?

Conclusions (2) Modeling multiple levels of organization in human memory Information retrieval –Matching of query to documents should operate at multiple levels

Software Public-domain MATLAB toolbox for topic modeling on the Web:

Retrieval Results with Long Queries Cranfield corpus Queries consist of sentences Dual route Standard topics LSA

Relevant Retrieved All docs Precision/ Recall

Gibbs Sampler Stability

Comparing topics from two runs KL distance Topics from Run 1 Re-ordered Topics Run 2 BEST KL =.46 WORST KL = 9.40

Extensions of Topic Models Combining topics and syntax Topic hierarchies Topic segmentation –no need for document boundaries Modeling authors as well as documents –who wrote this part of the paper?

Words can have high probability in multiple topics PRINTING PAPER PRINT PRINTED TYPE PROCESS INK PRESS IMAGE PRINTER PRINTS PRINTERS COPY COPIES FORM OFFSET GRAPHIC SURFACE PRODUCED CHARACTERS PLAY PLAYS STAGE AUDIENCE THEATER ACTORS DRAMA SHAKESPEARE ACTOR THEATRE PLAYWRIGHT PERFORMANCE DRAMATIC COSTUMES COMEDY TRAGEDY CHARACTERS SCENES OPERA PERFORMED TEAM GAME BASKETBALL PLAYERS PLAYER PLAY PLAYING SOCCER PLAYED BALL TEAMS BASKET FOOTBALL SCORE COURT GAMES TRY COACH GYM SHOT JUDGE TRIAL COURT CASE JURY ACCUSED GUILTY DEFENDANT JUSTICE EVIDENCE WITNESSES CRIME LAWYER WITNESS ATTORNEY HEARING INNOCENT DEFENSE CHARGE CRIMINAL HYPOTHESIS EXPERIMENT SCIENTIFIC OBSERVATIONS SCIENTISTS EXPERIMENTS SCIENTIST EXPERIMENTAL TEST METHOD HYPOTHESES TESTED EVIDENCE BASED OBSERVATION SCIENCE FACTS DATA RESULTS EXPLANATION STUDY TEST STUDYING HOMEWORK NEED CLASS MATH TRY TEACHER WRITE PLAN ARITHMETIC ASSIGNMENT PLACE STUDIED CAREFULLY DECIDE IMPORTANT NOTEBOOK REVIEW (Based on TASA corpus)

Problems with Spatial Representations

Violation of triangle inequality Can find similarity judgments that violate this: A BC AC  AB + BC

Violation of triangle inequality Can find associations that violate this: A BC AC  AB + BC SOCCER FIELD MAGNETIC

No Triangle Inequality with Topics SOCCER MAGNETIC FIELD TOPIC 1 TOPIC 2 Topic structure easily explains violations of triangle inequality

Small-World Structure of Associations (Steyvers & Tenenbaum, 2005) Properties: 1) Short path lengths 2) Clustering 3) Power law degree distributions Small world graphs arise elsewhere: internet, social relations, biology BASEBALL BAT BALL GAME PLAY STAGE THEATER

Power law degree distributions in Semantic Networks Power law degree distribution  some words are “hubs” in a semantic network

Creating Association Networks TOPICS MODEL: Connect i to j when P( w=j | w=i ) > threshold LSA: For each word, generate K associates by picking K nearest neighbors in semantic space  =-2.05

Associations in free recall STUDY THESE WORDS: Bed, Rest, Awake, Tired, Dream, Wake, Snooze, Blanket, Doze, Slumber, Snore, Nap, Peace, Yawn, Drowsy RECALL WORDS..... FALSE RECALL: “Sleep” 61%

Recall as a reconstructive process Reconstruct study list based on the stored “gist” The gist can be represented by a distribution over topics Under a single topic assumption: Retrieved word Study list

Predictions for the “Sleep” list STUDY LIST EXTRA LIST (top 8)

Hidden Markov Topic Model

Hidden Markov Topics Model Syntactic dependencies  short range dependencies Semantic dependencies  long-range  zz zz zz zz ww ww ww ww ss ss ss ss Semantic state: generate words from topic model Syntactic states: generate words from HMM (Griffiths, Steyvers, Blei, & Tenenbaum, 2004)

HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY 0.2 z = SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK 0.2 RESEARCH 0.2 MATHEMATICS 0.2 z = THE 0.6 A 0.3 MANY 0.1 OF 0.6 FOR 0.3 BETWEEN Transition between semantic state and syntactic states

THE ……………………………… HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY 0.2 z = SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK 0.2 RESEARCH 0.2 MATHEMATICS 0.2 z = x = 1 THE 0.6 A 0.3 MANY 0.1 x = 3 OF 0.6 FOR 0.3 BETWEEN 0.1 x = Combining topics and syntax

THE LOVE…………………… HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY 0.2 z = SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK 0.2 RESEARCH 0.2 MATHEMATICS 0.2 z = x = 1 THE 0.6 A 0.3 MANY 0.1 x = 3 OF 0.6 FOR 0.3 BETWEEN 0.1 x = Combining topics and syntax

THE LOVE OF……………… HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY 0.2 z = SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK 0.2 RESEARCH 0.2 MATHEMATICS 0.2 z = x = 1 THE 0.6 A 0.3 MANY 0.1 x = 3 OF 0.6 FOR 0.3 BETWEEN 0.1 x = Combining topics and syntax

THE LOVE OF RESEARCH …… HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY 0.2 z = SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK 0.2 RESEARCH 0.2 MATHEMATICS 0.2 z = x = 1 THE 0.6 A 0.3 MANY 0.1 x = 3 OF 0.6 FOR 0.3 BETWEEN 0.1 x = Combining topics and syntax

FOOD FOODS BODY NUTRIENTS DIET FAT SUGAR ENERGY MILK EATING FRUITS VEGETABLES WEIGHT FATS NEEDS CARBOHYDRATES VITAMINS CALORIES PROTEIN MINERALS MAP NORTH EARTH SOUTH POLE MAPS EQUATOR WEST LINES EAST AUSTRALIA GLOBE POLES HEMISPHERE LATITUDE PLACES LAND WORLD COMPASS CONTINENTS DOCTOR PATIENT HEALTH HOSPITAL MEDICAL CARE PATIENTS NURSE DOCTORS MEDICINE NURSING TREATMENT NURSES PHYSICIAN HOSPITALS DR SICK ASSISTANT EMERGENCY PRACTICE BOOK BOOKS READING INFORMATION LIBRARY REPORT PAGE TITLE SUBJECT PAGES GUIDE WORDS MATERIAL ARTICLE ARTICLES WORD FACTS AUTHOR REFERENCE NOTE GOLD IRON SILVER COPPER METAL METALS STEEL CLAY LEAD ADAM ORE ALUMINUM MINERAL MINE STONE MINERALS POT MINING MINERS TIN BEHAVIOR SELF INDIVIDUAL PERSONALITY RESPONSE SOCIAL EMOTIONAL LEARNING FEELINGS PSYCHOLOGISTS INDIVIDUALS PSYCHOLOGICAL EXPERIENCES ENVIRONMENT HUMAN RESPONSES BEHAVIORS ATTITUDES PSYCHOLOGY PERSON CELLS CELL ORGANISMS ALGAE BACTERIA MICROSCOPE MEMBRANE ORGANISM FOOD LIVING FUNGI MOLD MATERIALS NUCLEUS CELLED STRUCTURES MATERIAL STRUCTURE GREEN MOLDS Semantic topics PLANTS PLANT LEAVES SEEDS SOIL ROOTS FLOWERS WATER FOOD GREEN SEED STEMS FLOWER STEM LEAF ANIMALS ROOT POLLEN GROWING GROW

GOOD SMALL NEW IMPORTANT GREAT LITTLE LARGE * BIG LONG HIGH DIFFERENT SPECIAL OLD STRONG YOUNG COMMON WHITE SINGLE CERTAIN THE HIS THEIR YOUR HER ITS MY OUR THIS THESE A AN THAT NEW THOSE EACH MR ANY MRS ALL MORE SUCH LESS MUCH KNOWN JUST BETTER RATHER GREATER HIGHER LARGER LONGER FASTER EXACTLY SMALLER SOMETHING BIGGER FEWER LOWER ALMOST ON AT INTO FROM WITH THROUGH OVER AROUND AGAINST ACROSS UPON TOWARD UNDER ALONG NEAR BEHIND OFF ABOVE DOWN BEFORE SAID ASKED THOUGHT TOLD SAYS MEANS CALLED CRIED SHOWS ANSWERED TELLS REPLIED SHOUTED EXPLAINED LAUGHED MEANT WROTE SHOWED BELIEVED WHISPERED ONE SOME MANY TWO EACH ALL MOST ANY THREE THIS EVERY SEVERAL FOUR FIVE BOTH TEN SIX MUCH TWENTY EIGHT HE YOU THEY I SHE WE IT PEOPLE EVERYONE OTHERS SCIENTISTS SOMEONE WHO NOBODY ONE SOMETHING ANYONE EVERYBODY SOME THEN Syntactic classes BE MAKE GET HAVE GO TAKE DO FIND USE SEE HELP KEEP GIVE LOOK COME WORK MOVE LIVE EAT BECOME

MODEL ALGORITHM SYSTEM CASE PROBLEM NETWORK METHOD APPROACH PAPER PROCESS IS WAS HAS BECOMES DENOTES BEING REMAINS REPRESENTS EXISTS SEEMS SEE SHOW NOTE CONSIDER ASSUME PRESENT NEED PROPOSE DESCRIBE SUGGEST USED TRAINED OBTAINED DESCRIBED GIVEN FOUND PRESENTED DEFINED GENERATED SHOWN IN WITH FOR ON FROM AT USING INTO OVER WITHIN HOWEVER ALSO THEN THUS THEREFORE FIRST HERE NOW HENCE FINALLY #*IXTN-CFP#*IXTN-CFP EXPERTS EXPERT GATING HME ARCHITECTURE MIXTURE LEARNING MIXTURES FUNCTION GATE DATA GAUSSIAN MIXTURE LIKELIHOOD POSTERIOR PRIOR DISTRIBUTION EM BAYESIAN PARAMETERS STATE POLICY VALUE FUNCTION ACTION REINFORCEMENT LEARNING CLASSES OPTIMAL * MEMBRANE SYNAPTIC CELL * CURRENT DENDRITIC POTENTIAL NEURON CONDUCTANCE CHANNELS IMAGE IMAGES OBJECT OBJECTS FEATURE RECOGNITION VIEWS # PIXEL VISUAL KERNEL SUPPORT VECTOR SVM KERNELS # SPACE FUNCTION MACHINES SET NETWORK NEURAL NETWORKS OUPUT INPUT TRAINING INPUTS WEIGHTS # OUTPUTS NIPS Semantics NIPS Syntax

Random sentence generation LANGUAGE: [S] RESEARCHERS GIVE THE SPEECH [S] THE SOUND FEEL NO LISTENERS [S] WHICH WAS TO BE MEANING [S] HER VOCABULARIES STOPPED WORDS [S] HE EXPRESSLY WANTED THAT BETTER VOWEL

Nested Chinese Restaurant Process

Topic Hierarchies In regular topic model, no relations between topics topic 1 topic 2 topic 3 topic 4 topic 5 topic 6 topic 7 Nested Chinese Restaurant Process –Blei, Griffiths, Jordan, Tenenbaum (2004) –Learn hierarchical structure, as well as topics within structure

Example: Psych Review Abstracts RESPONSE STIMULUS REINFORCEMENT RECOGNITION STIMULI RECALL CHOICE CONDITIONING SPEECH READING WORDS MOVEMENT MOTOR VISUAL WORD SEMANTIC ACTION SOCIAL SELF EXPERIENCE EMOTION GOALS EMOTIONAL THINKING GROUP IQ INTELLIGENCE SOCIAL RATIONAL INDIVIDUAL GROUPS MEMBERS SEX EMOTIONS GENDER EMOTION STRESS WOMEN HEALTH HANDEDNESS REASONING ATTITUDE CONSISTENCY SITUATIONAL INFERENCE JUDGMENT PROBABILITIES STATISTICAL IMAGE COLOR MONOCULAR LIGHTNESS GIBSON SUBMOVEMENT ORIENTATION HOLOGRAPHIC CONDITIONIN STRESS EMOTIONAL BEHAVIORAL FEAR STIMULATION TOLERANCE RESPONSES A MODEL MEMORY FOR MODELS TASK INFORMATION RESULTS ACCOUNT SELF SOCIAL PSYCHOLOGY RESEARCH RISK STRATEGIES INTERPERSONAL PERSONALITY SAMPLING MOTION VISUAL SURFACE BINOCULAR RIVALRY CONTOUR DIRECTION CONTOURS SURFACES DRUG FOOD BRAIN AROUSAL ACTIVATION AFFECTIVE HUNGER EXTINCTION PAIN THE OF AND TO IN A IS

Generative Process RESPONSE STIMULUS REINFORCEMENT RECOGNITION STIMULI RECALL CHOICE CONDITIONING SPEECH READING WORDS MOVEMENT MOTOR VISUAL WORD SEMANTIC ACTION SOCIAL SELF EXPERIENCE EMOTION GOALS EMOTIONAL THINKING GROUP IQ INTELLIGENCE SOCIAL RATIONAL INDIVIDUAL GROUPS MEMBERS SEX EMOTIONS GENDER EMOTION STRESS WOMEN HEALTH HANDEDNESS REASONING ATTITUDE CONSISTENCY SITUATIONAL INFERENCE JUDGMENT PROBABILITIES STATISTICAL IMAGE COLOR MONOCULAR LIGHTNESS GIBSON SUBMOVEMENT ORIENTATION HOLOGRAPHIC CONDITIONIN STRESS EMOTIONAL BEHAVIORAL FEAR STIMULATION TOLERANCE RESPONSES A MODEL MEMORY FOR MODELS TASK INFORMATION RESULTS ACCOUNT SELF SOCIAL PSYCHOLOGY RESEARCH RISK STRATEGIES INTERPERSONAL PERSONALITY SAMPLING MOTION VISUAL SURFACE BINOCULAR RIVALRY CONTOUR DIRECTION CONTOURS SURFACES DRUG FOOD BRAIN AROUSAL ACTIVATION AFFECTIVE HUNGER EXTINCTION PAIN THE OF AND TO IN A IS

Other Slides

Markov chain Monte Carlo Sample from a Markov chain constructed to converge to the target distribution Allows sampling from unnormalized posterior Can compute approximate statistics from intractable distributions Gibbs sampling one such method, construct Markov chain with conditional distributions

Enron two example topics (T=100)

Enron two work unrelated topics

Using Topic Models for Information Retrieval

Clusters v. Topics Hidden Markov Models in Molecular Biology: New Algorithms and Applications Pierre Baldi, Yves C Hauvin, Tim Hunkapiller, Marcella A. McClure Hidden Markov Models (HMMs) can be applied to several important problems in molecular biology. We introduce a new convergent learning algorithm for HMMs that, unlike the classical Baum-Welch algorithm is smooth and can be applied on-line or in batch mode, with or without the usual Viterbi most likely path approximation. Left-right HMMs with insertion and deletion states are then trained to represent several protein families including immunoglobulins and kinases. In all cases, the models derived capture all the important statistical properties of the families and can be used efficiently in a number of important tasks such as multiple alignment, motif detection, and classification. [cluster 88] model data models time neural figure state learning set parameters network probability number networks training function system algorithm hidden markov [topic 10] state hmm markov sequence models hidden states probabilities sequences parameters transition probability training hmms hybrid model likelihood modeling [topic 37] genetic structure chain protein population region algorithms human mouse selection fitness proteins search evolution generation function sequence sequences genes Multiple Topics One Cluster

words C = F Q documents words topics documents Topic model words documents C = U D V T words dims documents LSA normalized co-occurrence matrix mixture weights mixture components

Documents as Topics Mixtures: a Geometric Interpretation P(word3) P(word1) P(word2) P(word1)+P(word2)+P(word3) = 1 topic 1 topic 2 = observed document

word  pixel document  image sample each pixel from a mixture of topics Generating Artificial Data: a visual example 10 “topics” with 5 x 5 pixels

A subset of documents

Inferred Topics Iterations

Interpretable decomposition SVD gives a basis for the data, but not an interpretable one The true basis is not orthogonal, so rotation does no good Topic model SVD

Similarity between 55 funding programs

INPUT:word-document counts (word order is irrelevant) OUTPUT: topic assignments to each wordP( z i ) likely words in each topicP( w | z ) likely topics in each document (“gist”)P( z | d ) Summary

Gibbs Sampling count of word w assigned to topic t count of topic t assigned to doc d probability that word i is assigned to topic t

Prior Distributions Dirichlet priors encourage sparsity on topic mixtures and topics θ ~ Dirichlet( α )  ~ Dirichlet( β ) Topic 1 Topic 2 Topic 3 Word 1 Word 2 Word 3 (darker colors indicate lower probability)

Example Topics extracted from NIH/NSF grants Important point: these distributions are learned in a completely automated “unsupervised” fashion from the data

Extracting Topics from 250,000 s 28,000 authors Enron TEXANS WIN FOOTBALL FANTASY SPORTSLINE PLAY TEAM GAME SPORTS GAMES GOD LIFE MAN PEOPLE CHRIST FAITH LORD JESUS SPIRITUAL VISIT TRAVEL ROUNDTRIP SAVE DEALS HOTEL BOOK SALE FARES TRIP CITIES FERC MARKET ISO COMMISSION ORDER FILING COMMENTS PRICE CALIFORNIA FILED POWER CALIFORNIA ELECTRICITY UTILITIES PRICES MARKET PRICE UTILITY CUSTOMERS ELECTRIC STATE PLAN CALIFORNIA DAVIS RATE BANKRUPTCY SOCAL POWER BONDS MOU

Topic trends from New York Times TOUR RIDER LANCE_ARMSTRONG TEAM BIKE RACE FRANCE Tour-de-France COMPANY QUARTER PERCENT ANALYST SHARE SALES EARNING ANTHRAX LETTER MAIL WORKER OFFICE SPORES POSTAL BUILDING Anthrax 330,000 articles Quarterly Earnings

Topic trends in NIPS conference KERNEL SUPPORT VECTOR MARGIN SVM KERNELS SPACE DATA MACHINES... while SVM’s become more popular. LAYER NET NEURAL LAYERS NETS ARCHITECTURE NUMBER FEEDFORWARD SINGLE Neural networks on the decline...

Finding Funding Overlap Analyze 22,000+ grants active during funding programs from NSF and NIH –Focus on behavioral sciences Questions of interest: –What are the areas of overlap between funding programs? –What topics are funded?

NIH NSF – BIO NSF – SBE 2D visualization of funding programs – nearby program support similar topics

Funding Amounts per Topic We have $ funding per grant We have distribution of topics for each grant Solve for the $ amount per topic  What are expensive topics?

High $$$ topicsLow $$$ topics

Basic Problems for Semantic Memory Prediction –what fact, concept, or word is next? Gist extraction –What is this set of words about? Disambiguation –What is the sense of this word?

Topics from an educational corpus PRINTING PAPER PRINT PRINTED TYPE PROCESS INK PRESS IMAGE PRINTER PRINTS PRINTERS COPY COPIES FORM OFFSET GRAPHIC SURFACE PRODUCED CHARACTERS PLAY PLAYS STAGE AUDIENCE THEATER ACTORS DRAMA SHAKESPEARE ACTOR THEATRE PLAYWRIGHT PERFORMANCE DRAMATIC COSTUMES COMEDY TRAGEDY CHARACTERS SCENES OPERA PERFORMED TEAM GAME BASKETBALL PLAYERS PLAYER PLAY PLAYING SOCCER PLAYED BALL TEAMS BASKET FOOTBALL SCORE COURT GAMES TRY COACH GYM SHOT JUDGE TRIAL COURT CASE JURY ACCUSED GUILTY DEFENDANT JUSTICE EVIDENCE WITNESSES CRIME LAWYER WITNESS ATTORNEY HEARING INNOCENT DEFENSE CHARGE CRIMINAL HYPOTHESIS EXPERIMENT SCIENTIFIC OBSERVATIONS SCIENTISTS EXPERIMENTS SCIENTIST EXPERIMENTAL TEST METHOD HYPOTHESES TESTED EVIDENCE BASED OBSERVATION SCIENCE FACTS DATA RESULTS EXPLANATION STUDY TEST STUDYING HOMEWORK NEED CLASS MATH TRY TEACHER WRITE PLAN ARITHMETIC ASSIGNMENT PLACE STUDIED CAREFULLY DECIDE IMPORTANT NOTEBOOK REVIEW 37K docs 26K word vocabulary 1700 topics e.g.:

Same words in different topics represent different contextual usages PRINTING PAPER PRINT PRINTED TYPE PROCESS INK PRESS IMAGE PRINTER PRINTS PRINTERS COPY COPIES FORM OFFSET GRAPHIC SURFACE PRODUCED CHARACTERS PLAY PLAYS STAGE AUDIENCE THEATER ACTORS DRAMA SHAKESPEARE ACTOR THEATRE PLAYWRIGHT PERFORMANCE DRAMATIC COSTUMES COMEDY TRAGEDY CHARACTERS SCENES OPERA PERFORMED TEAM GAME BASKETBALL PLAYERS PLAYER PLAY PLAYING SOCCER PLAYED BALL TEAMS BASKET FOOTBALL SCORE COURT GAMES TRY COACH GYM SHOT JUDGE TRIAL COURT CASE JURY ACCUSED GUILTY DEFENDANT JUSTICE EVIDENCE WITNESSES CRIME LAWYER WITNESS ATTORNEY HEARING INNOCENT DEFENSE CHARGE CRIMINAL HYPOTHESIS EXPERIMENT SCIENTIFIC OBSERVATIONS SCIENTISTS EXPERIMENTS SCIENTIST EXPERIMENTAL TEST METHOD HYPOTHESES TESTED EVIDENCE BASED OBSERVATION SCIENCE FACTS DATA RESULTS EXPLANATION STUDY TEST STUDYING HOMEWORK NEED CLASS MATH TRY TEACHER WRITE PLAN ARITHMETIC ASSIGNMENT PLACE STUDIED CAREFULLY DECIDE IMPORTANT NOTEBOOK REVIEW

Three documents with the word “play” (numbers & colors  topic assignments)

Topics MONEY 1 BANK 1 BANK 1 LOAN 1 BANK 1 MONEY 1 BANK 1 MONEY 1 BANK 1 LOAN 1 LOAN 1 BANK 1 MONEY Mixtures Documents and topic assignments RIVER 2 MONEY 1 BANK 2 STREAM 2 BANK 2 BANK 1 MONEY 1 RIVER 2 MONEY 1 BANK 2 LOAN 1 MONEY RIVER 2 BANK 2 STREAM 2 BANK 2 RIVER 2 BANK Example of generating words

Topics ? ? MONEY ? BANK BANK ? LOAN ? BANK ? MONEY ? BANK ? MONEY ? BANK ? LOAN ? LOAN ? BANK ? MONEY ?.... Mixtures RIVER ? MONEY ? BANK ? STREAM ? BANK ? BANK ? MONEY ? RIVER ? MONEY ? BANK ? LOAN ? MONEY ?.... RIVER ? BANK ? STREAM ? BANK ? RIVER ? BANK ?.... Statistical Inference Documents and topic assignments ?

PLAY How are these learned? CASH BAT LOAN MONEY BANK RIVER STREAM Two approaches to semantic representation Semantic networksSemantic Spaces FUN GAME STAGE THEATER BALL Can be learned (LSA), but is this representation flexible enough?

Latent Semantic Analysis (Landauer & Dumais, 1997) word-document counts high dimensional space SVD Each word is a single point in semantic space -- nearby words co-occur often with other words Words with multiple meanings only have one representation and have to be near all words related to different senses  this leads to distortions in similarity BALL PLAY STAGE THEATER GAME