Probabilistic Topic Models Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Tom Griffiths, UC Berkeley.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.

Information retrieval – LSI, pLSI and LDA

Splash Screen. Lesson Menu Five-Minute Check (over Lesson 11–1) CCSS Then/Now New Vocabulary Key Concept: Symmetric and Skewed Distributions Example 1:

Cross-Corpus Analysis with Topic Models Padhraic Smyth, Mark Steyvers, Dave Newman, Chaitanya Chemudugunta University of California, Irvine New York Times.

Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.

NLP Document and Sequence Models. Computational models of how natural languages work These are sometimes called Language Models or sometimes Grammars.

Supervised Learning Recap

Probabilistic inference in human semantic memory Mark Steyvers, Tomas L. Griffiths, and Simon Dennis 소프트컴퓨팅연구실오근현 TRENDS in Cognitive Sciences vol. 10,

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Introduction to Machine Learning Anjeli Singh Computer Science and Software Engineering April 28 th 2008.

Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.

Extracting Semantic Representations with Probabilistic Topic Models Mark SteyversUC Irvine Tom Griffiths Padhraic Smyth Dave Newman Brown University UC.

LEARNING FROM OBSERVATIONS Yılmaz KILIÇASLAN. Definition Learning takes place as the agent observes its interactions with the world and its own decision-making.

Analyzing unstructured text with topic models Mark Steyvers Dep. of Cognitive Sciences & Dep. of Computer Science University of California, Irvine collaborators:

Latent Dirichlet Allocation a generative model for text

Exploring subjective probability distributions using Bayesian statistics Tom Griffiths Department of Psychology Cognitive Science Program University of.

Analyzing Federal Funding, Scientific Publications and with Probabilistic Topic Models Mark SteyversUC Irvine Padhraic Smyth Dave Newman Tom Griffiths.

Latent Semantic Analysis Probabilistic Topic Models & Associative Memory.

Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction

Probabilistic Topic Models Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Tom Griffiths, UC Berkeley.

Unsupervised Learning

British Museum Library, London Picture Courtesy: flickr.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Concepts & Categorization. Measurement of Similarity Geometric approach Featural approach  both are vector representations.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.

Topics in statistical language modeling Tom Griffiths.

Isolated-Word Speech Recognition Using Hidden Markov Models

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.

Pattern Recognition & Machine Learning Debrup Chakraborty

(Infinitely) Deep Learning in Vision Max Welling (UCI) collaborators: Ian Porteous (UCI) Evgeniy Bart UCI/Caltech) Pietro Perona (Caltech)

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

27. May Topic Models Nam Khanh Tran L3S Research Center.

Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.

Integrating Topics and Syntax -Thomas L

Randomized Algorithms for Bayesian Hierarchical Clustering

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.

Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.

Statistical Modeling of Large Text Collections Padhraic Smyth Department of Computer Science University of California, Irvine MURI Project Kick-off Meeting.

CS Statistical Machine learning Lecture 24

Lecture 2: Statistical learning primer for biologists

Human and Optimal Exploration and Exploitation in Bandit Problems Department of Cognitive Sciences, University of California. A Bayesian analysis of human.

Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.

CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.

Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.

Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.

Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

PSY 325 AID Education Expert/psy325aid.com FOR MORE CLASSES VISIT

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Online Multiscale Dynamic Topic Models

School of Computer Science & Engineering

Data Mining Lecture 11.

Latent Dirichlet Analysis

Learning with information of features

Bayesian Inference for Mixture Language Models

Michal Rosen-Zvi University of California, Irvine

Junghoo “John” Cho UCLA

Presentation transcript:

Probabilistic Topic Models Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Tom Griffiths, UC Berkeley Padhraic Smyth, UC Irvine Dave Newman, UC Irvine

Overview IProblems of interest IIProbabilistic Topic Models II Using topic models in machine learning and data mining III Using topic models in psychology IV Conclusion

Problems of Interest What topics does this text collection “span”? Which documents are about a particular topic? Who writes about a particular topic? How have topics changed over time? How to represent the “gist” of a list of words? How to model associations between words? Machine Learning/ Data mining Psychology

Probabilistic Topic Models Based on pLSI and LDA model (e.g., Hoffman, 2001; Blei, Ng, Jordan, 2003) A probability model linking documents, topics, and words Topic  distribution over words Document  mixture of topics Topics and topic mixtures can be extracted from large collections of text

Example Topics extracted from NIH/NSF grants Important point: these distributions are learned in a completely automated “unsupervised” fashion from the data

Overview IProblems of interest IIProbabilistic Topic Models II Using topic models in machine learning and data mining III Using topic models in psychology IV Conclusion

Parameters Real World Data P( Data | Parameters) P( Parameters | Data) Probabilistic Model Statistical Inference

Document generation as a probabilistic process Each topic is a distribution over words Each document a mixture of topics Each word chosen from a single topic From parameters  (j) From parameters  (d)

Prior Distributions Dirichlet priors encourage sparsity on topic mixtures and topics θ ~ Dirichlet(  )  ~ Dirichlet(  ) Topic 1 Topic 2 Topic 3 Word 1 Word 2 Word 3

Topics  MONEY 1 BANK 1 BANK 1 LOAN 1 BANK 1 MONEY 1 BANK 1 MONEY 1 BANK 1 LOAN 1 LOAN 1 BANK 1 MONEY Mixtures θ Documents and topic assignments RIVER 2 MONEY 1 BANK 2 STREAM 2 BANK 2 BANK 1 MONEY 1 RIVER 2 MONEY 1 BANK 2 LOAN 1 MONEY RIVER 2 BANK 2 STREAM 2 BANK 2 RIVER 2 BANK Example of generating words

Topics  ? ? MONEY ? BANK BANK ? LOAN ? BANK ? MONEY ? BANK ? MONEY ? BANK ? LOAN ? LOAN ? BANK ? MONEY ?.... Mixtures θ RIVER ? MONEY ? BANK ? STREAM ? BANK ? BANK ? MONEY ? RIVER ? MONEY ? BANK ? LOAN ? MONEY ?.... RIVER ? BANK ? STREAM ? BANK ? RIVER ? BANK ?.... Inference Documents and topic assignments ?

Bayesian Inference Three sets of latent variables topic mixtures θ word distributions  topic assignments z Integrate out θ and  and estimate topic assignments: Use MCMC with Gibbs sampling for approximate inference Sum over terms

Gibbs Sampling Start with random assignments of words to topics Repeat M iterations Repeat for all words i Sample a new topic assignment for word i conditioned on all other topic assignments

16 Artificial Documents Can we recover the original topics and topic mixtures from this data? documents

Starting the Gibbs Sampling Assign word tokens randomly to topics (●=topic 1; ●=topic 2 )

After 1 iteration

After 4 iterations

After 32 iterations● ●

Choosing the number of topics Bayesian Model Selection: choose T to maximize: Requires integration over all parameters. Approximate by Gibbs sampling Results with artificial data with 10 topics:

Overview IProblems of interest IIProbabilistic Topic Models II Using topic models in machine learning and data mining III Using topic models in psychology IV Conclusion

Extracting Topics from 250,000 s 28,000 authors Enron TEXANS WIN FOOTBALL FANTASY SPORTSLINE PLAY TEAM GAME SPORTS GAMES GOD LIFE MAN PEOPLE CHRIST FAITH LORD JESUS SPIRITUAL VISIT TRAVEL ROUNDTRIP SAVE DEALS HOTEL BOOK SALE FARES TRIP CITIES FERC MARKET ISO COMMISSION ORDER FILING COMMENTS PRICE CALIFORNIA FILED POWER CALIFORNIA ELECTRICITY UTILITIES PRICES MARKET PRICE UTILITY CUSTOMERS ELECTRIC STATE PLAN CALIFORNIA DAVIS RATE BANKRUPTCY SOCAL POWER BONDS MOU

Topic trends from New York Times TOUR RIDER LANCE_ARMSTRONG TEAM BIKE RACE FRANCE Tour-de-France COMPANY QUARTER PERCENT ANALYST SHARE SALES EARNING Quarterly Earnings ANTHRAX LETTER MAIL WORKER OFFICE SPORES POSTAL BUILDING Anthrax 330,000 articles

Topic trends in NIPS conference KERNEL SUPPORT VECTOR MARGIN SVM KERNELS SPACE DATA MACHINES... while SVM’s become more popular. LAYER NET NEURAL LAYERS NETS ARCHITECTURE NUMBER FEEDFORWARD SINGLE Neural networks on the decline...

Finding Funding Overlap Analyze 22,000+ grants active during funding programs from NSF and NIH Focus on behavioral sciences Questions of interest: What are the areas of overlap between funding programs? What topics are funded?

NIH NSF – BIO NSF – SBE 2D visualization of funding programs – nearby program support similar topics

Funding Amounts per Topic We have $ funding per grant We have distribution of topics for each grant Solve for the $ amount per topic  What are expensive topics?

High $$$ topicsLow $$$ topics

Overview IProblems of interest IIProbabilistic Topic Models II Using topic models in machine learning and data mining III Using topic models in psychology IV Conclusion

Basic Problems for Semantic Memory Prediction what fact, concept, or word is next? Gist extraction What is this set of words about? Disambiguation What is the sense of this word?

TASA Corpus Applied model to educational text corpus: TASA 33K docs 6M words

Three documents with the word “play” (numbers & colors  topic assignments)

Word Association CUE: PLAY (Nelson, McEvoy, & Schreiber USF word association norms) Responses Association networks BASEBALL BAT BALL GAME PLAY STAGE THEATER

Topic model for Word Association Association as a problem of prediction: Given that a single word is observed, predict what other words might occur in that context Under a single topic assumption: Response Cue

Model predictions from TASA corpus RANK 9

Median rank of first associate TOPICS

Latent Semantic Analysis (Landauer & Dumais, 1997) word-document counts high dimensional space SVD Each word is a single point in semantic space Similarity measured by cosine of angle between word vectors BALL PLAY STAGE THEATER GAME

Median rank of first associate TOPICS LSA

Objections to spatial representations Tversky (1977) on similarity: asymmetry violation of the triangle inequality rich neighborhood structure Semantic association has the same properties asymmetry violation of the triangle inequality “small world” neighborhood structure (Steyvers & Tenenbaum, 2005)

Topics as latent factors to group words BASEBALL BAT BALL GAME STAGE THEATER TOPIC PLAY BASEBALL BAT BALL GAME PLAY STAGE THEATER Note: there are no direct connections between words

What about collocations? Why are these words related? DOW - JONES BUMBLE - BEE WHITE – HOUSE Suggests at least two routes for association: Semantic Collocation  Integrate collocations into topic model Related to paradigmatic/syntagmatic distinction

TOPIC MIXTURE TOPIC WORD X TOPIC WORD X TOPIC WORD If x=0, sample a word from the topic If x=1, sample a word from the distribution based on previous word Collocation Topic Model

TOPIC MIXTURE TOPIC DOW X=1 JONES X=0 TOPIC RISES Example: “DOW JONES RISES” JONES is more likely explained as a word following DOW than as word sampled from topic Result: DOW_JONES recognized as collocation Collocation Topic Model

Examples Topics from New York Times WEEK DOW_JONES POINTS 10_YR_TREASURY_YIELD PERCENT CLOSE NASDAQ_COMPOSITE STANDARD_POOR CHANGE FRIDAY DOW_INDUSTRIALS GRAPH_TRACKS EXPECTED BILLION NASDAQ_COMPOSITE_INDEX EST_02 PHOTO_YESTERDAY YEN _STOCK_INDEX WALL_STREET ANALYSTS INVESTORS FIRM GOLDMAN_SACHS FIRMS INVESTMENT MERRILL_LYNCH COMPANIES SECURITIES RESEARCH STOCK BUSINESS ANALYST WALL_STREET_FIRMS SALOMON_SMITH_BARNEY CLIENTS INVESTMENT_BANKING INVESTMENT_BANKERS INVESTMENT_BANKS SEPT_11 WAR SECURITY IRAQ TERRORISM NATION KILLED AFGHANISTAN ATTACKS OSAMA_BIN_LADEN AMERICAN ATTACK NEW_YORK_REGION NEW MILITARY NEW_YORK WORLD NATIONAL QAEDA TERRORIST_ATTACKS BANKRUPTCY CREDITORS BANKRUPTCY_PROTECTION ASSETS COMPANY FILED BANKRUPTCY_FILING ENRON BANKRUPTCY_COURT KMART CHAPTER_11 FILING COOPER BILLIONS COMPANIES BANKRUPTCY_PROCEEDINGS DEBTS RESTRUCTURING CASE GROUP TerrorismWall Street FirmsStock MarketBankruptcy

Context dependent collocations Example: WHITE - HOUSE In context of government, WHITE-HOUSE is a collocation In context of colors of houses, WHITE - HOUSE is treated as separate words

Overview IProblems of interest IIProbabilistic Topic Models II Using topic models in machine learning and data mining III Using topic models in psychology IV Conclusion

Psychology  Computer Science Models for information retrieval might be helpful in understanding semantic memory Research on semantic memory can be useful for developing better information retrieval algorithms

Software Public-domain MATLAB toolbox for topic modeling on the Web:

Gibbs Sampler Stability

Comparing topics from two runs KL distance Topics from Run 1 Re-ordered Topics Run 2 BEST KL =.46 WORST KL = 9.40

Extensions of Topic Models Combining topics and syntax Topic hierarchies Topic segmentation no need for document boundaries Modeling authors as well as documents who wrote this part of the paper?

Words can have high probability in multiple topics PRINTING PAPER PRINT PRINTED TYPE PROCESS INK PRESS IMAGE PRINTER PRINTS PRINTERS COPY COPIES FORM OFFSET GRAPHIC SURFACE PRODUCED CHARACTERS PLAY PLAYS STAGE AUDIENCE THEATER ACTORS DRAMA SHAKESPEARE ACTOR THEATRE PLAYWRIGHT PERFORMANCE DRAMATIC COSTUMES COMEDY TRAGEDY CHARACTERS SCENES OPERA PERFORMED TEAM GAME BASKETBALL PLAYERS PLAYER PLAY PLAYING SOCCER PLAYED BALL TEAMS BASKET FOOTBALL SCORE COURT GAMES TRY COACH GYM SHOT JUDGE TRIAL COURT CASE JURY ACCUSED GUILTY DEFENDANT JUSTICE EVIDENCE WITNESSES CRIME LAWYER WITNESS ATTORNEY HEARING INNOCENT DEFENSE CHARGE CRIMINAL HYPOTHESIS EXPERIMENT SCIENTIFIC OBSERVATIONS SCIENTISTS EXPERIMENTS SCIENTIST EXPERIMENTAL TEST METHOD HYPOTHESES TESTED EVIDENCE BASED OBSERVATION SCIENCE FACTS DATA RESULTS EXPLANATION STUDY TEST STUDYING HOMEWORK NEED CLASS MATH TRY TEACHER WRITE PLAN ARITHMETIC ASSIGNMENT PLACE STUDIED CAREFULLY DECIDE IMPORTANT NOTEBOOK REVIEW (Based on TASA corpus)

Problems with Spatial Representations

Violation of triangle inequality Can find similarity judgments that violate this: A BC AC  AB + BC

Violation of triangle inequality Can find associations that violate this: A BC AC  AB + BC SOCCER FIELD MAGNETIC

No Triangle Inequality with Topics SOCCER MAGNETIC FIELD TOPIC 1 TOPIC 2 Topic structure easily explains violations of triangle inequality

Small-World Structure of Associations (Steyvers & Tenenbaum, 2005) Properties: 1) Short path lengths 2) Clustering 3) Power law degree distributions Small world graphs arise elsewhere: internet, social relations, biology BASEBALL BAT BALL GAME PLAY STAGE THEATER

Power law degree distributions in Semantic Networks Power law degree distribution  some words are “hubs” in a semantic network

Creating Association Networks TOPICS MODEL: Connect i to j when P( w=j | w=i ) > threshold LSA: For each word, generate K associates by picking K nearest neighbors in semantic space  =-2.05

Associations in free recall STUDY THESE WORDS: Bed, Rest, Awake, Tired, Dream, Wake, Snooze, Blanket, Doze, Slumber, Snore, Nap, Peace, Yawn, Drowsy RECALL WORDS..... FALSE RECALL: “Sleep” 61%

Recall as a reconstructive process Reconstruct study list based on the stored “gist” The gist can be represented by a distribution over topics Under a single topic assumption: Retrieved word Study list

Predictions for the “Sleep” list STUDY LIST EXTRA LIST (top 8)

Hidden Markov Topic Model

Hidden Markov Topics Model Syntactic dependencies  short range dependencies Semantic dependencies  long-range  zz zz zz zz ww ww ww ww ss ss ss ss Semantic state: generate words from topic model Syntactic states: generate words from HMM (Griffiths, Steyvers, Blei, & Tenenbaum, 2004)

HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY 0.2 z = SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK 0.2 RESEARCH 0.2 MATHEMATICS 0.2 z = THE 0.6 A 0.3 MANY 0.1 OF 0.6 FOR 0.3 BETWEEN Transition between semantic state and syntactic states

THE ……………………………… HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY 0.2 z = SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK 0.2 RESEARCH 0.2 MATHEMATICS 0.2 z = x = 1 THE 0.6 A 0.3 MANY 0.1 x = 3 OF 0.6 FOR 0.3 BETWEEN 0.1 x = Combining topics and syntax

THE LOVE…………………… HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY 0.2 z = SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK 0.2 RESEARCH 0.2 MATHEMATICS 0.2 z = x = 1 THE 0.6 A 0.3 MANY 0.1 x = 3 OF 0.6 FOR 0.3 BETWEEN 0.1 x = Combining topics and syntax

THE LOVE OF……………… HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY 0.2 z = SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK 0.2 RESEARCH 0.2 MATHEMATICS 0.2 z = x = 1 THE 0.6 A 0.3 MANY 0.1 x = 3 OF 0.6 FOR 0.3 BETWEEN 0.1 x = Combining topics and syntax

THE LOVE OF RESEARCH …… HEART 0.2 LOVE 0.2 SOUL 0.2 TEARS 0.2 JOY 0.2 z = SCIENTIFIC 0.2 KNOWLEDGE 0.2 WORK 0.2 RESEARCH 0.2 MATHEMATICS 0.2 z = x = 1 THE 0.6 A 0.3 MANY 0.1 x = 3 OF 0.6 FOR 0.3 BETWEEN 0.1 x = Combining topics and syntax

FOOD FOODS BODY NUTRIENTS DIET FAT SUGAR ENERGY MILK EATING FRUITS VEGETABLES WEIGHT FATS NEEDS CARBOHYDRATES VITAMINS CALORIES PROTEIN MINERALS MAP NORTH EARTH SOUTH POLE MAPS EQUATOR WEST LINES EAST AUSTRALIA GLOBE POLES HEMISPHERE LATITUDE PLACES LAND WORLD COMPASS CONTINENTS DOCTOR PATIENT HEALTH HOSPITAL MEDICAL CARE PATIENTS NURSE DOCTORS MEDICINE NURSING TREATMENT NURSES PHYSICIAN HOSPITALS DR SICK ASSISTANT EMERGENCY PRACTICE BOOK BOOKS READING INFORMATION LIBRARY REPORT PAGE TITLE SUBJECT PAGES GUIDE WORDS MATERIAL ARTICLE ARTICLES WORD FACTS AUTHOR REFERENCE NOTE GOLD IRON SILVER COPPER METAL METALS STEEL CLAY LEAD ADAM ORE ALUMINUM MINERAL MINE STONE MINERALS POT MINING MINERS TIN BEHAVIOR SELF INDIVIDUAL PERSONALITY RESPONSE SOCIAL EMOTIONAL LEARNING FEELINGS PSYCHOLOGISTS INDIVIDUALS PSYCHOLOGICAL EXPERIENCES ENVIRONMENT HUMAN RESPONSES BEHAVIORS ATTITUDES PSYCHOLOGY PERSON CELLS CELL ORGANISMS ALGAE BACTERIA MICROSCOPE MEMBRANE ORGANISM FOOD LIVING FUNGI MOLD MATERIALS NUCLEUS CELLED STRUCTURES MATERIAL STRUCTURE GREEN MOLDS Semantic topics PLANTS PLANT LEAVES SEEDS SOIL ROOTS FLOWERS WATER FOOD GREEN SEED STEMS FLOWER STEM LEAF ANIMALS ROOT POLLEN GROWING GROW

GOOD SMALL NEW IMPORTANT GREAT LITTLE LARGE * BIG LONG HIGH DIFFERENT SPECIAL OLD STRONG YOUNG COMMON WHITE SINGLE CERTAIN THE HIS THEIR YOUR HER ITS MY OUR THIS THESE A AN THAT NEW THOSE EACH MR ANY MRS ALL MORE SUCH LESS MUCH KNOWN JUST BETTER RATHER GREATER HIGHER LARGER LONGER FASTER EXACTLY SMALLER SOMETHING BIGGER FEWER LOWER ALMOST ON AT INTO FROM WITH THROUGH OVER AROUND AGAINST ACROSS UPON TOWARD UNDER ALONG NEAR BEHIND OFF ABOVE DOWN BEFORE SAID ASKED THOUGHT TOLD SAYS MEANS CALLED CRIED SHOWS ANSWERED TELLS REPLIED SHOUTED EXPLAINED LAUGHED MEANT WROTE SHOWED BELIEVED WHISPERED ONE SOME MANY TWO EACH ALL MOST ANY THREE THIS EVERY SEVERAL FOUR FIVE BOTH TEN SIX MUCH TWENTY EIGHT HE YOU THEY I SHE WE IT PEOPLE EVERYONE OTHERS SCIENTISTS SOMEONE WHO NOBODY ONE SOMETHING ANYONE EVERYBODY SOME THEN Syntactic classes BE MAKE GET HAVE GO TAKE DO FIND USE SEE HELP KEEP GIVE LOOK COME WORK MOVE LIVE EAT BECOME

MODEL ALGORITHM SYSTEM CASE PROBLEM NETWORK METHOD APPROACH PAPER PROCESS IS WAS HAS BECOMES DENOTES BEING REMAINS REPRESENTS EXISTS SEEMS SEE SHOW NOTE CONSIDER ASSUME PRESENT NEED PROPOSE DESCRIBE SUGGEST USED TRAINED OBTAINED DESCRIBED GIVEN FOUND PRESENTED DEFINED GENERATED SHOWN IN WITH FOR ON FROM AT USING INTO OVER WITHIN HOWEVER ALSO THEN THUS THEREFORE FIRST HERE NOW HENCE FINALLY #*IXTN-CFP#*IXTN-CFP EXPERTS EXPERT GATING HME ARCHITECTURE MIXTURE LEARNING MIXTURES FUNCTION GATE DATA GAUSSIAN MIXTURE LIKELIHOOD POSTERIOR PRIOR DISTRIBUTION EM BAYESIAN PARAMETERS STATE POLICY VALUE FUNCTION ACTION REINFORCEMENT LEARNING CLASSES OPTIMAL * MEMBRANE SYNAPTIC CELL * CURRENT DENDRITIC POTENTIAL NEURON CONDUCTANCE CHANNELS IMAGE IMAGES OBJECT OBJECTS FEATURE RECOGNITION VIEWS # PIXEL VISUAL KERNEL SUPPORT VECTOR SVM KERNELS # SPACE FUNCTION MACHINES SET NETWORK NEURAL NETWORKS OUPUT INPUT TRAINING INPUTS WEIGHTS # OUTPUTS NIPS Semantics NIPS Syntax

Random sentence generation LANGUAGE: [S] RESEARCHERS GIVE THE SPEECH [S] THE SOUND FEEL NO LISTENERS [S] WHICH WAS TO BE MEANING [S] HER VOCABULARIES STOPPED WORDS [S] HE EXPRESSLY WANTED THAT BETTER VOWEL

Nested Chinese Restaurant Process

Topic Hierarchies In regular topic model, no relations between topics Alternative: hierarchical topic organization topic 1 topic 2 topic 3 topic 4 topic 5 topic 6 topic 7 Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum (2004) Learn hierarchical structure, as well as topics within structure

Example: Psych Review Abstracts RESPONSE STIMULUS REINFORCEMENT RECOGNITION STIMULI RECALL CHOICE CONDITIONING SPEECH READING WORDS MOVEMENT MOTOR VISUAL WORD SEMANTIC ACTION SOCIAL SELF EXPERIENCE EMOTION GOALS EMOTIONAL THINKING GROUP IQ INTELLIGENCE SOCIAL RATIONAL INDIVIDUAL GROUPS MEMBERS SEX EMOTIONS GENDER EMOTION STRESS WOMEN HEALTH HANDEDNESS REASONING ATTITUDE CONSISTENCY SITUATIONAL INFERENCE JUDGMENT PROBABILITIES STATISTICAL IMAGE COLOR MONOCULAR LIGHTNESS GIBSON SUBMOVEMENT ORIENTATION HOLOGRAPHIC CONDITIONIN STRESS EMOTIONAL BEHAVIORAL FEAR STIMULATION TOLERANCE RESPONSES A MODEL MEMORY FOR MODELS TASK INFORMATION RESULTS ACCOUNT SELF SOCIAL PSYCHOLOGY RESEARCH RISK STRATEGIES INTERPERSONAL PERSONALITY SAMPLING MOTION VISUAL SURFACE BINOCULAR RIVALRY CONTOUR DIRECTION CONTOURS SURFACES DRUG FOOD BRAIN AROUSAL ACTIVATION AFFECTIVE HUNGER EXTINCTION PAIN THE OF AND TO IN A IS

Generative Process RESPONSE STIMULUS REINFORCEMENT RECOGNITION STIMULI RECALL CHOICE CONDITIONING SPEECH READING WORDS MOVEMENT MOTOR VISUAL WORD SEMANTIC ACTION SOCIAL SELF EXPERIENCE EMOTION GOALS EMOTIONAL THINKING GROUP IQ INTELLIGENCE SOCIAL RATIONAL INDIVIDUAL GROUPS MEMBERS SEX EMOTIONS GENDER EMOTION STRESS WOMEN HEALTH HANDEDNESS REASONING ATTITUDE CONSISTENCY SITUATIONAL INFERENCE JUDGMENT PROBABILITIES STATISTICAL IMAGE COLOR MONOCULAR LIGHTNESS GIBSON SUBMOVEMENT ORIENTATION HOLOGRAPHIC CONDITIONIN STRESS EMOTIONAL BEHAVIORAL FEAR STIMULATION TOLERANCE RESPONSES A MODEL MEMORY FOR MODELS TASK INFORMATION RESULTS ACCOUNT SELF SOCIAL PSYCHOLOGY RESEARCH RISK STRATEGIES INTERPERSONAL PERSONALITY SAMPLING MOTION VISUAL SURFACE BINOCULAR RIVALRY CONTOUR DIRECTION CONTOURS SURFACES DRUG FOOD BRAIN AROUSAL ACTIVATION AFFECTIVE HUNGER EXTINCTION PAIN THE OF AND TO IN A IS

Other Slides

Markov chain Monte Carlo Sample from a Markov chain constructed to converge to the target distribution Allows sampling from unnormalized posterior Can compute approximate statistics from intractable distributions Gibbs sampling one such method, construct Markov chain with conditional distributions

Enron two example topics (T=100)

Enron two work unrelated topics

Using Topic Models for Information Retrieval

Clusters v. Topics Hidden Markov Models in Molecular Biology: New Algorithms and Applications Pierre Baldi, Yves C Hauvin, Tim Hunkapiller, Marcella A. McClure Hidden Markov Models (HMMs) can be applied to several important problems in molecular biology. We introduce a new convergent learning algorithm for HMMs that, unlike the classical Baum-Welch algorithm is smooth and can be applied on-line or in batch mode, with or without the usual Viterbi most likely path approximation. Left-right HMMs with insertion and deletion states are then trained to represent several protein families including immunoglobulins and kinases. In all cases, the models derived capture all the important statistical properties of the families and can be used efficiently in a number of important tasks such as multiple alignment, motif detection, and classification. [cluster 88] model data models time neural figure state learning set parameters network probability number networks training function system algorithm hidden markov [topic 10] state hmm markov sequence models hidden states probabilities sequences parameters transition probability training hmms hybrid model likelihood modeling [topic 37] genetic structure chain protein population region algorithms human mouse selection fitness proteins search evolution generation function sequence sequences genes Multiple Topics One Cluster

words C = F Q documents words topics documents Topic model words documents C = U D V T words dims documents LSA normalized co-occurrence matrix mixture weights mixture components

Testing violation of triangle inequality Look for triplets of associates w 1 w 2 w 3 such that P( w 2 | w 1 ) >  P( w 3 | w 2 ) >  and measure P( w 3 | w 1 ) Vary threshold 

Documents as Topics Mixtures: a Geometric Interpretation P(word3) P(word1) P(word2) P(word1)+P(word2)+P(word3) = 1 topic 1 topic 2 = observed document

word  pixel document  image sample each pixel from a mixture of topics Generating Artificial Data: a visual example 10 “topics” with 5 x 5 pixels

A subset of documents

Inferred Topics Iterations

Interpretable decomposition SVD gives a basis for the data, but not an interpretable one The true basis is not orthogonal, so rotation does no good Topic model SVD

INPUT:word-document counts (word order is irrelevant) OUTPUT: topic assignments to each wordP( z i ) likely words in each topicP( w | z ) likely topics in each document (“gist”)P( z | d ) Summary

Similarity between 55 funding programs

Example Topics extracted from NIH/NSF grants

Gibbs Sampling count of word w assigned to topic t count of topic t assigned to doc d probability that word i is assigned to topic t