CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,

Slides:

Advertisements

Similar presentations

What Did We See? & WikiGIS Chris Pal University of Massachusetts A Talk for Memex Day MSR Redmond, July 19, 2006.

Advertisements

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.

An Introduction to Conditional Random Field Ching-Chun Hsiao 1.

Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.

Analysis of Contour Motions Ce Liu William T. Freeman Edward H. Adelson Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

John Lafferty, Andrew McCallum, Fernando Pereira

Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.

Qualifying Exam: Contour Grouping Vida Movahedi Supervisor: James Elder Supervisory Committee: Minas Spetsakis, Jeff Edmonds York University Summer 2009.

Data Visualization STAT 890, STAT 442, CM 462

Generative Topic Models for Community Analysis

Data Mining Techniques Outline

CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.

Research Introspection “ICML does ICML” Andrew McCallum Computer Science Department University of Massachusetts Amherst.

Conditional Random Fields

1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.

Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Introduction to machine learning

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.

UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.

Information Retrieval in Practice

LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Toward Unified Models of Information Extraction and Data Mining Andrew McCallum Information Extraction and Synthesis Laboratory Computer Science Department.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Toward Unified Graphical Models of Information Extraction and Data Mining Andrew McCallum Computer Science Department University of Massachusetts Amherst.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

CSE 5539: Web Information Extraction

Graphical models for part of speech tagging

Information Extraction: Distilling Structured Data from Unstructured Text. -Andrew McCallum Presented by Lalit Bist.

IJCAI 2003 Workshop on Learning Statistical Models from Relational Data First-Order Probabilistic Models for Information Extraction Advisor: Hsin-His Chen.

Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.

1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Markov Random Fields Probabilistic Models for Images

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)

Toward Unified Models of Information Extraction and Data Mining Andrew McCallum Information Extraction and Synthesis Laboratory Computer Science Department.

Unified Models of Information Extraction and Data Mining with Application to Social Network Analysis Andrew McCallum Information Extraction and Synthesis.

First-Order Probabilistic Models for Coreference Resolution Aron Culotta Computer Science Department University of Massachusetts Amherst Joint work with.

Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.

Information Extraction, Conditional Random Fields, and Social Network Analysis Andrew McCallum Computer Science Department University of Massachusetts.

CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Machine Learning Margaret H. Dunham Department of Computer Science and Engineering Southern.

School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.

John Lafferty Andrew McCallum Fernando Pereira

Data Mining and Decision Support

Markov Random Fields & Conditional Random Fields

Unified Models of Information Extraction and Data Mining with Application to Social Network Analysis Andrew McCallum Information Extraction and Synthesis.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.

Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.

Brief Intro to Machine Learning CS539

Online Multiscale Dynamic Topic Models

School of Computer Science & Engineering

Data Mining: Concepts and Techniques Course Outline

Basic Intro Tutorial on Machine Learning and Data Mining

Computer Science Department University of Massachusetts Amherst

Expectation-Maximization & Belief Propagation

Speech recognition, machine learning

NER with Models Allowing Long-Range Dependencies

Modeling IDS using hybrid intelligent systems

Speech recognition, machine learning

Presentation transcript:

CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta, Xuerui Wang, Ben Wellner, Fuchun Peng, Michael Hay.

From Text to Actionable Knowledge Segment Classify Associate Cluster Filter Prediction Outlier detection Decision support IE Document collection Database Discover patterns - entity types - links / relations - events Data Mining Spider Actionable knowledge

Segment Classify Associate Cluster Filter Prediction Outlier detection Decision support IE Document collection Database Discover patterns - entity types - links / relations - events Data Mining Spider Actionable knowledge Uncertainty Info Emerging Patterns Joint Inference

An HLT Pipeline SNA, KDD, Events TDT, Summarization Coreference Relations NER Parsing MT ASR Errors cascade & accumulate

An HLT Pipeline SNA, KDD TDT, Summarization Coreference Relations NER Parsing MT ASR Unified, joint inference.

Segment Classify Associate Cluster Filter Prediction Outlier detection Decision support IE Document collection Database Discover patterns - entity types - links / relations - events Data Mining Spider Actionable knowledge Uncertainty Info Emerging Patterns Joint Inference

Segment Classify Associate Cluster Filter Prediction Outlier detection Decision support IE Document collection Probabilistic Model Discover patterns - entity types - links / relations - events Data Mining Spider Actionable knowledge Solution: Conditional Random Fields [Lafferty, McCallum, Pereira] Conditional PRMs [Koller…], [Jensen…], [Geetor…], [Domingos…] Discriminatively-trained undirected graphical models Complex Inference and Learning Just what we researchers like to sink our teeth into! Unified Model

(Linear Chain) Conditional Random Fields y t-1 y t x t y t+1 x t +1 x t - 1 Finite state modelGraphical model Undirected graphical model, trained to maximize conditional probability of output sequence given input sequence... FSM states observations y t+2 x t +2 y t+3 x t +3 said Jones a Microsoft VP … OTHER PERSON OTHER ORG TITLE … output seq input seq Asian word segmentation [COLING’04], [ACL’04] IE from Research papers [HTL’04] Object classification in images [CVPR ‘04] Wide-spread interest, positive experimental results in many applications. Noun phrase, Named entity [HLT’03], [CoNLL’03] Protein structure prediction [ICML’04] IE from Bioinformatics text [Bioinformatics ‘04],… [Lafferty, McCallum, Pereira 2001] where

Outline Motivating Joint Inference for NLP. Brief introduction of Conditional Random Fields Joint inference: Motivation and examples –Joint Labeling of Cascaded Sequences (Belief Propagation) –Joint Labeling of Distant Entities (BP by Tree Reparameterization) –Joint Co-reference Resolution (Graph Partitioning) –Joint Segmentation and Co-ref (Sparse BP) –Joint Extraction and Data Mining (Iterative) Topical N-gram models  

Jointly labeling cascaded sequences Factorial CRFs Part-of-speech Noun-phrase boundaries Named-entity tag English words [Sutton, Khashayar, McCallum, ICML 2004]

Jointly labeling cascaded sequences Factorial CRFs Part-of-speech Noun-phrase boundaries Named-entity tag English words [Sutton, Khashayar, McCallum, ICML 2004]

Jointly labeling cascaded sequences Factorial CRFs Part-of-speech Noun-phrase boundaries Named-entity tag English words [Sutton, Khashayar, McCallum, ICML 2004] But errors cascade--must be perfect at every stage to do well.

Jointly labeling cascaded sequences Factorial CRFs Part-of-speech Noun-phrase boundaries Named-entity tag English words [Sutton, Khashayar, McCallum, ICML 2004] Joint prediction of part-of-speech and noun-phrase in newswire, matching accuracy with only 50% of the training data. Inference: Loopy Belief Propagation

2. Jointly labeling distant mentions Skip-chain CRFs Senator Joe Green said today …. Green ran for … … [Sutton, McCallum, SRL 2004] Dependency among similar, distant mentions ignored.

2. Jointly labeling distant mentions Skip-chain CRFs Senator Joe Green said today …. Green ran for … … [Sutton, McCallum, SRL 2004] 14% reduction in error on most repeated field in seminar announcements. Inference: Tree reparameterization BP [Wainwright et al, 2002] See also [Finkel, et al, 2005]

3. Joint co-reference among all pairs Affinity Matrix CRF... Mr Powell Powell she  99 Y/N 11 [McCallum, Wellner, IJCAI WS 2003, NIPS 2004] ~25% reduction in error on co-reference of proper nouns in newswire. Inference: Correlational clustering graph partitioning [Bansal, Blum, Chawla, 2002] “Entity resolution” “Object correspondence”

Transfer Learning with Factorial CRFs ed seminar entities English words Too little labeled training data. 60k words training. From: Terri Stankus To: Date: 26 Feb 1992 GRAND CHALLENGES FOR MACHINE LEARNING Jaime Carbonell School of Computer Science Carnegie Mellon University 3:30 pm 7500 Wean Hall Machine learning has evolved from obscurity in the 1970s into a vibrant and popular discipline in artificial intelligence during the 1980s and 1990s. As a result of its success and growth, machine learning is evolving into a collection of related disciplines: inductive concept acquisition, analytic learning in problem solving (e.g. analogy, explanation-based learning), learning theory (e.g. PAC learning), genetic algorithms, connectionist learning, hybrid systems, and so on. [Sutton, McCallum, 2005]

Newswire named entities Newswire English words [Sutton, McCallum, 2005] Train on “related” task with more data. 200k words training. CRICKET - MILLNS SIGNS FOR BOLAND CAPE TOWN ( ) South African provincial side Boland said on Thursday they had signed Leicestershire fast bowler David Millns on a one year contract. Millns, who toured Australia with England A in 1992, replaces former England all-rounder Phillip DeFreitas as Boland's overseas professional. Transfer Learning with Factorial CRFs

Newswire named entities English words [Sutton, McCallum, 2005] At test time, label with newswire NEs... Transfer Learning with Factorial CRFs

Newswire named entities ed seminar ann’mt entities English words [Sutton, McCallum, 2005] …then use these labels as features for final task Transfer Learning with Factorial CRFs

Newswire named entities Seminar Announcement entities English words [Sutton, McCallum, 2005] Use joint inference at test time. An alternative to hierarchical Bayes. Needn’t know anything about parameterization of subtask. Accuracy No transfer < Cascaded Transfer < Joint Inference Transfer Transfer Learning with Factorial CRFs 11% Reduction in Error

p Database field values c 4. Joint segmentation and co-reference o s o s c c s o Citation attributes y y y Segmentation [Wellner, McCallum, Peng, Hay, UAI 2004] Inference: Sparse Generalized Belief Propagation Co-reference decisions Laurel, B. Interface Agents: Metaphors with Character, in The Art of Human-Computer Interface Design, B. Laurel (ed), Addison- Wesley, Brenda Laurel. Interface Agents: Metaphors with Character, in Laurel, The Art of Human-Computer Interface Design, , [Pal, Sutton, McCallum, 2005] World Knowledge 35% reduction in co-reference error by using segmentation uncertainty. 6-14% reduction in segmentation error by using co-reference. Extraction from and matching of research paper citations. see also [Marthi, Milch, Russell, 2003]

Joint IE and Coreference from Research Paper Citations Textual citation mentions (noisy, with duplicates) Paper database, with fields, clean, duplicates collapsed AUTHORS TITLE VENUE Cowell, Dawid… Probab…Springer Montemerlo, Thrun…FastSLAM… AAAI… Kjaerulff Approxi… Technic… 4. Joint segmentation and co-reference

Laurel, B. Interface Agents: Metaphors with Character, in The Art of Human-Computer Interface Design, T. Smith (ed), Addison-Wesley, Brenda Laurel. Interface Agents: Metaphors with Character, in Smith, The Art of Human-Computr Interface Design, , Citation Segmentation and Coreference

Laurel, B. Interface Agents: Metaphors with Character, in The Art of Human-Computer Interface Design, T. Smith (ed), Addison-Wesley, Brenda Laurel. Interface Agents: Metaphors with Character, in Smith, The Art of Human-Computr Interface Design, , ) Segment citation fields Citation Segmentation and Coreference

Laurel, B. Interface Agents: Metaphors with Character, in The Art of Human-Computer Interface Design, T. Smith (ed), Addison-Wesley, Brenda Laurel. Interface Agents: Metaphors with Character, in Smith, The Art of Human-Computr Interface Design, , ) Segment citation fields 2) Resolve coreferent citations Citation Segmentation and Coreference Y?NY?N

Laurel, B. Interface Agents: Metaphors with Character, in The Art of Human-Computer Interface Design, T. Smith (ed), Addison-Wesley, Brenda Laurel. Interface Agents: Metaphors with Character, in Smith, The Art of Human-Computr Interface Design, , Citation Segmentation and Coreference Y?NY?N Segmentation QualityCitation Co-reference (F1) No Segmentation78% CRF Segmentation91% True Segmentation93% 1) Segment citation fields 2) Resolve coreferent citations

Laurel, B. Interface Agents: Metaphors with Character, in The Art of Human-Computer Interface Design, T. Smith (ed), Addison-Wesley, Brenda Laurel. Interface Agents: Metaphors with Character, in Smith, The Art of Human-Computr Interface Design, , ) Segment citation fields 2) Resolve coreferent citations 3) Form canonical database record Citation Segmentation and Coreference AUTHOR =Brenda Laurel TITLE =Interface Agents: Metaphors with Character PAGES = BOOKTITLE =The Art of Human-Computer Interface Design EDITOR =T. Smith PUBLISHER =Addison-Wesley YEAR =1990 Y?NY?N Resolving conflicts

Laurel, B. Interface Agents: Metaphors with Character, in The Art of Human-Computer Interface Design, T. Smith (ed), Addison-Wesley, Brenda Laurel. Interface Agents: Metaphors with Character, in Smith, The Art of Human-Computr Interface Design, , ) Segment citation fields 2) Resolve coreferent citations 3) Form canonical database record Citation Segmentation and Coreference AUTHOR =Brenda Laurel TITLE =Interface Agents: Metaphors with Character PAGES = BOOKTITLE =The Art of Human-Computer Interface Design EDITOR =T. Smith PUBLISHER =Addison-Wesley YEAR =1990 Y?NY?N Perform jointly.

x s Observed citation CRF Segmentation IE + Coreference Model J Besag 1986 On the… AUT AUT YR TITL TITL

x s Observed citation CRF Segmentation IE + Coreference Model Citation mention attributes J Besag 1986 On the… AUTHOR = “J Besag” YEAR = “1986” TITLE = “On the…” c

x s IE + Coreference Model c J Besag 1986 On the… Smyth Data Mining… Smyth, P Data mining… Structure for each citation mention

x s IE + Coreference Model c Binary coreference variables for each pair of mentions J Besag 1986 On the… Smyth Data Mining… Smyth, P Data mining…

x s IE + Coreference Model c y n n J Besag 1986 On the… Smyth Data Mining… Smyth, P Data mining… Binary coreference variables for each pair of mentions

y n n x s IE + Coreference Model c J Besag 1986 On the… Smyth Data Mining… Smyth, P Data mining… Research paper entity attribute nodes AUTHOR = “P Smyth” YEAR = “2001” TITLE = “Data Mining…”...

y y y x s IE + Coreference Model c J Besag 1986 On the… Smyth Data Mining… Smyth, P Data mining… Research paper entity attribute node

y n n x s IE + Coreference Model c J Besag 1986 On the… Smyth Data Mining… Smyth, P Data mining…

Inference by Sparse “Generalized BP” Exact inference on these linear-chain regions J Besag 1986 On the… Smyth Data Mining… Smyth, P Data mining… From each chain pass an N-best List into coreference [Pal, Sutton, McCallum 2005]

Inference by Sparse “Generalized BP” J Besag 1986 On the… Smyth Data Mining… Smyth, P Data mining… Approximate inference by graph partitioning… …integrating out uncertainty in samples of extraction Make scale to 1M citations with Canopies [McCallum, Nigam, Ungar 2000] [Pal, Sutton, McCallum 2005]

NameTitle… Laurel, BInterface Agents: Metaphors with Character The … Laurel, B.Interface Agents: Metaphors with Character … Laurel, B. Interface Agents Metaphors with Character … When calculating similarity with another citation, have more opportunity to find correct, matching fields. NameTitleBook TitleYear Laurel, B. InterfaceAgents: Metaphors with Character The Art of Human Computer Interface Design 1990 Laurel, B.Interface Agents: Metaphors with Character The Art of Human Computer Interface Design 1990 Laurel, B. InterfaceAgents: Metaphors with Character The Art of Human Computer Interface Design 1990 Inference: Sample = N-best List from CRF Segmentation y ? n

y n n Inference by Sparse “Generalized BP” J Besag 1986 On the… Smyth Data Mining… Smyth, P Data mining… Exact (exhaustive) inference over entity attributes [Pal, Sutton, McCallum 2005]

y n n Inference by Sparse “Generalized BP” J Besag 1986 On the… Smyth Data Mining… Smyth, P Data mining… Revisit exact inference on IE linear chain, now conditioned on entity attributes [Pal, Sutton, McCallum 2005]

y n n Parameter Estimation: Piecewise Training Coref graph edge weights MAP on individual edges Divide-and-conquer parameter estimation IE Linear-chain Exact MAP Entity attribute potentials MAP, pseudo-likelihood In all cases: Climb MAP gradient with quasi-Newton method [Sutton & McCallum 2005]

p Database field values c 4. Joint segmentation and co-reference o s o s c c s o Citation attributes y y y Segmentation [Wellner, McCallum, Peng, Hay, UAI 2004] Inference: Variant of Iterated Conditional Modes Co-reference decisions Laurel, B. Interface Agents: Metaphors with Character, in The Art of Human-Computer Interface Design, B. Laurel (ed), Addison- Wesley, Brenda Laurel. Interface Agents: Metaphors with Character, in Laurel, The Art of Human-Computer Interface Design, , [Besag, 1986] World Knowledge 35% reduction in co-reference error by using segmentation uncertainty. 6-14% reduction in segmentation error by using co-reference. Extraction from and matching of research paper citations.

Outline Motivating Joint Inference for NLP. Brief introduction of Conditional Random Fields Joint inference: Motivation and examples –Joint Labeling of Cascaded Sequences (Belief Propagation) –Joint Labeling of Distant Entities (BP by Tree Reparameterization) –Joint Co-reference Resolution (Graph Partitioning) –Joint Segmentation and Co-ref (Sparse BP) –Joint Extraction and Data Mining (Iterative) Topical N-gram models  

“George W. Bush’s father is George H. W. Bush (son of Prescott Bush).”

?

Relation Extraction as Sequence Labeling George W. Bush …George H. W. Bush (son of Prescott Bush) … FatherGrandfather

Learning Relational Database Features George W. Bush …George H. W. Bush (son of Prescott Bush) … FatherGrandfather NameSon Prescott BushGeorge H. W. Bush George W. Bush Search DB for “relational paths” between subject and token Subject_Is_SonOf_SonOf_Token=1.0

Highly weighted relational paths Many Family equivalences –Sibling=Parent_Offspring –Cousin=Parent_Sibling_Offspring College=Parent_College Religion=Parent_Religion Ally=Opponent_Opponent Friend=Person_Same_School Preliminary results: nice performance boost using relational features (~8% absolute F1)

Testing on Unknown Entities John F. Kennedy … son of Joseph P. Kennedy, Sr. and Rose Fitzgerald NameSon Joseph P. KennedyJohn F. Kennedy Rose FitzgeraldJohn F. Kennedy FatherMother Fill DB with “first-pass” CRF Use relational features with “second-pass” CRF

Next Steps Feature induction to discover complex rules Measure relational features’ sensitivity to noise in DB Collective inference among related relations

Outline Motivating Joint Inference for NLP. Brief introduction of Conditional Random Fields Joint inference: Motivation and examples –Joint Labeling of Cascaded Sequences (Belief Propagation) –Joint Labeling of Distant Entities (BP by Tree Reparameterization) –Joint Co-reference Resolution (Graph Partitioning) –Joint Segmentation and Co-ref (Sparse BP) –Joint Extraction and Data Mining (Iterative) Topical N-gram models   

Topical N-gram Model - Our first attempt z1z1 z2z2 z3z3 z4z4 w1w1 w2w2 w3w3 w4w4 y1y1 y2y2 y3y3 y4y4  11 T D...  W T W  11 22  22 {0, 1, 1:2, 2:2, 1:3, 2:3, 3:3} Wang & McCallum

Beyond bag-of-words z1z1 z2z2 z3z3 z4z4 w1w1 w2w2 w3w3 w4w4   TW D  ... Wallach

LDA-COL (Collocation) Model z1z1 z2z2 z3z3 z4z4 w1w1 w2w2 w3w3 w4w4 y1y1 y2y2 y3y3 y4y4  11 22 T Griffiths & Steyvers D  11 22 W W ...

Topical N-gram Model z1z1 z2z2 z3z3 z4z4 w1w1 w2w2 w3w3 w4w4 y1y1 y2y2 y3y3 y4y4  11 T D...  W T W  11 22  22 Wang & McCallum

Topical N-gram Model z1z1 z2z2 z3z3 z4z4 w1w1 w2w2 w3w3 w4w4 y1y1 y2y2 y3y3 y4y4  11 T D...  W T W  11 22  22 Wang & McCallum

Topic Comparison learning optimal reinforcement state problems policy dynamic action programming actions function markov methods decision rl continuous spaces step policies planning LDA reinforcement learning optimal policy dynamic programming optimal control function approximator prioritized sweeping finite-state controller learning system reinforcement learning RL function approximators markov decision problems markov decision processes local search state-action pair markov decision process belief states stochastic policy action selection upright position reinforcement learning methods policy action states actions function reward control agent q-learning optimal goal learning space step environment system problem steps sutton policies Topical N-grams (2+)Topical N-grams (1)

Topic Comparison motion visual field position figure direction fields eye location retina receptive velocity vision moving system flow edge center light local LDA receptive field spatial frequency temporal frequency visual motion motion energy tuning curves horizontal cells motion detection preferred direction visual processing area mt visual cortex light intensity directional selectivity high contrast motion detectors spatial phase moving stimuli decision strategy visual stimuli motion response direction cells stimulus figure contrast velocity model responses stimuli moving cell intensity population image center tuning complex directions Topical N-grams (2+)Topical N-grams (1)

Topic Comparison word system recognition hmm speech training performance phoneme words context systems frame trained speaker sequence speakers mlp frames segmentation models LDA speech recognition training data neural network error rates neural net hidden markov model feature vectors continuous speech training procedure continuous speech recognition gamma filter hidden control speech production neural nets input representation output layers training algorithm test set speech frames speaker dependent speech word training system recognition hmm speaker performance phoneme acoustic words context systems frame trained sequence phonetic speakers mlp hybrid Topical N-grams (2+)Topical N-grams (1)

Summary Joint inference can avoid accumulating errors in an pipeline from extraction to data mining. Examples –Factorial finite state models –Jointly labeling distant entities –Coreference analysis –Segmentation uncertainty aiding coreference & vice-versa –Joint Extraction and Data Mining Many examples of sequential topic models.