Research Introspection “ICML does ICML” Andrew McCallum Computer Science Department University of Massachusetts Amherst.

Slides:



Advertisements
Similar presentations
Learning Clusterwise Similarity with First-Order Features Aron Culotta and Andrew McCallum University of Massachusetts - Amherst NIPS Workshop on Theoretical.
Advertisements

What Did We See? & WikiGIS Chris Pal University of Massachusetts A Talk for Memex Day MSR Redmond, July 19, 2006.
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Markov Logic Networks Instructor: Pedro Domingos.
Conditional Random Fields - A probabilistic graphical model Stefan Mutter Machine Learning Group Conditional Random Fields - A probabilistic graphical.
Data Visualization STAT 890, STAT 442, CM 462
Generative Topic Models for Community Analysis
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
CSE 574: Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
INTRODUCTION TO Machine Learning 3rd Edition
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.
CRFs and Joint Inference in NLP Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Charles Sutton, Aron Culotta,
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Data Mining Techniques
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
COMP 875 Machine Learning Methods in Image Analysis.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
Bibliometric Impact Measures Leveraging Topic Analysis Gideon Mann David Mimno Andrew McCallum Computer Science Department University of Massachusetts.
Graphical models for part of speech tagging
Information Extraction: Distilling Structured Data from Unstructured Text. -Andrew McCallum Presented by Lalit Bist.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Structural.
IJCAI 2003 Workshop on Learning Statistical Models from Relational Data First-Order Probabilistic Models for Information Extraction Advisor: Hsin-His Chen.
A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance Andrew McCallum Kedar Bellare Fernando Pereira Thanks to Charles.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Toward Unified Models of Information Extraction and Data Mining Andrew McCallum Information Extraction and Synthesis Laboratory Computer Science Department.
First-Order Probabilistic Models for Coreference Resolution Aron Culotta Computer Science Department University of Massachusetts Amherst Joint work with.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Bibliometric Impact Measures Leveraging Topic Analysis Gideon Mann David Mimno Andrew McCallum Computer Science Department University of Massachusetts.
Anomaly Detection in GPS Data Based on Visual Analytics Kyung Min Su - Zicheng Liao, Yizhou Yu, and Baoquan Chen, Anomaly Detection in GPS Data Based on.
1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
PAIR project progress report Yi-Ting Chou Shui-Lung Chuang Xuanhui Wang.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Learning Bayesian Networks for Complex Relational Data
Brief Intro to Machine Learning CS539
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
Recommendation in Scholarly Big Data
Semi-Supervised Clustering
Knowledge Discovery, Machine Learning, and Social Mining
Eick: Introduction Machine Learning
School of Computer Science & Engineering
中国计算机学会学科前沿讲习班:信息检索 Course Overview
Course Summary (Lecture for CS410 Intro Text Info Systems)
CH. 1: Introduction 1.1 What is Machine Learning Example:
Information Extraction, Data Mining and Joint Inference
Basic Intro Tutorial on Machine Learning and Data Mining
Bibliometric Impact Measures Leveraging Topic Analysis
Bring Order to The Web Ruey-Lung, Hsiao May 4 , 2000.
Overview of Machine Learning
Christoph F. Eick: A Gentle Introduction to Machine Learning
Statistical Relational AI
Presentation transcript:

Research Introspection “ICML does ICML” Andrew McCallum Computer Science Department University of Massachusetts Amherst

Relational Modeling of the Research Literature & other Entities Better understand structure of our own research area. Tools to help us learn a new sub-field. Aid collaboration Map how ideas travel through social networks of researchers. Aids for hiring and finding reviewers! Many opportunities for rich relational learning... in a domain we understand well.

Previous Systems

Research Paper Cites Previous Systems

Research Paper Cites Person UniversityVenue Grant Groups Expertise More Entities and Relations

Our Status So Far Over 1.6 million research papers, gathered as part of Rexa.info portal. Cross linked papers / people / grants / topics.

Rexa System Overview Reference resolution (of papers, authors & grants) Spider Web for PDFs Convert to text (with layout & format) Extract metadata (title, authors, abstract, venue, citations; 14 fields in total) Browsable Web Interface Topic Analysis & other Data Mining WWW Home-grown Java+MySQL (~1m PDF/day) Enhanced ps2text (better word stiching, plus layout in XML) Conditional Random Fields (99% word accuracy) NSF grant DB Discriminatively trained graph partitioning (competition-winning accuracy)

From Text to Actionable Knowledge Segment Classify Associate Cluster Filter Prediction Outlier detection Decision support IE Document collection Database Discover patterns - entity types - links / relations - events Data Mining Spider Actionable knowledge

Segment Classify Associate Cluster Filter Prediction Outlier detection Decision support IE Document collection Database Discover patterns - entity types - links / relations - events Data Mining Spider Actionable knowledge Uncertainty Info Emerging Patterns Joint Inference

Segment Classify Associate Cluster Filter Prediction Outlier detection Decision support IE Document collection Probabilistic Model Discover patterns - entity types - links / relations - events Data Mining Spider Actionable knowledge Conditional Random Fields [Lafferty, McCallum, Pereira] Conditional PRMs [Koller…], [Jensen…], [Geetor…], [Domingos…] Discriminatively-trained undirected graphical models Complex Inference and Learning Just what we researchers like to sink our teeth into! Unified Model

Information Extraction Markov dependencies...and long-range & KB dependencies?

IE from Research Papers [McCallum et al kaelbling96reinforcement, author = "Leslie Pack Kaelbling and Michael L. Littman and Andrew P. Moore", title = "Reinforcement Learning: A Survey", journal = "Journal of Artificial Intelligence Research", volume = "4", pages = " ", year = "1996",

(Linear Chain) Conditional Random Fields y t-1 y t x t y t+1 x t +1 x t - 1 Finite state modelGraphical model Undirected graphical model, trained to maximize conditional probability of output sequence given input sequence... FSM states observations y t+2 x t +2 y t+3 x t +3 said Jones a Microsoft VP … OTHER PERSON OTHER ORG TITLE … output seq input seq Asian word segmentation [COLING’04], [ACL’04] IE from Research papers [HTL’04] Object classification in images [CVPR ‘04] Wide-spread interest, positive experimental results in many applications. Noun phrase, Named entity [HLT’03], [CoNLL’03] Protein structure prediction [ICML’04] IE from Bioinformatics text [Bioinformatics ‘04],… [Lafferty, McCallum, Pereira 2001] where

Entity Resolution Joint inference among all pairwise coref...models of entities, attributes, first-order...

Y/N Joint Co-reference Decisions, Discriminative Model Stuart Russell [Culotta & McCallum 2005] S. Russel People

Y/N Co-reference for Multiple Entity Types Stuart Russell University of California at Berkeley [Culotta & McCallum 2005] S. Russel Berkeley PeopleOrganizations

Y/N Joint Co-reference of Multiple Entity Types Stuart Russell University of California at Berkeley [Culotta & McCallum 2005] S. Russel Berkeley PeopleOrganizations Reduces error by 22%

Dean Martin Howard Dean Howard Martin SamePerson(Howard Dean, Howard Martin, Dean Martin)? First-Order Features  x 1,x 2 StringMatch(x 1,x 2 )  x 1,x 2 ¬StringMatch(x 1,x 2 )  x 1,x 2 EditDistance>.5(x 1,x 2 ) ThreeDistinctStrings(x 1,x 2, x 3 ) Toward High-Order Representations Identity Uncertainty

Structured Topic Models Discovering latent structure in jointly modeling words, time, relations...

Topical N-gram Model z1z1 z2z2 z3z3 z4z4 w1w1 w2w2 w3w3 w4w4 y1y1 y2y2 y3y3 y4y4  11 T D...  W T W  11 22  22 [Wang, McCallum 2005]

Finding Topics with TNG Traditional unigram LDA run on 1.6 million titles / abstracts (200 topics)...select ~300k papers on ML, NLP, robotics, vision... Find 200 TNG topics among those papers.

Topical Transfer Citation counts from one topic to another. Map “producers and consumers”

Trends in 17 years of NIPS proceedings

Topic Distributions Conditioned on Time time topic mass (in vertical height)

Topical Transfer Through Time Can we predict which research topics will be “hot” at ICML next year?...based on –the hot topics in “neighboring” venues last year –learned “neighborhood” distances for venue pairs

How do Ideas Progress Through Social Networks? COLT “ADA Boost” ICML ACL (NLP) ICCV (Vision) SIGIR (Info. Retrieval) Hypothetical Example:

How do Ideas Progress Through Social Networks? COLT “ADA Boost” ICML ACL (NLP) ICCV (Vision) SIGIR (Info. Retrieval) Hypothetical Example:

How do Ideas Progress Through Social Networks? COLT “ADA Boost” ICML ACL (NLP) ICCV (Vision) SIGIR (Info. Retrieval) Hypothetical Example:

How do Conferences Influence Each Other? Run an LDA on research papers. For each year, create an agglomerated topic distribution for a particular conference Model the topic distribution of a conference by the topic distributions of related conferences

Topic Prediction Models Static Model Transfer Model Linear Regression and Ridge Regression Used for Coefficient Training.

Preliminary Results Mean Squared Prediction Error # Venues used for prediction Transfer Model with Ridge Regression is a good Predictor (Smaller Is better) Transfer Model

Estimated Neighborhood Distances ML.079 Neural Computation.023 UAI PAMI.0998 Theoretical CS.0955 AI.032 AAAI.082 Transfer into NIPS,

Other Relational Opportunities Categorizing citations. Map transfer of ideas through science. Rank CS departments by various criteria. What 10 papers tell the story of ASR research? Predicting when a student will graduate. Help me find the right postdoc. Suggest best collaborative opportunities. Who should chair the next ICML?