UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts

Outline CC-Prediction –Learning in the wild from user email usage DEX –Learning in the wild from user correction... as well as KB records filled by other CALO components Rexa –Learning in the wild from user corrections to coreference... propagating constraints in a Markov- Logic-like system that scales to ~20 million objects Several new topic models –Discover interesting useful structure without the need for supervision... learning from newly arrived data on the fly

CC Prediction Using Various Exponential Family Factor Graphs Learning to keep an org. connected & avoid stove-piping. First steps toward ad-hoc team creation. Learning in the wild from user’s CC behavior, and from other parts of the CALO ontology.

Graphical Models for Email xbxb y NbNb xsxs NsNs xrxr N r-1 Body Subject Other Words Words Recipients Recipient of Email NrNr Compute P(y|x) for CC prediction - function - random variable - N replications N Local functions facilitate system engineering through modularity Email Model: Nb words in the body, Ns words in the subject, Nr recipients The graph describes the joint distribution of random variables in term of the product of local functions

Document Models xbxb y NbNb xsxs NsNs xrxr N a-1 Title Abstract Body Co-authors References Author of Document NaNa Models may relational attributes xtxt xbxb NtNt NrNr We can optimize P(y|x) for classification performance and P(x|y) for model interpretability and parameter transfer (to other models)

CC Prediction and Relational Attributes xbxb y NbNb xsxs NsNs xrxr N r-1 Thread Body Subject Other Relation Relation Words Words Recipients Target Recipient NrNr x r’ x tr Thread Relations – e.g. Was a given recipient ever included on this email thread? Recipient Relationships – e.g. Does one of the other recipients report to the target recipient? N tr

CC-Prediction Learning in the Wild As documents are added to Rexa, models of expertise for authors grows As DEX obtains more contact information and keywords, organizational relations emerge Model parameters can be adapted on-line Priors on parameters can be used to transfer learned information between models New relations can be added on-line Modular model construction and intelligent model optimization enable these goals

CC Prediction Upcoming work on Multi-Conditional Learning A discriminatively-trained topic model, discovering low-dimensional representations for transfer learning and improved regularization & generalization.

Objective Functions for Parameter Estimation Traditional, joint training (e.g. naive Bayes, most topic models) Traditional, conditional training (e.g. MaxEnt classifiers, CRFs) Conditional mixtures (e.g. Jebara’s CEM, McCallum CRF string edit distance,...) Multi-conditional (mostly conditional, generative regularization) Multi-conditional (for semi-sup) Multi-conditional (for transfer learning, 2 tasks, shared hiddens) Traditional New, multi-conditional Traditional mixture model (e.g. LDA)

“Multi-Conditional Learning” (Regularization) [McCallum, Pal, Wang, 2006]

Multi-Conditional Mixtures

Predictive Random Fields mixture of Gaussians on synthetic data Data, classify by colorGeneratively trained Conditionally-trained [Jebara 1998] Multi-Conditional [McCallum, Wang, Pal, 2005]

Multi-Conditional Mixtures vs. Harmoniun on document retrieval task Harmonium, joint with words, no labels Harmonium, joint, with class labels and words Conditionally-trained, to predict class labels Multi-Conditional, multi-way conditionally trained [McCallum, Wang, Pal, 2005]

DEX Beginning with a review of previous work, then new work on record extraction, with the ability to leverage new KBs in the wild, and for transfer

System Overview Contact Info and Person Name Extraction Person Name Extraction Name Coreference Homepage Retrieval Social Network Analysis Keyword Extraction CRF WWW names Email

An Example To: “Andrew McCallum” mccallum@cs.umass.edu Subject... First Name: Andrew Middle Name: Kachites Last Name: McCallum JobTitle:Associate Professor Company:University of Massachusetts Street Address: 140 Governor’s Dr. City:Amherst State:MA Zip:01003 Company Phone: (413) 545-1323 Links:Fernando Pereira, Sam Roweis,… Key Words: Information extraction, social network,… Search for new people

Summary of Results Token Acc Field Prec Field Recall Field F1 CRF94.5085.7376.3380.76 PersonKeywords William CohenLogic programming Text categorization Data integration Rule learning Daphne KollerBayesian networks Relational models Probabilistic models Hidden variables Deborah McGuiness Semantic web Description logics Knowledge representation Ontologies Tom MitchellMachine learning Cognitive states Learning apprentice Artificial intelligence Contact info and name extraction performance (25 fields) Example keywords extracted 1.Expert Finding: When solving some task, find friends-of-friends with relevant expertise. Avoid “stove-piping” in large org’s by automatically suggesting collaborators. Given a task, automatically suggest the right team for the job. (Hiring aid!) 2.Social Network Analysis: Understand the social structure of your organization. Suggest structural changes for improved efficiency.

Information about –people –contact information –email –affiliation –job title –expertise –... are key to answering many CALO questions... both directly, and as supporting inputs to higher-level questions. Importance of accurate DEX fields in IRIS

Learning Field Compatibilities in DEX Professor Jane Smith University of California 209-555-5555 Professor Smith chairs the Computer Science Department. She hails from Boston, …her administrative assistant … John Doe Administrative Assistant University of California 209-444-4444 Name: Jane Smith, John Doe JobTitle: Professor, Administrative Assistant Company: U of California Department: Computer Science Phone: 209-555-5555, 209-444-4444 City: Boston Extracted Record Jane SmithUniversity of California 209-555-5555 Computer Science Boston John Doe Administrative Assistant University of California 209-444-4444 Professor -.5 -.4 -.6.4.8.4 -.5 Compatibility Graph

Learning Field Compatibilities in DEX Professor Jane Smith University of California 209-555-5555 Professor Smith chairs the Computer Science Department. She hails from Boston, …her administrative assistant … John Doe Administrative Assistant University of California 209-444-4444 Name: Jane Smith, John Doe JobTitle: Professor, Administrative Assistant Company: U of California Department: Computer Science Phone: 209-555-5555, 209-444-4444 City: Boston Extracted Record Jane SmithUniversity of California 209-555-5555Computer Science Boston John Doe Administrative Assistant University of California 209-444-4444 Professor

~35% error reduction over transitive closure Qualitatively better than heuristic approach Mine Knowledge Bases from other parts of IRIS for learning compatibility rules among fields –“Professor” job title co-occurs with “University” company –Area code / city compatibility –“Senator” job title co-occurs with “Washington, D.C” location In the wild –As the user adds new fields & make corrections, DEX learns from this KB data Transfer learning –between departments/industries Learning Field Compatibilities in DEX

Rexa A knowledge base of publications, grants, people, their expertise, topics, and inter-connections Learning for information extraction and coreference. Incrementally leveraging multiple sources of information for improved coreference Gathering information about people’s expertise and co- author, citation relations First a tour of Rexa, then slides about learning

Previous Systems

Research Paper Cites Previous Systems

Research Paper Cites Person UniversityVenue Grant Groups Expertise More Entities and Relations

Learning in Rexa Extraction, coreference In the wild: Re-adjusting KB after corrections from a user Also, learning research topics/expertise, and their interconnections

(Linear Chain) Conditional Random Fields y t-1 y t x t y t+1 x t +1 x t - 1 Finite state modelGraphical model Undirected graphical model, trained to maximize conditional probability of output sequence given input sequence... FSM states observations y t+2 x t +2 y t+3 x t +3 said Jones a Microsoft VP … where OTHER PERSON OTHER ORG TITLE … output seq input seq Asian word segmentation [COLING’04], [ACL’04] IE from Research papers [HTL’04] Object classification in images [CVPR ‘04] Wide-spread interest, positive experimental results in many applications. Noun phrase, Named entity [HLT’03], [CoNLL’03] Protein structure prediction [ICML’04] IE from Bioinformatics text [Bioinformatics ‘04],… [Lafferty, McCallum, Pereira 2001] (500 citations)

IE from Research Papers [McCallum et al ‘99]

IE from Research Papers Field-level F1 Hidden Markov Models (HMMs)75.6 [Seymore, McCallum, Rosenfeld, 1999] Support Vector Machines (SVMs)89.7 [Han, Giles, et al, 2003] Conditional Random Fields (CRFs)93.9 [Peng, McCallum, 2004]  error 40% (Word-level accuracy is >99%)

p Database field values c Joint segmentation and co-reference o s o s c c s o Citation attributes y y y Segmentation [Wellner, McCallum, Peng, Hay, UAI 2004] Inference: Variant of Iterated Conditional Modes Co-reference decisions Laurel, B. Interface Agents: Metaphors with Character, in The Art of Human-Computer Interface Design, B. Laurel (ed), Addison- Wesley, 1990. Brenda Laurel. Interface Agents: Metaphors with Character, in Laurel, The Art of Human-Computer Interface Design, 355-366, 1990. [Besag, 1986] World Knowledge 35% reduction in co-reference error by using segmentation uncertainty. 6-14% reduction in segmentation error by using co-reference. Extraction from and matching of research paper citations. see also [Marthi, Milch, Russell, 2003]

Rexa Learning in the Wild from User Feedback Coreference will never be perfect. Rexa allows users to enter corrections to coreference decisions Rexa then uses this feedback to –re-consider other inter-related parts of the KB –automatically make further error corrections by propagating constraints (Our coreference system uses underlying ideas very much like Markov Logic, and scales to ~20 million mention objects.)

Finding Topics in 1 million CS papers 200 topics & keywords automatically discovered.

Topical Transfer Citation counts from one topic to another. Map “producers and consumers”

Topical Diversity Find the topics that are cited by many other topics ---measuring diversity of impact. Entropy of the topic distribution among papers that cite this paper (this topic). Low Diversity High Diversity

Some New Work on Topic Models Robustly capturing topic correlations Pachkinko Allocation Model Capturing phrases in topic-specific ways Topical N-Gram Model

Pachinko Machine

Pachinko Allocation Model [Li, McCallum, 2005] Model structure, not the graphical model Distributions over words (like “LDA topics”) Distributions over topics; mixtures, representing topic correlations Distributions over distributions over topics... Some interior nodes could contain one multinomial, used for all documents. (i.e. a very peaked Dirichlet)  22  31  33  41  42  43  44  45  32 word 1 word 2 word 3 word 4 word 5 word 6 word 7 word 8  21  11

Topic Coherence Comparison LDA 100 estimation likelihood maximum noisy estimates mixture scene surface normalization generated measurements surfaces estimating estimated iterative combined figure divisive sequence ideal LDA 20 models model parameters distribution bayesian probability estimation data gaussian methods likelihood em mixture show approach paper density framework approximation markov Example super-topic 33 input hidden units function number 27 estimation bayesian parameters data methods 24 distribution gaussian markov likelihood mixture 11 exact kalman full conditional deterministic 1 smoothing predictive regularizers intermediate slope “models, estimation, stopwords” “estimation, some junk” PAM 100 estimation bayesian parameters data methods estimate maximum probabilistic distributions noise variable variables noisy inference variance entropy models framework statistical estimating “estimation”

Topic Correlations in PAM 5000 research paper abstracts, from across all CS Numbers on edges are supertopics’ Dirichlet parameters

Likelihood Comparison Varying number of topics

Want to Model Trends over Time Is prevalence of topic growing or waning? Pattern appears only briefly –Capture its statistics in focused way –Don’t confuse it with patterns elsewhere in time How do roles, groups, influence shift over time?

Topics over Time (TOT)  wt  NdNd z D  T  T Beta over time Multinomial over words   Dirichlet multinomial over topics topic index word time stamp Dirichlet prior Uniform prior  w t NdNd z D  T Multinomial over words  time stamp multinomial over topics topic index word Dirichlet prior  distribution on time stamps  T Beta over time  Uniform prior [Wang, McCallum 2006]

State of the Union Address 208 Addresses delivered between January 8, 1790 and January 29, 2002. To increase the number of documents, we split the addresses into paragraphs and treated them as ‘documents’. One-line paragraphs were excluded. Stopping was applied. 17156 ‘documents’ 21534 words 669,425 tokens Our scheme of taxation, by means of which this needless surplus is taken from the people and put into the public Treasury, consists of a tariff or duty levied upon importations from abroad and internal-revenue taxes levied upon the consumption of tobacco and spirituous and malt liquors. It must be conceded that none of the things subjected to internal-revenue taxation are, strictly speaking, necessaries. There appears to be no just complaint of this taxation by the consumers of these articles, and there seems to be nothing so well able to bear the burden without hardship to any portion of the people. 1910

Comparing TOT against LDA

Topic Distributions Conditioned on Time time topic mass (in vertical height) NIPS vol 1-14

UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Similar presentations

Presentation on theme: "UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.

Similar presentations

Presentation on theme: "UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts."— Presentation transcript:

Similar presentations

About project

Feedback