Presentation is loading. Please wait.

Presentation is loading. Please wait.

Co-training Internal and External Extraction Models By Thomas Packer.

Similar presentations


Presentation on theme: "Co-training Internal and External Extraction Models By Thomas Packer."— Presentation transcript:

1 Co-training Internal and External Extraction Models By Thomas Packer

2 Bootstrapped Knowledge and Language Acquisition Tom Mitchell’s Co-training Theory – “Combining Labeled and Unlabeled Data with Co- training”, Avrim Blum and Tom Mitchell, 1998. Tom Mitchell’s Coupled Bootstrap Learning – “Coupling Semi-Supervised Learning of Categories and Relations”, Andrew Carlson, Justin Betteridge, Estevam Rafael Hruschka Jr. and Tom M. Mitchell, 2009. David Yarowsky – “Unsupervised Word Sense Disambiguation Rivaling Supervised Methods”, 1995.

3 Source Document Types Semi-structured, noisy OCR’d historical documents: – (This presentation.) Semi-structured, clean(-ish) HTML web pages: – Using multiple ontology constraints (Tom Mitchell, Andrew Carlson paper). – Adding the learning and utilizing of cardinality constraints.

4 OCR Documents

5 380,641,672,686 WOMEN'S 670,641,893,686 HOME 891,641,1316,685 MISSIONARY 1314,639,1622,686 SOCIETY 909,886,1091,931 Officers 192,969,450,1029 Presidenlt 1032,972,1077,1011 M 1086,986,1135,1011 RS 1142,972,1391,1019 CHARLES 1388,974,1464,1013 A 1475,973,1692,1026 JEWELL 207,1037,309,1077 Vice 308,1038,597,1077 Presidenl

6 OCR Documents WOMEN'S HOME MISSIONARY SOCIETY Officers PresidenltM RS CHARLES A JEWELL Vice Presidenl MRS FRANCIS B COOI EN MRS P W ELILSWVORT MRs HERBERT C ADSWVORTH MRS HENRY E TAINTOR MR DANIEl H WELLS MRS ARTHUR L GOODRICH Recording Secreta rvMiss JOSEPHINE WHITE Corresponding S retaryMss JULIA A GRAVES TreasurerMs H B LANGDON Chairman IWork ComtmitteeMiss MARY H ADAMS Chairman of 31emb 'ershipMiss ELIZA F Mix Chairman of Purch lasizng Cont'MRs MIARY C ST )NEC Chairman of Socia I ConnilLt'eMIRS AI I ERT H PITKIN Secretary's Report This Society is auxiliary to the Women's Home Missionary Union of Connecticut Its membership is 120 and its active season extends from November to April Meetings are held semi-monthly on Friday afternoons from 2 until 5 o'clock The time is occupied in sewing hearing letters from the home missionary field transacting business and in social intercourse often ending with tea

7 Co-trainable Extraction Models Internal Model: – Decision list. – Maps word to label with certain percentage confidence. – “James”  ‘Given Name’ 0.9 External Model: – Decision list. – Map collocation patterns to labels with certain percentage confidence. – Left token is ‘Given Name’, right token is ‘Surname’, current token has length=1  ‘Initial’ 0.95

8 Bootstrapping Approach 1.Initialize empty models (internal and external). 2.Manually create seed ontology, e.g. list of first names, last names, etc. 3.Process documents, extracting instances and features. 4.Loop: 1.Label words with top-precision labels based on current models. 2.Propose new model elements based on newly labeled tokens. 3.Update model parameters based on label statistics.

9 OCR Documents Seed models: – Prefix: “Mrs”, Miss”, “Mr” – Initials: “A”, “B”, “C”, … – Given Name: “Charles”, “Francis”, Herbert” – Surname: “Goodrich”, Wells”, White” – Stopword: “Jewell”, “Graves” Updates: – Prefix: first token in line – Given Name: between ‘Prefix’ and ‘Initial’ – Surname: between initial and M RS CHARLES A JEWELL MRS FRANCIS B COOI EN MRS P W ELILSWVORT MRs HERBERT C ADSWVORTH MRS HENRY E TAINTOR MR DANIEl H WELLS MRS ARTHUR L GOODRICH Miss JOSEPHINE WHITE Mss JULIA A GRAVES Ms H B LANGDON Miss MARY H ADAMS Miss ELIZA F Mix 'MRs MIARY C ST )NEC MIRS AI I ERT H PITKIN

10 Evaluation Measure and compare (trade-off): – Precision – Recall – Human time Compare bootstrapping to baselines: – Simple dictionary matching – Dictionary + hand-coded patterns (regular expressions matching labels) – Possibly combining evidence from multiple matching lines in the decision list (e.g. noisy-OR, naïve Bayes).

11 Questions


Download ppt "Co-training Internal and External Extraction Models By Thomas Packer."

Similar presentations


Ads by Google