Presentation is loading. Please wait.

Presentation is loading. Please wait.

Natural Language Processing for Underground Communications Dan Klein MURI Kickoff, 11/20/2009.

Similar presentations


Presentation on theme: "Natural Language Processing for Underground Communications Dan Klein MURI Kickoff, 11/20/2009."— Presentation transcript:

1 Natural Language Processing for Underground Communications Dan Klein MURI Kickoff, 11/20/2009

2 Underground Communications Example Data

3 Underground Communications Example Data, Manual Extraction

4 Processing: Information Extraction

5 Observation Graphs http://www.spam-reklama.ru/contact.html http://www.rossmail.ru/offline.htm http://www.fax- reklama.ru/contact.html http://www.f-mail.ru/kontact/

6 Underlying Entities and Relations Person 1211 Alias: Steakcap ICQ: 598199837 Location: France Referral From: Person 2133 To: Person 1211 Product: 3319 Person 2133 Alias: Thunderelvi ICQ: 787659871 Location: USA Product 3319 Type: FB Harvester Contact: 709-324-0989 Person 9876 Alias: Zakar ICQ: 234150301 Email: zakar@e-... Employee Person: Person 9876 Product: 5621 Role: Developer Product 5621 Type: Spam Sender Contact: 495-210-4423 Extraction Goal

7 Existing NLP Tasks

8 Discourse Structure signdelivervote

9 General Approach

10

11

12

13

14 An Entity Reference Model Our Existing Approach

15

16

17 Adding Semantic Knowledge America Online company Our Current Work

18 Evaluation: Reference MUC F 1 - Cluster Similarity Unsupervised Supervised Unsupervised Baseline Bengston & Roth 08 Preliminary Current Work Does it Work?

19 Cross-Document Identity What’s Coming Up

20 Extracting Global Entities

21 Underlying Entities and Relations Person 1211 Alias: Steakcap ICQ: 598199837 Location: France Referral From: Person 2133 To: Person 1211 Product: 3319 Person 2133 Alias: Thunderelvi ICQ: 787659871 Location: USA Product 3319 Type: FB Harvester Contact: 709-324-0989 Person 9876 Alias: Zakar ICQ: 234150301 Email: zakar@e-... Employee Person: Person 9876 Product: 5621 Role: Developer Product 5621 Type: Spam Sender Contact: 495-210-4423 Subsequent Goals

22 Summary  Goal: systems which simultaneously extract and dedupe  Train in an unsupervised / discovery manner  Requires: both new statistical machinery and good models of underlying domain structure (transactions, etc)  Requires: processing domain-specific language (domain adaptation, grammar induction)  Evaluation: are the entities and relations correct?  First steps: measure general approach on newswire, etc. where we know the right answers  Also: evaluate on underground network data  Near term: increased accuracy in identity resolution, begin to extract simple relations, better basic analysis

23 Thanks!


Download ppt "Natural Language Processing for Underground Communications Dan Klein MURI Kickoff, 11/20/2009."

Similar presentations


Ads by Google