Download presentation
Presentation is loading. Please wait.
Published byAmice Hood Modified over 9 years ago
1
Natural Language Processing for Underground Communications Dan Klein MURI Kickoff, 11/20/2009
2
Underground Communications Example Data
3
Underground Communications Example Data, Manual Extraction
4
Processing: Information Extraction
5
Observation Graphs http://www.spam-reklama.ru/contact.html http://www.rossmail.ru/offline.htm http://www.fax- reklama.ru/contact.html http://www.f-mail.ru/kontact/
6
Underlying Entities and Relations Person 1211 Alias: Steakcap ICQ: 598199837 Location: France Referral From: Person 2133 To: Person 1211 Product: 3319 Person 2133 Alias: Thunderelvi ICQ: 787659871 Location: USA Product 3319 Type: FB Harvester Contact: 709-324-0989 Person 9876 Alias: Zakar ICQ: 234150301 Email: zakar@e-... Employee Person: Person 9876 Product: 5621 Role: Developer Product 5621 Type: Spam Sender Contact: 495-210-4423 Extraction Goal
7
Existing NLP Tasks
8
Discourse Structure signdelivervote
9
General Approach
14
An Entity Reference Model Our Existing Approach
17
Adding Semantic Knowledge America Online company Our Current Work
18
Evaluation: Reference MUC F 1 - Cluster Similarity Unsupervised Supervised Unsupervised Baseline Bengston & Roth 08 Preliminary Current Work Does it Work?
19
Cross-Document Identity What’s Coming Up
20
Extracting Global Entities
21
Underlying Entities and Relations Person 1211 Alias: Steakcap ICQ: 598199837 Location: France Referral From: Person 2133 To: Person 1211 Product: 3319 Person 2133 Alias: Thunderelvi ICQ: 787659871 Location: USA Product 3319 Type: FB Harvester Contact: 709-324-0989 Person 9876 Alias: Zakar ICQ: 234150301 Email: zakar@e-... Employee Person: Person 9876 Product: 5621 Role: Developer Product 5621 Type: Spam Sender Contact: 495-210-4423 Subsequent Goals
22
Summary Goal: systems which simultaneously extract and dedupe Train in an unsupervised / discovery manner Requires: both new statistical machinery and good models of underlying domain structure (transactions, etc) Requires: processing domain-specific language (domain adaptation, grammar induction) Evaluation: are the entities and relations correct? First steps: measure general approach on newswire, etc. where we know the right answers Also: evaluate on underground network data Near term: increased accuracy in identity resolution, begin to extract simple relations, better basic analysis
23
Thanks!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.