Jianwei Lu1 Information Extraction from Event Announcements Student: Jianwei Lu ( ) Supervisor: Robert Dale
Jianwei Lu2 Agenda Project Introduction Event Information Extractor Conclusion
Jianwei Lu3 Background What is Information Extraction (IE)? Automated extraction of key information Populate a database What are the significances? Manage and search data efficiently Aim for other target applications FOR MORE INFO... [Cowie J and Wilks Y n,d]
Jianwei Lu4 The Outcomes Title URL
Jianwei Lu5 Sample Data Corpus 1 – 30 documents Corpus 2 – 100 documents Corpus 3 – 1,500 documents
Jianwei Lu6 Agenda Project Introduction Event Information Extractor Conclusion
Jianwei Lu7 My System Architecture
Jianwei Lu8 Text Zoning
Jianwei Lu9 URL Finding Rules Use pattern to capture URLs Approaches for finding an event URL 1. Search Summary zone 2. Search the whole document Results
Jianwei Lu10 Dates Finding Rules Use pattern to capture Dates Use clues to find corresponding date 1. submission-date < start-date <= end-date 2. no submission-date in a “Call for Participation” announcement 3. etc. Results
Jianwei Lu11 Locations Finding Rules Tokenise lines into words Use gazetteer to capture Locations Results
Jianwei Lu12 Title Finding Rules
Jianwei Lu13 Title Finding Rules (cont’d) Apply Machine Learning to classify title lines Refine title after classification Results
Jianwei Lu14 Current Performance
Jianwei Lu15 Agenda Project Introduction Event Information Extractor Conclusion
Jianwei Lu16 What I have Achieved Modules for Information Extraction URL Dates Locations Title Evaluation Framework
Jianwei Lu17 Limitations and Future Work Extension for refining titles Comparison for titles Comprehensive study on SVM tool and features used for machine learning
Jianwei Lu18 Implementation Details Python 2.6 Gazetteer from Support Vector Machine Natural Language Toolkit (NLTK)
Jianwei Lu19 Questions?