Julius Information Extractor

Julius Information Extractor
June 14, 2006 Kyle Woodward Lee-Ming Zen

The Problem There is a lot of text and information out there, but not a whole lot of tagging. How can we extract information a user is interested in without knowing anything beforehand?

Approach Based upon AT&T system Additional features Web
Build up “spelling” and “context” rules Iteratively learn new rules by labeling and examining labels by jumping from one set of rules to the other Additional features We used a fixed length prefix and suffix to augment the context Substituted POS instead of a full grammar parse for context Window bounds selection to determine tag size Web Use information from web search snippets

Rules Rules are a set of features for a particular labeling with weights for each feature e.g. allcap, contains, full-string, etc.

What’s Cool Generality GUI tools Works
No restrictions on the type of data it runs against No preassumed notions about the domain GUI tools Labeler Statistics viewer Works Works well on small data sets

What’s Not Fails at larger corpora
Generality tradeoff means not being able to exploit certain information Web context does not necessarily help due to noise

Julius Information Extractor

Similar presentations

Presentation on theme: "Julius Information Extractor"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Julius Information Extractor

Similar presentations

Presentation on theme: "Julius Information Extractor"— Presentation transcript:

Similar presentations

About project

Feedback