Data Mining and Text Analytics GATE, by Joel Bywater
Introduction Developed at University of Sheffield in 1995 Now used world wide by many users, such as scientists, teachers and companies for natural processing language tasks Currently handles 12 languages Written in Java ture_for_Text_Engineering
What is it? “General Architecture for Text Engineering (GATE) is a development environment for writing software that can process human- language text. In particular, GATE is used for computational language processing and text mining.” al-Architecture-for-Text-Engineering-GATE
Types GATE Teamware: A web-based management system for semi-automatic and manual annotation of text collections GATE Developer: Development environment providing tools for processing human language GATE Mimir: Concerned with storage in for the form of an index, used for search
GATE Graphical User Interface
Uses Scalable – Supports a wide range of potential tasks Currently able to annotate 12 languages Open source software Includes plug-ins, supporting applications such as WEKA Collectiveness – A set of tools all within one package
Drawbacks? Visual interface may be deemed to be complex by new users Doesn’t support annotation of all languages Frequent updates – User having to become familiar with new version, may be tedious
Summary So, what have we learned? - Introduction to GATE - The various types that are available - Seen an interface example - Positives and negatives