MinorThird 서울시립대학교 인공지능연구실 곽별샘
MinorThird A collection of Java classes for Storing text Annotating text Learning to extract entities Categorize text
What's Different About MinorThird Differs from existing NLP and learning toolkits Combines tools for annotating and visualizing text with state-of-the art learning methods Contains methods to visualize Both training data and the performance of classifiers Facilitates debugging Integrated with text manipulation tools Possible to track and visualize the transformation of text data into machine learning data Architected to support active learning and on-line learning Should facilitate integration of learning methods into agents
Components TextBase A collection of documents TextLabels Logical assertions about documents in a TextBase A type of stand off annotation The annotation are completely independent of the text Assert a category or property for a word, a document, or a subsequence of words(span) by human labelers or by a learned program encode syntactic properties like shallow parser or POS tags semantic properties like the functional role that entities play in a sentence
Components Repository Annotated TextBases are accessed in a single uniform way. However, they are stored in one of several schemes. Repository can be configured to hold a bunch of TextLabels and their associated TextBases. Mixup (Minorthird Information eXtraction and Understanding Program) A special-purpose annotation language Moderately complex hand-coded annotation programs can be implemented with Mixup Based on the widely used notion of cascaded finite state transducers Includes some powerful features A GUI debugging environment Escape to Java A kind of subroutine call mechanism