Download presentation
Presentation is loading. Please wait.
Published byLee Lang Modified over 9 years ago
1
Ontology-Aware Information Extraction http://gate.ac.uk/ Hamish Cunningham, Kalina Bontcheva Department of Computer Science, University of Sheffield OntoWeb 4, SIG 5, 2002
2
2(12) GATE, a General Architecture for Text Engineering GATE is…. An architecture A macro-level organisational picture for LE software systems. A framework For programmers, GATE is an object-oriented class library that implements the architecture. A development environment For language engineers, computational linguists et al, GATE is a graphical development environment bundled with a set of tools for doing e.g. Information Extraction. Free software (LGPL). Mature robust software (in development since 1995). Download at http://gate.ac.uk/download Comes with… Some free components......and wrappers for other people's components Tools for: evaluation; visualise/edit; persistence; IR; IE; dialogue; ontologies; etc.
3
3(12) Applications; languages GATE has been used for a variety of applications, including: MUMIS: automatic creation of semantic indexes for multimedia programme material MUSE: a multi-genre IE system EMILLE: a 70 million word corpus of Indic languages Metadata for Medline (at Merck) Creation of metadata for Semantic Web Services; documentation using NLG HSE: summarisation of health and safety information from company reports OldBaileyIE: NE recognition on 17th century Old Bailey Court reports. AKT: language technology in knowledge management AMITIES: call centre automation Digital libraries / e-philology for ancient languages researchers Various Medical Informatics and database technology projects IE in Romanian, Bulgarian, Greek, Bengali, Spanish, Swedish, German, Italian, and French (Arabic, Chinese and Russian next year)
4
4(12) Some users… At time of writing a representative fraction of GATE users includes: Longman Pearson publishing, UK; BT Exact Technologies, UK; Merck KgAa, Germany; Canon Europe, UK; Knight Ridder (the second biggest US news publisher); BBN Technologies, US; Sirma AI Ltd., Bulgaria; Resco AB, Sweden/Finland/Germany; Glaxo Smith Kline Plc: drug-based navigation of Medline abstracts Master Foods NV: extraction of commodities events from news the American National Corpus project, US; Imperial College, London, the University of Manchester, Queen Mary College, UMIST, the University of Karlsruhe, Vassar College, ISI / the University of Southern California and a large number of other UK, US and EU Universities; the Perseus Digital Library project, Tufts University, US.
5
5(12) Scientific method and HLT How do we really know that this stuff works?! Open source systems make experimental repeatability easier and therefore cut down on site- specific skew effects. GATE's IE tools have competed in MUC, TREC (QA), ACE, and DUC. TIDES Surprise Language exercise next year. GATE includes markup and automated evaluation tools: easier quantitative evaluation.
6
6(12) Collaboration opportunities Interoperation, integration, not re-invention: collaboration not competition Take the code, do what you like with it, perhaps contribute something back Involve us in your 6th Framework projects Join KITShare: a network of excellence in Knowledge and Interface Tool Sharing.
7
7(12) The Holy Grail Problem: gap between many current IE tools and SemWeb needs
8
8(12) What is needed? Content, not Information Extraction –Identify the ontological reference, not just the class –Maintain referential integrity (coreference) Ontology-aware IE tools –Use instances already in the ontology –React to changes in the ontology Support experienced users to change the IE tools
9
9(12) GATE and Content Extraction ANNIE - Open-source IE system in GATE, providing modules needed for content extraction Pre-processing Named entity recognition Coreference resolution –ANNIE handles proper names, pronouns, and nominals Easy-to-use pattern-action rule language to enable customisation and postprocessing of the IE results
10
10(12) Populating Ontologies with ANNIE
11
11(12) Ontologies as explicit IE resources Reuse, not reinvention: –Protégé for ontology maintenance –Sesame/KAON for storage and reasoning Ontology-aware gazetteers –Provide the ontological class of each entry –Use instances from the ontology for IE
12
12(12) Ontology-aware IE The IE tools can use available formal knowledge and reasoning Ontology-based anaphora resolution –G. Bush, G. Brown, the president The correct ontological classes are assigned to the recognised entities Changes in the ontology available to the IE tools
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.