Presentation is loading. Please wait.

Presentation is loading. Please wait.

GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles]

Similar presentations


Presentation on theme: "GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles]"— Presentation transcript:

1 GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles] http://gate.ac.uk/http://gate.ac.uk/ http://nlp.shef.ac.uk/http://nlp.shef.ac.uk/ Hamish Cunningham Kalina Bontcheva Yorick Wilks Southampton, January 2004 1.New GATE-related projects 2.Current state of the system 3.Future plans

2 2(12) New Projects SEKT: €9m IP with BT, AIFB, JSI, Empolis, SAI, OntoPrise, ISOCO, UB, Kea-Pro PrestoSpace – €9m IP with BBC, RAI, ORF, INA,...: preservation of audio-visual media KnowledgeWeb – NoE successor to OntoWeb ETCSL – GATE for humanities scholars hTechSight – petrochem tech oversight SWAN – large-scale semantic annotation

3 3(12) Human Language Formal Knowledge (ontologies and instance bases) (MI)IE CLIE (M)NLG Controlled Language OBIE Semantic Web; Semantic Grid; Semantic Web Services KEY MNLG: Multilingual Natural Language Generation OBIE: Ontology-Based Information Extraction (MI)IE: Mixed-Intiative IE CLIE: Controlled Language IE SEKT: large-scale DM + robust HLT for NGKM

4 4(12) SEKT: Evaluating Semantic Tagging Need for new metrics when evaluating hierarchy/ontology-based NE tagging Need to take into account distance in the hierarchy Tagging a company as a charity is less wrong than tagging it as a person Several SEKT-related initiatives (w/s at ECAI; Pascal network)

5 5(12) PrestoSpace Cultural Heritage / Digital Libraries IP BBC, RAI, ORF, INA, B&G, USFD, and 23 others (!) 20 th Century Rot: rapid disappearance of audio- visual media Preservation and digitisation is high cost Therefore we need rich metadata and semantic access Little training data, open domain: FSTs for users Follows MUMIS and other projects Evaluation: TRECVID, OBIE

6 6(12) GATE Status (version 2½) Stable core since end 2002 Increasing numbers of users (next slide) Increasing numbers of languages (most recently: Chinese, Arabic, Russian, German system from DotKom) Increasing numbers of 3 rd party components (e.g. Medline and UMLS work, OBIE/KIM, QA, summarisation,...) Embedded in KM applications

7 7(12) A bit of a nuisance (GATE users) GATE team projects. Past: MUMIS: semantic index of sports video MUSE, cross-genre entitiy finder HSL, Health-and-safety IE Old Bailey: collaboration with HRI on 17th century court reports Multiflora: plant taxonomy text analysis for biodiversity research e-science EMILLE: S. Asian languages corpus ACE / TIDES: Arabic, Chinese NE Present: Advanced Knowledge Technologies SEKT: next-generation KM PrestoSpace: audiovisual preservation) KnowledgeWeb: semantic web network h-TechSight: technology oversight ETCSL: Sumerian language corpus SWAN: Semantic Web Annotator MiAKT: medical informatics KM Thousands of users at hundreds of sites (based on survey of 4,700 downloaders). A representative sample: the American National Corpus project the Perseus Digital Library project, Tufts University, US Greenstone digital library, NZ Longman Pearson publishing, UK Merck KgAa, Germany Canon Europe, UK Knight Ridder, US BBN (leading HLT research lab), US SMEs inc. Sirma AI Ltd., Bulgaria Imperial College, London, the University of Manchester, UMIST, Vassar College, the University of Southern California and a large number of other UK, US and EU Universities UK and EU projects inc.MyGrid, CLEF, DotKom, AMITIES, Cub Reporter, EMILLE, Poesia...

8 8(12) Some new stuff Johns Hopkins w/s on Semantic Annotation: BNC-based corpus, ME expts WEKA 2 release (JSI library integration soon) papers: RANLP, ISWC, Journal of Digital Libraries, Journal of Data and Knowledge Eng. JWS editorial board; co-editor JNLE special RANLP IE tutorial, tutorial on HLT/SW at ESWS HLT/SW evaluation workshop at ECAI OBIE in Multiflora, hTechsight SW NLG in MiAKT (below)

9 9(12) MIAKT – NLG for SW RDF input from image annotation GUI......generated text MIAKT has important productivity and accuracy implications

10 10(12) hTechSight tech oversight Ontology-Based IE (OBIE) for semantic tagging of job adverts, news and reports in chemical engineering domain Aim is to track technological change over time Centred around domain-specific ontology Terminological gazetteer lists are linked to classes in the ontology Rules classify the mentions in the text wrt. the domain ontology Annotations output to DB or RDF

11 OBIE in MultiFlora 2 Combining Information Extraction and Knowledge Representation for Biodiversity Informatics BBSRC project led by Mary McGee Wood, U. Mcr. Varying plant taxa Merged RDF

12 12(12) GATE 4: the Final Conflict (GATE 3 release happening soonish) Continuity guaranteed for AKT phase 2 (€2 million GATE-related work 2004-2007) Some future elements: –more and better OBIE, inc. cross-doc co-reference –pluggable OWL repository support (now only Sesame; soon 3Store, KAON) –large- and huge-scale processing –standardisation of the component integration model (ECLIPSE) –service-based integration (“SDK” SW API) This talk: http://gate.ac.uk/sale/talks/akt-jan04.ppt http://gate.ac.uk/sale/talks/akt-jan04.ppt What else? You tell us...


Download ppt "GATE: an AKT success story [GATE: open source language technology component architecture and many tools, with a number of AKT roles]"

Similar presentations


Ads by Google