Download presentation
Presentation is loading. Please wait.
Published byOsborn Stone Modified over 9 years ago
1
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania
2
Sponsorship Natural Language Engineering: Integrating Parallel and Parametric Processing Victorian Partnership for Advanced Computing Expertise Grant EPPNME092.2003
3
NLE Application Areas Information Extraction Information Retrieval Authoring Tools Language Analysis Language Understanding Knowledge Representation Knowledge Discovery Spoken Language Input Written Language Input Natural Language Generation Spoken Output Multilinguality Multimodality Discourse and Dialogue Spoken dialogue systems Cross-language information retrieval Word-sense disambiguation Multi-document summarisation Natural language database interfaces
4
Some NLE Applications in detail Information extraction from broadcast news Tokenization, alignment, entity detection, coreference resolution, semantic mapping Spoken language dialogue systems (SLDS) Speech recognition, parsing, user modelling, discourse management, generation, synthesis Language analysis Interlinear text annotation, lexicon development, morphosyntactic grammar development
5
Meta Activities Discovery What tools work with data in format X? What lexical resources exist for language Y? Reuse Diverse implementation frameworks Component integration, wrapping, etc Training and evaluation Parametric and parallel processing Comparing systems running on the same data Gold standard vs theory comparison Analyzing interaction logs
6
Learn about NLE This department hosts a mirror of the ACL digital anthology 50k pages, 40 years http://www.cs.mu.oz.au/acl/
7
SLDS Architecture
8
SLDS Components
9
Another SLDS Architecture
10
Observations Common components, different arrangements Multiple components for doing the same task Most NLE components convert between information types Parser: from strings to trees ASR: from speech to text Summariser: from text to selected text But: Many processes benefit from other information sources (e.g. exploiting intonation in input) Input and output can be aligned Solution: multilayer annotations
11
Multilayer annotations
12
Multilayer Annotations
13
Annotation Graphs Labelled digraphs with timestamped nodes
14
Annotation Graphs: complex example AGTK: Annotation Graph Toolkit library, applications agtk.sourceforge.net
15
NLE and Grids NLE Applications typically constructed out of numerous components each component responsible for a specialised task executed against large data sets To use grids in NLE: subscribe to a model which allows automated discovery of data and components flexible design of applications, coordination of execution, storage of results Ideally: view grid as a commodity, hidden from application developers
16
Architectural Components Data Language resources for analysis E.g. Switchboard, 2400 annotated telephone conversations (26 CDs) Software Components minimal individual functional units e.g. Annotation Server, Alignment, ASR, Data Source Packaging, Format Conversion, Text Annotation, Lexicon Server, Semantic Mapping common interface specification Metadata Repositories Dublin Core Application Profile for NLE resources Application data + components + processing instructions declarative specification in XML Grid Service computational and storage resources for application execution
17
Architecture
18
Conclusion Natural Language Engineering interesting test case for grid services many mature component technologies applications that are both data and processor intensive applications for building the multilingual information society of the future...
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.