Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de A.

Slides:



Advertisements
Similar presentations
Accelerating The Application Lifecycle. DEPLOY DEFINE DESIGN TEST DEVELOP CHANGE MANAGEMENT Application Lifecycle Management #1 in Java Meta, Giga, Gartner.
Advertisements

Track, View, Manage and Report on all aspects of the Recruitment Process… with ease!
Experiences with UIMA in NLP teaching and research Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
ClearEye: An Visualization System for Document Revision CPSC 533C Project Update Qiang Kong Qixing Zheng.
1 CBioC: Collaborative Bio- Curation Chitta Baral Department of Computer Science and Engineering Arizona State University.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
The Data Mining Visual Environment Motivation Major problems with existing DM systems They are based on non-extensible frameworks. They provide a non-uniform.
Text Analytics on UIMA and UIMA Semantic Search Engine ISM209 David Lewis Student Project Presentation
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de A.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Use Case Modelling Visual Annotator for studying ICU Notes Bacchus Beale.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de A.
1 CMPT 275 Software Engineering Requirements Analysis Process Janice Regan,
NON-FUNCTIONAL PROPERTIES IN SOFTWARE PRODUCT LINES: A FRAMEWORK FOR DEVELOPING QUALITY-CENTRIC SOFTWARE PRODUCTS May Mahdi Noorian
Framework for Model Creation and Generation of Representations DDI Lifecycle Moving Forward.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
* Faculty of Electrical Engineering, Instituto Superior Politécnico José A Echeverría, Marianao, La Habana. Cuba +InfoAsset AG, Munich. Germany # Informatics.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
C8: Enterprise Integration Patterns in Sonic ™ ESB Stefano Picozzi Solutions Architect.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Design.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Master.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Master’s.
Copyright © 2012 Accenture All Rights Reserved.Copyright © 2012 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
1 The Architectural Design of FRUIT: A Family of Retargetable User Interface Tools Yi Liu, H. Conrad Cunningham and Hui Xiong Computer & Information Science.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
Office Live Workspace Visio 2007 Outlook 2007 Groove 2007 Access 2007 Excel 2007 Word 2007.
Experiences with UIMA from a User’s Perspective Dietmar Rösner, Manuela Kunze, Hany Mahgoub University of Magdeburg C Knowledge Based Systems and Document.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Master’s.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Data-Parallel.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Design.
1 Peter Fox Xinformatics 4400/6400 Week 11, April 16, 2013 Information Audit and dealing with Unstructured Information.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Design.
Microsoft Project 2010 ® Tutorial 6: Sharing Project Information with Other People & Applications.
Košice, 10 February Experience Management based on Text Notes The EMBET System Michal Laclavik.
CORPORUM-OntoExtract Ontology Extraction Tool Author: Robert Engels Company: CognIT a.s.
An Approach To Automate a Process of Detecting Unauthorised Accesses M. Chmielewski, A. Gowdiak, N. Meyer, T. Ostwald, M. Stroiński
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST Kick-off.
ARCH-2: UML From Design to Implementation using UML Frank Beusenberg Senior Technical Consultant.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. m3pe LIGHT An Extensible Multi-Meta-Model Workflow Execution.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Making Watson Fast Daniel Brown HON111. Need for Watson to be fast to play Jeopardy successfully – All computations have to be done in a few seconds –
Combining GATE and UIMA Ian Roberts. University of Sheffield NLP 2 Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE.
Using Domain Ontologies to Improve Information Retrieval in Scientific Publications Engineering Informatics Lab at Stanford.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
MedKAT Medical Knowledge Analysis Tool December 2009.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Toward an Open Source Textual Entailment Platform (Excitement Project) Bernardo Magnini (on behalf of the Excitement consortium) 1 STS workshop, NYC March.
Reviews Crawler (Detection, Extraction & Analysis) FOSS Practicum By: Syed Ahmed & Rakhi Gupta April 28, 2010.
Personalized Recommendation of Related Content Based on Automatic Metadata Extraction Andreas Nauerz 1, Fedor Bakalov 2, Birgitta.
Cross Language Clone Analysis Team 2 February 3, 2011.
Markus Müller Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de A.
Institute for Information Problems of the Russian academy of Sciences and its linguistic research Olga Kozhunova CML-2008, Becici, 6-13 September.
Christian Stiller Technical Account Manager SOA-23: Enterprise Integration Patterns in Sonic ™ ESB.
Combining GATE and UIMA Ian Roberts. 2 Overview Introduction to UIMA Comparison with GATE Mapping annotations between GATE and UIMA.
OpenACS and.LRN Conference 2008 Automatic Limited-Choice and Completion Test Creation, Assessment and Feedback in modern Learning Processes Institute for.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Title:
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Factors.
Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de Design.
Avoiding Redundancy in the Management of Technical Documentation and Models: Requirements Analysis and Prototypical Implementation for Enterprise Architecture.
GATE and the Semantic Web
Web Engineering.
Guided Research: Intelligent Contextual Task Support for Mails
Combining GATE and UIMA
EDDI Copenhagen (Denmark)
DBOS DecisionBrain Optimization Server
Presentation transcript:

Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de A web based Workbench for Interactive Semantic Text Analysis: Design and Prototypical Implementation Tobias Waltl TUM

Overview 1.Introduction 2.Research Questions 3.Requirements 4.Existing Architectures for NLP Applications 5.Architecture & Implementation 6.Live Demo 7.Conclusion & Outlook © sebis Tobias Waltl - Master's Thesis 2

Problem Very huge and fast growing amount of legal literature [2,4] 2005 – 2009: 616 passed laws 2009 – 2013: 553 passed laws Judgements Commentaries … © sebis Tobias Waltl - Master's Thesis 3 A system that Deals with legal literature, Semantically analyzes and annotates it, Provides a workbench for exploring and filtering the literature and its annotations Progress in NLP technologies IBM Watson UIMA (Unstructured Information Management Architecture) GATE (General Architecture for Text Engineering) … Introduction Motivation

© sebis Tobias Waltl - Master's Thesis 4 Introduction Related work – GATE Developer [1]

© sebis Tobias Waltl - Master's Thesis 5 Introduction Related work – Argo [3]

What are requirements for a software architecture to support semantic analysis of legal literature? What are common software architectures based on these requirements that support semantic analysis in web applications? How does a prototypical integration of an architecture enabling semantic analysis on legal literature look like? Research Questions © sebis Tobias Waltl - Master's Thesis 6

Technical requirements from literature review The workbench should be a web application The system’s architecture should foster reuse of components It should be easy to integrate and to interchange foreign components The system’s text mining engine should support parallel processing of NLP tasks Requirements (excerpt) © sebis Tobias Waltl - Master's Thesis 7 Functional requirements from expert interviews The workbench shall annotate legal definitions The workbench shall annotate exceptions of legal norms The workbench shall provide linguistic information

Other approaches Whiteboard architecture Talisman TalLab Heart of Gold TIPSTER-based TIPSTER Ellogon LIMA GATE UIMA Existing Architectures for NLP Applications © sebis Tobias Waltl - Master's Thesis 8

Modular architecture Combination of analysis engines (AE) forms a pipeline AE communicate with the CAS AE specify inputs / outputs UIMA © sebis Tobias Waltl - Master's Thesis 9 Annotations Strongly typed annotations Standoff annotations in a Common Analysis Structure (CAS) Implementations Frameworks for Java and C++ Rule engine for regex-like pattern matching over annotations

Rule-based language for pattern matching over annotations Powerful tool for functional requirements Example: UIMA - Ruta © sebis Tobias Waltl - Master's Thesis 10

Architecture & Implementation © sebis Tobias Waltl - Master's Thesis 11

Live Demo © sebis Tobias Waltl - Master's Thesis 12

Conclusion Current implementation serves as fundament for further features and can easily be extended All nonfunctional requirements fulfilled Also some functional requirements fulfilled Outlook Development of further patterns Editable texts Different kinds of literature: Judgements Commentaries Contracts … Conclusion & Outlook © sebis Tobias Waltl - Master's Thesis 13

Technische Universität München Department of Informatics Chair of Software Engineering for Business Information Systems Boltzmannstraße Garching bei München wwwmatthes.in.tum.de Tobias Waltl B.Sc. Thank you for your attention!

Screenshot of the app used for interviews © sebis Tobias Waltl - Master's Thesis 15

Nonfunctional requirements © sebis Tobias Waltl - Master's Thesis 16 The workbench should be a web application The system’s architecture should foster reuse of components The system’s text mining engine should incorporate a common type system for the created annotations The system’s text mining engine should comply with a standardized data format for data exchange between its components It should be easy to integrate and to interchange foreign components The system’s text mining engine should support parallel processing of NLP tasks

Functional requirements © sebis Tobias Waltl - Master's Thesis 17 The workbench should support adding, removing, and editing of annotations The workbench should support the persistence of annotations It shall be possible to fold sections of the displayed text It shall be possible to leave own comments in the documents It shall be possible to set bookmarks in the documents It shall be possible to edit the texts It shall be possible that multiple users work on the same document and track their changes The workbench shall feature a comparison of documents and their different versions The workbench shall be able to import documents with different formats The workbench shall allow for exporting documents in different formats The workbench shall provide information about incoming and outgoing references The workbench shall annotate legal definitions The workbench shall annotate exceptions of legal norms The workbench shall annotate legal consequences The workbench shall provide linguistic information

Architecture Requirement TIPSTEREllogonLIMA Whiteboard Architecture TALISMANTalLab Heart of Gold GATEUIMA Web application Reuse of components Type system Common data format Integration and interchangeability Parallel processing Framework Assessment of architectures against nonfunctional requirements © sebis Tobias Waltl - Master's Thesis 18

Why typed annotations? Typed annotations © sebis Tobias Waltl - Master's Thesis 19

Architecture & Implementation © sebis Tobias Waltl - Master's Thesis 20

Architecture & Implementation © sebis Tobias Waltl - Master's Thesis 21

Example pipeline © sebis Tobias Waltl - Master's Thesis 22

References [1] Cunningham, H., Tablan, V., Roberts, A., & Bontcheva, K. (2013). Getting More Out of Biomedical Documents with GATE’s Full Lifecycle Open Source Text Analytics. PLoS Comput Biol, 9(2), e doi: /journal.pcbi [2] Deutscher Bundestag (2014): Deutscher Bundestag – Neue Ausgabe des Datenhandbuchs zur Geschichte des Deutschen Bundestages. Retrieved from (last access on ) [3] Rak, R., Rowley, A., Black, W., & Ananiadou, S. (2012). Argo: an integrative, interactive, text mining-based workbench supporting curation. Database, doi: /database/bas010 [4] Walter, S. (2010). Definitionsextraktion aus Urteilstexten (Dissertation). Universität des Saarlandes, Saarbrücken, Germany. Retrieved from (last access on ) © sebis Tobias Waltl - Master's Thesis 23