Presentation is loading. Please wait.

Presentation is loading. Please wait.

Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de A.

Similar presentations


Presentation on theme: "Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de A."— Presentation transcript:

1 Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de A web based Workbench for Interactive Semantic Text Analysis: Design and Prototypical Implementation Tobias Waltl 19.10.2015 TUM

2 Overview 1.Introduction 2.Research Questions 3.Requirements 4.Existing Architectures for NLP Applications 5.Architecture & Implementation 6.Live Demo 7.Conclusion & Outlook © sebis 19.10.2015 Tobias Waltl - Master's Thesis 2

3 Problem Very huge and fast growing amount of legal literature [2,4] 2005 – 2009: 616 passed laws 2009 – 2013: 553 passed laws Judgements Commentaries … © sebis 19.10.2015 Tobias Waltl - Master's Thesis 3 A system that Deals with legal literature, Semantically analyzes and annotates it, Provides a workbench for exploring and filtering the literature and its annotations Progress in NLP technologies IBM Watson UIMA (Unstructured Information Management Architecture) GATE (General Architecture for Text Engineering) … Introduction Motivation

4 © sebis 19.10.2015 Tobias Waltl - Master's Thesis 4 Introduction Related work – GATE Developer [1]

5 © sebis 19.10.2015 Tobias Waltl - Master's Thesis 5 Introduction Related work – Argo [3]

6 What are requirements for a software architecture to support semantic analysis of legal literature? What are common software architectures based on these requirements that support semantic analysis in web applications? How does a prototypical integration of an architecture enabling semantic analysis on legal literature look like? Research Questions © sebis 19.10.2015 Tobias Waltl - Master's Thesis 6

7 Technical requirements from literature review The workbench should be a web application The system’s architecture should foster reuse of components It should be easy to integrate and to interchange foreign components The system’s text mining engine should support parallel processing of NLP tasks Requirements (excerpt) © sebis 19.10.2015 Tobias Waltl - Master's Thesis 7 Functional requirements from expert interviews The workbench shall annotate legal definitions The workbench shall annotate exceptions of legal norms The workbench shall provide linguistic information

8 Other approaches Whiteboard architecture Talisman TalLab Heart of Gold TIPSTER-based TIPSTER Ellogon LIMA GATE UIMA Existing Architectures for NLP Applications © sebis 19.10.2015 Tobias Waltl - Master's Thesis 8

9 Modular architecture Combination of analysis engines (AE) forms a pipeline AE communicate with the CAS AE specify inputs / outputs UIMA © sebis 19.10.2015 Tobias Waltl - Master's Thesis 9 Annotations Strongly typed annotations Standoff annotations in a Common Analysis Structure (CAS) Implementations Frameworks for Java and C++ Rule engine for regex-like pattern matching over annotations

10 Rule-based language for pattern matching over annotations Powerful tool for functional requirements Example: UIMA - Ruta © sebis 19.10.2015 Tobias Waltl - Master's Thesis 10

11 Architecture & Implementation © sebis 19.10.2015 Tobias Waltl - Master's Thesis 11

12 Live Demo © sebis 19.10.2015 Tobias Waltl - Master's Thesis 12

13 Conclusion Current implementation serves as fundament for further features and can easily be extended All nonfunctional requirements fulfilled Also some functional requirements fulfilled Outlook Development of further patterns Editable texts Different kinds of literature: Judgements Commentaries Contracts … Conclusion & Outlook © sebis 19.10.2015 Tobias Waltl - Master's Thesis 13

14 Technische Universität München Department of Informatics Chair of Software Engineering for Business Information Systems Boltzmannstraße 3 85748 Garching bei München wwwmatthes.in.tum.de Tobias Waltl B.Sc. waltlt@in.tum.de Thank you for your attention!

15 Screenshot of the app used for interviews © sebis 19.10.2015 Tobias Waltl - Master's Thesis 15

16 Nonfunctional requirements © sebis 19.10.2015 Tobias Waltl - Master's Thesis 16 The workbench should be a web application The system’s architecture should foster reuse of components The system’s text mining engine should incorporate a common type system for the created annotations The system’s text mining engine should comply with a standardized data format for data exchange between its components It should be easy to integrate and to interchange foreign components The system’s text mining engine should support parallel processing of NLP tasks

17 Functional requirements © sebis 19.10.2015 Tobias Waltl - Master's Thesis 17 The workbench should support adding, removing, and editing of annotations The workbench should support the persistence of annotations It shall be possible to fold sections of the displayed text It shall be possible to leave own comments in the documents It shall be possible to set bookmarks in the documents It shall be possible to edit the texts It shall be possible that multiple users work on the same document and track their changes The workbench shall feature a comparison of documents and their different versions The workbench shall be able to import documents with different formats The workbench shall allow for exporting documents in different formats The workbench shall provide information about incoming and outgoing references The workbench shall annotate legal definitions The workbench shall annotate exceptions of legal norms The workbench shall annotate legal consequences The workbench shall provide linguistic information

18 Architecture Requirement TIPSTEREllogonLIMA Whiteboard Architecture TALISMANTalLab Heart of Gold GATEUIMA Web application-----++0+ Reuse of components -+00--00+ Type system+-----0-+ Common data format +++---+++ Integration and interchangeability 0+-0--0++ Parallel processing ---++++-+ Framework-++---+++ Assessment of architectures against nonfunctional requirements © sebis 19.10.2015 Tobias Waltl - Master's Thesis 18

19 Why typed annotations? Typed annotations © sebis 19.10.2015 Tobias Waltl - Master's Thesis 19

20 Architecture & Implementation © sebis 19.10.2015 Tobias Waltl - Master's Thesis 20

21 Architecture & Implementation © sebis 19.10.2015 Tobias Waltl - Master's Thesis 21

22 Example pipeline © sebis 19.10.2015 Tobias Waltl - Master's Thesis 22

23 References [1] Cunningham, H., Tablan, V., Roberts, A., & Bontcheva, K. (2013). Getting More Out of Biomedical Documents with GATE’s Full Lifecycle Open Source Text Analytics. PLoS Comput Biol, 9(2), e1002854. doi:10.1371/journal.pcbi.1002854 [2] Deutscher Bundestag (2014): Deutscher Bundestag – Neue Ausgabe des Datenhandbuchs zur Geschichte des Deutschen Bundestages. Retrieved from https://www.bundestag.de/dokumente/datenhandbuch/10 (last access on 31.05.2015)https://www.bundestag.de/dokumente/datenhandbuch/10 [3] Rak, R., Rowley, A., Black, W., & Ananiadou, S. (2012). Argo: an integrative, interactive, text mining-based workbench supporting curation. Database, 2012. doi:10.1093/database/bas010 [4] Walter, S. (2010). Definitionsextraktion aus Urteilstexten (Dissertation). Universität des Saarlandes, Saarbrücken, Germany. Retrieved from http://www.coli.uni-saarland.de/bib/files/DissertationStephanWalter.pdf (last access on 20.07.2015)http://www.coli.uni-saarland.de/bib/files/DissertationStephanWalter.pdf © sebis 19.10.2015 Tobias Waltl - Master's Thesis 23


Download ppt "Software Engineering for Business Information Systems (sebis) Department of Informatics Technische Universität München, Germany wwwmatthes.in.tum.de A."

Similar presentations


Ads by Google