Download presentation
Presentation is loading. Please wait.
Published byLaurel Long Modified over 9 years ago
1
Practical Project of the 2006 Joint International Master’s Degree
2
Agenda Introduction Technologies in use Architecture Demonstration Remaining Issues Work packages for Semester II Questions & Comments
3
Introduction Practical project during the course of studies Timeframe: two terms Topic: Prototype of a semantic search engine using UIMA Objectives of the first semester Study the UIMA-Framework and OpenNLP library Search for players, teams, matches and dates Semantic search for goal events Implement an executable prototype
4
Technologies in Use UIMA-Framework OpenNLP Java / Java Server Pages Tomcat-Server Python (Webcrawler)
5
Architecture Overview
6
Architecture Webcrawler Usage of web crawler for preselection of Texts Implemented in Python Crawls ca. 2500 pages in 20 minutes Presently based on keywords Transfer of results to Jimgle still manual
7
Architecture NLP-Annotator Usage of the OpenNLP-Tools & API Rule based approach Tagging of paragraphs, sentences and words Part-of-Speech-Tagging Implementation in UIMA as separate annotator Results are used by consecutive annotators Internal usage only, not displayed in the search index
8
Architecture Identification of players of the WM2006 Rule based implementation Usage of the OpenNLP word-annotations Matching against the player database (XML- File) Consideration of last names and nicknames Player-Annotator
9
Architecture Date & Time-Annotator Identification of time and date information Usage of the OpenNLP word-annotations Presently custom, rule based implementation Detecs standard conform time & date information Detection of relative or colloquial time information not implemented yet
10
Architecture Match-Annotator Identification of matches Based on 3 components Detection of locality Detection of participating teams Detection of the match result Usage of upstream annotators OpenNLP word-annotations Player annotations Date- & time-annotations
11
Architecture Goal-Event Annotator Description of goals are too complex for a rule- based detection Therefore: Machine based learning Usage of the OpenNLP library Based on statistical information of sentences Comprehensive training necessary Implementation as OpenNLP component Integration into UIMA by wrapper-classes
12
Architecture Persistent Indexing Functionality Import of all files in a specific directory Annotation of all available texts Compilation of XML-Files with CAS-data of every source text Adjacent creation of a search index Provision of index files for the web-server
13
Architecture Graphical User Interface Linux server with tomcat installation Simple operation via web-based GUI Search queries are handled by Java server pages Processing of requests by Java beans
14
Demonstration Search engine
15
Open Issues Further proceeding…? Search for attributes e.g. Player AND Germany (presently only via OmniFind) Automate processing of search engine results Further training of the components Usage improvements at front- and backend
16
New scenarios… …for the second semester Automated analysis of eMails Search for phone numbers Search for customer contacts of employee Find employees with specific skills Find links & relations between employees Competitive analysis Compare own products with ones from competitors Find out about customer opinions in internet portals Further ideas??
17
Ideas… …for the second semester Natural language based search queries Design templates for customizable annotators Machine based learning for the Web-Crawler Mark annotations in the search results Automated processing of search results Implement more anotators via OpenNLP Provide annotators as web-services Further ideas??
18
JIMGLE JIM Master-Project Questions? Suggestions?
19
JIMGLE JIM Master-Project Thanks for your attention…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.