Presentation is loading. Please wait.

Presentation is loading. Please wait.

Email Processing and Recommendation Michal Laclavík, Ladislav Hluchý, Martin Šeleng (Email research, information extraction, information retrieval, contextual.

Similar presentations


Presentation on theme: "Email Processing and Recommendation Michal Laclavík, Ladislav Hluchý, Martin Šeleng (Email research, information extraction, information retrieval, contextual."— Presentation transcript:

1 Email Processing and Recommendation Michal Laclavík, Ladislav Hluchý, Martin Šeleng (Email research, information extraction, information retrieval, contextual recommendation)

2 Abstract In this presentation we give overview to our research focusing on text processing and recommendation. We focus on information and knowledge hidden in email communication in organizational or enterprise context. We exploit simple information extraction techniques based on patterns and gazetteers to deliver semantic or semi formal understanding of text (email) content and context. Context is used for recommendation. We have developed proof –of-concept prototypes of email based recommendation and search based on key-value pairs (named entities) extracted from text (emails), based on hierarchical trees build from recognized entities. In addition we exploit social networks hidden in email archives. Vienna, 14th October 20102IRF-TUWIEN Doctoral Seminar

3 Vienna, 14th October 20103 Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN and Grid applications –Intelligent and Knowledge oriented Technologies Experience from European IST projects: –3 project in FP5: ANFAS, CrosGRID, Pellucid –6 project in FP6: EGEE II, K-Wf Grid, DEGREE (coordinator), EGEE, int.eu.grid, MEDIGRID –4 projects in FP7: Commius, Admire, EGEE III, Secricom Several National Projects (SPVV, VEGA, APVT) IKT Group Focus: –Information Processing –Semantic Web –Knowledge oriented Technologies –Parallel and Distributed Information Processing Solutions: –Ontea: Pattern-based Semantic Annotation –ACoMA: KM tool in Email –EMBET: Recommendation System Director & leader of PDC: Dr. Dipl. Ing. Ladislav Hluchý URL: http://ikt.ui.sav.sk IRF-TUWIEN Doctoral Seminar

4 Ontea: Pattern based information extraction and semantic annotation Text processing

5 Ontea: Information Extraction (Features)  Regex patterns  Visual Annotation Tool  Integration with external tools  GATE, Stemers, Hadoop …  Gazetteers  IE System configuration  Automatic loading of extractors  Patterns  Multilingual tests Spanish Slovak English Italian Vienna, 14th October 20105IRF-TUWIEN Doctoral Seminar

6 Information Extraction Model Address and product patterns Extraction Processing 3 words macro ZIP macro Street number macro Street name macro City name macro Country macro Address patterns Vienna, 14th October 20106IRF-TUWIEN Doctoral Seminar

7 Segmentation Sentences Paragraphs Objects (Address, Product..) Vienna, 14th October 20107IRF-TUWIEN Doctoral Seminar

8 Gazetteer Can extract information, which cannot be properly extracted by regular expression patterns (like given names, product names, etc.) Gazetteer extraction approach is combined with regular expressions based extrac- tion. For example personal full names can be extracted with higher precision. Gazetteer is easy to update, because it is configured by simple text files. Information Extraction: Gazetteers configuration Vienna, 14th October 20108 Gazetteer lists simple text files with keywords Gazetteer configuration simple text file with : Information extractor rules IRF-TUWIEN Doctoral Seminar

9 Information Extraction: Rules configuration IE System configuration –IE dynamically loads and run its components (XMLRegexExtractor, Gazetteer, RuleTransformer) according to setting in IE rules file –IE Components are executing consecutively and operate on a set of information extraction results Vienna, 14th October 20109 Information extractor rules file IE result set Modified IE result set IE component Regex based IE component Gazetteer IE component Result set transformer IE component IRF-TUWIEN Doctoral Seminar

10 Semantic Annotation Vienna, 14th October 201010 Object trees Tree of IE results Set of IE results Text/Email Theconcept The concept  InformationExtractor - IE produces a set of extraction results  SemanticAnnotator - SA consumes the IE result set and builds a trees convertible to Ontology instances or objects according to XML schema e.g. Core Components SA first builds an intermediate tree of IE results on which it operates SA first builds an intermediate tree of IE results on which it operates The tree is upon its creation not compliant to Core Components specification and needs to be transformed The tree is upon its creation not compliant to Core Components specification and needs to be transformed Therefore we have tree transformers which transform the IE result tree to a trees Therefore we have tree transformers which transform the IE result tree to a trees IRF-TUWIEN Doctoral Seminar

11 Semantic Annotation Tree transformers –Input is a tree of IE results and output is the modified tree of IE results –Tree transformers are executing consecutively and operate on a tree of information extraction results –Tree transformers, which delete, create, rename, move, switch and order nodes are configured in the SA rules file Vienna, 14th October 201011 Modified tree of IE results Tree of IE results Tree transformer IRF-TUWIEN Doctoral Seminar

12 Social Networks Social network reconstruction:  probabilistic inference using spreading activation  relies on the output of the information extractor (IE) in the form of complex objects Vienna, 14th October 201012 Preliminary results on a set of 50 Spanish emails (phone/name):  Precision 60% (due to lower recall in IE)  Precision 85% (achievable with better IE)  self-healing (with new incoming emails) IRF-TUWIEN Doctoral Seminar

13 Social Networks Vienna, 14th October 201013 Results as XML or HTML: (via XSL Transformations) Future:  DataSource for Search for Partner module  Improve the recall of Information Extractor  Exploit multi-pass algorithm and named entity recognition: things learned in the first pass will be used in the next, e.g. possible names with initials, etc.  Build an enhanced statistical reasoning procedure on top of the present Social Network Extractor/Correlator IRF-TUWIEN Doctoral Seminar

14 Email Research Acoma

15 Vienna, 14th October 201015 Acoma Architecture Connected to email protocols on desktop or server No need to change working practices –Emails are received and send as before Received email is processed by Acoma and enriched with useful information Extensible with OSGi modules IRF-TUWIEN Doctoral Seminar

16 Vienna, 14th October 201016 System Connectors Connection of Acoma to existing systems –Document Archives –Internet or Intranet Systems –Databases Access or import of data Key-value pair transformation Meta-Connector Web Connector SpreadSheet Connector Database Connector Key-value Transformed Key-value IRF-TUWIEN Doctoral Seminar

17 Vienna, 14th October 201017 Acoma architecture : Message Post Processing Useful hints with links are included in enriched email Links lead to internal or external systems (Internet, Intranet) IRF-TUWIEN Doctoral Seminar

18 Vienna, 14th October 201018 Business objects in Emails Study on 6 organizations show: –Objects can be identified by patterns and gazeteers –It is possible to define set of common objects Objects identified: –Organization: org:Name, org:RegNo, org:TaxNo –Person: person:Name, person:Function –Contact: contact:Phone, contact:Email, contact:Webpage –Address: address:ZIP, address:Street, address:Settlement –Product: product:Name, product:Module, product:Component, product:BOID –Document: doc:Invoice, doc:Order, doc:Contract, doc:ChangeRequest –Inventory: inventory:ResID, inventory:ResType –Other business object ID: BOID IRF-TUWIEN Doctoral Seminar

19 Social Networks and Graph Data Relations among objects Support for search Vienna, 14th October 201019IRF-TUWIEN Doctoral Seminar

20 Use of Social Network from email Includes extracted objects Full text of extracted objects Related objects discovered and ordered by spread activation on social network graph Faceted search, navigation Email Search Prototype Vienna, 14th October 201020IRF-TUWIEN Doctoral Seminar

21 Context based Recommendation, Knowledge Sharing EMBET, Acoma

22 22 Objective: Recommend and provide user information or knowledge in context EMBET: proactive information and knowledge provision Collaboration among users Knowledge sharing Active knowledge provision Reuse of knowledge: notes and other resources http://ups.savba.sk/kwfgrid/uaa/ Vienna, 14th October 2010IRF-TUWIEN Doctoral Seminar

23 23 EMBET: Achievements Software with following functionality –User Problem description –Displaying Knowledge –Adding Knowledge –Knowledge Reuse –Permanent Notes Storage –Voting on Notes EMBET architecture: Core, GUI Context detection Context Matching to display information & knowledge Plain text analysis using Advanced Semantic Annotation Algorithms – OnTeA Theory of different context matching algorithms Vienna, 14th October 2010IRF-TUWIEN Doctoral Seminar

24 Vienna, 14th October 201024 Acoma: Hint Recommendation IRF-TUWIEN Doctoral Seminar

25 Information Retrieval and Information Extraction lectures

26 IR Lectures Introduction to Information Retrieval Text Operations, Text Analysis, stemming Crawling, link processing IR Models, Indexing techniques IR Software libraries and systems Ranking by Graph Algorithms (PageRank, HITS, …) and Searching Information Extraction Regular Expressions Large Scale Data Processing on MapReduce Architecture Multimedia Information Retrieval Evaluation Techniques, Precision, Recall Google Semantics and IR, Semantic Web Standards 26Vienna, 14th October 2010IRF-TUWIEN Doctoral Seminar

27 Lectures conditions Every students gets project focused on –Crawling –Indexing –Ranking –Information Extraction –Large Scale information Processing They have to consult project 3 times during semester Availability of data from day one Lectures are available at: –http://vi.ikt.ui.sav.sk/Témy_prednášokhttp://vi.ikt.ui.sav.sk/Témy_prednášok 27Vienna, 14th October 2010IRF-TUWIEN Doctoral Seminar


Download ppt "Email Processing and Recommendation Michal Laclavík, Ladislav Hluchý, Martin Šeleng (Email research, information extraction, information retrieval, contextual."

Similar presentations


Ads by Google