Presentation is loading. Please wait.

Presentation is loading. Please wait.

11 November 20111 Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.

Similar presentations


Presentation on theme: "11 November 20111 Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid."— Presentation transcript:

1 11 November 20111 Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid and MapReduce applications –Intelligent and Knowledge oriented Technologies Experience from IST: –3 project in FP5: ANFAS, CrosGRID, Pellucid –6 project in FP6: EGEE II, K-Wf Grid, DEGREE (coordinator), EGEE, int.eu.grid, MEDIGRID –4 projects in FP7: Commius, Admire, Secricom, EGEE III Several National Projects (SPVV, VEGA, APVT) IKT Group Focus: –Information Processing (Large Scale) –Graph Processing –Information Extraction and Retrieval –Semantic Web –Knowledge oriented Technologies –Parallel and Distributed Information Processing Solutions: –SGDB: Simple Graph Database –gSemSearch: Graph based Semantic Search –Ontea: Pattern-based Semantic Annotation –ACoMA: KM tool in Email –EMBET: Recommendation System –Experts on MapReduce and IR (Nutch, Solr, Lucene) Director & leader of PDC: Dr. Ladislav Hluchý URL: http://ikt.ui.sav.skhttp://ikt.ui.sav.sk

2 Approach and Solutions

3 Large scale Text and Graph data processing Core Technology Web crawling –Nutch + plugins Full text indexing and search –lucene, Sorl Information Extraction –Ontea, GATE All above large scale –Hadoop, S4 Graph processing and Querying –Simple Graph Database (SGDB) –gSemSearch –Neo4j –Blueprints 11 November 20113 Underlined are the technologies developed by IISAS

4 Ontea: Information Extraction Tool  Regex patterns  Gazetteers  Resuls  Key-value pairs  Structured into trees  graphs  Transformers, Configuration  Automatic loading of extractors  Visual Annotation Tool  Integration with external tools  GATE, Stemers, Hadoop …  Multilingual tests English, Slovak, Spanish, Italian 11 November 20114 http://ontea.sf.net

5 Use of Social Network from email Includes extracted objects Full text of extracted objects Related objects discovered and ordered by spread activation on social network graph Faceted search, navigation Email Search Prototype 11 November 20115

6 gSemSearch: Graph based Semantic Search Graph/Network of interacting (interconnected) entities Discovering relation in the Graph (network) using spread of activation algorithm Showing relations of concrete type, e.g. telephone numbers related to a person Navigation over related entities Full-text search of the entities User interface for search User interaction with data (merging, deleting entities) with immediate impact on discovered relations Tested on Email Enron Corpus –Email Social Network Search –http://ikt.ui.sav.sk/esns/http://ikt.ui.sav.sk/esns/ 11 November 20116

7 SGDB: Simple Graph Database Storage for graphs Optimized for graph traversing and spread of activation Faster then Neo4j for graph traversing operations Supports Blueprints API https://simplegdb.svn.sourceforge.net/svnroot/simplegdb/Sgdb3 Graph Database Benchmarks –Graph Traversal Benchmark for Graph Databases –http://ups.savba.sk/~marek/gbench.htmlhttp://ups.savba.sk/~marek/gbench.html –Blueprints API - possibility to test compliant Graph databases 11 November 20117

8 Future Direction: Relations Discovery in Large Graph Data Motivation –Graph/Network data are everywhere: social networks, web, LinkedData, transactions, communication (email, phone). –Also text can be converted to graph. –Interconnecting graph data and searching for relations is crucial. Approach –Forming semantic trees and graphs from text, web, communication, databases and LinkedData –User interaction with graph data in order to achieve integration and data cleansing –Users will do it, if user effort have immediate impact on search results 11 November 20118


Download ppt "11 November 20111 Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid."

Similar presentations


Ads by Google