11 November 20111 Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Benchmarking traversal operations over graph databases Marek Ciglan 1, Alex Averbuch 2 and Ladialav Hluchý 1 1 Institute of Informatics, Slovak Academy.
Nokia Technology Institute Natural Partner for Innovation.
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
SEVENPRO – STREP KEG seminar, Prague, 8/November/2007 © SEVENPRO Consortium SEVENPRO – Semantic Virtual Engineering Environment for Product.
Image Information Retrieval Shaw-Ming Yang IST 497E 12/05/02.
June 22-23, 2005 Technology Infusion Team Committee1 High Performance Parallel Lucene search (for an OAI federation) K. Maly, and M. Zubair Department.
Search Engines and Information Retrieval
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Integration and Insight Aren’t Simple Enough Laura Haas IBM Distinguished Engineer Director, Computer Science Almaden Research Center.
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Knowledge Portals and Knowledge Management Tools
Evaluations and recommendations for a user support toolkit Christine Cahoon George Munroe.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Databases & Data Warehouses Chapter 3 Database Processing.
Search Search Drupal with Apache Solr with CERN Web Communications Group – Copyright 2013.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
11 October Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.
Enron s as Graph Data Corpus for Large-scale Graph Querying Experimentation Michal Laclavík, Martin Šeleng, Marek Ciglan, Ladislav Hluchý.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
Search Engines and Information Retrieval Chapter 1.
Ontology and Agent based Approach for Knowledge Management
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Multimedia Databases (MMDB)
Survey of Semantic Annotation Platforms
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
Processing and Recommendation Michal Laclavík, Ladislav Hluchý, Martin Šeleng ( research, information extraction, information retrieval, contextual.
Information processing Michal Laclavík, Ladislav Hluchý ( research, information extraction, information retrieval, contextual recommendation)
Master Thesis Defense Jan Fiedler 04/17/98
Institute of Informatics, Slovak Academy of Sciences Michal Laclavík Ladislav Hluchý.
Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.
FI-CORE Data Context Media Management Chapter Release 4.1 & Sprint Review.
Košice, 10 February Experience Management based on Text Notes The EMBET System Michal Laclavik.
Semantic Technologies & GATE NSWI Jan Dědek.
Session 4e, 24 October 2007 eChallenges e-2007 Copyright 2007 Institute of Informatics, SAS Network Enterprise Interoperability and Collaboration using.
STASIS Technical Innovations - Simplifying e-Business Collaboration by providing a Semantic Mapping Platform - Dr. Sven Abels - TIE -
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
ICCS 2008, CracowJune 23-25, Towards Large Scale Semantic Annotation Built on MapReduce Architecture Michal Laclavík, Martin Šeleng, Ladislav Hluchý.
Ontea: Pattern based Annotation Platform Michal Laclavík.
CNI, 3rd April 2006 Slide 1 UK National Centre for Text Mining: Activities and Plans Dr. Robert Sanderson Dept. of Computer Science University of Liverpool.
Workshop 12g, 26 October 2007 eChallenges e-2007 Copyright 2007 Commius consortium Commius: ISU via Michal Laclavík Institute of Informatics, Slovak.
Session 10a, 21st October 2005 eChallenges e-2005 Copyright 2005 K-Wf Grid, Institute of Informatics SAS Experience Management based on Text Notes (EMBET)
Lightweight Semantic Approach for Enterprise Search and Interoperability Michal Laclavík, Štefan Dlugolinský, Martin Šeleng, Marek Ciglan, Martin Tomašek,
C. Lee Giles David Reese Professor, College of Information Sciences and Technology Graduate Professor of Computer Science and Engineering Courtesy Professor.
7th May Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.
INFSO-RI NA2 meeting in Karlsruhe - FZK, 1-2 June 2006 Enabling Grids for E-sciencE - II EGEE-II NA2 Activities Miroslav Dobrucký.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
ELISQ Systems Demonstration Sagnik Ray Choudhury Doha -- May 2015.
WIKTBratislava, 28. november Semantic Organization/Enterprise Vision Michal Laclavik, Ladislav Hluchy, Marian Babik, Zoltan Balogh, Ivana Budinska,
MarkLogic The Only Enterprise NoSQL Database Presented by: Aashi Rastogi ( ) Sanket Patel ( )
Viet Tran Institute of Informatics, SAS Slovakia.
Information Organization
Modern Data Management
Chair of Tech Committee, BetterGrids.org
Restrict Range of Data Collection for Topic Trend Detection
Topics Covered in COSC 6340 Data models (ER, Relational, XML (short))
Data Warehousing and Data Mining
Defining Data-intensive computing
Topics Covered in COSC 6340 Data models (ER, Relational, XML)
CSE 635 Multimedia Information Retrieval
Database Systems Summary and Overview
AI Discovery Template IBM Cloud Architecture Center
Primary Research Team & Capabilities
Presentation transcript:

11 November Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid and MapReduce applications –Intelligent and Knowledge oriented Technologies Experience from IST: –3 project in FP5: ANFAS, CrosGRID, Pellucid –6 project in FP6: EGEE II, K-Wf Grid, DEGREE (coordinator), EGEE, int.eu.grid, MEDIGRID –4 projects in FP7: Commius, Admire, Secricom, EGEE III Several National Projects (SPVV, VEGA, APVT) IKT Group Focus: –Information Processing (Large Scale) –Graph Processing –Information Extraction and Retrieval –Semantic Web –Knowledge oriented Technologies –Parallel and Distributed Information Processing Solutions: –SGDB: Simple Graph Database –gSemSearch: Graph based Semantic Search –Ontea: Pattern-based Semantic Annotation –ACoMA: KM tool in –EMBET: Recommendation System –Experts on MapReduce and IR (Nutch, Solr, Lucene) Director & leader of PDC: Dr. Ladislav Hluchý URL:

Approach and Solutions

Large scale Text and Graph data processing Core Technology Web crawling –Nutch + plugins Full text indexing and search –lucene, Sorl Information Extraction –Ontea, GATE All above large scale –Hadoop, S4 Graph processing and Querying –Simple Graph Database (SGDB) –gSemSearch –Neo4j –Blueprints 11 November Underlined are the technologies developed by IISAS

Ontea: Information Extraction Tool  Regex patterns  Gazetteers  Resuls  Key-value pairs  Structured into trees  graphs  Transformers, Configuration  Automatic loading of extractors  Visual Annotation Tool  Integration with external tools  GATE, Stemers, Hadoop …  Multilingual tests English, Slovak, Spanish, Italian 11 November

Use of Social Network from Includes extracted objects Full text of extracted objects Related objects discovered and ordered by spread activation on social network graph Faceted search, navigation Search Prototype 11 November 20115

gSemSearch: Graph based Semantic Search Graph/Network of interacting (interconnected) entities Discovering relation in the Graph (network) using spread of activation algorithm Showing relations of concrete type, e.g. telephone numbers related to a person Navigation over related entities Full-text search of the entities User interface for search User interaction with data (merging, deleting entities) with immediate impact on discovered relations Tested on Enron Corpus – Social Network Search – 11 November 20116

SGDB: Simple Graph Database Storage for graphs Optimized for graph traversing and spread of activation Faster then Neo4j for graph traversing operations Supports Blueprints API Graph Database Benchmarks –Graph Traversal Benchmark for Graph Databases – –Blueprints API - possibility to test compliant Graph databases 11 November 20117

Future Direction: Relations Discovery in Large Graph Data Motivation –Graph/Network data are everywhere: social networks, web, LinkedData, transactions, communication ( , phone). –Also text can be converted to graph. –Interconnecting graph data and searching for relations is crucial. Approach –Forming semantic trees and graphs from text, web, communication, databases and LinkedData –User interaction with graph data in order to achieve integration and data cleansing –Users will do it, if user effort have immediate impact on search results 11 November 20118