11 October 20131 Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.

Slides:



Advertisements
Similar presentations
Leveraging Commercial Graph DB Technologies in Open Source and Polyglot Application Environments Brian Clark, VP Product Management Objectivity, Inc.
Advertisements

Benchmarking traversal operations over graph databases Marek Ciglan 1, Alex Averbuch 2 and Ladialav Hluchý 1 1 Institute of Informatics, Slovak Academy.
Nokia Technology Institute Natural Partner for Innovation.
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
BigData Tools Seyyed mohammad Razavi. Outline  Introduction  Hbase  Cassandra  Spark  Acumulo  Blur  MongoDB  Hive  Giraph  Pig.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Data warehouse example
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Business Intelligence System September 2013 BI.
Libraries and Institutional Content Management Systems
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Databases & Data Warehouses Chapter 3 Database Processing.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
What Can Do for You! Fabian Christ
Enron s as Graph Data Corpus for Large-scale Graph Querying Experimentation Michal Laclavík, Martin Šeleng, Marek Ciglan, Ladislav Hluchý.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
SQL vs NOSQL Discussion
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Data Mining GyuHyeon Choi. ‘80s  When the term began to be used  Within the research community.
Multimedia Databases (MMDB)
Organizational Memory: Issues in Design & Implementation Sree Nilakanta May 1, 2000.
Survey of Semantic Annotation Platforms
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Processing and Recommendation Michal Laclavík, Ladislav Hluchý, Martin Šeleng ( research, information extraction, information retrieval, contextual.
Information processing Michal Laclavík, Ladislav Hluchý ( research, information extraction, information retrieval, contextual recommendation)
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Institute of Informatics, Slovak Academy of Sciences Michal Laclavík Ladislav Hluchý.
Košice, 10 February Experience Management based on Text Notes The EMBET System Michal Laclavik.
Session 4e, 24 October 2007 eChallenges e-2007 Copyright 2007 Institute of Informatics, SAS Network Enterprise Interoperability and Collaboration using.
Problems in Semantic Search Krishnamurthy Viswanathan and Varish Mulwad {krishna3, varish1} AT umbc DOT edu 1.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Oracle Database 11g Semantics Overview Xavier Lopez, Ph.D., Dir. Of Product Mgt., Spatial & Semantic Technologies Souripriya Das, Ph.D., Consultant Member.
ICCS 2008, CracowJune 23-25, Towards Large Scale Semantic Annotation Built on MapReduce Architecture Michal Laclavík, Martin Šeleng, Ladislav Hluchý.
Ontea: Pattern based Annotation Platform Michal Laclavík.
Workshop 12g, 26 October 2007 eChallenges e-2007 Copyright 2007 Commius consortium Commius: ISU via Michal Laclavík Institute of Informatics, Slovak.
Lightweight Semantic Approach for Enterprise Search and Interoperability Michal Laclavík, Štefan Dlugolinský, Martin Šeleng, Marek Ciglan, Martin Tomašek,
11 November Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.
+ Big Data. + Chapter Objectives Learn the basic concepts of Big Data, structured storage, and the MapReduce process Learn the basic concepts of data.
CPT-S Topics in Computer Science Big Data 1 Yinghui Wu EME 49.
C. Lee Giles David Reese Professor, College of Information Sciences and Technology Graduate Professor of Computer Science and Engineering Courtesy Professor.
7th May Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Big Data Analytics Platforms. Our Team NameApplication Viborov MichaelApache Spark Bordeynik YanivApache Storm Abu Jabal FerasHPCC Oun JosephGoogle BigQuery.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
WIKTBratislava, 28. november Semantic Organization/Enterprise Vision Michal Laclavik, Ladislav Hluchy, Marian Babik, Zoltan Balogh, Ivana Budinska,
SAP BI – The Solution at a Glance : SAP Business Intelligence is an enterprise-class, complete, open and integrated solution.
Empowering the Knowledge Worker End-User Software Engineering in Knowledge Management Witold Staniszkis The 17th International.
Computing & Information Sciences Kansas State University An Overview of Big Data Analytics: Challenges & Selected Applications Guest Seminar Drake University.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Viet Tran Institute of Informatics, SAS Slovakia.
Book web site:
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
CS 405G: Introduction to Database Systems
SAS users meeting in Halifax
Big Data Enterprise Patterns
Chapter 14 Big Data Analytics and NoSQL
Modern Data Management
David Ostrovsky | Couchbase
Topics Covered in COSC 6340 Data models (ER, Relational, XML (short))
Data Warehousing and Data Mining
Defining Data-intensive computing
Searching and browsing through fragments of TED Talks
CSE 635 Multimedia Information Retrieval
Course Summary ChengXiang “Cheng” Zhai Department of Computer Science
Big DATA.
Big Data.
Presentation transcript:

11 October Primary Research Team & Capabilities Dept. of Parallel and Distributed Computing Research and Development Areas: –Large-scale HPCN, Grid and MapReduce applications –Intelligent and Knowledge oriented Technologies Experience from IST: –3 project in FP5: ANFAS, CrosGRID, Pellucid –6 project in FP6: EGEE II, K-Wf Grid, DEGREE (coordinator), EGEE, int.eu.grid, MEDIGRID –4 projects in FP7: Commius, Admire, Secricom, EGEE III Several National Projects (SPVV, VEGA, APVT) IKT Group Focus: –Information Processing (Large Scale) –Graph Processing –Information Extraction and Retrieval –Semantic Web –Knowledge oriented Technologies –Parallel and Distributed Information Processing Solutions: –SGDB: Simple Graph Database –gSemSearch: Graph based Semantic Search –Ontea: Pattern-based Semantic Annotation –ACoMA: KM tool in –EMBET: Recommendation System –Experts on MapReduce and IR (Nutch, Solr, Lucene) Director & leader of PDC: Dr. Ladislav Hluchý URL:

Towards Entity Search Current approaches –Confirmed human knowledge –Google Knowledge Graph –Facebook Graph Search Data sets Available –Wikipedia –DBPedia (111 languages) –Freebase –Linked Data cloud Our approach –Quite unique mix of skills: IR, Semantic Web, Graphs and Networks –Networks, Text, metadata –Graph algorithms –Information Retrieval techniques –Anchor texts: aliases, properties, types 11 October 20132

Entity Search Applications 11 October

Entity Search Applications Online Advertising –Query Categorization –Keyword Extension Business Intelligence –Enterprise Search –Knowledge Management –Text analytics Multilingual short text categorizations –Based on Wikipedia Language versions, DBPedia, Freebase –Query Categorization –Social media (Twitter) categorization, analysis Security Domain –Information Leakage prevention –Categorization 11 October 20134

Large scale Text and Graph data processing Core Technology Web crawling –Nutch + plugins Full text indexing and search –lucene, Sorl Information Extraction –Ontea, GATE All above large scale –Hadoop, S4 Graph processing and Querying –Simple Graph Database (SGDB) –gSemSearch –Neo4j –Blueprints 11 October Underlined are the technologies developed by IISAS

Relation to Business Intelligence Old BI approaches –Data Integration from RDBM –Data ware houses –OLAP –… New BI approaches –Other than RDBM data structures: Networks, Semantics Networks/Graphs in Telecom, Social Networks, Transactions, Linked Data … NoSQL: key value (Tokyo Cabinet), column stores (HBase), Graph databases, RDF(s) –In-Memory computing –Commodity PCs solutions for large data: MapReduce style - Hadoop, Pregel style – Giraph, Hama –Big unstructured data processing (on Hadoop): Sentiment analysis, topic detection, named entity detection 11 October 20136

Ontea: Information Extraction Tool  Regex patterns  Gazetteers  Resuls  Key-value pairs  Structured into trees  graphs  Transformers, Configuration  Automatic loading of extractors  Visual Annotation Tool  Integration with external tools  GATE, Stemers, Hadoop …  Multilingual tests English, Slovak, Spanish, Italian 11 October Text with annotations Tree of annotations Network /Graph of annotations

Named Entity Recognition (NER) Combination of Existing NER –ANNIE (GATE), Apache OpenNLP, –Illinois NER, Illinois Wikifier, –LingPipe, Open Calais –Stanford NER,WikiMiner, –Miscinator Machine Learning –Decision Trees models Received second place at MSM 2013, missing first place by 1%, where participated 17 teams word wide ikt.ui.sav.sk/index.php?n=Main.IEChallenge ikt.ui.sav.sk/index.php?n=Main.IEChallenge October 20138

gSemSearch: Graph based Semantic Search Entity relation search in semantic networks/graphs Search, Navigation, Data Interaction Aiming at data integration of –Structured data (Relational data, LinkedData) –Unstructured Data (text, documents, communication) Applications: – , Web, Text documents, LinkedData 11 October

SemSets: Sematnic Search Answering list type questions: astronauts who walked on the Moon Wikipedia as text and networks/graph Text: IR methods, Lucene based Graph/network: sprading activation and SemSets Winning solution on Semantic Search Challenge October Eugene_Cernan 2.Alan_Bean 3.David_Scott 4.John_Young_(astronaut) 5.Neil_Armstrong 6.Pete_Conrad 7.Harrison_Schmitt 8.Alan_Shepard 9.Charles_Duke 10.Buzz_Aldrin 11.James_Irwin 12.Edgar_Mitchell

SGDB: Simple Graph Database Storage for graphs Optimized for graph traversing and spread of activation Faster then Neo4j for graph traversing operations Supports Blueprints API Graph Database Benchmarks –Graph Traversal Benchmark for Graph Databases – –Blueprints API - possibility to test compliant Graph databases 11 October Source:

Community Detection in Complex Networks Task: Identify densely connected subgraphs in complex networks community collapsing problem SCCD –Near-linear time complexity –Avoids community collapsing problem (to certain extend) KDD paper –Re-weighting approach –Better results on real networks 11 October Marek Ciglan, Kjetil Nørvåg: Fast detection of size-constrained communities in large networks, proceedings of WISE'10, LNCS Volume 6488/2010 Marek Ciglan, Michal Laclavík and Kjetil Nørvåg: On Community Detection in Real-World Networks and the Importance of Degree Assortativity, 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2013

Future Direction: Entity Search in Large Graph Data Motivation –Graph/Network data are everywhere: social networks, web, LinkedData, transactions, communication ( , phone). –Also text can be converted to graph. –Interconnecting graph data and searching for relations is crucial. Approach –Forming semantic trees and graphs from text, web, communication, databases and LinkedData –User interaction with graph data in order to achieve integration and data cleansing –Users will do it, if user effort have immediate impact on search results 11 October