Graph-RAT Overview By Daniel McEnnis. 2/32 What is Graph-RAT  Relational Analysis Toolkit  Database abstraction layer  Evaluation platform  Robustly.

Slides:



Advertisements
Similar presentations
Introduction to NHibernate By Andrew Smith. The Basics Object Relation Mapper Maps POCOs to database tables Based on Java Hibernate. V stable Generates.
Advertisements

Raptor Technical Details. Outline Workshop structured by Raptor workflow – Raptor Event model. – ICA log file parsing – ICA/MUA event storage – ICA event.
1 CSC 551: Web Programming Spring 2004 client-side programming with JavaScript  scripts vs. programs  JavaScript vs. JScript vs. VBScript  common tasks.
Multi-Model Digital Video Library Professor: Michael Lyu Member: Jacky Ma Joan Chung Multi-Model Digital Video Library LYU9904 Multi-Model Digital Video.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Information Retrieval in Practice
Finding Similar Music Artists for Recommendation Presented by :Abhay Goel, Prerak Trivedi.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
28/1/2001 Seminar in Databases in the Internet Environment Introduction to J ava S erver P ages technology by Naomi Chen.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
Labadmin Monitoring System Final Presentation Supervisor: Victor Kulikov Studnets: Jameel Shorosh Malek Zoabi.
Recommender systems Ram Akella November 26 th 2008.
Music Recommendation By Daniel McEnnis. Outline Sociology of Music Recommendation Infrastructure –Relational Analysis Toolkit Description Evaluation –GATE.
Microsoft ® Official Course Interacting with the Search Service Microsoft SharePoint 2013 SharePoint Practice.
Overview of Search Engines
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Department of Computer Science, University of California, Irvine Site Visit for UC Irvine KD-D Project, April 21 st 2004 The Java Universal Network/Graph.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Classes and Class Libraries Examples and Hints November 9,
DHTML. What is DHTML?  DHTML is the combination of several built-in browser features in fourth generation browsers that enable a web page to be more.
FALL 2005CSI 4118 – UNIVERSITY OF OTTAWA1 Part 4 Web technologies: HTTP, CGI, PHP,Java applets)
Industrial Project (234313) Final Presentation “App Analyzer” Deliver the right apps users want! (VMware) Students: Edward Khachatryan & Elina Zharikov.
Christopher Jeffers August 2012
Zhonghua Qu and Ovidiu Daescu December 24, 2009 University of Texas at Dallas.
1 PHP and MySQL. 2 Topics  Querying Data with PHP  User-Driven Querying  Writing Data with PHP and MySQL PHP and MySQL.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Building Search Portals With SP2013 Search. 2 SharePoint 2013 Search  Introduction  Changes in the Architecture  Result Sources  Query Rules/Result.
OracleAS Reports Services. Problem Statement To simplify the process of managing, creating and execution of Oracle Reports.
Developing Reporting Solutions with SQL Server
Personalizing the web for multilingual web sources Anil Goud V Lalith Krishna L Dinesh Kumar D.R.
Master Thesis Defense Jan Fiedler 04/17/98
Music Recommendation A Data Mining Approach Daniel McEnnis 2nd year PhD Daniel McEnnis 2nd year PhD.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
JAVA SERVER PAGES. 2 SERVLETS The purpose of a servlet is to create a Web page in response to a client request Servlets are written in Java, with a little.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Advanced PHP & RSS Utilizing XML, RSS, and PHP. XML (eXtensible Markup Language) XML is the language of all RSS feeds and subscriptions XML is basically.
Module 10 Administering and Configuring SharePoint Search.
Overview Web Session 3 Matakuliah: Web Database Tahun: 2008.
Intelligent Web Topics Search Using Early Detection and Data Analysis by Yixin Yang Presented by Yixin Yang (Advisor Dr. C.C. Lee) Presented by Yixin Yang.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Graph RAT A framework for integrating social and content data By Daniel McEnnis University of Waikato To what extent do artists cluster into genres Pattern.
Cross Language Clone Analysis Team 2 February 3, 2011.
RDFPath: Path Query Processing on Large RDF Graph with MapReduce Martin Przyjaciel-Zablocki et al. University of Freiburg ESWC May 2013 SNU IDB.
JAVA BEANS JSP - Standard Tag Library (JSTL) JAVA Enterprise Edition.
Preface IIntroduction Objectives I-2 Course Overview I-3 1Oracle Application Development Framework Objectives 1-2 J2EE Platform 1-3 Benefits of the J2EE.
Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis.
Cross Language Clone Analysis Team 2 February 3, 2011.
1 Java Server Pages A Java Server Page is a file consisting of HTML or XML markup into which special tags and code blocks are inserted When the page is.
Java Programming: Advanced Topics 1 Building Web Applications Chapter 13.
© ExplorNet’s Centers for Quality Teaching and Learning 1 Objective % Understand advanced production methods for web-based digital media.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
10 Copyright © 2004, Oracle. All rights reserved. Building ADF View Components.
Singleton Academy, Pune. Course syllabus Singleton Academy Pune – Course Syllabus1.
Visual Basic 2010 How to Program © by Pearson Education, Inc. All Rights Reserved.
Integrating and Extending Workflow 8 AA301 Carl Sykes Ed Heaney.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Wednesday NI Vision Sessions
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Vertical Search for Courses of UIUC Homepage Classification The aim of the Course Search project is to construct a database of UIUC courses across all.
Information Retrieval in Practice
Basic 1960s It was designed to emphasize ease of use. Became widespread on microcomputers It is relatively simple. Will make it easier for people with.
Search Engine Architecture
Google Web Toolkit Tutorial
Software for scientific calculations
Machine Learning with Weka
Presented by: Jacky Ma Date: 11 Dec 2001
Supporting High-Performance Data Processing on Flat-Files
Presentation transcript:

Graph-RAT Overview By Daniel McEnnis

2/32 What is Graph-RAT  Relational Analysis Toolkit  Database abstraction layer  Evaluation platform  Robustly evaluate all different ways of performing recommendation

3/32 Kinds of Analysis  Recommendation Systems  Relational Machine Learning  Data Mining  MIR document retrieval

4/32 Talk Outline  Base Components  Queries  Algorithms  Schedulers  Graph-RAT Language  Conclusion and Examples

5/32 Base Components  Graphs  Actors  Links  Properties A B E C D AA B E C D AA B E C D A [Vector] Hiking Biking 22 John A Name Age Hobbies Library

6/32 Properties  Variables of Graph-RAT  Can be arbitrary Java types  Can be attached to anything  Unique ID string for each object  Accessed only as sets, not as objects

7/32 Data View  Hyper-graph structure defined by the set of actors and links in a graph  Accessible from the enclosing graph  Can be cyclic A B E C D AA B E C D AA B E C D A

8/32 Metadata View  Not constructed by default  Implicit graph described by modes and the relations between them  Needed for relational machine learning User Friend

9/32 Query Language  Constructs sets retrieved from a graph  Functional structure  Similar to SQL  4 types  Graph Queries  Actor Queries  Link Queries  Property Queries

10/32 Query Structure  Cascading queries in a LISP style syntax  Each child query is of a different type  Restrictions can be added at runtime

11/32 Query Examples  LinkByActor(  false,  ActorByMode(false, “Target”,”.*”)  ActorByMode(false, “Source”,”.*”)  SetOperation.XOR)

12/32 Query Comparisons  Similar to the JENA interface  Construction is similar to Jung system  Implements all SQL queries that do not require temporary tables

13/ Query  Uses graph primitives instead of Queries  Algorithms use hard-coded GraphByID

14/32 Algorithms  Functions that execute over a given graph  Metadata is a part of the algorithm  Properties utilized or created are declared up front.  Excepting output algorithms, no side effects are permitted. execute(Graph graph) IODescriptor getInput() IODescriptor getOuput()

15/32 Propositional Algorithms  Utilizes aggregator function as a parameter  Crosses all ways of shifting data  Aggregate By Link  Aggregate By Link Property  Aggregate On Graph  Graph To Actor  Link To Graph  Graph To Graph

16/32 Aggregator Functions  1 or more elements to equal or fewer elements  Examples Statistical Moments Arithmetic Operations Null Aggregation Concatentation

17/32 Social Network Analysis Algorithms  Prestige Algorithms  Degree  Betweeness  Closeness  Page Rank  HITS  Graph Triples

18/32 Classification Algorithms  Machine Learning Primitives  Uses Weka  Separate algorithms for training and classifying

19/32 Clustering Algorithms  Several graph-based algorithms  Weak Component Clustering  Strong Component Clustering  Edge Betweeness Clustering  Norman-Girvan Edge Betweeness  Also has primitives calling Weka on vector data

20/32 Similarity Algorithms  Comparisons between modes  Types of Similarity Similarity By Link Similarity By Property Graph Similarity  Distance Functions All Weka distance functions KLDistance Exponential Distance

21/32 Collaborative Filtering Algorithms  Traditional recommendation algorithms  Item to Item  User to User  Associative Mining

22/32 Array-Based Algorithms  Transform To Array  Principal Component Analysis

23/32 Evaluation  All forms of evaluating results  Set Based (precision and recall)  Weighted Set (Correlations)  Ordered Lists (Kendall Tau, Half Life)  Cross-Validation algorithms  By Actor  By Link  By Graph

24/32 Data Acquisition  Components for acquiring source data  File Reader Types  Reading different file formats  Web Crawling Types  LiveJournal or LastFM  Connection Types  Links different sets together

25/32 Web Crawler  Custom Multi-threaded web crawler  Dynamic parsers  Properties passing between both crawls and parser execution  Stop and filter conditions are parameterized

26/32 Existing Parsers  Base HTML parsing  XML Parsing (SAX)  LiveJournal FOAF  LastFM REST services  Graph-RAT documents  Yahoo search queries

27/32 Comparisons  SQL  LINQ  Matlab  Other graph packages  Prolog?

28/32 Embedded Use  Dynamic Loading  AbstractFactory abstract superclass  Example - Retrieving links to YouTube videos from GData

29/32 Graph-RAT Language  Base Graph-RAT:  Data Acquisition components executed  For each algorithm entry:  Graph Query selects a set of graphs  Algorithm is executed over each graph  Cross-Validation Graph-RAT  Mode, relation, or graph chosen in advance,  Data Acquisition components run once  Algorithm entries rerun for each fold  Statistical Graph-RAT  List of cross-validation schedulers  Statistical metrics of which performed better

30/32 User To User Collaborative Filtering Example  Aggregate By Link(Artist->User)  Similarity By Link (User->User)  Aggregate By Link (User->User)  Property to Link (User->Artist)

31/32 Setup Example

33/32 DataAquisition Crawl LastFM Proxy proxy.waikato.ac.nz …

34/32 Query Entry.*

Algorithm Entry … GraphTriples Relation Friends Destination TriplesVector …

36/32 Future Work  Stabilization to beta  Statistical testing on result sets  Upgrading the GUI interface  Memory performance upgrades  Octave Integration

37/32 Questions?   Stable (beta) release is 0.4.3