Graph-RAT Overview By Daniel McEnnis
2/32 What is Graph-RAT Relational Analysis Toolkit Database abstraction layer Evaluation platform Robustly evaluate all different ways of performing recommendation
3/32 Kinds of Analysis Recommendation Systems Relational Machine Learning Data Mining MIR document retrieval
4/32 Talk Outline Base Components Queries Algorithms Schedulers Graph-RAT Language Conclusion and Examples
5/32 Base Components Graphs Actors Links Properties A B E C D AA B E C D AA B E C D A [Vector] Hiking Biking 22 John A Name Age Hobbies Library
6/32 Properties Variables of Graph-RAT Can be arbitrary Java types Can be attached to anything Unique ID string for each object Accessed only as sets, not as objects
7/32 Data View Hyper-graph structure defined by the set of actors and links in a graph Accessible from the enclosing graph Can be cyclic A B E C D AA B E C D AA B E C D A
8/32 Metadata View Not constructed by default Implicit graph described by modes and the relations between them Needed for relational machine learning User Friend
9/32 Query Language Constructs sets retrieved from a graph Functional structure Similar to SQL 4 types Graph Queries Actor Queries Link Queries Property Queries
10/32 Query Structure Cascading queries in a LISP style syntax Each child query is of a different type Restrictions can be added at runtime
11/32 Query Examples LinkByActor( false, ActorByMode(false, “Target”,”.*”) ActorByMode(false, “Source”,”.*”) SetOperation.XOR)
12/32 Query Comparisons Similar to the JENA interface Construction is similar to Jung system Implements all SQL queries that do not require temporary tables
13/ Query Uses graph primitives instead of Queries Algorithms use hard-coded GraphByID
14/32 Algorithms Functions that execute over a given graph Metadata is a part of the algorithm Properties utilized or created are declared up front. Excepting output algorithms, no side effects are permitted. execute(Graph graph) IODescriptor getInput() IODescriptor getOuput()
15/32 Propositional Algorithms Utilizes aggregator function as a parameter Crosses all ways of shifting data Aggregate By Link Aggregate By Link Property Aggregate On Graph Graph To Actor Link To Graph Graph To Graph
16/32 Aggregator Functions 1 or more elements to equal or fewer elements Examples Statistical Moments Arithmetic Operations Null Aggregation Concatentation
17/32 Social Network Analysis Algorithms Prestige Algorithms Degree Betweeness Closeness Page Rank HITS Graph Triples
18/32 Classification Algorithms Machine Learning Primitives Uses Weka Separate algorithms for training and classifying
19/32 Clustering Algorithms Several graph-based algorithms Weak Component Clustering Strong Component Clustering Edge Betweeness Clustering Norman-Girvan Edge Betweeness Also has primitives calling Weka on vector data
20/32 Similarity Algorithms Comparisons between modes Types of Similarity Similarity By Link Similarity By Property Graph Similarity Distance Functions All Weka distance functions KLDistance Exponential Distance
21/32 Collaborative Filtering Algorithms Traditional recommendation algorithms Item to Item User to User Associative Mining
22/32 Array-Based Algorithms Transform To Array Principal Component Analysis
23/32 Evaluation All forms of evaluating results Set Based (precision and recall) Weighted Set (Correlations) Ordered Lists (Kendall Tau, Half Life) Cross-Validation algorithms By Actor By Link By Graph
24/32 Data Acquisition Components for acquiring source data File Reader Types Reading different file formats Web Crawling Types LiveJournal or LastFM Connection Types Links different sets together
25/32 Web Crawler Custom Multi-threaded web crawler Dynamic parsers Properties passing between both crawls and parser execution Stop and filter conditions are parameterized
26/32 Existing Parsers Base HTML parsing XML Parsing (SAX) LiveJournal FOAF LastFM REST services Graph-RAT documents Yahoo search queries
27/32 Comparisons SQL LINQ Matlab Other graph packages Prolog?
28/32 Embedded Use Dynamic Loading AbstractFactory abstract superclass Example - Retrieving links to YouTube videos from GData
29/32 Graph-RAT Language Base Graph-RAT: Data Acquisition components executed For each algorithm entry: Graph Query selects a set of graphs Algorithm is executed over each graph Cross-Validation Graph-RAT Mode, relation, or graph chosen in advance, Data Acquisition components run once Algorithm entries rerun for each fold Statistical Graph-RAT List of cross-validation schedulers Statistical metrics of which performed better
30/32 User To User Collaborative Filtering Example Aggregate By Link(Artist->User) Similarity By Link (User->User) Aggregate By Link (User->User) Property to Link (User->Artist)
31/32 Setup Example
…
33/32 DataAquisition Crawl LastFM Proxy proxy.waikato.ac.nz …
34/32 Query Entry.*
Algorithm Entry … GraphTriples Relation Friends Destination TriplesVector …
36/32 Future Work Stabilization to beta Statistical testing on result sets Upgrading the GUI interface Memory performance upgrades Octave Integration
37/32 Questions? Stable (beta) release is 0.4.3