Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Executional Architecture
Haystack: Per-User Information Environment 1999 Conference on Information and Knowledge Management Eytan Adar et al Presented by Xiao Hu CS491CXZ.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
SciVal Experts & SciVal Funding Information Sessions.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Information Retrieval in Practice
Xyleme A Dynamic Warehouse for XML Data of the Web.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
CMSC838 Project Presentation An Ontology-based Approach for Managing Software Components by Vladimir Kolovski.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
University of Kansas Data Discovery on the Information Highway Susan Gauch University of Kansas.
Overview of Search Engines
Overview of Mini-Edit and other Tools Access DB Oracle DB You Need to Send Entries From Your Std To the Registry You Need to Get Back Updated Entries From.
Graph-RAT Overview By Daniel McEnnis. 2/32 What is Graph-RAT  Relational Analysis Toolkit  Database abstraction layer  Evaluation platform  Robustly.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Server-side Scripting Powering the webs favourite services.
Search Engines and Information Retrieval Chapter 1.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Strategies for improving Web site performance Google Webmaster Tools + Google Analytics Marshall Breeding Director for Innovative Technologies and Research.
DCS Overview MCS/DCS Technical Interchange Meeting August, 2000.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Online Autonomous Citation Management for CiteSeer CSE598B Course Project By Huajing Li.
Building Search Portals With SP2013 Search. 2 SharePoint 2013 Search  Introduction  Changes in the Architecture  Result Sources  Query Rules/Result.
CSCA48 Course Summary.
ITEC 2620A Introduction to Data Structures
Web Categorization Crawler Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Design & Architecture Dec
Music Recommendation A Data Mining Approach Daniel McEnnis 2nd year PhD Daniel McEnnis 2nd year PhD.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
CSE 219 Computer Science III Program Design Principles.
Aude Dufresne and Mohamed Rouatbi University of Montreal LICEF – CIRTA – MATI CANADA Learning Object Repositories Network (CRSNG) Ontologies, Applications.
Module 10 Administering and Configuring SharePoint Search.
We can’t walk on water, Trinity Software computer simulation. but we can produce the.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
18 April 2005CSci 210 Spring Design Patterns 1 CSci 210.
Google’s Deep-Web Crawl By Jayant Madhavan, David Ko, Lucja Kot, Vignesh Ganapathy, Alex Rasmussen, and Alon Halevy August 30, 2008 Speaker : Sahana Chiwane.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Grid programming with components: an advanced COMPonent platform for an effective invisible grid © 2006 GridCOMP Grids Programming with components. An.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
Logical view –show classes and objects Process view –models the executables Implementation view –Files, configuration and versions Deployment view –Physical.
1 CSCD 326 Data Structures I Software Design. 2 The Software Life Cycle 1. Specification 2. Design 3. Risk Analysis 4. Verification 5. Coding 6. Testing.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Secure Systems Research Group - FAU 1 A Trust Model for Web Services Ph.D Dissertation Progess Report Candidate: Nelly A. Delessy, Advisor: Dr E.B. Fernandez.
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Graph RAT A framework for integrating social and content data By Daniel McEnnis University of Waikato To what extent do artists cluster into genres Pattern.
Review of Parnas’ Criteria for Decomposing Systems into Modules Zheng Wang, Yuan Zhang Michigan State University 04/19/2002.
Reviews Crawler (Detection, Extraction & Analysis) FOSS Practicum By: Syed Ahmed & Rakhi Gupta April 28, 2010.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
The Akoma Ntoso Naming Convention Fabio Vitali University of Bologna.
1 FollowMyLink Individual APT Presentation First Talk February 2006.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Data mining in web applications
Information Retrieval in Practice
Web Engineering.
PDAP Query Language International Planetary Data Alliance
Authors: Khaled Abdelsalam Mohamed Amr Kamel
Thanks to Bill Arms, Marti Hearst
ITEC 2620M Introduction to Data Structures
JINI ICS 243F- Distributed Systems Middleware, Spring 2001
Supporting High-Performance Data Processing on Flat-Files
Presentation transcript:

Progress Report - Year 2 Extensions of the PhD Symposium Presentation Daniel McEnnis

Overview Accomplishments Data set acquisition and cleaning Theoretical achievements Graph-RAT improvements

Current Data 40’s Jazz Recordings 2000 annotated recordings from 80 CDs Covers nearly all 40’s popular music LastFM by Song Retrieves tag and user info by song Data cleaning on user playcounts needed

Planned Data Set Acquisition Explored DBTunes XML version of myspace. Linking with LastFM data designed but not yet written. Provides per-artist audio data for all recent artists.

Theoretical Achievements Algorithm Literature Review Theortical Computer Science journal submission NZCSRSC conference submission Recommendation Tasks and Evaluation Metrics

Algorithm Literature Systematic exploration of theoretical computer science and discrete mathematics. Discovered 1973 SIAM paper for maximal clique algorithm. Maximal clique algorithm is most efficient discovered

Journal Submission Submitted Graph Triples Census algorithm. Proof of correctness Proof of Time complexity Proof of Space Complexity Rediscovery of 2001 algorithm in Social Networks Most efficient implementation known

NZCSRSC Poster at the conference Written as a short users guide

Evaluation Exploration Incorporating cross-validation into relational data. 9 types of music recommendation Personalized versus generic Open query versus targeted query Dynamic versus static data New music versus all music

Personalized Radio Open query with personalized presentation Static data vs dynamic data New items prediction vs predict anything

Targeted Search Not personalized Similarity queries Automatically generating targeted lists for a browsing hierarchy New music vs all music Static vs dynamic data

Personalized Tag Radio Create a personalized play list matching a given query New music vs all music Static vs dynamic data

Excluded Types ‘Top 40’ prediction Rendered obsolete by other types

Cross-Validation in Graphs Actor removal Only form currently used All links to a particular actor are removed Link removal Selected links from ground truth are removed Algorithm evaluated on reproducing missing links

Graph-RAT Improvements Release of Finalized Graph-RAT as a relational programming language Added propositional algorithms Release of New Query Subsystem Usability enhancements Space complexity improvements

Aggregators 8 algorithms with 9 helper functions Cover each form of propositionalization Cover mappings between links and properties Core primitives for Graph-RAT as a programming language.

Similarity 2 new similarity algorithms 1 new distance metric

Query Subsystem 28 primitives for searching in a graph 10 graph primitives 7 actor primitives 7 link primitives 4 property primitives Functional - composition to build queries

Performance Specs Queries can return collections or iterators. Collections Implemented as references into graphs Linear in number of references Iterators Ordered sequences of objects Constant in space complexity (excluding Graph ID and AllGraphs)

Usability Enhancements Properties and Metadata Interface enhancements Dynamic Loading of Classes XML scripting support

Properties and Metadata Properties description Encapsulates all parameter code Utilizes Graph-RAT Property objects Comparison to JavaBeans New Metadata Model Parameter model update Input/Output descriptors update

Interface Updates Arrays->Lists graph, link, actor, and property objects Iterators All graph operations support iterators

Dynamic Loading Classes loaded from file at runtime. Loading controlled by call to loader object Automatic registering with relevant factories All factories updated to support dynamic loading Extend Abstract Factory

XML Scripting support SAX parser support for all components excepting crawling and parsing Implemented using the Builder pattern

Core Improvements 2 cross-validation algorithms ~20 algorithm with space complexity improvements Iterators for all graph primitives Macros for separation of graph data by cross-validation property.

Additional algorithms 2 new similarity algorithms 1 new distance metric added Obsolete algorithms removed

LastFM crawler updates LastFM upgraded its web-services, removing the old version New version will link to the semantic web ~20 parsers completed Still under construction

Planned Future Work Contingent on arrival of computer Testing of existing code Cross-Validation Scheduler Completion of LastFM Parser DBTunes (from semantic web) parser Experiments! Write Thesis!

Unplanned Future Work Full semantic web crawler Incorporating GData protocols Database backend Colt-Matrix-Over-Graph adapter Database-backed Weka instance

Beyond the Horizon Support for Prolog primitives Multi-database graph support Semantic Web graph utilizing the proxy pattern Support for dynamic updates and dynamic data