Searching Provenance Shankar Pasupathy, Network Appliance PASS Workshop, Harvard October 2005.

Slides:



Advertisements
Similar presentations
ROWLBAC – Representing Role Based Access Control in OWL
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Provenance-Aware Storage Systems Margo Seltzer April 29, 2005.
Irwin/McGraw-Hill Copyright © 2000 The McGraw-Hill Companies. All Rights reserved Whitten Bentley DittmanSYSTEMS ANALYSIS AND DESIGN METHODS5th Edition.
The Case for Browser Provenance Daniel W. Margo and Margo Seltzer Harvard School of Engineering and Applied Sciences.
The eXtensible Markup Language (XML) An Applied Tutorial Kevin Thomas.
Introduction to Information Retrieval
Fast Algorithms For Hierarchical Range Histogram Constructions
Information Retrieval Visualization CPSC 533c Class Presentation Qixing Zheng March 22, 2004.
Project 1 Assignment Building a mini-database for CCI in UNCC which includes entity sets: departments (CS,SIS, bioinformatics), faculties, courses given.
Administrivia Final exam: Wed, May 12, 3:00-5:00, in this room Q&A on it today Playoffs: Fri, May 14, noon-2:00, FEC 141 Post-class survey (anonymous)
Novelty Detection and Profile Tracking from Massive Data Jaime Carbonell Eugene Fink Santosh Ananthraman.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
A Heterogeneous Network Access Service based on PERMIS and SAML Gabriel López Millán University of Murcia EuroPKI Workshop 2005.
BUSINESS DRIVEN TECHNOLOGY
Information Retrieval
Introduction to Genealogy By Al Barron Slidell Branch Library November 17, 2004.
m AUTHOR EXECUTIVE TRACK CONSULTANTMAUIDAN HOLME.
Chapter 11 Databases.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Current Situation and CI Requirements OOI Cyberinfrastructure Integrated Observatory Management Workshop San Diego May 28-29, 2008.
After the Interviews. How do you know when you’re done interviewing? SATURATION POINT.
Aardvark Anatomy of a Large-Scale Social Search Engine.
Database System Concepts and Architecture
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
CRISP Requirements Discussion draft-ietf-crisp-requirements-02.txt Andrew Newton 55 th IETF, November 19, 2002 Atlanta, GA.
October 8, 2015 University of Tulsa - Center for Information Security Microsoft Windows 2000 DNS October 8, 2015.
Mini-Project on Web Data Analysis DANIEL DEUTCH. Data Management “Data management is the development, execution and supervision of plans, policies, programs.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
The protection of the DB against intentional or unintentional threats using computer-based or non- computer-based controls. Database Security – Part 2.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
IT Just Works ©2008 BigFix, Inc. Practical Guide to Relevance Ben Kus – 1/31/2008.
Music Information Retrieval Information Universe Seongmin Lim Dept. of Industrial Engineering Seoul National University.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
ITGS Databases.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Ranking CSCI 572: Information Retrieval and Search Engines Summer 2010.
Create Content Capture Content Review Content Edit Content Version Content Version Content Translate Content Translate Content Format Content Transform.
CSE 332: Design Patterns Review: Design Pattern Structure A design pattern has a name –So when someone says “Adapter” you know what they mean –So you can.
1 KMeD: A Knowledge-Based Multimedia Medical Database System Wesley W. Chu Computer Science Department University of California, Los Angeles
Lawrence Snyder University of Washington, Seattle © Lawrence Snyder 2004.
QUERY PROCESSING RELATIONAL DATABASE KUSUMA AYU LAKSITOWENING
Understand Audit Policies LESSON Security Fundamentals.
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
Configuring, Managing and Maintaining Windows Server® 2008 Servers Course 6419A.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
Introduction to Databases Dr. Osama AL Rababah. Objectives In this capture you will learn: Some common uses of database systems. The characteristics of.
1 Chapter 2 Database Environment Pearson Education © 2009.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
LECTURE TWO Introduction to Databases: Data models Relational database concepts Introduction to DDL & DML.
© ExplorNet’s Centers for Quality Teaching and Learning 1 Describe applications and services. Objective Course Weight 5%
Chapter 1 Overview of Databases and Transaction Processing.
- The most common types of data models.
Databases.
Introduction Multimedia initial focus
Associative Query Answering via Query Feature Similarity
Chapter 2 Database Environment.
Database.
The Economy of Distributed Metadata Authoring
Data Model.
Database Systems Instructor Name: Lecture-3.
Introduction to Information Retrieval
Discussion Class 9 Google.
Visual Grounding.
Presentation transcript:

Searching Provenance Shankar Pasupathy, Network Appliance PASS Workshop, Harvard October 2005

Outline Why does Netapp care ? Questions we’d like to ask Use Google to search provenance ? Novel uses of searching provenance Distributed search of provenance Metrics

Why does Netapp care ? Workflows (ILM) –Morgan Stanley wants to track how financial reports are generated – Set backup policies on workflows Generic search –Relevance of a document when ranking results What sources were used to create the document ? How authoritative are the sources ? Audit trails Don’t backup stuff that can be easily recreated –E.g. object files (.o) –Thumbnails of images

Questions we’d like to ask Forward and reverse queries –Given file “foo”, tell me the workflow that it’s part of –What are all the data sets I can recreate easily from “foo” (need the notion of how long it takes to create descendants of foo) How much space will I save ? –Given file “foo”, tell me exactly how to recreate it Fine-grain query –Which parts of other files (offset, length) were used to create “foo”

Use Google ? Use Google to index all provenance information out there –No security (we could fix that) –Is it too heavy weight ? Are queries well structured ? – Perhaps SQL and a relational database work well

Query mechanisms What the user sees –How would you modify NFS/CIFS to support provenance queries ? –Could you do something interesting via the filesystem E.g. lookup an extended attribute whose name is well known to get the provenance tree –Modify fstat ? Visual representation of provenance Notification of provenance changes

Query language… User access rights –Is uid/gid enough ? –Probably want to be specify roles and domains that the questioner has control over. –What if you don’t have permissions to view a region of the provenance tree ? What about the provenance system itself –Is SQL good enough ? –RDF ontology/SPARQL query language

Other uses of provenance Determine trust, authority of authors of documents –A provenance tree is similar to a paper citation tree –Who’s cited most often ? –Use that to improve search results Craig Soules’ work –Capture relationships among files over a period of time

Distributed querying of provenance Data may have been produced from sources on different computers Distributed querying => common format for recording provenance –What is that format ? How do you describe partial or incomplete answers ? –Because parts of the distributed provenance tree are not available

Metrics How do you compare PASS query systems ? –Performance –Relevance –Benchmarks