2005.11.22- SLIDE 1IS 257 – Fall 2005 Future of Database Systems 2: XML Databases and Grid-based Digital Libraries University of California, Berkeley School.

Slides:



Advertisements
Similar presentations
International Grid Communities Dr. Carl Kesselman Information Sciences Institute University of Southern California.
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Peter Berrisford RAL – Data Management Group SRB Services.
The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
High Performance Computing Course Notes Grid Computing.
SLIDE 1IS 257 – Fall 2006 New Generation Database Systems: XML Databases and Grid-based Digital Libraries University of California, Berkeley.
Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
SLIDE 1IS 257 – Fall 2005 Future of Database Systems University of California, Berkeley School of Information Management and Systems SIMS.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
11/27/2001Database Management -- R. Larson Databases and the Future (Cont.) University of California, Berkeley School of Information Management and Systems.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
1.Training and education 2.Consulting 3.Travel 4.Hardware 5.Software Which of the following is not included in a firm’s IT infrastructure investments?
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Peer to Peer & Grid Computing Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
DISTRIBUTED COMPUTING
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 ARGONNE  CHICAGO Grid Introduction and Overview Ian Foster Argonne National Lab University of Chicago Globus Project
Authors: Ronnie Julio Cole David
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
7. Grid Computing Systems and Resource Management
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
Storage Management on the Grid Alasdair Earl University of Edinburgh.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Clouds , Grids and Clusters
Grid Computing.
Grid Computing B.Ramamurthy 9/22/2018 B.Ramamurthy.
CS258 Spring 2002 Mark Whitney and Yitao Duan
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Grid Services B.Ramamurthy 12/28/2018 B.Ramamurthy.
VORB Virtual Object Ring Buffers
Introduction to Grid Technology
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
Presentation transcript:

SLIDE 1IS 257 – Fall 2005 Future of Database Systems 2: XML Databases and Grid-based Digital Libraries University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management

SLIDE 2IS 257 – Fall 2005 Lecture Outline Review –Future of Database Systems XML and DBMS Grid-Based Digital Libraries –Data Grids –Grid-based IR DBMS and usability

SLIDE 3IS 257 – Fall 2005 Lecture Outline Review –Future of Database Systems XML and DBMS Grid-Based Digital Libraries –Data Grids –Grid-based IR DBMS and usability

SLIDE 4IS 257 – Fall 2005 Radio has no future, Heavier-than-air flying machines are impossible. X-rays will prove to be a hoax. –William Thompson (Lord Kelvin), 1899

SLIDE 5IS 257 – Fall 2005 This “Telephone” has too many shortcomings to be seriously considered as a means of communication. The device is inherently of no value to us. –Western Union, Internal Memo, 1876

SLIDE 6IS 257 – Fall 2005 I think there is a world market for maybe five computers –Thomas Watson, Chair of IBM, 1943

SLIDE 7IS 257 – Fall 2005 By the turn of this century, we will live in a paperless society. –Roger Smith, Chair of GM, 1986

SLIDE 8IS 257 – Fall 2005 I predict the internet… will go spectacularly supernova and in 1996 catastrophically collapse. –Bob Metcalfe (3-Com founder and inventor of ethernet), 1995

SLIDE 9IS 257 – Fall 2005 Accomplishments of DBMS Research DBMS are now used in almost every computing environment to create, organize and maintain large collections of information, and this is largely due to the results of the DBMS research community’s efforts, in particular: –Relational DBMS –Transaction management –Distributed DBMS

SLIDE 10IS 257 – Fall 2005 Next Generation Database Systems Where are we going from here? –Hardware is getting faster and cheaper –DBMS technology continues to improve and change OODBMS ORDBMS –Bigger challenges for DBMS technology Medicine, design, manufacturing, digital libraries, sciences, environment, planning, etc...

SLIDE 11IS 257 – Fall 2005 Examples NASA EOSDIS –Estimated Bytes (Exabyte) Computer-Aided design The Human Genome Department Store tracking –Mining non-transactional data (e.g. Scientific data, text data?) Insurance Company –Multimedia DBMS support

SLIDE 12IS 257 – Fall 2005 New Features New Data types Rule Processing New concepts and data models Problems of Scale Parallelism/Grid-based DB Tertiary Storage vs Very Large-Scale Disk Storage Heterogeneous Databases Memory Only DBMS

SLIDE 13IS 257 – Fall 2005 Coming to a Database Near You… Browsibility User-defined access methods Security Steering Long processes Federated Databases IR capabilities XML The Semantic Web(?)

SLIDE 14IS 257 – Fall 2005 Some things to consider Bandwidth will keep increasing and getting cheaper (and go wireless) Processing power will keep increasing –Moore’s law: Number of circuits on the most advanced semiconductors doubling every 18 months Memory and Storage will keep getting cheaper (and probably smaller) –“Storage law”: Worldwide digital data storage capacity has doubled every 9 months for the past decade Put it all together and what do you have? –“The ideal database machine would have a single infinitely fast processor with infinite memory with infinite bandwidth – and it would be infinitely cheap (free)” : David DeWitt and Jim Gray, 1992

SLIDE 15IS 257 – Fall 2005 Lecture Outline Review –Future of Database Systems XML and DBMS Grid-Based Digital Libraries –Data Grids –Grid-based IR DBMS and usability

SLIDE 16IS 257 – Fall 2005 Standards: XML/SQL As part of SQL3 an extension providing a mapping from XML to DBMS is being created called XML/SQL The (draft) standard is very complex, but the ideas are actually pretty simple Suppose we have a table called EMPLOYEE that has columns EMPNO, FIRSTNAME, LASTNAME, BIRTHDATE, SALARY

SLIDE 17IS 257 – Fall 2005 Standards: XML/SQL That table can be mapped to: John Smith … etc. …

SLIDE 18IS 257 – Fall 2005 Standards: XML/SQL In addition the standard says that XMLSchemas must be generated for each table, and also allows relations to be managed by nesting records from tables in the XML. Don’t know whether this has actually been implemented by anyone –There is actually something very similar in the Cheshire II interface to RDBMS

SLIDE 19IS 257 – Fall 2005 Lecture Outline Review –Future of Database Systems XML and DBMS Grid-Based Digital Libraries –Data Grids –Grid-based IR DBMS and usability

SLIDE 20IS 257 – Fall 2005 Grid-based Digital Libraries So what’s this Grid thing anyhow? Data Grids and Distributed Storage Grid-Based IR Grid-Based Digital Libraries This lecture borrows heavily from presentations by Ian Foster (Argonne National Laboratory & University of Chicago), Reagan Moore and others from San Diego Supercomputer Center

SLIDE 21IS 257 – Fall 2005 The Grid: On-Demand Access to Electricity Time Quality, economies of scale Source: Ian Foster

SLIDE 22IS 257 – Fall 2005 By Analogy, A Computing Grid Decouples production and consumption –Enable on-demand access –Achieve economies of scale –Enhance consumer flexibility –Enable new devices On a variety of scales –Department –Campus –Enterprise –Internet Source: Ian Foster

SLIDE 23IS 257 – Fall 2005 Not Exactly a New Idea … “The time-sharing computer system can unite a group of investigators …. one can conceive of such a facility as an … intellectual public utility.” –Fernando Corbato and Robert Fano, 1966 “We will perhaps see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service individual homes and offices across the country.” Len Kleinrock, 1967 Source: Ian Foster

SLIDE 24IS 257 – Fall 2005 But, Things are Different Now Networks are far faster (and cheaper) –Faster than computer backplanes “Computing” is very different than pre-Net –Our “computers” have already disintegrated –E-commerce increases size of demand peaks –Entirely new applications & social structures We’ve learned a few things about software Source: Ian Foster

SLIDE 25IS 257 – Fall 2005 Computing isn’t Really Like Electricity I import electricity but must export data “Computing” is not interchangeable but highly heterogeneous: data, sensors, services, … This complicates things; but also means that the sum can be greater than the parts –Real opportunity: Construct new capabilities dynamically from distributed services Raises three fundamental questions –Can I really achieve economies of scale? –Can I achieve QoS across distributed services? –Can I identify apps that exploit synergies? Source: Ian Foster

SLIDE 26IS 257 – Fall 2005 Why the Grid? (1) Revolution in Science Pre-Internet –Theorize &/or experiment, alone or in small teams; publish paper Post-Internet –Construct and mine large databases of observational or simulation data –Develop simulations & analyses –Access specialized devices remotely –Exchange information within distributed multidisciplinary teams Source: Ian Foster

SLIDE 27IS 257 – Fall 2005 Why the Grid? (2) Revolution in Business Pre-Internet –Central data processing facility Post-Internet –Enterprise computing is highly distributed, heterogeneous, inter-enterprise (B2B) –Business processes increasingly computing- & data-rich –Outsourcing becomes feasible => service providers of various sorts Source: Ian Foster

SLIDE 28IS 257 – Fall 2005 New Opportunities Demand New Technology “ Resource sharing & coordinated problem solving in dynamic, multi- institutional virtual organizations” Source: Ian Foster

SLIDE 29IS 257 – Fall 2005 Building an Open Grid

SLIDE 30IS 257 – Fall 2005 Building an Open Grid Open Standards

SLIDE 31IS 257 – Fall 2005 Building an Open Grid Open Standards Open Source

SLIDE 32IS 257 – Fall 2005 Building an Open Grid Open Standards Open Source Open Infrastructure

SLIDE 33IS 257 – Fall 2005 Building an Open Grid Open Standards Open Source Open Infrastructure Open Grid

SLIDE 34IS 257 – Fall 2005 Building an Open Grid Open Standards Open Source Open Infrastructure Open Grid

SLIDE 35IS 257 – Fall 2005 Grids and Open Standards Increased functionality, standardization Time Custom solutions Open Grid Services Arch GGF: OGSI, … (+ OASIS, W3C) Multiple implementations, including Globus Toolkit Web services Globus Toolkit Defacto standards GGF: GridFTP, GSI X.509, LDAP, FTP, … App-specific Services

SLIDE 36IS 257 – Fall 2005 Open Grid Services Architecture Service-oriented architecture –Key to virtualization, discovery, composition, local-remote transparency Leverage industry standards –Internet, Web services Distributed service management –A “component model for Web services” A framework for the definition of composable, interoperable services “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”, Foster, Kesselman, Nick, Tuecke, 2002

SLIDE 37IS 257 – Fall 2005 Realizing a Service-Oriented Architecture: How Do I Create, name, manage, discover services? Render resources, data, sensors as services? Negotiate service level agreements? Express & negotiate policy? Organize & manage service collections? Establish identity, negotiate authentication? Manage VO membership & communication? Compose services efficiently? Achieve interoperability?

SLIDE 38IS 257 – Fall 2005 Web Services XML-based distributed computing technology Web service = a server process that exposes typed ports to the network Described by the Web Services Definition Language, an XML document that contains –Type of message(s) the service understands & types of responses & exceptions it returns –“Methods” bound together as “port types” –Port types bound to protocols as “ports” A WSDL document completely defines a service and how to access it

SLIDE 39IS 257 – Fall 2005 Open Grid Services Infrastructure Implementation Service data element Other standard interfaces: factory, notification, collections Hosting environment/runtime (“C”, J2EE,.NET, …) Service data element Service data element GridService (required) Data access Lifetime management Explicit destruction Soft-state lifetime Introspection: What port types? What policy? What state? Client Grid Service Handle Grid Service Reference handle resolution

SLIDE 40IS 257 – Fall 2005 The Grid as Enabler of 21st Century Science Entirely new approaches to enquiry based on –Deep analysis of huge quantities of data –Interdisciplinary collaboration –Large-scale simulation –Smart instrumentation Enabled by an infrastructure that enables access to, and integration of, resources & services without regard for location

SLIDE 41IS 257 – Fall 2005 Grid Infrastructure Broadly deployed services in support of fundamental collaborative activities –Formation & operation of virtual organizations –Authentication, authorization, discovery, … Services, software, and policies enabling on- demand access to critical resources –Computers, databases, networks, storage, software services,… Operational support for 24x7 availability Integration with campus and commercial infrastructures

SLIDE 42IS 257 – Fall 2005 Tier0/1 facility Tier2 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link Tier3 facility The Foundations are Being Laid Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Soton London Belfast DL RAL Hinxton

SLIDE 43IS 257 – Fall 2005 Data Grid Problem “Enable a geographically distributed community [of thousands] to pool their resources in order to perform sophisticated, computationally intensive analyses on Petabytes of data” Note that this problem: –Is common to many areas of science –Overlaps strongly with other Grid problems

SLIDE 44IS 257 – Fall 2005 Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents Image courtesy Harvey Newman, Caltech

SLIDE 45IS 257 – Fall 2005 Data Intensive Issues Include … Harness [potentially large numbers of] data, storage, network resources located in distinct administrative domains Respect local and global policies governing what can be used for what Schedule resources efficiently, again subject to local and global constraints Achieve high performance, with respect to both speed and reliability Catalog software and virtual data

SLIDE 46IS 257 – Fall 2005 Data Intensive Computing and Grids The term “Data Grid” is often used –Implies a distinct infrastructure, which it isn’t; but easy to say Data-intensive computing shares numerous requirements with collaboration, instrumentation, computation, … –Security, resource mgt, info services, etc. Important to exploit commonalities as very unlikely that multiple infrastructures can be maintained Fortunately this seems easy to do!

SLIDE 47IS 257 – Fall 2005 Examples of Desired Data Grid Functionality High-speed, reliable access to remote data Automated discovery of “best” copy of data Manage replication to improve performance Co-schedule compute, storage, network “Transparency” wrt delivered performance Enforce access control on data Allow representation of “global” resource allocation policies

SLIDE 48IS 257 – Fall 2005 A Model Architecture for Data Grids Metadata Catalog Replica Catalog Tape Library Disk Cache Attribute Specification Logical Collection and Logical File Name Disk ArrayDisk Cache Application Replica Selection Multiple Locations NWS Selected Replica GridFTP Control Channel Performance Information & Predictions Replica Location 1Replica Location 2Replica Location 3 MDS GridFTP Data Channel Source: Arcot Rajasekar (SDSC)

SLIDE 49IS 257 – Fall 2005 Data Grid Requirements Seamless access to data and information stored at local and remote sites Virtualization of data, collection and meta information Handle Dataset Scaling – size & number Integrate Data Collections & Associated Metadata Handle Multiplicity of Platforms, Resource & Data Types Handle Seamless Authentication Handle Access Control Provide Auditing Facilities Handle Legacy Data & Methods Source: Arcot Rajasekar (SDSC)

SLIDE 50IS 257 – Fall 2005 SRB as a Solution Application SRB Server Distributed Storage Resources (database systems, archival storage systems, file systems, ftp, http, …) MCAT HRM DB2, Oracle, Illustra, ObjectStoreHPSS, ADSM, UniTreeUNIX, NTFS, HTTP, FTP The Storage Resource Broker is a middleware It virtualizes resource access It mediates access to distributed heterogeneous resources It uses a MetaCATalog to facilitate the brokering It integrates data and metadata Source: Arcot Rajasekar (SDSC)

SLIDE 51IS 257 – Fall 2005 SRB Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Sybase File Systems Unix, NT, Mac OSX Application C, C++, Linux I/O Unix Shell Dublin Core Resource, User Defined Application Meta-data Remote Proxies DataCutter Third-party copy Java, NT Browsers Web Prolog Python MCAT HRM Source: Arcot Rajasekar (SDSC) SDSC Storage Resource Broker & Meta-data Catalog

SLIDE 52IS 257 – Fall 2005 SRB Master SRB agents Application MCAT (port) 1 24 Authentication Secure Password, GSI or SEA Server spawned 3 Identification & Initialization Session Established (Host,port) CA 3 Source: Arcot Rajasekar (SDSC) SRB Single SignOn

SLIDE 53IS 257 – Fall 2005 SRB server SRB agent SRB server Federated SRB Operation MCAT Read Application SRB agent Logical Name Or Attribute Condition 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Peer-to-peer Brokering Server(s) Spawning Data Access Parallel Data Access R1 R2 5/6 Source: Arcot Rajasekar (SDSC)

SLIDE 54IS 257 – Fall 2005 SRB Concepts Abstraction of User Space –Single sign-on –Multiple authentication schemes certificates, (secure) passwords, tickets, group permissions, roles Virtualization of Resources –Resource Location, Type & Access transparency –Logical Resource Definitions - bundling Abstraction of Data and Collections –Virtual Collections: Persistent Identifier and Global Name Space –Replication & Segmentation Data Discovery – system & application metadata –User-defined Metadata – Structural & Descriptive –Attribute-based Access (path names become irrelevant) Uniform Access Methods –APIs, Command Line, GUI Browsers, Web-Access (Portal,WSDL, CGI) –Parallel Access with both Client and Server-driven strategies Source: Arcot Rajasekar (SDSC)

SLIDE 55IS 257 – Fall 2005 OceanStore: Everyone’s data, One big Utility “The data is just out there” Separate information from location –Locality is an only an optimization (an important one!) –Wide-scale coding and replication for durability All information is globally identified –Unique identifiers are hashes over names & keys –Single uniform lookup interface replaces: DNS, server location, data location –No centralized namespace required (such as SDSI) OStore Source: John Kubiatowicz (UCB)

SLIDE 56IS 257 – Fall 2005 Basic Structure: Irregular Mesh of “Pools” OStore Source: John Kubiatowicz (UCB)

SLIDE 57IS 257 – Fall 2005 Amusing back of the envelope calculation How many files in the OceanStore? –Assume people in world –Say 10,000 files/person (very conservative?) –So files in OceanStore! –If 1 gig files (not likely), get 1 mole of files! Truly impressive number of elements… … but small relative to physical constants –(courtesy Bill Bolotsky, Microsoft) OStore Source: John Kubiatowicz (UCB)

SLIDE 58IS 257 – Fall 2005 Utility-based Infrastructure Service provided by confederation of companies –Monthly fee paid to one service provider –Companies buy and sell capacity from each other Pac Bell Sprint IBM AT&T Canadian OceanStore IBM OStore Source: John Kubiatowicz (UCB)

SLIDE 59IS 257 – Fall 2005 Lecture Outline Review –Future of Database Systems Grid-Based Digital Libraries –Data Grids –Grid-based IR DBMS and usability

SLIDE 60IS 257 – Fall 2005 DBMS and Usability What features would you like to see in DBMS?

SLIDE 61IS 257 – Fall 2005 DBMS and Usability What do you hate about Database Management Systems? –From your experiences –In general What do you like about Database Management Systems? –From your experience –In general

SLIDE 62IS 257 – Fall 2005 Next Week Workshops to help you develop the final reports and presentations.