Download presentation
Presentation is loading. Please wait.
1
2005.11.22- SLIDE 1IS 257 – Fall 2005 Future of Database Systems 2: XML Databases and Grid-based Digital Libraries University of California, Berkeley School of Information Management and Systems SIMS 257: Database Management
2
2005.11.22- SLIDE 2IS 257 – Fall 2005 Lecture Outline Review –Future of Database Systems XML and DBMS Grid-Based Digital Libraries –Data Grids –Grid-based IR DBMS and usability
3
2005.11.22- SLIDE 3IS 257 – Fall 2005 Lecture Outline Review –Future of Database Systems XML and DBMS Grid-Based Digital Libraries –Data Grids –Grid-based IR DBMS and usability
4
2005.11.22- SLIDE 4IS 257 – Fall 2005 Radio has no future, Heavier-than-air flying machines are impossible. X-rays will prove to be a hoax. –William Thompson (Lord Kelvin), 1899
5
2005.11.22- SLIDE 5IS 257 – Fall 2005 This “Telephone” has too many shortcomings to be seriously considered as a means of communication. The device is inherently of no value to us. –Western Union, Internal Memo, 1876
6
2005.11.22- SLIDE 6IS 257 – Fall 2005 I think there is a world market for maybe five computers –Thomas Watson, Chair of IBM, 1943
7
2005.11.22- SLIDE 7IS 257 – Fall 2005 By the turn of this century, we will live in a paperless society. –Roger Smith, Chair of GM, 1986
8
2005.11.22- SLIDE 8IS 257 – Fall 2005 I predict the internet… will go spectacularly supernova and in 1996 catastrophically collapse. –Bob Metcalfe (3-Com founder and inventor of ethernet), 1995
9
2005.11.22- SLIDE 9IS 257 – Fall 2005 Accomplishments of DBMS Research DBMS are now used in almost every computing environment to create, organize and maintain large collections of information, and this is largely due to the results of the DBMS research community’s efforts, in particular: –Relational DBMS –Transaction management –Distributed DBMS
10
2005.11.22- SLIDE 10IS 257 – Fall 2005 Next Generation Database Systems Where are we going from here? –Hardware is getting faster and cheaper –DBMS technology continues to improve and change OODBMS ORDBMS –Bigger challenges for DBMS technology Medicine, design, manufacturing, digital libraries, sciences, environment, planning, etc...
11
2005.11.22- SLIDE 11IS 257 – Fall 2005 Examples NASA EOSDIS –Estimated 10 16 Bytes (Exabyte) Computer-Aided design The Human Genome Department Store tracking –Mining non-transactional data (e.g. Scientific data, text data?) Insurance Company –Multimedia DBMS support
12
2005.11.22- SLIDE 12IS 257 – Fall 2005 New Features New Data types Rule Processing New concepts and data models Problems of Scale Parallelism/Grid-based DB Tertiary Storage vs Very Large-Scale Disk Storage Heterogeneous Databases Memory Only DBMS
13
2005.11.22- SLIDE 13IS 257 – Fall 2005 Coming to a Database Near You… Browsibility User-defined access methods Security Steering Long processes Federated Databases IR capabilities XML The Semantic Web(?)
14
2005.11.22- SLIDE 14IS 257 – Fall 2005 Some things to consider Bandwidth will keep increasing and getting cheaper (and go wireless) Processing power will keep increasing –Moore’s law: Number of circuits on the most advanced semiconductors doubling every 18 months Memory and Storage will keep getting cheaper (and probably smaller) –“Storage law”: Worldwide digital data storage capacity has doubled every 9 months for the past decade Put it all together and what do you have? –“The ideal database machine would have a single infinitely fast processor with infinite memory with infinite bandwidth – and it would be infinitely cheap (free)” : David DeWitt and Jim Gray, 1992
15
2005.11.22- SLIDE 15IS 257 – Fall 2005 Lecture Outline Review –Future of Database Systems XML and DBMS Grid-Based Digital Libraries –Data Grids –Grid-based IR DBMS and usability
16
2005.11.22- SLIDE 16IS 257 – Fall 2005 Standards: XML/SQL As part of SQL3 an extension providing a mapping from XML to DBMS is being created called XML/SQL The (draft) standard is very complex, but the ideas are actually pretty simple Suppose we have a table called EMPLOYEE that has columns EMPNO, FIRSTNAME, LASTNAME, BIRTHDATE, SALARY
17
2005.11.22- SLIDE 17IS 257 – Fall 2005 Standards: XML/SQL That table can be mapped to: 000020 John Smith 1955-08-21 52300.00 … etc. …
18
2005.11.22- SLIDE 18IS 257 – Fall 2005 Standards: XML/SQL In addition the standard says that XMLSchemas must be generated for each table, and also allows relations to be managed by nesting records from tables in the XML. Don’t know whether this has actually been implemented by anyone –There is actually something very similar in the Cheshire II interface to RDBMS
19
2005.11.22- SLIDE 19IS 257 – Fall 2005 Lecture Outline Review –Future of Database Systems XML and DBMS Grid-Based Digital Libraries –Data Grids –Grid-based IR DBMS and usability
20
2005.11.22- SLIDE 20IS 257 – Fall 2005 Grid-based Digital Libraries So what’s this Grid thing anyhow? Data Grids and Distributed Storage Grid-Based IR Grid-Based Digital Libraries This lecture borrows heavily from presentations by Ian Foster (Argonne National Laboratory & University of Chicago), Reagan Moore and others from San Diego Supercomputer Center
21
2005.11.22- SLIDE 21IS 257 – Fall 2005 The Grid: On-Demand Access to Electricity Time Quality, economies of scale Source: Ian Foster
22
2005.11.22- SLIDE 22IS 257 – Fall 2005 By Analogy, A Computing Grid Decouples production and consumption –Enable on-demand access –Achieve economies of scale –Enhance consumer flexibility –Enable new devices On a variety of scales –Department –Campus –Enterprise –Internet Source: Ian Foster
23
2005.11.22- SLIDE 23IS 257 – Fall 2005 Not Exactly a New Idea … “The time-sharing computer system can unite a group of investigators …. one can conceive of such a facility as an … intellectual public utility.” –Fernando Corbato and Robert Fano, 1966 “We will perhaps see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service individual homes and offices across the country.” Len Kleinrock, 1967 Source: Ian Foster
24
2005.11.22- SLIDE 24IS 257 – Fall 2005 But, Things are Different Now Networks are far faster (and cheaper) –Faster than computer backplanes “Computing” is very different than pre-Net –Our “computers” have already disintegrated –E-commerce increases size of demand peaks –Entirely new applications & social structures We’ve learned a few things about software Source: Ian Foster
25
2005.11.22- SLIDE 25IS 257 – Fall 2005 Computing isn’t Really Like Electricity I import electricity but must export data “Computing” is not interchangeable but highly heterogeneous: data, sensors, services, … This complicates things; but also means that the sum can be greater than the parts –Real opportunity: Construct new capabilities dynamically from distributed services Raises three fundamental questions –Can I really achieve economies of scale? –Can I achieve QoS across distributed services? –Can I identify apps that exploit synergies? Source: Ian Foster
26
2005.11.22- SLIDE 26IS 257 – Fall 2005 Why the Grid? (1) Revolution in Science Pre-Internet –Theorize &/or experiment, alone or in small teams; publish paper Post-Internet –Construct and mine large databases of observational or simulation data –Develop simulations & analyses –Access specialized devices remotely –Exchange information within distributed multidisciplinary teams Source: Ian Foster
27
2005.11.22- SLIDE 27IS 257 – Fall 2005 Why the Grid? (2) Revolution in Business Pre-Internet –Central data processing facility Post-Internet –Enterprise computing is highly distributed, heterogeneous, inter-enterprise (B2B) –Business processes increasingly computing- & data-rich –Outsourcing becomes feasible => service providers of various sorts Source: Ian Foster
28
2005.11.22- SLIDE 28IS 257 – Fall 2005 New Opportunities Demand New Technology “ Resource sharing & coordinated problem solving in dynamic, multi- institutional virtual organizations” Source: Ian Foster
29
2005.11.22- SLIDE 29IS 257 – Fall 2005 Building an Open Grid
30
2005.11.22- SLIDE 30IS 257 – Fall 2005 Building an Open Grid Open Standards
31
2005.11.22- SLIDE 31IS 257 – Fall 2005 Building an Open Grid Open Standards Open Source
32
2005.11.22- SLIDE 32IS 257 – Fall 2005 Building an Open Grid Open Standards Open Source Open Infrastructure
33
2005.11.22- SLIDE 33IS 257 – Fall 2005 Building an Open Grid Open Standards Open Source Open Infrastructure Open Grid
34
2005.11.22- SLIDE 34IS 257 – Fall 2005 Building an Open Grid Open Standards Open Source Open Infrastructure Open Grid
35
2005.11.22- SLIDE 35IS 257 – Fall 2005 Grids and Open Standards Increased functionality, standardization Time Custom solutions Open Grid Services Arch GGF: OGSI, … (+ OASIS, W3C) Multiple implementations, including Globus Toolkit Web services Globus Toolkit Defacto standards GGF: GridFTP, GSI X.509, LDAP, FTP, … App-specific Services
36
2005.11.22- SLIDE 36IS 257 – Fall 2005 Open Grid Services Architecture Service-oriented architecture –Key to virtualization, discovery, composition, local-remote transparency Leverage industry standards –Internet, Web services Distributed service management –A “component model for Web services” A framework for the definition of composable, interoperable services “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”, Foster, Kesselman, Nick, Tuecke, 2002
37
2005.11.22- SLIDE 37IS 257 – Fall 2005 Realizing a Service-Oriented Architecture: How Do I Create, name, manage, discover services? Render resources, data, sensors as services? Negotiate service level agreements? Express & negotiate policy? Organize & manage service collections? Establish identity, negotiate authentication? Manage VO membership & communication? Compose services efficiently? Achieve interoperability?
38
2005.11.22- SLIDE 38IS 257 – Fall 2005 Web Services XML-based distributed computing technology Web service = a server process that exposes typed ports to the network Described by the Web Services Definition Language, an XML document that contains –Type of message(s) the service understands & types of responses & exceptions it returns –“Methods” bound together as “port types” –Port types bound to protocols as “ports” A WSDL document completely defines a service and how to access it
39
2005.11.22- SLIDE 39IS 257 – Fall 2005 Open Grid Services Infrastructure Implementation Service data element Other standard interfaces: factory, notification, collections Hosting environment/runtime (“C”, J2EE,.NET, …) Service data element Service data element GridService (required) Data access Lifetime management Explicit destruction Soft-state lifetime Introspection: What port types? What policy? What state? Client Grid Service Handle Grid Service Reference handle resolution
40
2005.11.22- SLIDE 40IS 257 – Fall 2005 The Grid as Enabler of 21st Century Science Entirely new approaches to enquiry based on –Deep analysis of huge quantities of data –Interdisciplinary collaboration –Large-scale simulation –Smart instrumentation Enabled by an infrastructure that enables access to, and integration of, resources & services without regard for location
41
2005.11.22- SLIDE 41IS 257 – Fall 2005 Grid Infrastructure Broadly deployed services in support of fundamental collaborative activities –Formation & operation of virtual organizations –Authentication, authorization, discovery, … Services, software, and policies enabling on- demand access to critical resources –Computers, databases, networks, storage, software services,… Operational support for 24x7 availability Integration with campus and commercial infrastructures
42
2005.11.22- SLIDE 42IS 257 – Fall 2005 Tier0/1 facility Tier2 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link Tier3 facility The Foundations are Being Laid Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Soton London Belfast DL RAL Hinxton
43
2005.11.22- SLIDE 43IS 257 – Fall 2005 Data Grid Problem “Enable a geographically distributed community [of thousands] to pool their resources in order to perform sophisticated, computationally intensive analyses on Petabytes of data” Note that this problem: –Is common to many areas of science –Overlaps strongly with other Grid problems
44
2005.11.22- SLIDE 44IS 257 – Fall 2005 Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents Image courtesy Harvey Newman, Caltech
45
2005.11.22- SLIDE 45IS 257 – Fall 2005 Data Intensive Issues Include … Harness [potentially large numbers of] data, storage, network resources located in distinct administrative domains Respect local and global policies governing what can be used for what Schedule resources efficiently, again subject to local and global constraints Achieve high performance, with respect to both speed and reliability Catalog software and virtual data
46
2005.11.22- SLIDE 46IS 257 – Fall 2005 Data Intensive Computing and Grids The term “Data Grid” is often used –Implies a distinct infrastructure, which it isn’t; but easy to say Data-intensive computing shares numerous requirements with collaboration, instrumentation, computation, … –Security, resource mgt, info services, etc. Important to exploit commonalities as very unlikely that multiple infrastructures can be maintained Fortunately this seems easy to do!
47
2005.11.22- SLIDE 47IS 257 – Fall 2005 Examples of Desired Data Grid Functionality High-speed, reliable access to remote data Automated discovery of “best” copy of data Manage replication to improve performance Co-schedule compute, storage, network “Transparency” wrt delivered performance Enforce access control on data Allow representation of “global” resource allocation policies
48
2005.11.22- SLIDE 48IS 257 – Fall 2005 A Model Architecture for Data Grids Metadata Catalog Replica Catalog Tape Library Disk Cache Attribute Specification Logical Collection and Logical File Name Disk ArrayDisk Cache Application Replica Selection Multiple Locations NWS Selected Replica GridFTP Control Channel Performance Information & Predictions Replica Location 1Replica Location 2Replica Location 3 MDS GridFTP Data Channel Source: Arcot Rajasekar (SDSC)
49
2005.11.22- SLIDE 49IS 257 – Fall 2005 Data Grid Requirements Seamless access to data and information stored at local and remote sites Virtualization of data, collection and meta information Handle Dataset Scaling – size & number Integrate Data Collections & Associated Metadata Handle Multiplicity of Platforms, Resource & Data Types Handle Seamless Authentication Handle Access Control Provide Auditing Facilities Handle Legacy Data & Methods Source: Arcot Rajasekar (SDSC)
50
2005.11.22- SLIDE 50IS 257 – Fall 2005 SRB as a Solution Application SRB Server Distributed Storage Resources (database systems, archival storage systems, file systems, ftp, http, …) MCAT HRM DB2, Oracle, Illustra, ObjectStoreHPSS, ADSM, UniTreeUNIX, NTFS, HTTP, FTP The Storage Resource Broker is a middleware It virtualizes resource access It mediates access to distributed heterogeneous resources It uses a MetaCATalog to facilitate the brokering It integrates data and metadata Source: Arcot Rajasekar (SDSC)
51
2005.11.22- SLIDE 51IS 257 – Fall 2005 SRB Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Sybase File Systems Unix, NT, Mac OSX Application C, C++, Linux I/O Unix Shell Dublin Core Resource, User Defined Application Meta-data Remote Proxies DataCutter Third-party copy Java, NT Browsers Web Prolog Python MCAT HRM Source: Arcot Rajasekar (SDSC) SDSC Storage Resource Broker & Meta-data Catalog
52
2005.11.22- SLIDE 52IS 257 – Fall 2005 SRB Master SRB agents Application MCAT (port) 1 24 Authentication Secure Password, GSI or SEA Server spawned 3 Identification & Initialization Session Established (Host,port) CA 3 Source: Arcot Rajasekar (SDSC) SRB Single SignOn
53
2005.11.22- SLIDE 53IS 257 – Fall 2005 SRB server SRB agent SRB server Federated SRB Operation MCAT Read Application SRB agent 1 2 3 4 6 5 Logical Name Or Attribute Condition 1.Logical-to-Physical mapping 2. Identification of Replicas 3.Access & Audit Control Peer-to-peer Brokering Server(s) Spawning Data Access Parallel Data Access R1 R2 5/6 Source: Arcot Rajasekar (SDSC)
54
2005.11.22- SLIDE 54IS 257 – Fall 2005 SRB Concepts Abstraction of User Space –Single sign-on –Multiple authentication schemes certificates, (secure) passwords, tickets, group permissions, roles Virtualization of Resources –Resource Location, Type & Access transparency –Logical Resource Definitions - bundling Abstraction of Data and Collections –Virtual Collections: Persistent Identifier and Global Name Space –Replication & Segmentation Data Discovery – system & application metadata –User-defined Metadata – Structural & Descriptive –Attribute-based Access (path names become irrelevant) Uniform Access Methods –APIs, Command Line, GUI Browsers, Web-Access (Portal,WSDL, CGI) –Parallel Access with both Client and Server-driven strategies Source: Arcot Rajasekar (SDSC)
55
2005.11.22- SLIDE 55IS 257 – Fall 2005 OceanStore: Everyone’s data, One big Utility “The data is just out there” Separate information from location –Locality is an only an optimization (an important one!) –Wide-scale coding and replication for durability All information is globally identified –Unique identifiers are hashes over names & keys –Single uniform lookup interface replaces: DNS, server location, data location –No centralized namespace required (such as SDSI) OStore Source: John Kubiatowicz (UCB)
56
2005.11.22- SLIDE 56IS 257 – Fall 2005 Basic Structure: Irregular Mesh of “Pools” OStore Source: John Kubiatowicz (UCB)
57
2005.11.22- SLIDE 57IS 257 – Fall 2005 Amusing back of the envelope calculation How many files in the OceanStore? –Assume 10 10 people in world –Say 10,000 files/person (very conservative?) –So 10 14 files in OceanStore! –If 1 gig files (not likely), get 1 mole of files! Truly impressive number of elements… … but small relative to physical constants –(courtesy Bill Bolotsky, Microsoft) OStore Source: John Kubiatowicz (UCB)
58
2005.11.22- SLIDE 58IS 257 – Fall 2005 Utility-based Infrastructure Service provided by confederation of companies –Monthly fee paid to one service provider –Companies buy and sell capacity from each other Pac Bell Sprint IBM AT&T Canadian OceanStore IBM OStore Source: John Kubiatowicz (UCB)
59
2005.11.22- SLIDE 59IS 257 – Fall 2005 Lecture Outline Review –Future of Database Systems Grid-Based Digital Libraries –Data Grids –Grid-based IR DBMS and usability
60
2005.11.22- SLIDE 60IS 257 – Fall 2005 DBMS and Usability What features would you like to see in DBMS?
61
2005.11.22- SLIDE 61IS 257 – Fall 2005 DBMS and Usability What do you hate about Database Management Systems? –From your experiences –In general What do you like about Database Management Systems? –From your experience –In general
62
2005.11.22- SLIDE 62IS 257 – Fall 2005 Next Week Workshops to help you develop the final reports and presentations.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.