Presentation is loading. Please wait.

Presentation is loading. Please wait.

2006.11.28- SLIDE 1IS 257 – Fall 2006 New Generation Database Systems: XML Databases and Grid-based Digital Libraries University of California, Berkeley.

Similar presentations


Presentation on theme: "2006.11.28- SLIDE 1IS 257 – Fall 2006 New Generation Database Systems: XML Databases and Grid-based Digital Libraries University of California, Berkeley."— Presentation transcript:

1 2006.11.28- SLIDE 1IS 257 – Fall 2006 New Generation Database Systems: XML Databases and Grid-based Digital Libraries University of California, Berkeley School of Information IS 257: Database Management

2 2006.11.28- SLIDE 2IS 257 – Fall 2006 Lecture Outline XML and DBMS The Grid and DBMS –The Grid –Data Grids –Grid-based DBMS

3 2006.11.28- SLIDE 3IS 257 – Fall 2006 Lecture Outline XML and DBMS The Grid and DBMS –The Grid –Data Grids –Grid-based DBMS

4 2006.11.28- SLIDE 4IS 257 – Fall 2006 Standards: XML/SQL As part of SQL3 an extension providing a mapping from XML to DBMS is being created called XML/SQL The (draft) standard is very complex, but the ideas are actually pretty simple Suppose we have a table called EMPLOYEE that has columns EMPNO, FIRSTNAME, LASTNAME, BIRTHDATE, SALARY

5 2006.11.28- SLIDE 5IS 257 – Fall 2006 Standards: XML/SQL That table can be mapped to: 000020 John Smith 1955-08-21 52300.00 … etc. …

6 2006.11.28- SLIDE 6IS 257 – Fall 2006 Standards: XML/SQL In addition the standard says that XMLSchemas must be generated for each table, and also allows relations to be managed by nesting records from tables in the XML. Variants of this are incorporated into the latest versions of ORACLE (Slides from Oracle Web Site on ORACLE XML)

7 2006.11.28- SLIDE 7IS 257 – Fall 2006 Lecture Outline XML and DBMS The Grid and DBMS –The Grid –Data Grids –Grid-based DBMS

8 2006.11.28- SLIDE 8IS 257 – Fall 2006 Grid-based Digital Libraries So what’s this Grid thing anyhow? Data Grids and Distributed Storage Grid-Based IR Grid-Based Digital Libraries This lecture borrows heavily from presentations by Ian Foster (Argonne National Laboratory & University of Chicago), Reagan Moore and others from San Diego Supercomputer Center

9 2006.11.28- SLIDE 9IS 257 – Fall 2006 The Grid: On-Demand Access to Electricity Time Quality, economies of scale Source: Ian Foster

10 2006.11.28- SLIDE 10IS 257 – Fall 2006 By Analogy, A Computing Grid Decouples production and consumption –Enable on-demand access –Achieve economies of scale –Enhance consumer flexibility –Enable new devices On a variety of scales –Department –Campus –Enterprise –Internet Source: Ian Foster

11 2006.11.28- SLIDE 11IS 257 – Fall 2006 What is the Grid? “The short answer is that, whereas the Web is a service for sharing information over the Internet, the Grid is a service for sharing computer power and data storage capacity over the Internet. The Grid goes well beyond simple communication between computers, and aims ultimately to turn the global network of computers into one vast computational resource.” Source: The Global Grid Forum

12 2006.11.28- SLIDE 12IS 257 – Fall 2006 Not Exactly a New Idea … “The time-sharing computer system can unite a group of investigators …. one can conceive of such a facility as an … intellectual public utility.” –Fernando Corbato and Robert Fano, 1966 “We will perhaps see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service individual homes and offices across the country.” Len Kleinrock, 1967 Source: Ian Foster

13 2006.11.28- SLIDE 13IS 257 – Fall 2006 But, Things are Different Now Networks are far faster (and cheaper) –Faster than computer backplanes “Computing” is very different than pre-Net –Our “computers” have already disintegrated –E-commerce increases size of demand peaks –Entirely new applications & social structures We’ve learned a few things about software Source: Ian Foster

14 2006.11.28- SLIDE 14IS 257 – Fall 2006 Computing isn’t Really Like Electricity I import electricity but must export data “Computing” is not interchangeable but highly heterogeneous: data, sensors, services, … This complicates things; but also means that the sum can be greater than the parts –Real opportunity: Construct new capabilities dynamically from distributed services Raises three fundamental questions –Can I really achieve economies of scale? –Can I achieve QoS across distributed services? –Can I identify apps that exploit synergies? Source: Ian Foster

15 2006.11.28- SLIDE 15IS 257 – Fall 2006 Why the Grid? (1) Revolution in Science Pre-Internet –Theorize &/or experiment, alone or in small teams; publish paper Post-Internet –Construct and mine large databases of observational or simulation data –Develop simulations & analyses –Access specialized devices remotely –Exchange information within distributed multidisciplinary teams Source: Ian Foster

16 2006.11.28- SLIDE 16IS 257 – Fall 2006 Why the Grid? (2) Revolution in Business Pre-Internet –Central data processing facility Post-Internet –Enterprise computing is highly distributed, heterogeneous, inter-enterprise (B2B) –Business processes increasingly computing- & data-rich –Outsourcing becomes feasible => service providers of various sorts Source: Ian Foster

17 2006.11.28- SLIDE 17IS 257 – Fall 2006 The Information Grid Imagine a web of data Machine Readable –Search, Aggregate, Transform, Report On, Mine Data – using more computers, and less humans Scalable –Machines are cheap – can buy 50 machines with 100Gb or memory and 100 TB disk for under $100K, and dropping –Network is now faster than disk Flexible –Move data around without breaking the apps Source: S. Banerjee, O. Alonso, M. Drake - ORACLE

18 2006.11.28- SLIDE 18IS 257 – Fall 2006 Tier0/1 facility Tier2 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link Tier3 facility The Foundations are Being Laid Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Soton London Belfast DL RAL Hinxton

19 2006.11.28- SLIDE 19IS 257 – Fall 2006 Data Grid Problem “Enable a geographically distributed community [of thousands] to pool their resources in order to perform sophisticated, computationally intensive analyses on Petabytes of data” Note that this problem: –Is common to many areas of science –Overlaps strongly with other Grid problems

20 2006.11.28- SLIDE 20IS 257 – Fall 2006 Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents Image courtesy Harvey Newman, Caltech

21 2006.11.28- SLIDE 21IS 257 – Fall 2006 Grids and Open Standards Increased functionality, standardization Time Custom solutions Open Grid Services Arch GGF: OGSI, … (+ OASIS, W3C) Multiple implementations, including Globus Toolkit Web services Globus Toolkit Defacto standards GGF: GridFTP, GSI X.509, LDAP, FTP, … App-specific Services

22 2006.11.28- SLIDE 22IS 257 – Fall 2006 The Grid as Enabler of 21st Century Science Entirely new approaches to enquiry based on –Deep analysis of huge quantities of data –Interdisciplinary collaboration –Large-scale simulation –Smart instrumentation Enabled by an infrastructure that enables access to, and integration of, resources & services without regard for location

23 2006.11.28- SLIDE 23IS 257 – Fall 2006 Not only Science… The Database world is moving to the Grid for large-scale applications Oracle 10g is specifically designed to exploit clustered/grid computing using RACs (Real Application Clusters) An example from the Information/Publishing world… –Presentation from Oracle about Thomson Legal’s use of Oracle 10g and RACs


Download ppt "2006.11.28- SLIDE 1IS 257 – Fall 2006 New Generation Database Systems: XML Databases and Grid-based Digital Libraries University of California, Berkeley."

Similar presentations


Ads by Google