2006.11.28- SLIDE 1IS 257 – Fall 2006 New Generation Database Systems: XML Databases and Grid-based Digital Libraries University of California, Berkeley.

Slides:



Advertisements
Similar presentations
L ondon e-S cience C entre Application Scheduling in a Grid Environment Nine month progress talk Laurie Young.
Advertisements

International Grid Communities Dr. Carl Kesselman Information Sciences Institute University of Southern California.
National e-Science Centre Glasgow e-Science Hub Opening: Remarks NeSCs Role Prof. Malcolm Atkinson Director 17 th September 2003.
1 Chapter 11: Data Centre Administration Objectives Data Centre Structure Data Centre Structure Data Centre Administration Data Centre Administration Data.
10 May 20041INET2004:New Internet (IPv6), Barcelona IPv6 and Grid Piers O'Hanlon University College London.
IT INFRASTRUCTURE AND EMERGING TECHNOLOGIES
IPv6 and Grid in 6NET IPv6 and Grid Peter T. Kirstein University College London.
What is Grid Computing? Cevat Şener Dept. of Computer Engineering, METU.
High Performance Computing Course Notes Grid Computing.
Highest Energy e + e – Collider LEP at CERN GeV ~4km radius First e + e – Collider ADA in Frascati GeV ~1m radius e + e – Colliders.
Alain Romeyer - Dec Grid computing for CMS What is the Grid ? Let’s start with an analogy How it works ? (Some basic ideas) Grid for LHC and CMS.
IBM Solutions for Grid Computing. I. IT view on “GRID” II. IBM and GRID III. IBM Storage and GRID Index …
SLIDE 1IS 257 – Fall 2011 New Generation Database Systems: The Grid/Cloud and The Future University of California, Berkeley School of Information.
Parallel Programming on the SGI Origin2000 With thanks to Moshe Goldberg, TCC and Igor Zacharov SGI Taub Computer Center Technion Mar 2005 Anne Weill-Zrahia.
POLITEHNICA University of Bucharest California Institute of Technology National Center for Information Technology Ciprian Mihai Dobre Corina Stratan MONARC.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
SLIDE 1IS 257 – Fall 2005 Future of Database Systems 2: XML Databases and Grid-based Digital Libraries University of California, Berkeley School.
SLIDE 1IS 257 – Fall 2010 New Generation Database Systems: IR Systems and the Grid/Cloud University of California, Berkeley School of Information.
Knowledge Environments for Science: Representative Projects Ian Foster Argonne National Laboratory University of Chicago
Introduction to Grid Computing Ann Chervenak Carl Kesselman And the members of the Globus Team.
SLIDE 1IS 257 – Fall 2012 Data Mining and the Weka Toolkit and Intro for Big Data University of California, Berkeley School of Information.
CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.
1.Training and education 2.Consulting 3.Travel 4.Hardware 5.Software Which of the following is not included in a firm’s IT infrastructure investments?
1 ARGONNE  CHICAGO How the Linux and Grid Communities can Build the Next- Generation Internet Platform Ian Foster Argonne National.
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
Service, Grid Service and Workflow Xian-He Sun Scalable Computing Software Laboratory Illinois Institute of Technology Nov. 30, 2006 Fermi.
Welcome e-Science in the UK Building Collaborative eResearch Environments Prof. Malcolm Atkinson Director 23 rd February 2004.
Introduction to Grid Computing Ann Chervenak and Ewa Deelman USC Information Sciences Institute.
Peer to Peer & Grid Computing Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The University.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
NORDUnet NORDUnet The Fibre Generation Lars Fischer CTO NORDUnet.
Hussein Suleman University of Cape Town Department of Computer Science Advanced Information Management Laboratory High Performance.
Slide 1 Experiences with NMI R2 Grids Software at Michigan Shawn McKee April 8, 2003 Internet2 Spring Meeting.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
M.A.Doman Short video intro Model for enabling the delivery of computing as a SERVICE.
Grid-based Sensor Network Service on Future Internet By Mohammad Mehedi Hassan Student ID:
SLIDE 1IS 257 – Fall 2012 Data Mining and the Weka Toolkit and Intro for Big Data University of California, Berkeley School of Information.
Grid Computing Its Promise and Challenges Tom Smith Master’s Candidate Computer Science Union College January 2004.
The Grid: The First 50 Years Ian Foster Argonne National Laboratory University of Chicago Carl Kesselman Information Sciences Institute University of Southern.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
The Grid and the Future of Business Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
Grid-based Future Internet with Wireless sensor network By Mohammad Mehedi Hassan Student ID:
AIMES Launch Friday 22 nd July DIGITAL CHALLENGE: CREATING A DIGITALLY ENABLED SOCIETY COMMUNITY GRID.
1 ARGONNE  CHICAGO Grid Introduction and Overview Ian Foster Argonne National Lab University of Chicago Globus Project
Authors: Ronnie Julio Cole David
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
…building the next IT revolution From Web to Grid…
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Microsoft.NET; A vision for the next generation of XML Web Services. Steven Adler Product Manager Microsoft EMEA.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Eine Einführung ins Grid Andreas Gellrich IT Training DESY Hamburg
CEOS Working Group on Information Systems and Services - 1 Data Services Task Team Discussions on GRID and GRIDftp Stuart Doescher, USGS WGISS-15 May 2003.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Middleware and the Grid Steven Tuecke Mathematics and Computer Science Division Argonne National Laboratory.
Welcome Grids and Applied Language Theory Dave Berry Research Manager 16 th October 2003.
Storage Management on the Grid Alasdair Earl University of Edinburgh.
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Giuseppe Andronico INFN Sez. CT & Consorzio COMETA Workshop Clouds.
10-Feb-00 CERN HepCCC Grid Initiative ATLAS meeting – 16 February 2000 Les Robertson CERN/IT.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
] Open Science Grid Ben Clifford University of Chicago
Introduction to Big Data
Clouds , Grids and Clusters
Access Grid and USAID November 14, 2007
Grid Computing.
CS258 Spring 2002 Mark Whitney and Yitao Duan
Grid Introduction and Overview
Presentation transcript:

SLIDE 1IS 257 – Fall 2006 New Generation Database Systems: XML Databases and Grid-based Digital Libraries University of California, Berkeley School of Information IS 257: Database Management

SLIDE 2IS 257 – Fall 2006 Lecture Outline XML and DBMS The Grid and DBMS –The Grid –Data Grids –Grid-based DBMS

SLIDE 3IS 257 – Fall 2006 Lecture Outline XML and DBMS The Grid and DBMS –The Grid –Data Grids –Grid-based DBMS

SLIDE 4IS 257 – Fall 2006 Standards: XML/SQL As part of SQL3 an extension providing a mapping from XML to DBMS is being created called XML/SQL The (draft) standard is very complex, but the ideas are actually pretty simple Suppose we have a table called EMPLOYEE that has columns EMPNO, FIRSTNAME, LASTNAME, BIRTHDATE, SALARY

SLIDE 5IS 257 – Fall 2006 Standards: XML/SQL That table can be mapped to: John Smith … etc. …

SLIDE 6IS 257 – Fall 2006 Standards: XML/SQL In addition the standard says that XMLSchemas must be generated for each table, and also allows relations to be managed by nesting records from tables in the XML. Variants of this are incorporated into the latest versions of ORACLE (Slides from Oracle Web Site on ORACLE XML)

SLIDE 7IS 257 – Fall 2006 Lecture Outline XML and DBMS The Grid and DBMS –The Grid –Data Grids –Grid-based DBMS

SLIDE 8IS 257 – Fall 2006 Grid-based Digital Libraries So what’s this Grid thing anyhow? Data Grids and Distributed Storage Grid-Based IR Grid-Based Digital Libraries This lecture borrows heavily from presentations by Ian Foster (Argonne National Laboratory & University of Chicago), Reagan Moore and others from San Diego Supercomputer Center

SLIDE 9IS 257 – Fall 2006 The Grid: On-Demand Access to Electricity Time Quality, economies of scale Source: Ian Foster

SLIDE 10IS 257 – Fall 2006 By Analogy, A Computing Grid Decouples production and consumption –Enable on-demand access –Achieve economies of scale –Enhance consumer flexibility –Enable new devices On a variety of scales –Department –Campus –Enterprise –Internet Source: Ian Foster

SLIDE 11IS 257 – Fall 2006 What is the Grid? “The short answer is that, whereas the Web is a service for sharing information over the Internet, the Grid is a service for sharing computer power and data storage capacity over the Internet. The Grid goes well beyond simple communication between computers, and aims ultimately to turn the global network of computers into one vast computational resource.” Source: The Global Grid Forum

SLIDE 12IS 257 – Fall 2006 Not Exactly a New Idea … “The time-sharing computer system can unite a group of investigators …. one can conceive of such a facility as an … intellectual public utility.” –Fernando Corbato and Robert Fano, 1966 “We will perhaps see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service individual homes and offices across the country.” Len Kleinrock, 1967 Source: Ian Foster

SLIDE 13IS 257 – Fall 2006 But, Things are Different Now Networks are far faster (and cheaper) –Faster than computer backplanes “Computing” is very different than pre-Net –Our “computers” have already disintegrated –E-commerce increases size of demand peaks –Entirely new applications & social structures We’ve learned a few things about software Source: Ian Foster

SLIDE 14IS 257 – Fall 2006 Computing isn’t Really Like Electricity I import electricity but must export data “Computing” is not interchangeable but highly heterogeneous: data, sensors, services, … This complicates things; but also means that the sum can be greater than the parts –Real opportunity: Construct new capabilities dynamically from distributed services Raises three fundamental questions –Can I really achieve economies of scale? –Can I achieve QoS across distributed services? –Can I identify apps that exploit synergies? Source: Ian Foster

SLIDE 15IS 257 – Fall 2006 Why the Grid? (1) Revolution in Science Pre-Internet –Theorize &/or experiment, alone or in small teams; publish paper Post-Internet –Construct and mine large databases of observational or simulation data –Develop simulations & analyses –Access specialized devices remotely –Exchange information within distributed multidisciplinary teams Source: Ian Foster

SLIDE 16IS 257 – Fall 2006 Why the Grid? (2) Revolution in Business Pre-Internet –Central data processing facility Post-Internet –Enterprise computing is highly distributed, heterogeneous, inter-enterprise (B2B) –Business processes increasingly computing- & data-rich –Outsourcing becomes feasible => service providers of various sorts Source: Ian Foster

SLIDE 17IS 257 – Fall 2006 The Information Grid Imagine a web of data Machine Readable –Search, Aggregate, Transform, Report On, Mine Data – using more computers, and less humans Scalable –Machines are cheap – can buy 50 machines with 100Gb or memory and 100 TB disk for under $100K, and dropping –Network is now faster than disk Flexible –Move data around without breaking the apps Source: S. Banerjee, O. Alonso, M. Drake - ORACLE

SLIDE 18IS 257 – Fall 2006 Tier0/1 facility Tier2 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link Tier3 facility The Foundations are Being Laid Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Soton London Belfast DL RAL Hinxton

SLIDE 19IS 257 – Fall 2006 Data Grid Problem “Enable a geographically distributed community [of thousands] to pool their resources in order to perform sophisticated, computationally intensive analyses on Petabytes of data” Note that this problem: –Is common to many areas of science –Overlaps strongly with other Grid problems

SLIDE 20IS 257 – Fall 2006 Data Grids for High Energy Physics Tier2 Centre ~1 TIPS Online System Offline Processor Farm ~20 TIPS CERN Computer Centre FermiLab ~4 TIPS France Regional Centre Italy Regional Centre Germany Regional Centre Institute Institute ~0.25TIPS Physicist workstations ~100 MBytes/sec ~622 Mbits/sec ~1 MBytes/sec There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Physics data cache ~PBytes/sec ~622 Mbits/sec or Air Freight (deprecated) Tier2 Centre ~1 TIPS Caltech ~1 TIPS ~622 Mbits/sec Tier 0 Tier 1 Tier 2 Tier 4 1 TIPS is approximately 25,000 SpecInt95 equivalents Image courtesy Harvey Newman, Caltech

SLIDE 21IS 257 – Fall 2006 Grids and Open Standards Increased functionality, standardization Time Custom solutions Open Grid Services Arch GGF: OGSI, … (+ OASIS, W3C) Multiple implementations, including Globus Toolkit Web services Globus Toolkit Defacto standards GGF: GridFTP, GSI X.509, LDAP, FTP, … App-specific Services

SLIDE 22IS 257 – Fall 2006 The Grid as Enabler of 21st Century Science Entirely new approaches to enquiry based on –Deep analysis of huge quantities of data –Interdisciplinary collaboration –Large-scale simulation –Smart instrumentation Enabled by an infrastructure that enables access to, and integration of, resources & services without regard for location

SLIDE 23IS 257 – Fall 2006 Not only Science… The Database world is moving to the Grid for large-scale applications Oracle 10g is specifically designed to exploit clustered/grid computing using RACs (Real Application Clusters) An example from the Information/Publishing world… –Presentation from Oracle about Thomson Legal’s use of Oracle 10g and RACs