Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
Cacti Workshop Tony Roman Agenda What is Cacti? The Origins of Cacti Large Installation Considerations Automation The Current.
Presented by Marie-Gisele Assigue Hon Shea Thursday, March 31 st 2011.
Database Software File Management Systems Database Management Systems.
Computer System Overview
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Chapter 12 File Management Systems
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
5 Creating the Physical Model. Designing the Physical Model Phase IV: Defining the physical model.
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse and Traditional Data Warehouse Design BI Best Practices and Tuning for Scaling SQL Server.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 1 Preview of Oracle Database 12 c In-Memory Option Thomas Kyte
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Shilpa Seth.  Centralized System Centralized System  Client Server System Client Server System  Parallel System Parallel System.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
PMIT-6102 Advanced Database Systems
Storage & Peripherals Disks, Networks, and Other Devices.
Two or more disks Capacity is the same as the total capacity of the drives in the array No fault tolerance-risk of data loss is proportional to the number.
Objectives To provide a grand tour of the major operating systems components To provide coverage of basic computer system organization.
Managing Multi-User Databases AIMS 3710 R. Nakatsu.
Systems analysis and design, 6th edition Dennis, wixom, and roth
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How are data stored? –physical level –logical level.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
Introduction to Hadoop and HDFS
Data Warehousing at Acxiom Paul Montrose Data Warehousing at Acxiom Paul Montrose.
Data Warehousing 1 Lecture-24 Need for Speed: Parallelism Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Database Systems Slide 1 Database Systems Lecture 5 Overview of Oracle Database Architecture - Concept Manual : Chapters 1,8 Lecturer : Dr Bela Stantic.
Oracle Advanced Compression – Reduce Storage, Reduce Costs, Increase Performance Session: S Gregg Christman -- Senior Product Manager Vineet Marwah.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
RAID SECTION (2.3.5) ASHLEY BAILEY SEYEDFARAZ YASROBI GOKUL SHANKAR.
The Client/Server Database Environment Ployphan Sornsuwit KPRU Ref.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Physical Storage Organization. Advanced DatabasesPhysical Storage Organization2 Outline Where and How data are stored? –physical level –logical level.
Click to add text Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 6: Accessing.
Coupling Facility. The S/390 Coupling Facility (CF), the key component of the Parallel Sysplex cluster, enables multisystem coordination and datasharing.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 1Oracle9i DBA II: Backup/Recovery and Network Administration 1 Chapter 1 Backup and Recovery Overview MSCD642 Backup and Recovery.
Review of Computer System Organization. Computer Startup For a computer to start running when it is first powered up, it needs to execute an initial program.
Lec 5 part2 Disk Storage, Basic File Structures, and Hashing.
Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Background Computer System Architectures Computer System Software.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
DISTRIBUTED FILE SYSTEM- ENHANCEMENT AND FURTHER DEVELOPMENT BY:- PALLAWI(10BIT0033)
Practical Database Design and Tuning
Chapter 2.1 CPU.
Managing Multi-User Databases
Flash Storage 101 Revolutionizing Databases
Storage Virtualization
Introduction of Week 6 Assignment Discussion
Physical Database Design
Practical Database Design and Tuning
UNIT IV RAID.
Database administration
DBMS Physical Design Physical design is concerned with the placement of data and selection of access methods for efficiency and ongoing maintenance.
Presentation transcript:

Infrastructure for Data Warehouses

Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure

Basics Of Data Access: Storage Data on a single disk all share one controller. Striping data randomly across several disks reduces contention for controller time. Databases requiring 100% uptime use striping or mirroring to facilitate backup and maintenance. Backups can be written from one copy while processing proceeds with the other one. Striping, particularly in a RAID environment, permits replacement of failed hardware without bringing down the database.

Basics Of Data Access: Retrieval The speed of processing a given retrieval is primarily governed by the number of disk accesses required to execute it. Data is transferred to and from the disk in buffer sized units. On large systems these buffers (blocks) can be set by the code; on PC’s the buffer sizes (sectors) are fixed. A block may contain several records. If all of the records in a block can be processed before another retrieval is needed then processing is faster.

Basics Of Data Access: Busses A bus transfers data from device to device. In single systems the bus is internal. In distributed systems the network acts as the bus. Busses transfer data in units of a word. Normally a word is smaller than a buffer unit so transfer takes several bus cycles. (For networks packets do the same thing as words on a backplane bus.) Busses can service only one unit on the bus network at a time. Multiple units on the same bus can generate bus contention.

Basics Of Data Access: Cache Cache is high speed data storage location that stores the most recently used data that is to be transferred between units in a system. Cache speeds up processing by taking advantage of data reuse (looping) typical of most programs, by reducing the number of physical DASD accesses required. Memory cache (as opposed to CPU cache) is a location in main memory and can be set by the system administrator.

Program Characteristics Transaction Systems Access few records at a time. Require records from random locations. Update and modify data frequently. Data Warehouse Systems Access a number of records at a time. Require records in order. Update and modify data infrequently.

System Tuning Transaction Systems Small buffers Large cache Fast busses Data Warehouse Systems Large buffers Small cache Wide busses

Acxiom Overview Acxiom, creates and delivers Customer and Information Management Solutions that enable many of the largest, most respected companies in the world to build great relationships with their customers. Acxiom achieves this by blending data, technology and services to provide the most advanced customer information infrastructure available in the marketplace today.

Data Warehouses The characteristics of an Acxiom data warehouse generally are... Large multi-terabyte databases Large periodic sequential data loads Denormalized database schema Sequential reads/full table scans Little or no indices Little or no transaction logging Robust periodic backup solutions Performance measured using megabytes/gigabytes per second (MBPS, GBPS)

IBM Database Data Warehouses The processing platform is generally a large global class server or cluster of servers running UNIX. The storage sub- system is very fast with wide bandwidth and high levels of redundancy which permits the ability to move large amounts of sequential data in a very short time. The database is; A large vertical database that is denormalized with few tables but very long with sorted data and are sometimes several billion rows. The data is striped across the storage in a manner that prevents physical hot spots and takes advantage of the wide bandwidth.

IBM Data Warehouses

Transactional Databases The characteristics of an Acxiom transactional database generally are... Small, usually no larger than a few terabytes Random and simultaneous inserts, updates, deletes, and queries Random reads and writes Normalized database schema Transaction logging and archiving with incremental and periodic backup solutions Generally sub-second response required per transaction taking into account concurrency Performance measured using transactions per second (TPS) and I/O latency

IBM Database Transactional Databases The processing platform is generally a medium/enterprise class server The storage sub- system is very fast with low latency and nominal bandwidth and high levels of redundancy which permits the ability to move small amounts of selected data quickly. The database is; A normalized database that utilizes lookup tables. The data is stored randomly within a table but striped across the storage to prevent physical hot spots.

Transactional Databases IBM

Hybrid Databases The characteristics of an Acxiom hybrid database generally are... Medium sized, usually three to ten terabytes Random and simultaneous inserts, updates, deletes, and queries Random and sequential reads and writes Loosely normalized database schema Indices used sparingly Usually a batch maintenance process Transaction logging and archiving with incremental and periodic backup solutions Generally sub-second response required per transaction taking into account concurrency Performance measured using TPS, I/O latency, and MBPS

IBM Database Hybrid Databases The processing platform is generally a medium sized global class server The storage sub- system is very fast with wide bandwidth and high levels of redundancy which permits the ability to move large amounts of random and sequential data in a very short time. The database is; A large vertical database that is loosely normalized with few tables but very long with sorted data and are sometimes more than a billions rows. The data is striped across the storage in a manner that prevents physical hot spots and takes advantage of the wide bandwidth.

IBM Hybrid Databases

What’s New/ Future Innovations Grid or scale-out environments... Utilize low cost commodity based servers Low cost/no cost operating systems Many servers can be working on one problem with the aggregate processing power being more that one large server for less money Not locked into a single vendor or supplier When adding a new node, able to use current technology at a lower price Need to understand and factor in peripheral costs such as network, administration, data center etc.

Parallel Grid DB OS DB IBM server pSeries IBM server pSeries IBM server pSeries IBM server pSeries IBM server pSeries IBM server pSeries DB OS Clustered Grid

Shared nothing environment, each partition has its own resources allowing unlimited scalability (up to 999 partitions). Centralized management of partitioned environment. Data is equally distributed across all partitions. Any partition can receive connections and distribute queries among the other nodes.. Distributed Grid Database

Summary Understand the process in which the database is to be used and fashion a solution to meet the requirements and customer expectations Even though a DBA may only be responsible for the database, many factors such as operating system and hardware configuration affect the functionality of the database and thus are a concern to the DBA. A DBA must relate the database to its environment to achieve an optimized solution. A large multi-terabyte database is not a scary monster, it is the same as dealing with a smaller database, just add a few more zeros.