IISWC 2007 Panel Analyzing Petabytes Suchi Raman Netezza Corp.

Slides:



Advertisements
Similar presentations
Extreme Performance with Oracle Data Warehousing
Advertisements

Network Systems Sales LLC
Copyright © SoftTree Technologies, Inc. DB Tuning Expert.
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
Sensor Network 教育部資通訊科技人才培育先導型計畫. 1.Introduction General Purpose  A wireless sensor network (WSN) is a wireless network using sensors to cooperatively.
10 REASONS Why it makes a good option for your DB IN-MEMORY DATABASES Presenter #10: Robert Vitolo.
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
1. Aim High with Oracle Real World Performance Andrew Holdsworth Director Real World Performance Group Server Technologies.
Parallel Database Systems The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana.
IISWC 2007 Panel Benchmarking in the Web 2.0 Era Sudhanva Gurumurthi University of Virginia.
© 2011 Citrusleaf. All rights reserved.1 A Real-Time NoSQL DB That Preserves ACID Citrusleaf Srini V. Srinivasan Brian Bulkowski VLDB, 09/01/11.
Employing Web Search indexing for fast creation of filtered view of large text files Mostafa Agbaria, Ahmad Atamlh Department of Electrical engineering,
1 Software Testing and Quality Assurance Lecture 40 – Software Quality Assurance.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
1 Introduction To The New Mainframe Stephen S. Linkin Houston Community College ©HCCS & IBM® 2008 Stephen Linkin.
Module 9 Review Questions 1. The ability for a system to continue when a hardware failure occurs is A. Failure tolerance B. Hardware tolerance C. Fault.
Selecting and Implementing An Embedded Database System Presented by Jeff Webb March 2005 Article written by Michael Olson IEEE Software, 2000.
Data Center Infrastructure
Chapter 2. Creating the Database Environment
C-Store: Column Stores over Solid State Drives Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 19, 2009.
1 Enabling Large Scale Network Simulation with 100 Million Nodes using Grid Infrastructure Hiroyuki Ohsaki Graduate School of Information Sci. & Tech.
Oracle Challenges Parallelism Limitations Parallelism is the ability for a single query to be run across multiple processors or servers. Large queries.
September 2011Copyright 2011 Teradata Corporation1 Teradata Columnar.
Improving Efficiency of I/O Bound Systems More Memory, Better Caching Newer and Faster Disk Drives Set Object Access (SETOBJACC) Reorganize (RGZPFM) w/
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Information Explosion. Reality: New Machine-Generated Data Non-relational and relational data outside of the EDW † Source: Analytics Platforms – Beyond.
Oracle Advanced Compression – Reduce Storage, Reduce Costs, Increase Performance Session: S Gregg Christman -- Senior Product Manager Vineet Marwah.
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
LAN Switching and Wireless – Chapter 1 Vilina Hutter, Instructor
Kiew-Hong Chua a.k.a Francis Computer Network Presentation 12/5/00.
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Business Data Communications, Fourth Edition Chapter 11: Network Management.
…optimise your IT investments Warehousing for low latency analytics Philip Howard Research Director – Bloor Research.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
Hierarchical Topology Design. 2 Topology Design Topology is a map of an___________ that indicates network segments, interconnection points, and user communities.
IT 606 Computer Networks (CN). 1.Evolution of Computer Networks & Application Layer. 2.Transport Layer & Network Layer. 3.Routing & Data link Layer. 4.Physical.
Operating Systems: Wrap-Up Questions answered in this lecture: What is an Operating System? Why are operating systems so interesting? What techniques can.
Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.
Virtual Application Profiler (VAPP) Problem – Increasing hardware complexity – Programmers need to understand interactions between architecture and their.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
B5: Exascale Hardware. Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring.
Amagees Tech Corp value added services Data Management and Infrastructure.
John Li Jeff Lee Hardware, software, information Technology Hardware --The physical machinery and devices that make up a computer system. Software.
1 Copyright © 2005, Oracle. All rights reserved. Following a Tuning Methodology.
March, 2002 Efficient Bitmap Indexing Techniques for Very Large Datasets Kesheng John Wu Ekow Otoo Arie Shoshani.
GlueX Collaboration May05 C. Cuevas 1 Topics: Infrastructure Update New Developments EECAD & Modeling Tools Flash ADC VXS – Crates GlueX Electronics Workshop.
Making Data Work for Everyone Gordon Phillips May 28, 2014.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Component 8/Unit 1bHealth IT Workforce Curriculum Version 1.0 Fall Installation and Maintenance of Health IT Systems Unit 1b Elements of a Typical.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
Embedded Database Benchmark Team CodeBlooded. Internet of Things “As the number of interconnected platforms continues to multiply, vendors and customers.
Interaction and Animation on Geolocalization Based Network Topology by Engin Arslan.
CLOUD ARCHITECTURE Many organizations and researchers have defined the architecture for cloud computing. Basically the whole system can be divided into.
Monitoring Windows Server 2012
Barracuda Link Balancer
Data Center Infrastructure
CONNECTING TO THE INTERNET
Microsoft SharePoint Server 2016
Software Architecture in Practice
Migration Strategies – Business Desktop Deployment (BDD) Overview
Adopting OpenCAPI for High Bandwidth Database Accelerators
IS4680 Security Auditing for Compliance
فصل پانزدهم فاز پياده سازي مونا بخارايي نيا
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Introduction to Teradata
Software Acceleration in Hybrid Systems Xiaoqiao (XQ) Meng IBM T. J
7.3 Example Use Cases Spirent Automation Platform Technologies.
Presentation transcript:

IISWC 2007 Panel Analyzing Petabytes Suchi Raman Netezza Corp.

2 Netezza Confidential Petabyte Database Workloads Macro-analytic queries > Identify trends and patterns > Very large data volumes > Query times dominated by disk scan times Micro-analytic queries > Short running queries > Query run once and stored > Pre-computed summaries Data management > ETL load/unload > Backup/restore

3 Netezza Confidential Netezza NPS System

4 Netezza Confidential Software challenges Effective disk bandwidth > Optimal data layouts > Data compression > Increased effective disk bandwidth (and reliability!) > Upgrades and evolution of on-disk formats > Minimize disk reads (indexes, caches) Query processing algorithms > Skew avoidance algorithms > Scheduling among queries, especially with mixed workloads combining large and small queries System Monitoring/profiling > System monitoring during busy periods > Accurate profiling techniques Data management challenges > High speed data path in/out of NPS system > Efficient/flexible data formats for load/unload > Infrastructure challenge – fast external devices for sourcing/sinking data > Custom functions (UDFs/UDAs) implemented within the system

5 Netezza Confidential Hardware challenges > Increased effective disk bandwidth (and reliability!) > Multi-core technology > Balancing CPU-to-disk ratio > Specialized engines (e.g., FPGA-based filtering) > Faster internal and external connectivity

6 Netezza Confidential How can University Researchers contribute? Explore new applications and data types > E.g., network traffic analysis > Geospatial data > Biological data types > Skew avoidance/scheduling algorithms > Applications built on UDFs/UDAs > Verification methods for optimizer algorithms Platform improvements > Disk performance and reliability > FPGA filtering algorithms > Faster interconnect networks > Power and cooling improvements