Modeling Big Data Execution speed limited by: Model complexity

Slides:



Advertisements
Similar presentations
FatMax Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5.
Advertisements

CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
Computer Speed. Determining the Speed of a Computer Specifications  Examine the speed and size of the processor(s), memory, storage, and other components.
CSCE101 – Ch 3 September 14 & 16, Chapter 3 Computer Software = System Software + Application Software Delineation unclear – (ex. Microsoft Antitrust)
1: Operating Systems Overview
Memory Organization.
Computer System Overview Chapter 1. Basic computer structure CPU Memory memory bus I/O bus diskNet interface.
Virtualization Concept. Virtualization  Real: it exists, you can see it.  Transparent: it exists, you cannot see it  Virtual: it does not exist, you.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Practical PC, 7th Edition Chapter 17: Looking Under the Hood
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
CSCI 4717/5717 Computer Architecture
Invitation to Computer Science 5th Edition
CS 1308 Computer Literacy and the Internet Computer Systems Organization.
C++ Programming Language Lecture 1 Introduction By Ghada Al-Mashaqbeh The Hashemite University Computer Engineering Department.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
Computer Programming 2 Why do we study Java….. Java is Simple It has none of the following: operator overloading, header files, pre- processor, pointer.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.
 Hardware compatibility means that software will run properly on the computer in which it is installed.  When purchasing software, look for one of these.
CENTRAL PROCESSING UNIT. CPU Does the actual processing in the computer. A single chip called a microprocessor. Composed of an arithmetic and logic unit.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 2 Parallel Hardware and Parallel Software An Introduction to Parallel Programming Peter Pacheco.
Lecture on Central Process Unit (CPU)
EKT303/4 Superscalar vs Super-pipelined.
Slide 6-1 Chapter 6 System Software Considerations Introduction to Information Systems Judith C. Simon.
Programming in the Context of a Typical Computer Computer Studies Created by Rex Woollard.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
Chapter 3 Getting Started. Copyright © 2005 Pearson Addison-Wesley. All rights reserved. Objectives To give an overview of the structure of a contemporary.
Compilers: History and Context COMP Outline Compilers and languages Compilers and architectures – parallelism – memory hierarchies Other uses.
Introduction to Operating Systems Concepts
4.1 Machines and Computational Models
Cache Advanced Higher.
TYPES OF MEMORY.
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
A Level Computing – a2 Component 2 1A, 1B, 1C, 1D, 1E.
Chapter 1: Introduction
AWS Integration in Distributed Computing
Topics Introduction Hardware and Software How Computers Store Data
Andy Wang COP 5611 Advanced Operating Systems
Introduction
Central Processing Unit- CPU
Chapter 9 – Real Memory Organization and Management
Spatial Analysis With Big Data
The Client/Server Database Environment
Multi-Processing in High Performance Computer Architecture:
Architecture Background
Hadoop Clusters Tess Fulkerson.
CS 286 Computer Organization and Architecture
Multi-Processing in High Performance Computer Architecture:
OPERATING SYSTEMS.
Symmetric Multiprocessing (SMP)
Parallel Analytic Systems
What is Concurrent Programming?
Introduction to Operating Systems
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
Introduction to Operating Systems
Motherboard External Hard disk USB 1 DVD Drive RAM CPU (Main Memory)
What is Concurrent Programming?
All assignments and information is posted on web site
CS 286 Computer Organization and Architecture
Chapter 6: Understanding and Assessing Hardware
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Parts of the Computer
Presentation transcript:

Modeling Big Data Execution speed limited by: Model complexity Software Efficiency Spatial and temporal extent and resolution Data size & access speed Hardware performance

Data Access ALU CPU Cache Dynamic RAM Static RAM Hard Disk Network

The Cloud: slow, cheap? infinite capacity? Data Hierarchy The Cloud: slow, cheap? infinite capacity?

Vector or Array Computing Super computers of the 60’s, 70’s, and 80’s Harvard Architecture: Separate program and data 2^n Processors execute the same program on different data Vector arithmetic Limited flexibility

Von Neumann Architecture Instructions and data share memory More flexible Allows for one task to be divided into many processes and executed individually CPU ALU Cache RAM I/O Instructions Data

Applications and Services

Multiprocessing Multiple processors (CPUs) working on the same task Processes Applications: have a UI Services: Run in background Can be executed individually See “Task Manager” Processes can have multiple “threads”

Processes

Threads A process can have lots of threads ArcGIS now can have 2, one for the GUI and one for a geoprocessing task Obtain a portion of the CPU cycles Must “sleep” or can lockup Share access to memory, disk, I/O

Distributed Processing Task must be broken up into processes that can be run independently or sequentially Typically: Command line-driven Scripts or compiled programs R, Python, C++, Java, PHP, etc.

Distributed Processing Grid – distributed computing Beowulf – lots of simple computer boards (motherboards) Condor – software to share free time on computers “The Cloud” – web-based “services”. Should allow submittal of processes.

Trends Processors are not getting faster The internet is not getting faster RAM continues to decrease in price Hard discs continue to increase in size Solid State Drives available Number of “Cores” continues to increase

Future Computers? 128k cores, lots of “cache” Multi-terabyte RAM Terabyte SSD Drives 100s of terabyte hard discs? Allows for: Large datasets in RAM (multi-terabyte) Event larger datasets on “hard disks” Lots of tasks to run simultaneously

Reality Check Whether through local processing or distributed processing: We will need to “parallelize” spatial analysis in the future to manage: Larger datasets Larger modeling extents and finer resolution Move complex models Desire: Break-up processing into “chunks” that can be each executed somewhat independently of each other

Challenge Having all the software you need on the computer you are executing the task on Virtual Application: Entire computer disk image sent to another computer All required software installed. Often easier to manage your own cluster Programs installed “once” Shared hard disc access Communication between threads

Software ArcGIS: installation, licensing, processing makes it almost impossible to use Quantum, GRASS: installation make it challenging FWTools, C++ applications, Use standard language libraries and functions to avoid compatibility problems

Data Issues Break data along natural lines: Window spatial data Different species Different time slices Window spatial data Oversized Vector data: size typically not an issue Raster data: size is an issue

Windowing Spatial Data Raster arithmetic is natural Each pixel result is only dependent on one pixel in the source raster 1 2 3 12 9 13 10 13 11 15 = +

Windowing Spatial Data N x N filters: Needs to use oversized windows Columns 12 20 23 34 40 15 30 31 39 22 29 14 28 38 13 19 25 32 37 Rows

Windowing Spatial Data Others are problematic: Viewsheds Stream networks Spatial simulations ScienceDirect.com