Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed.

Slides:



Advertisements
Similar presentations
FatMax Licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.5 LicenseCreative Commons Attribution-NonCommercial-ShareAlike 2.5.
Advertisements

Operating Systems Manage system resources –CPU scheduling –Process management –Memory management –Input/Output device management –Storage device management.
CA 714CA Midterm Review. C5 Cache Optimization Reduce miss penalty –Hardware and software Reduce miss rate –Hardware and software Reduce hit time –Hardware.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
Chapter 5 Computing Components. The (META) BIG IDEA Cool, idea but maybe too big DATA – Must be stored somewhere in a storage device PROCESSING – Data.
Chapter 5: Computer Systems Organization Invitation to Computer Science, Java Version, Third Edition.
1: Operating Systems Overview
Memory Organization.
Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.
Introduction to Computers and Python. What is a Computer? Computer- a device capable of performing computations and making logical decisions at speeds.
How Computers Work. A computer is a machine f or the storage and processing of information. Computers consist of hardware (what you can touch) and software.
Virtualization Concept. Virtualization  Real: it exists, you can see it.  Transparent: it exists, you cannot see it  Virtual: it does not exist, you.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
1 Advanced Storage Technologies for High Performance Computing Sorin, Faibish EMC NAS Senior Technologist IDC HPC User Forum, April 14-16, Norfolk, VA.
Topics Introduction Hardware and Software How Computers Store Data
Practical PC, 7th Edition Chapter 17: Looking Under the Hood
Different CPUs CLICK THE SPINNING COMPUTER TO MOVE ON.
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Invitation to Computer Science 5th Edition
CS 1308 Computer Literacy and the Internet Computer Systems Organization.
Parallel Algorithms Sorting and more. Keep hardware in mind When considering ‘parallel’ algorithms, – We have to have an understanding of the hardware.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
Conclusions and Future Considerations: Parallel processing of raster functions were 3-22 times faster than ArcGIS depending on file size. Also, processing.
Copyright © 2011 Curt Hill MIMD Multiple Instructions Multiple Data.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
Data Management for Decision Support Session-4 Prof. Bharat Bhasker.
CS 1308 Computer Literacy and the Internet. Objectives In this chapter, you will learn about:  The components of a computer system  Putting all the.
 Hardware compatibility means that software will run properly on the computer in which it is installed.  When purchasing software, look for one of these.
CENTRAL PROCESSING UNIT. CPU Does the actual processing in the computer. A single chip called a microprocessor. Composed of an arithmetic and logic unit.
Copyright © Curt Hill Parallelism in Processors Several Approaches.
Academic PowerPoint Computer System – Architecture.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 2 Parallel Hardware and Parallel Software An Introduction to Parallel Programming Peter Pacheco.
Memory Hierarchy: Terminology Hit: data appears in some block in the upper level (example: Block X)  Hit Rate : the fraction of memory access found in.
Outline Why this subject? What is High Performance Computing?
Lecture on Central Process Unit (CPU)
EKT303/4 Superscalar vs Super-pipelined.
Lecture 3: Computer Architectures
Parallel processing
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 OS 1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Slide 6-1 Chapter 6 System Software Considerations Introduction to Information Systems Judith C. Simon.
Programming in the Context of a Typical Computer Computer Studies Created by Rex Woollard.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
SMP Basics KeyStone Training Multicore Applications Literature Number: SPRPxxx 1.
Background Computer System Architectures Computer System Software.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
Von Neumann Machines. 3 The Von Neumann Architecture Model for designing and building computers, based on the following three characteristics: 1)The.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Compilers: History and Context COMP Outline Compilers and languages Compilers and architectures – parallelism – memory hierarchies Other uses.
Systems Architecture Keywords Fetch Execute Cycle
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Modeling Big Data Execution speed limited by: Model complexity
A Level Computing – a2 Component 2 1A, 1B, 1C, 1D, 1E.
Topics Introduction Hardware and Software How Computers Store Data
Central Processing Unit- CPU
Lecture 22: Using ArcToolbox Tools in Python
Spatial Analysis With Big Data
Architecture Background
Symmetric Multiprocessing (SMP)
1.1 The Characteristics of Contemporary Processors, Input, Output and Storage Devices Types of Processors.
Chapter 6: Understanding and Assessing Hardware
Parts of the Computer
Presentation transcript:

Modeling Big Data Execution speed limited by: –Model complexity –Software Efficiency –Spatial and temporal extent and resolution –Data size & access speed –Hardware performance

Combinatorials If it takes 1 hour to run processing on Humboldt County: 10,495 km² –Entire US: 39 days

Vector or Array Computing Super computers of the 60’s, 70’s, and 80’s Harvard Architecture: –Separate program and data –2^n Processors execute the same program on different data Vector arithmetic Limited flexibility

Instructions and data share memory More flexible Allows for one task to be divided into many processes and executed individually CPU Von Neumann Architecture ALU Cache RAM I/O InstructionsData

Applications and Services

Multiprocessing Multiple processors (CPUs) working on the same task Processes –Applications: have a UI –Services: Run in background –Can be executed individually –See “Task Manager” Processes can have multiple “threads”

Processes

Threads A process can have lots of threads –ArcGIS now can have 2, one for the GUI and one for a geoprocessing task Obtain a portion of the CPU cycles Must “sleep” or can lockup Share access to memory, disk, I/O

Distributed Processing Task must be broken up into processes that can be run independently or sequentially Typically: –Command line-driven –Scripts or compiled programs –R, Python, C++, Java, PHP, etc.

Distributed Processing Grid – distributed computing Beowulf – lots of simple computer boards (motherboards) Condor – software to share free time on computers “The Cloud?” – web-based “services”. Should allow submittal of processes in the future.

Trends Processors are not getting faster The internet is not getting faster RAM continues to decrease in price Hard discs continue to increase in size –Solid State Drives available Number of “Cores” continues to increase

Future Computers? 128k cores, lots of “cache” Multi-terabyte RAM Terabyte SSD Drives 100s of terabyte hard discs? Allows for: –Large datasets in RAM (multi-terabyte) –Event larger datasets on “hard disks” –Lots of tasks to run simultaneously

Reality Check Whether through local processing or distributed processing: –We will need to “parallelize” spatial analysis in the future to manage: Larger datasets Larger modeling extends and finer resolution Move complex models Desire: –Break-up processing into “chunks” that can be each executed somewhat independently of each other

Challenge Having all the software you need on the computer you are executing the task on –Virtual Application: Entire computer disk image sent to another computer –All required software installed. Often easier to manage your own cluster –Programs installed “once” –Shared hard disc access –Communication between threads

Software ArcGIS: installation, licensing, processing makes it almost impossible to use Quantum, GRASS: installation make it challenging FWTools, C++ applications, Use standard language libraries and functions to avoid compatibility problems

Data Issues Break data along natural lines: –Different species –Different time slices Window spatial data –Oversized Vector data: size typically not an issue Raster data: size is an issue

Windowing Spatial Data Raster arithmetic is natural –Each pixel result is only dependent on one pixel in the source raster =

Windowing Spatial Data N x N filters: –Needs to use oversized windows Columns Rows

Windowing Spatial Data Others are problematic: –Viewsheds –Stream networks –Spatial simulations ScienceDirect.com