Lars Arge 1/12 Lars Arge. 2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere.

Slides:



Advertisements
Similar presentations
I/O-Algorithms Lars Arge Spring 2012 April 17, 2012.
Advertisements

University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
Lars Arge 1/43 Big Terrain Data Analysis Algorithms in the Field Workshop SoCG June 19, 2012 Lars Arge.
Lars Arge 1/13 Efficient Handling of Massive (Terrain) Datasets Lars Arge A A R H U S U N I V E R S I T E T Department of Computer Science.
MADALGO ― Center for Massive Data Algorithmics MADALGO is a major new basic research center funded by The Danish National Research Foundation initially.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Global economic trends are driving science education  Worldwide countries are working to develop modern, knowledge based economies in order to be competitive.
I/O-Algorithms Lars Arge January 31, Lars Arge I/O-algorithms 2 Random Access Machine Model Standard theoretical model of computation: –Infinite.
Structural Genomics – an example of transdisciplinary research at Stanford Goal of structural and functional genomics is to determine and analyze all possible.
I/O-Algorithms Lars Arge Spring 2009 February 2, 2009.
I/O-Algorithms Lars Arge Spring 2009 January 27, 2009.
I/O-Algorithms Lars Arge Spring 2007 January 30, 2007.
I/O-Algorithms Lars Arge Aarhus University February 7, 2005.
I/O-Algorithms Lars Arge Aarhus University February 6, 2007.
I/O-Algorithms Lars Arge Spring 2006 February 2, 2006.
Lars Arge 1/14 A A R H U S U N I V E R S I T E T Department of Computer Science Efficient Handling of Massive (Terrain) Datasets Professor Lars Arge University.
I/O-Algorithms Lars Arge Aarhus University April 16, 2008.
I/O-Algorithms Lars Arge Aarhus University February 14, 2008.
Massive Data Algorithmics Faglig Dag, January 17, 2008 Gerth Stølting Brodal University of Aarhus Department of Computer Science.
Chapter 1 and 2 Computer System and Operating System Overview
IR Paolo Ferragina Dipartimento di Informatica Università di Pisa.
Flow modeling on grid terrains. Why GIS?  How it all started.. Duke Environmental researchers: computing flow accumulation for Appalachian Mountains.
I/O-Algorithms Lars Arge Spring 2008 January 31, 2008.
Algorithms for Information Retrieval Prologue. References Managing gigabytes A. Moffat, T. Bell e I. Witten, Kaufmann Publisher A bunch of scientific.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Advances in Networked Autonomous Vehicles: Technologies, Tools, and Case Studies João Borges de Sousa and Karl H. Johansson Tutorial Workshop at IEEE International.
I/O-Algorithms Lars Arge Fall 2014 August 28, 2014.
Heavily based on slides by Lars Arge I/O-Algorithms Thomas Mølhave Spring 2012 February 9, 2012.
Subproject 3 Large Scale Optimization Participants CTI, CUNI, MPII, RWTH, TU Wroclaw, UCY, UPB presented by Burkhard Monien (UPB)
Overview of Computing. Computer Science What is computer science? The systematic study of computing systems and computation. Contains theories for understanding.
Bin Yao Spring 2014 (Slides were made available by Feifei Li) Advanced Topics in Data Management.
X-Stream: Edge-Centric Graph Processing using Streaming Partitions
Future role of DMR in Cyber Infrastructure D. Ceperley NCSA, University of Illinois Urbana-Champaign N.B. All views expressed are my own.
External Sorting Sort n records/elements that reside on a disk. Space needed by the n records is very large.  n is very large, and each record may be.
To the Idea of Establishing the US-Russian Professional Network: International Education and Research Administrators Social Expertise Exchange Fellowship.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Lars Arge1 Permuting Lower Bound Permuting N elements according to a given permutation takes I/Os in “indivisibility” model Indivisibility model: Move.
EE & CSE Program Educational Objectives Review EECS Industrial Advisory Board Meeting May 1 st, 2009 by G. Serpen, PhD Sources ABET website: abet.org Gloria.
CSE332: Data Abstractions Lecture 8: Memory Hierarchy Tyler Robison Summer
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
MotivationFundamental ProblemsProblems on Graphs Parallel processors are becoming common place. Each core of a multi-core processor consists of a CPU and.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
King Fahd University of Petroleum and Minerals King Fahd University of Petroleum and Minerals Computer Engineering Department Computer Engineering Department.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Materials Innovation Platforms (MIP): A New NSF Mid-scale Instrumentation and User Program to Accelerate The Discovery of New Materials MRSEC Director’s.
Report on CHEP ‘06 David Lawrence. Conference had many participants, but was clearly dominated by LHC LHC has 4 major experiments: ALICE, ATLAS, CMS,
Computer Organization. This module surveys the physical resources of a computer system.  Basic components  CPU  Memory  Bus  I/O devices  CPU structure.
Group Science J. Marc Overhage MD, PhD Regenstrief Institute Indiana University School of Medicine.
Overview of Operating Systems Introduction to Operating Systems: Module 0.
ICT-enabled Agricultural Science for Development Scenarios, Opportunities, Issues by ICTs transforming agricultural science, research & technology generation.
ProblemAssumption and PreliminariesAlgorithm  How does the water flow and depressions fill during non-uniform rain over a terrain?  Can we efficiently.
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
PRA is the world's most complete surface coatings adviser.
Lecture 1: Basic Operators in Large Data CS 6931 Database Seminar.
1 Cache-Oblivious Query Processing Bingsheng He, Qiong Luo {saven, Department of Computer Science & Engineering Hong Kong University of.
Internal Memory Pointer MachineRandom Access MachineStatic Setting Data resides in records (nodes) that can be accessed via pointers (links). The priority.
Katy Börner Teaching & Research Teaching & Research Katy Börner
COST: from Policy to Practice Antoine Mercier, Research Grant Team Leader, University of Bedfordshire, former French CNC.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Digital Terrain Analysis for Massive Grids
Three Goals to Accomplish by Writing Papers
Advanced Topics in Data Management
CSC Classes Required for TCC CS Degree
Enabling ML Based Research
Science of Team Science
Vrije Universiteit Amsterdam
CSE 332: Data Abstractions Memory Hierarchy
Computer System.
Presentation transcript:

Lars Arge 1/12 Lars Arge

2/12  Pervasive use of computers and sensors  Increased ability to acquire/store/process data → Massive data collected everywhere  Society increasingly “data driven” → Access/process data anywhere any time  Increased data availability: “nano-technology-like” opportunity Nature issue 2/06: “2020 – Future of computing”  Trend continues as e.g. sensors increasingly pervasive → Exponential growth of scientific data  Paradigm shift: Science will be about mining data → Computer science paramount in all sciences Massive Data

Lars Arge 3/12 Algorithm Inadequacy  Importance of scalability/efficiency → Algorithmics core computer science area  Traditional algorithmics: Transform input to output using simple machine model  Inadequate with e.g.  Massive data  Small/diverse devices  Continually arriving data → Software inadequacies!  Communities addressing inadequacies have emerged  But much work still needs to be done

Lars Arge 4/12  Established 2007  Initially funded for 5 years by the national research foundation  High level objectives:  Advance algorithmic knowledge in “massive data” processing area  Train researchers in world-leading international environment  Be catalyst for multidisciplinary/industry collaboration  Building on:  Strong international team  Establish vibrant international environment (focus on people)  Focus areas:  I/O-efficient, streaming, cache-oblivious algorithms  Algorithm engineering Center for Massive Data Algorithmics

Lars Arge 5/12 I/O-Efficient Algorithms  Problems involving massive data on disk  Disk access is 10 6 times slower than main memory access  Large access time amortized by transferring large blocks of data → Important to store/access data to take advantage of blocks  I/O-efficient algorithms:  Move as few disk blocks as possible to solve given problem “The difference in speed between modern CPU and disk technologies is analogous to the difference in speed in sharpening a pencil using a sharpener on one’s desk or by taking an airplane to the other side of the world and using a sharpener on someone else’s desk.” (D. Comer)

Lars Arge 6/12 I/O-Efficient Algorithms Matters  Example: Traversing linked list (List ranking)  Array size N = 10 elements  Disk block size B = 2 elements  Main memory size M = 4 elements (2 blocks)  Difference between N and N/B large since block size is large  Example: N = 256 x 10 6, B = 8000, 1ms disk access time  N I/Os take 256 x 10 3 sec = 4266 min = 71 hr  N/B I/Os take 256/8 sec = 32 sec Algorithm 2: N/B=5 I/OsAlgorithm 1: N=10 I/Os

Lars Arge 7/12 Streaming Algorithms  Problems involving truly massive data  Sequential read of disk blocks much faster than random read  In many modern (sensor) applications data arrive continually → ( Massive) problems often have to be solved in one sequential scan  Streaming algorithms:  Use single scan, handle each element fast, using small space

Lars Arge 8/12 Cache-Oblivious Algorithms  Problems to be solved on unknown and/or changing devices  Block access important on all levels of memory hierarchy  But memory hierarchies are very diverse  Cache-oblivious algorithms:  Use blocks efficiently on all levels of any hierarchy

Lars Arge 9/12  Algorithm engineering:  Design/implementation of practical algorithms  Experimentation  Center motivated by theory inadequacy  Center wants to promote interdisciplinary/industry work → Natural to consider algorithm engineering  Algorithm engineering work can lead to practical breakthroughs  Example: Flow simulation on massive terrain models ~18 billion points at 1 meter (>>1TB) Implementation of I/O-efficient alg. → two weeks to three hours!! Algorithm Engineering

Lars Arge 10/12 Center Team  International core team of algorithms researchers  Including top ranked US and European groups  Leading expertise in focus areas  AU: I/O, cache and algorithm engineering  MPI: I/O (graph) and algorithm engineering  MIT: Cache and streaming AU MPI MIT Arge Brodal Mehlhorn Meyer Demaine Indyk

Lars Arge 11/12 Center Activities  Visits of core researchers  Exchange of AU, MPI and MIT Post Docs and PhD students  Visiting faculty, Post doc and students from other institutions  Frequent summer schools: Streaming algorithms this week!  Major international events:  25 th Annual Symposium on Computational Geometry in 2009  New conference: Symposium on Algorithms for Massive Datasets  Theme Workshops to foster multidisciplinary/industry collaboration

Lars Arge 12/12  Initially funded for 5 years by the National Research Foundation  Collaboration between three leading international groups  Addresses inadequacies of traditional theory  When processing massive data  When using diverse devices  Initial research focus on I/O-efficient, streaming, cache-oblivious  Also focus on algorithm engineering and multidisciplinary work  Focus on people: Establish vibrant international environment