Download presentation
Presentation is loading. Please wait.
Published byProsper Chandler Modified over 9 years ago
1
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014
2
INTRODUCTION This chapter discusses the current state of the art and gaps in fundamental understanding of computations over massive data sets. Massive data computation uses three types of resources Computational resources Statistical/information-theoretic resources Physical resources 2Resources, Trade-offs, and Limitations
3
RELEVANT ASPECTS OF THEORETICAL CS Theoretical Computer Science Discover and analyze algorithms for key computational problems that are efficient in terms of resources used Understand the inherent limitations of computation with bounded resources. Relevant aspects: Tractability and Intractability Sublinear, Sketching, and Streaming Algorithms Communication Complexity External Memory Parallel Algorithms Computational Learning Theory 3Resources, Trade-offs, and Limitations
4
TRACTABILITY AND INTRACTABILITY Tractable: problems that have polynomial-time algorithms Intractable: problems for which such algorithms are conjectured to not exist NP: problems for which it takes polynomial time to verify the solution NP-complete problems: if any of them can be solved in polynomial time, then all of them can. E.g., Traveling Salesman Problem: given a set of cities and distances between them, is there a tour of a given length that visits all cities? P v.s. NP: whether an NP-complete problem can be solved in polynomial time 4Resources, Trade-offs, and Limitations
5
SUBLINEAR, SKETCHING, AND STREAMING ALGORITHMS Sublinear algorithms: using an amount of resources that is much smaller than the input size, often exponentially smaller. Stream model: the algorithm can make only a single pass over the data, and the storage used by the algorithm can be much smaller than the input size. Sketch: a summary of the input, which is much shorter but nevertheless sufficient to approximate the desired quantity. E.g., counting the number of distinct elements in a stream Sublinear time computation: the algorithm is restricted to using an amount of time that scales sublinearly with the input size. Approximate property testing: one tests whether the input satisfies a certain property using few data samples. Resources, Trade-offs, and Limitations5
6
COMMUNICATION COMPLEXITY Definition The amount of information that needs to be extracted from the input, or communicated between two or more parties sharing parts of the input, to accomplish a given task. E.g., “sketching” to sublinear computation Some tasks cannot be accomplished using limited communication. E.g., set disjointness problem: two parties want to determine whether two data sets of equal size, each held locally by one of the parties, contain any common items. Resources, Trade-offs, and Limitations6
7
EXTERNAL MEMORY The cost of transferring data between the fast local memory and slow external memory The External Memory Model: The data are exchanged between the main and external memories via a sequence of input/output (I/O) operations. The complexity of an algorithm is then measured by the total number of I/O operations that the algorithm performs. “cache-aware” Algorithms must be supplied with the amount of available main memory before they can proceed. Resources, Trade-offs, and Limitations7
8
PARALLEL ALGORITHMS “For which problems one can obtain a speedup using parallelism?” Problems having polynomial-time sequential algorithms One can obtain exponential speedups by using parallelism. E.g., finding a perfect matching in a graph: finding a subset of edges that contains exactly one edge incident to any vertex, given a set of nodes and edges between them. Resources, Trade-offs, and Limitations8
9
COMPUTATIONAL LEARNING THEORY “How much data and computational resources are needed in order to ‘learn’ a concept of interest with a given accuracy and confidence? “ For example, to infer a linear classifier that can separate data into positive and negative classes, given a sequence of labeled examples and using a bounded amount of computational resources. Resources, Trade-offs, and Limitations9
10
CHALLENGES FOR COMPUTER SCIENCE Computational Hardness of Massive Data Set Problems Polynomial-time is typically not a sufficient condition for tractability when the input to a problem is very large. Quadratic time is a natural boundary of intractability for problems over massive data. Versatile tools are lacking that would help determine whether a given problem has a sub-quadratic-time algorithm or not Linear running time is a gold standard of algorithmic efficiency. The Role of Constants Algorithm engineering rather than algorithm design Time-dependent v.s. platform-dependent New Models for Massive Data Computation MapReduce, Hadoop and variations, multicores, graphic processing units (GPUs), and parallel databases. 10Resources, Trade-offs, and Limitations
11
NEW MODEL: MAPREDUCE Resources, Trade-offs, and Limitations 11
12
CHALLENGES FOR OTHER DISCIPLINES Statistics The more data is available, the easier it becomes to solve them. Data – a sequence of samples from some distribution or that the data have sparsity or other structural properties Quantitative trade-offs: the computational limitations are typically specified in terms of polynomial-time computability, and thus the limitations of that topic apply. Privacy: “how much information about the data must be revealed in order to perform some computation or answer some queries about the data?” Physical Resources Optimizing energy use; Green computing Reversible computing aims to understand the necessary condition for computation to be energy efficient. Resources, Trade-offs, and Limitations12
13
QUESTIONS? Resources, Trade-offs, and Limitations13
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.