11 Algorithmic Techniques for Massive Data (COMS 6998-9) Alex Andoni.

Slides:



Advertisements
Similar presentations
Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev 361 Levine STOC 2014, joint work with Alexandr Andoni, Krzysztof.
Advertisements

Algorithms for data streams Foundations of Data Science 2014 Indian Institute of Science Navin Goyal.
Parallel Algorithms for Geometric Graph Problems Alex Andoni (Microsoft Research) Joint with: Aleksandar Nikolov (Rutgers), Krzysztof Onak (IBM), Grigory.
COMP171 Data Structures and Algorithms Spring 2009.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.
COMP171 Data Structures and Algorithms Spring 2009.
CSC Topics in Algorithms: Combinatorial Optimization and Approximation Algorithms Lecture 1: Jan 10.
Lecture 41 CSE 331 Dec 10, HW 10 due today Q1 in one pile and Q 3+4 in another I will not take any HW after 1:15pm.
Lecture 21 CSE 331 Oct 20, Announcements Graded mid-term exams at the END of the lecture Sign up for blog posts/group scribe leader No more than.
COMP171 Data Structures and Algorithm Huamin Qu Lecture 1 (Sept. 1, 2005)
Discrete Math CSC151 Analysis of Algorithms. Complexity of Algorithms  In CS it's important to be able to predict how many resources an algorithm will.
Today’s quiz on 8.2 A Graphing Worksheet 1 will be given at the end of class. You will have 12 minutes to complete this quiz, which will consist of one.
COMS W1004 Introduction to Computer Science May 29, 2009.
Lecture 8 CSE 331 Sep 18, Homeworks Hand in your HW 1 HW 2 and solutions to HW 1 out at the end of class Not naming your collaborators is same as.
Approaches to Representing and Recognizing Objects Visual Classification CMSC 828J – David Jacobs.
FLANN Fast Library for Approximate Nearest Neighbors
CS525: Special Topics in DBs Large-Scale Data Management
Sketching, Sampling and other Sublinear Algorithms: Streaming Alex Andoni (MSR SVC)
Sketching, Sampling and other Sublinear Algorithms: Algorithms for parallel models Alex Andoni (MSR SVC)
ISE420 Algorithmic Operations Research Asst.Prof.Dr. Arslan M. Örnek Industrial Systems Engineering.
COMP 111 Programming Languages 1 First Day. Course COMP111 Dr. Abdul-Hameed Assawadi Office: Room AS15 – No. 2 Tel: Ext. ??
Data Structures and Programming.  Today:  Administrivia  Introduction to 225, Stacks  Course website: 
Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu.
Computer Networks Lecture 1: Logistics Based on slides from D. Choffnes Northeastern U. and P. Gill from StonyBrook University Revised Autumn 2015 by S.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
1 HEINZ NIXDORF INSTITUT University of Paderborn Algorithms und Complexity Seminar Designing Peer-to-Peer- Networks Christian Schindelhauer
Efficient Elastic Burst Detection in Data Streams Yunyue Zhu and Dennis Shasha Department of Computer Science Courant Institute of Mathematical Sciences.
2011 Group Project2 Goal: Groups of 3 students (preferred, 2 and 4 is also okay; students pick a topic, work 3 weeks on the topic, and prepare a 5-8-page.
Principles of Computer Science I Honors Section Note Set 1 CSE 1341 – H 1.
IST 210: Organization of Data
Geometric Problems in High Dimensions: Sketching Piotr Indyk.
Point Pattern Analysis. Methods for analyzing completely censused population data F Entire extent of study area or F Each unit of an array of contiguous.
Nonlinear Control Systems ECSE 6420 Spring 2009 Lecture 1: 12 January 2009.
COMP6311D Hot Topics on Big Data Lei Chen. Course Info. Venue: Rm 3311 Time: Wed 1:30-4:20pm Instructors: Lei Chen and Ke Yi Course Webpage: –
Computational Geometry Piyush Kumar (Lecture 1: Introduction) Welcome to CIS5930.
CSCE 1030 Computer Science 1 First Day. Course Dr. Ryan Garlick Office: Research Park F201 B –Inside the Computer Science department.
ITIS 4510/5510 Web Mining Spring Overview Class hour 5:00 – 6:15pm, Tuesday & Thursday, Woodward Hall 135 Office hour 3:00 – 5:00pm, Tuesday, Woodward.
Machine Learning with Discriminative Methods Lecture 00 – Introduction CS Spring 2015 Alex Berg.
Sketching, Sampling and other Sublinear Algorithms: Euclidean space: dimension reduction and NNS Alex Andoni (MSR SVC)
Outline Problem Background Theory Extending to NLP and Experiment
1 CS 311 Data Structures. 2 Instructor Name : Vana Doufexi Office : 2-229, Ford Building Office hours: By appointment.
Capstone Project Fall Course Information Instructor Ye Zhao –Office: MSB 220 – Fall 2015 (MSB162) –Time: Tue, Thu 10:45am.
1 Data Structures CSCI 132, Spring 2014 Lecture 1 Big Ideas in Data Structures Course website:
&d1 Algorithms and networks Period 3, 2010/2011. &d2 Today Graphs and networks and algorithms: what and why? This course: organization Case introduction:
Course Instructor Professor Clark J. Radcliffe Office hours: MWF 11:30-12: EB ME 481.
CSE 312 Foundations of Computing II Instructor: Pedro Domingos.
Data Structures and Algorithms in Java AlaaEddin 2012.
INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Review Lecture Tuesday, 12/11/01.
11 Lecture 24: MapReduce Algorithms Wrap-up. Admin PS2-4 solutions Project presentations next week – 20min presentation/team – 10 teams => 3 days – 3.
Lecture Topics: 12/1 File System Implementation –Space allocation –Free Space –Directory implementation –Caching Disk Scheduling File System/Disk Interaction.
Dr. Ying Lu ylu at cse.unl.edu Schorr Center Aug 22, CSCE 351 Operating System Kernels.
Summer School on Hashing’14 Dimension Reduction Alex Andoni (Microsoft Research)
IST 210: ORGANIZATION OF DATA Introduction IST210 1.
Design and Analysis of Algorithms CS st Term Course Syllabus Cairo University Faculty of Computers and Information.
Kriging for Estimation of Mineral Resources GISELA/EPIKH School Exequiel Sepúlveda Department of Mining Engineering, University of Chile, Chile ALGES Laboratory,
6/12/20161 SOEN 385 Control Systems and Applications Instructor: T. D. Bui Office: EV Office hours:
Computational Geometry Piyush Kumar (Lecture 1: Introduction) Welcome to CIS5930.
RAIK 283 Data Structures and Algorithms
Lecture 1 (Part 1) Introduction/Overview Tuesday, 9/9/08
COMS-E6998 Information Theory in Computer Science
CS & CS Probabilistic Data Management
Sublinear Algorithmic Tools 2
Lecture 4: CountSketch High Frequencies
Lecture 7: Dynamic sampling Dimension Reduction
CS & CS ST: Probabilistic Data Management
Intro to CIT 594
Lecture 15: Least Square Regression Metric Embeddings
President’s Day Lecture: Advanced Nearest Neighbor Search
Solution methods for NP-hard Discrete Optimization Problems
Presentation transcript:

11 Algorithmic Techniques for Massive Data (COMS ) Alex Andoni

Algorithms Happy when your algorithm is fast Golden standard: – “linear time”  O(input size) time and space. 2 COMS E4231

Algorithms for massive data 3 Computer resources << data Access data in a limited way – Limited space (main memory << hard drive) – Limited time (time << time to read entire data) COMS E4231

Example of “something”: # distinct IPs max frequency other statistics… Scenario: limited space IPFrequency IPFrequency Challenge: compute something on the table, using small space. Challenge: compute something on the table, using small space

How? 5

Topics Streaming algorithms IPFrequency

Topics Streaming algorithms Dimension reduction, sketching 7 d a t a DTA A

Topics Streaming algorithms Dimension reduction, sketching High-dimensional Nearest Neighbor Search

Topics Streaming algorithms Dimension reduction, sketching High-dimensional Nearest Neighbor Search Sampling, property testing 9

Topics Streaming algorithms Dimension reduction, sketching High-dimensional Nearest Neighbor Search Sampling, property testing Parallel algorithms 10

The class is not about BIG DATA – or Massive Data – it is about algorithms where data volume is so large that classic algorithmic approaches don’t scale well MapReduce, or other systems – “theory class”, implementation-independent – will mention application areas 11

Course Information Instructor: Alex Andoni TAs: Drishan Arora, Pedro Savarese, Kevin Shi Grading: – Scribing, 2-3 students per lecture (10%) – 5 homeworks (55%) 1 st : 7% (due next Thursday, Sep 17 th ) 2 nd -5 th : 12% each 5 days of lateness total (120 hours). No other extentions. OK to collaborate (4 max). Each writes their own solutions. – Project, research-based (35%) Solve/make progress on an open problem in the area Apply algorithms to your research area (e.g., implement an algorithm) Synthesis of a few related papers In teams, up to 4ppl. Presentation at the end. Scribing today? 12

Problem: counting 13 IPFrequency

Morris Algorithm [1978] 14