Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMS-E6998 Information Theory in Computer Science

Similar presentations


Presentation on theme: "COMS-E6998 Information Theory in Computer Science"— Presentation transcript:

1 COMS-E6998 Information Theory in Computer Science
Columbia , Fall 2017

2 Administrativia Instructor : Omri Weinstein TA : Zhenrui Liao
Lectures: Tuesday 4:10-6:20 pm (10-15 min break) . Office Hours: Omri: Tue, 2:30-3:30 pm (or by apt). Zhenrui: Wed 2pm-4pm . Recitations: Optional, possibly a couple to bring up to speed . Original motivation: Netflix, Amazon Instant Video (+pics?) Abstraction of typical scenario: clients miss information during transmissions of network . So this problem models a scenario where clients miss some informatino and NW wishes to minimize re-transmission rate, Course Level : 6K (graduate) -- proof heavy but not technical bootcamp. No prerequisites beyond mathematical maturity + probability theory. Books/HW/reading list – on website ( .

3 Grading HW: Bi-weekly (~5-6 in total) : 40% . Final Project : 40% .
Scribing 1 lecture : 20% (at most 13 lectures, up to 2 students a lecture) . HW policy: - 5-6 exercises per HW, each totals to 125 pts. - Collaborations allowed as long as you write your own solution separately! (new concepts takes time to sink in, hard to keep up without practice). Original motivation: Netflix, Amazon Instant Video (+pics?) Abstraction of typical scenario: clients miss information during transmissions of network . So this problem models a scenario where clients miss some informatino and NW wishes to minimize re-transmission rate,

4 Final Research Projects
Research/Reading based : Choose a course-related paper, summarize project and prepare 10-min talk (for final presentations day). You’re expected to say “something new” about problem (connection/extension). “Meta project”: related to storage, compression and search in modern file systems (more on this soon). Project based on unexplored problem in dynamic data structures – both theory and applied aspects (many directions to choose from, more on this soon..) Implementation based (storage) : Focus on 1 sub-topic in storage project, implement algorithm on real data repositories (more soon). Small groups allowed. Original motivation: Netflix, Amazon Instant Video (+pics?) Abstraction of typical scenario: clients miss information during transmissions of network . So this problem models a scenario where clients miss some informatino and NW wishes to minimize re-transmission rate, - Consult + converge early on (e.g., during office hours or schedule meeting) - Project proposal deadline : October 10th.

5 Introduction

6 Motivating story for this course
Engineering perspective: Traditional “home” of Information Theory (IT). IT underlies most modern digital technology (e.g., compression, coding, distributed storage…). Explore some of this theory, applications, extensions. Complexity perspective: Computational Complexity: “How many resources required to solve given computational problem?” (space, time, energy, neural layers, etc.) Holy grail : Unconditional lower bounds on such resources. Unfortunately, majority of Complexity built on conditional hardness results  Original motivation: Netflix, Amazon Instant Video (+pics?) Abstraction of typical scenario: clients miss information during transmissions of network . So this problem models a scenario where clients miss some informatino and NW wishes to minimize re-transmission rate, One of major achievements of TCS : understanding “information bottleneck” of computational models (as opposed to the efficiency of processing data).

7 The Complexity Theoretic View
Course “birthright” : increasing role of information-theoretic mindset and methods in understanding computation (hopefully twist your minds as well). Most topics we’ll encounter are active research fields (e.g., streaming, LPs, circuits complexity, static/dynamic DS...). Many open problems… Original motivation: Netflix, Amazon Instant Video (+pics?) Abstraction of typical scenario: clients miss information during transmissions of network . So this problem models a scenario where clients miss some informatino and NW wishes to minimize re-transmission rate, What are we about to see in this course ? …

8 Data Compression and Interactive Compression
Data compression, a.k.a Shannon’s “source coding” problem (today) : What is the essential amount of communication communication required to transmit/encode a random dataset (mention few natural and practical compression schemes, e.g., Huffman, JPEG). Later in the course: Does Shannon’s compression theory extend to interactive scenarios, i.e., “can we compress conversations”?

9 Streaming Real-time systems (financial databases, monitoring networks) store huge, rapidly changing data, which is too large to store explicitly. x x xn x +3 +17 +3 - 6 -15 +3 Long sequence of updates in [M] (e.g., IP addresses of packets passing through a router, DNA sequence, transactions in sensor networks…) Goal: Compute some function f(x) of the final stream (whp), with minimum space. Trivial solution: ~ n lg M = O(n) bits of space – too expensive  Relaxation: approximate f(x) (whp) using sublinear space. (ex: Fp := i |xi|p)

10 Data Structures : Time-Space tradeoffs
Nearest Neighbor problem (cornerstone in most ML apps, optimization, DBs…) Rd Preprocess n pts into smallest possible memory, allowing fast answering of exp(d) >> n queries. ? Trivial solutions: Either exp(d) space (store all answers), or ~ n time (scan entire database) . Original motivation: Netflix, Amazon Instant Video (+pics?) Abstraction of typical scenario: clients miss information during transmissions of network . So this problem models a scenario where clients miss some informatino and NW wishes to minimize re-transmission rate, Are there better tradeoffs ? Some databases are dynamic – maintain them cheaply, online (like streaming)…

11 Compression and Storage
“Dictionary” problem: Dropbox server needs to store file collection X = (X1, X2, …, Xn) » ¹ of multiple users, s.t individual files can be retrieved quickly (few memory accesses) . This field will bring us to a specific natural DS problem that arises in Storage systems. Realistic scenario – Files correlated (Info¹(X) ¿ i Info¹ (Xi) is space benchmark) . Tradeoffs b/w storage space and decoding time (# memory accesses) “locally decodable” compression schemes??

12 We’ll see later in the course:
- Theoretical problem is mostly open (understanding it in general notoriously hard) - For some joint distributions, hopeless to have small space and fast (“local”)decoding… But, wait a minute – real life distributions aren’t like that. 2 natural and important data types (distributions) : (1) Visual ; (2) Genomic . (correlations are huge and dictionaries are important) Final project proposal: Theoretical and Applied. Suggested projects : (a) Define generative model for correlated file albums per data type, then try to design similarity search and compression algs (ideally dynamic). Work with real data repositories. (b) how to search ? (c) how to compress? (d) realistic problem is dynamic – can we handle updates? Original motivation: Netflix, Amazon Instant Video (+pics?) Abstraction of typical scenario: clients miss information during transmissions of network . So this problem models a scenario where clients miss some informatino and NW wishes to minimize re-transmission rate, Will host special session (40-min) on this project late September (doodle poll).

13 Interactive Compression
Data compression (today’s topic). Does Shannon’s classical extend to interactive scenarios? Can we compress conversations ? Linear-Program LBs LPs = powerful algorithmic tool for solving (approximating) combinatorial optimization problems. How large (# vars/constraints) does an LP need to be in order to produce a good sln to resource allocation/packing problems? Original motivation: Netflix, Amazon Instant Video (+pics?) Abstraction of typical scenario: clients miss information during transmissions of network . So this problem models a scenario where clients miss some informatino and NW wishes to minimize re-transmission rate, Common denominator of all these applications? Communication and Information Theory (surprisingly, all implicitly embed some distributed problem over a channel…)

14 Rough plan (syllabus) Information Theory 101, Data Compression.
Communication Complexity. Information Complexity and interactive compression. Applications to Streaming. Applications to Data Structures (static & dynamic). Original motivation: Netflix, Amazon Instant Video (+pics?) Abstraction of typical scenario: clients miss information during transmissions of network . So this problem models a scenario where clients miss some informatino and NW wishes to minimize re-transmission rate,


Download ppt "COMS-E6998 Information Theory in Computer Science"

Similar presentations


Ads by Google