Download presentation
Presentation is loading. Please wait.
Published byHelena Allison Modified over 9 years ago
2
ENGS4 2004 Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko, x6-3843 gvc@dartmouth.edu Assistant: Sharon Cooper (“Shay”), x6-3546 Course webpage: http://thayer.dartmouth.edu/~engs004/ http://thayer.dartmouth.edu/~engs004/
3
ENGS4 2004 Lecture 1 Today’s Class Sharon’s mini-lecture Shannon theory and coding theory basics Break Leo’s mini-lecture Ryan’s mini-lecture
4
ENGS4 2004 Lecture 1 Future mini-lectures Feb 19 – Sarah (social inequality/digital divide), Ryan (internet dating), Noah R. (nanotechnology), Scott (digital watermarking) Feb 24 – Dason (pornography), En Young (persistence), Rob (GPS), Simon (online games) Topics – persistence, piracy, IP telephony, blogs, IM, global adoption and trends, GPS, online games
5
ENGS4 2004 Lecture 1 Constructing Prefix Codes Huffman coding 1/61/61/61/61/61/6 123456 1/3 2/3 1 0000010100111011 3/6 x 4 + 2/6 x 2 = 2 2/3 < 3
6
ENGS4 2004 Lecture 1 Another example What about two symbols, a,b with P(a)=0.9 and P(b)=0.1? The way to build efficient codes for such cases is to use “block codes” That is, consider pairs of symbols: aa, ab, ba, bb or aaa, aab, aba, abb, baa, bab, bba, bbb, etc Let’s do an example on the board.
7
ENGS4 2004 Lecture 1 How about the other example? S1S2 01 Average number of bits per symbol = 1. Is this the best possible? No.
8
ENGS4 2004 Lecture 1 Entropy of a source Alphabet ={ s 1, s 2, s 3,..., s n } Prob(s k ) = p k Entropy = H = - p k log 2 (p k ) SOURCE Shannon’s Source Coding Theorem: H <= Average number of bits per symbol used by any decodable code.
9
ENGS4 2004 Lecture 1 Examples H(die example) = - 6 x 1/6 x log(1/6) H(Sun rise) = 0 log 0 + 1 log 1 = 0 How can we achieve the rate determined by the entropy bound? Block codes can do better than pure Huffman coding. “Universal codes”.... eg. Lempel-Ziv.
10
ENGS4 2004 Lecture 1 Break
11
ENGS4 2004 Lecture 1 Models of a source We can easily measure the probability of each symbol in English. How? How about the probabilities of each pair of symbols? Triples? Sentences? True model of English = model of language = model of thought This is very hard.
12
ENGS4 2004 Lecture 1 Lempel-Ziv Coding (zip codes, etc) Want to encode a string: aaabcababcaaa into 0’s and 1’s. Step 1: Convert to 0’s and 1’s by any prefix substitution. a=00 b=01 c=11 00000001110001000111000000
13
ENGS4 2004 Lecture 1 Lempel-Ziv Coding (zip codes, etc) Step 2: Parse the string into “never before seen” strings. 00000001110001000111000000 0,00,000,01,1,10,001,0001,11,0000,00
14
ENGS4 2004 Lecture 1 Lempel-Ziv Coding (zip codes, etc) Step 3: Assign binary numbers to each. 0,00,000,01,1,10,001,0001,11,0000,00 0001,0010,0011,0100,0101,0110,0111,1000,1001,1010,1011 Step 4: To each string, assign number of substring plus last bit. 00000,00010,00100,00011,00001,01010,00101, 00000000100010000011000010101000101...
15
ENGS4 2004 Lecture 1 Lempel-Ziv Coding (zip codes, etc) Step 5: Store this string plus length of labels in bits. 00000,00010,00100, 1111000000001000100.... This may be inefficient for small examples but for very long inputs, it achieves the entropy for the best model. LZW = Lempel-Ziv-Welsch, GIF image interchange standard
16
ENGS4 2004 Lecture 1 Example What is the Lempel Ziv encoding of 00000...0000 (N 0’s)? What is the entropy of the source? How many bits per symbol will be used in the encoded data as N goes to infinity? Let’s work out the details. How about 010010001000010000010000001.... ?
17
ENGS4 2004 Lecture 1 Properties of Lempel-Ziv For most sources (alphabets+probabilities), the Lempel-Ziv algorithm will result in average number of bits per symbol entropy of the source (any order model) if the string/data to be compressed is long enough. How about compressing the compressed string? That is, applying Lempel-Ziv again and again? Answer: The compressed bit string will look completely random: 0 or 1 with probability 1/2. Entropy = 1 means 1 bit per symbol on average. No improvement is possible.
18
ENGS4 2004 Lecture 1 Analog vs Digital Most “real world” phenomena is continuous: images vision sound touch To transmit it, we must convert continuous signals into digital signals. Important note: There is a fundamental shift from continuous to digital representation of the real world.
19
ENGS4 2004 Lecture 1 The Fundamental Shift The telephone system is designed to carry analog voice signals using circuit switching. The whole infrastructure is based on that. When a modem connects your computer to the network over a telephone line, the modem must disguise the computer data as a speech/voice signal. The infrastructure of the internet is totally digital. Sending voice over the internet requires disguising voice as digital data!!! This is a fundamental turnaround....same will hold for TV, cable TV, audio, etc.
20
ENGS4 2004 Lecture 1 Analog to Digital Conversion 111 110 101 100 011 010 001 000 d 2d 3d 4d 5d 6d 7d 8d 9d 10d 11d 12d time are samples
21
ENGS4 2004 Lecture 1 Analog to Digital Conversion 111 110 101 100 011 010 001 000 d 2d 3d 4d 5d 6d 7d 8d 9d 10d 11d 12d time are samples d is the “sampling interval”, 1/d is the sampling “rate”
22
ENGS4 2004 Lecture 1 Sampling and quantization In this example, we are using 8 quantization levels which requires 3 bits per sample. Using 8 bits per sample would lead to 256 quantization levels, etc. If the sampling interval is 1/1000000 second (a microsecond), the sampling rate is 1000000 samples per second or 1 megaHertz. Hertz means “number per second” so 20,000 Hertz means 20,000 per second. So sampling at 20 kiloHertz means “20,000 samples per second”
23
ENGS4 2004 Lecture 1 Analog frequencies All real world signals can be represented as a sum or superposition of sine waves with different frequencies - Fourier representation theorem. The frequency of a sine wave is the number of times it oscillates in a second. Sine wave with frequency 20 will complete a cycle or period once every 1/20th of a second so 20 times a second, etc. We say that a sine wave with frequency 20 is a 20 Hertz signal.....oscillates 20 times a second.
24
ENGS4 2004 Lecture 1 Fourier Java Applet http://www.falstad.com/fourier/
25
ENGS4 2004 Lecture 1 Nyquist Sampling Theorem If an analog signal is “bandlimited” (ie consists of frequencies in a finite range [0, F]), then sampling must be at or above the twice the highest frequency to reconstruct the signal perfectly. Does not take quantization into account. Sampling at lower than the Nyquist rate will lead to “aliasing”.
26
ENGS4 2004 Lecture 1 Sampling for Digital Voice High quality human voice is 4000 Hz Sampling rate is 8000 Hz 8 bit quantization means 64,000 bits per second Phone system built around such a specification Computer communications over voice telephone lines is limited to about 56kbps
27
ENGS4 2004 Lecture 1 Implications for Digital Audio Human ear can hear up to 20 kHz Sampling at twice that rate means 40 kHz Quantization at 8 bits (256 levels) 40,000 samples/second x 8 bits/ sample translates to 320,000 bits per second or 40,000 bytes per second. 60 seconds of music: 2,400,000 Bytes 80 minutes: about 190 Mbytes Audio CD??
28
ENGS4 2004 Lecture 1 Some Digital Audio Links http://www.musiq.com/recording/mp3/index.html http://www.musiq.com/recording/digaudio/bitrates.html Aliasing in Images http://www.telacommunications.com/nutshell/pixelation.h tm#enlargementhttp://www.telacommunications.com/nutshell/pixelation.h tm#enlargement Other http://www.physics.nyu.edu/faculty/sokal/#papers
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.