ENGS Lecture 1 ENGS 4 - Lecture 11 Technology of Cyberspace Winter 2004 Thayer School of Engineering Dartmouth College Instructor: George Cybenko, x Assistant: Sharon Cooper (“Shay”), x Course webpage:
ENGS Lecture 1 Today’s Class Sharon’s mini-lecture Shannon theory and coding theory basics Break Leo’s mini-lecture Ryan’s mini-lecture
ENGS Lecture 1 Future mini-lectures Feb 19 – Sarah (social inequality/digital divide), Ryan (internet dating), Noah R. (nanotechnology), Scott (digital watermarking) Feb 24 – Dason (pornography), En Young (persistence), Rob (GPS), Simon (online games) Topics – persistence, piracy, IP telephony, blogs, IM, global adoption and trends, GPS, online games
ENGS Lecture 1 Constructing Prefix Codes Huffman coding 1/61/61/61/61/61/ /3 2/ /6 x 4 + 2/6 x 2 = 2 2/3 < 3
ENGS Lecture 1 Another example What about two symbols, a,b with P(a)=0.9 and P(b)=0.1? The way to build efficient codes for such cases is to use “block codes” That is, consider pairs of symbols: aa, ab, ba, bb or aaa, aab, aba, abb, baa, bab, bba, bbb, etc Let’s do an example on the board.
ENGS Lecture 1 How about the other example? S1S2 01 Average number of bits per symbol = 1. Is this the best possible? No.
ENGS Lecture 1 Entropy of a source Alphabet ={ s 1, s 2, s 3,..., s n } Prob(s k ) = p k Entropy = H = - p k log 2 (p k ) SOURCE Shannon’s Source Coding Theorem: H <= Average number of bits per symbol used by any decodable code.
ENGS Lecture 1 Examples H(die example) = - 6 x 1/6 x log(1/6) H(Sun rise) = 0 log log 1 = 0 How can we achieve the rate determined by the entropy bound? Block codes can do better than pure Huffman coding. “Universal codes”.... eg. Lempel-Ziv.
ENGS Lecture 1 Break
ENGS Lecture 1 Models of a source We can easily measure the probability of each symbol in English. How? How about the probabilities of each pair of symbols? Triples? Sentences? True model of English = model of language = model of thought This is very hard.
ENGS Lecture 1 Lempel-Ziv Coding (zip codes, etc) Want to encode a string: aaabcababcaaa into 0’s and 1’s. Step 1: Convert to 0’s and 1’s by any prefix substitution. a=00 b=01 c=
ENGS Lecture 1 Lempel-Ziv Coding (zip codes, etc) Step 2: Parse the string into “never before seen” strings ,00,000,01,1,10,001,0001,11,0000,00
ENGS Lecture 1 Lempel-Ziv Coding (zip codes, etc) Step 3: Assign binary numbers to each. 0,00,000,01,1,10,001,0001,11,0000, ,0010,0011,0100,0101,0110,0111,1000,1001,1010,1011 Step 4: To each string, assign number of substring plus last bit ,00010,00100,00011,00001,01010,00101,
ENGS Lecture 1 Lempel-Ziv Coding (zip codes, etc) Step 5: Store this string plus length of labels in bits ,00010,00100, This may be inefficient for small examples but for very long inputs, it achieves the entropy for the best model. LZW = Lempel-Ziv-Welsch, GIF image interchange standard
ENGS Lecture 1 Example What is the Lempel Ziv encoding of (N 0’s)? What is the entropy of the source? How many bits per symbol will be used in the encoded data as N goes to infinity? Let’s work out the details. How about ?
ENGS Lecture 1 Properties of Lempel-Ziv For most sources (alphabets+probabilities), the Lempel-Ziv algorithm will result in average number of bits per symbol entropy of the source (any order model) if the string/data to be compressed is long enough. How about compressing the compressed string? That is, applying Lempel-Ziv again and again? Answer: The compressed bit string will look completely random: 0 or 1 with probability 1/2. Entropy = 1 means 1 bit per symbol on average. No improvement is possible.
ENGS Lecture 1 Analog vs Digital Most “real world” phenomena is continuous: images vision sound touch To transmit it, we must convert continuous signals into digital signals. Important note: There is a fundamental shift from continuous to digital representation of the real world.
ENGS Lecture 1 The Fundamental Shift The telephone system is designed to carry analog voice signals using circuit switching. The whole infrastructure is based on that. When a modem connects your computer to the network over a telephone line, the modem must disguise the computer data as a speech/voice signal. The infrastructure of the internet is totally digital. Sending voice over the internet requires disguising voice as digital data!!! This is a fundamental turnaround....same will hold for TV, cable TV, audio, etc.
ENGS Lecture 1 Analog to Digital Conversion d 2d 3d 4d 5d 6d 7d 8d 9d 10d 11d 12d time are samples
ENGS Lecture 1 Analog to Digital Conversion d 2d 3d 4d 5d 6d 7d 8d 9d 10d 11d 12d time are samples d is the “sampling interval”, 1/d is the sampling “rate”
ENGS Lecture 1 Sampling and quantization In this example, we are using 8 quantization levels which requires 3 bits per sample. Using 8 bits per sample would lead to 256 quantization levels, etc. If the sampling interval is 1/ second (a microsecond), the sampling rate is samples per second or 1 megaHertz. Hertz means “number per second” so 20,000 Hertz means 20,000 per second. So sampling at 20 kiloHertz means “20,000 samples per second”
ENGS Lecture 1 Analog frequencies All real world signals can be represented as a sum or superposition of sine waves with different frequencies - Fourier representation theorem. The frequency of a sine wave is the number of times it oscillates in a second. Sine wave with frequency 20 will complete a cycle or period once every 1/20th of a second so 20 times a second, etc. We say that a sine wave with frequency 20 is a 20 Hertz signal.....oscillates 20 times a second.
ENGS Lecture 1 Fourier Java Applet
ENGS Lecture 1 Nyquist Sampling Theorem If an analog signal is “bandlimited” (ie consists of frequencies in a finite range [0, F]), then sampling must be at or above the twice the highest frequency to reconstruct the signal perfectly. Does not take quantization into account. Sampling at lower than the Nyquist rate will lead to “aliasing”.
ENGS Lecture 1 Sampling for Digital Voice High quality human voice is 4000 Hz Sampling rate is 8000 Hz 8 bit quantization means 64,000 bits per second Phone system built around such a specification Computer communications over voice telephone lines is limited to about 56kbps
ENGS Lecture 1 Implications for Digital Audio Human ear can hear up to 20 kHz Sampling at twice that rate means 40 kHz Quantization at 8 bits (256 levels) 40,000 samples/second x 8 bits/ sample translates to 320,000 bits per second or 40,000 bytes per second. 60 seconds of music: 2,400,000 Bytes 80 minutes: about 190 Mbytes Audio CD??
ENGS Lecture 1 Some Digital Audio Links Aliasing in Images tm#enlargementhttp:// tm#enlargement Other