Carnegie Mellon
Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA)
Carnegie Mellon How to get in touch with us Mike Christel (412) or x WeH5212 Alex Hauptmann (412) or x WeH5124 – Office Hours by Appointment
Carnegie Mellon Teaching Assistant Rong Jin Office WeH5316 Office hours by appointment (412) or x8-4050
Carnegie Mellon Course Outline, Part 1 of 3 More details at October 22Intro to Multimedia October 25Multimedia Enabling Technologies, Macromedia Flash Intro and Demo October 29Sound Processing, Speech Recognition November 1Digital Video Creation and Transmission November 5Speech Synthesis
Carnegie Mellon Course Outline, Part 2 of 3 More details at November 8Image Processing November 12Digital Music and Music Processing November 15Multimedia Internet Protocols, SMIL November 19Synthetic Interviews: A Multimedia Company (Experiences from the Field) November 22Programming for Interactive Multimedia (CGI Scripts/ASP)
Carnegie Mellon Course Outline, Part 3 of 3 More details at November 29Content Analysis and Coding of Digital Audio and Video, Multimedia Storage and Retrieval Management. December 3Video Retrieval Evaluation and Testing Multimedia Interface Design, Digital Libraries December 6Visual Design, Multimedia Interface Design Guidelines, Multimedia use in the future (Experience on Demand) December 10Multimedia as Entertainment Technology, Virtual Reality
Carnegie Mellon
Carnegie Mellon Homeworks See 9 Homeworks planned, 10 points each One hard homework will be worth 20 points No final, no midterm Publish homeworks on your web page - us URL Space?
Carnegie Mellon Today: Intro to Multimedia Apple Knowledge Navigator Vision 1988
Audio Images InformationRetrieval StorageSystems Networking Psychology HCI DataCompression NaturalLanguageProcessing Multimedia CPU Power Video
Carnegie Mellon Definition of Multimedia Multi (latin multus - numerous) Media, medium (latin medius, medium: middle, center, intermediary; latin mediat: intermediary, means) Multiple types of information captured, stored, manipulated, transmitted, and presented. Specifically: Images, Video, Audio (+Speech) and Text
Carnegie Mellon Definition of Multimodal Multi (latin multus - numerous) Modal (latin modus: manner) Traditionally refers to input/output formats: Input: sounds, speech (mike) gestures (camera, tablet) eye-gaze (camera), mouse, keyboard Output: sounds, speech video Pictures Animations Text
Carnegie Mellon Perceived Information Physical Variables Sound is a waveform An image is a waveform light is electromagnetic radiation with different intensity in spatial coordinates color corresponds to wavelength
Carnegie Mellon History of Multimedia I Analog signals to sensors E.g. vinyl records Fidelity is faithfulness to the original Digital representation (‘60s) Sampling Quantizing Coding codec, modem, (A/D and D/A)
Carnegie Mellon Hardware Advances CPU Bus Network I/O Keyboard, Mouse Disk Mike + A/D Board Camera + A/D Board Speakers (+ D/A Board) Display
Carnegie Mellon History of Multimedia II Analog controls only Special hardware (Displays, Scanners, FFTs) Integrated hardware components Further Integration Other devices
Carnegie Mellon History of Multimedia III Limiting Factors: Storage Limits CPU Speeds I/O Speeds Network Bandwidth
Carnegie Mellon Why Digital? Universal storage, transmission format CD, internet Precision (Range of values, number of bits, floating point) Lossless transmission/storage BUT: sampling rate distorts information size requirements may be ‘large’ compared to analog
Carnegie Mellon Digitization Process Sampling from an analog signal Sampling Errors relate to signal frequencies Quantization Errors
Carnegie Mellon Text ASCII, Unicode Formatted Text, Rich Text Document Formats: –Structured: Tex, HTML –Page Descriptions: Postscript, PDF
Carnegie Mellon Graphics Objects –circles, splines, rectangles, lines Editable –resize, reshape, move, colorize Synthetic
Carnegie Mellon Images (Pictures) Fixed digitized representation –bitmap, colors per pixel Editable in limited ways –retouch, cut and paste, remap colors, filter [Photoshop tools] –no ‘model’ of the thing Captured –not just from real life, clip art, screen dump
Carnegie Mellon Audio Sounds –hear 15 Hz to 20 kHz –Speech is 50 Hz to 10 kHz Speech Recognition –It is hard to wreck a nice beach –Ice cream I scream Synthesis –Speech –Music MIDI for 127 instruments, 47 percussion sounds Notes, timing
Carnegie Mellon Speech Recognition Issues Continuous vs Discrete Vocabulary Size Channel (Microphone) Environment (Location of mike and Speaker) Speaker Dependent/Speaker Independent Context (Language Model) Interactivity (Dialog Model)
Carnegie Mellon Acoustic Modeling Describes the sounds that make up speech Lexicon Describes which sequences of speech sounds make up valid words Language Model Describes the likelihood of various sequences of words being spoken Speech Recognition Speech Recognition Knowledge Sources
Carnegie Mellon Speech Variations Style Variations careful, clear, articulated, formal, casual spontaneous, normal, read, dictated, intimate Voice Quality breathy, creaky, whispery, tense, lax, modal Context sport, professional, interview, free conversation, man-machine dialogue Speaking Rate normal, slow, fast, very fast Stress in noise, with increased vocal effort (Lombard reflex), emotional factors (e.g. angry), under cognitive load
Carnegie Mellon Video Frames comprise the video –Frame rate = delay between successive frames –minimal change between frames Sequencing creates the illusion of movement > 16 fps is “smooth” Standards: is NTSC, 25 is PAL, 60 is HDTV Interlacing Display scan rate is different –monitor refresh rate – Hz (= 1/s)
Carnegie Mellon Captured vs. Synthetic Animation vs Video Graphics vs Pictures Synthesizer vs Recording Storage? Manipulation? Processor Requirements? Fidelity to real world Hybrids are possible
Carnegie Mellon Why is Multimedia Important? Our society - – captures its experience, – records its accomplishments, – portrays its past – informs its masses ……in pictures, audio and video For many, CNN has become the “publication of record” Multimedia learning leverages “multiple intelligences” Gardner, 1993 Multimedia Digital libraries are an essential component of – formal, informal, and professional learning – distance education, telemedicine
Carnegie Mellon Technology Push vs Market Pull –Home Entertainment –Catalog Ordering –Multimedia Training, Education –Videoconferencing –Professional Video Services –Videomail –Speech Recognition
Carnegie Mellon Hype vs. Reality What is feasible, under what circumstances? What is possible? What is impossible? What is unlikely?
Carnegie Mellon Multimedia Visions DARPA: Dominate the Battle Space HP “1995” LSI “Flash Point” HP “Synergies”
Carnegie Mellon Intro to Multimedia That’s all for today
Carnegie Mellon