Download presentation
Presentation is loading. Please wait.
1
Carnegie Mellon
2
Carnegie Mellon Multimedia Michael Christel Alex Hauptmann Rong Jin (TA) http://www.cs.cmu.edu/~alex/mmCourse
3
Carnegie Mellon How to get in touch with us Mike Christel christel@cs.cmu.edu http://www.cs.cmu.edu/~christel (412)268-7799 or x8-7799 WeH5212 Alex Hauptmann alex@cs.cmu.edu http://www.cs.cmu.edu/~alex (412)268-1448 or x8-1448 WeH5124 – Office Hours by Appointment
4
Carnegie Mellon Teaching Assistant Rong Jin jin+@andrew.cmu.edu Office WeH5316 Office hours by appointment (412)268-4050 or x8-4050
5
Carnegie Mellon Course Outline, Part 1 of 3 More details at www.cs.cmu.edu/~alex/mmCourse October 22Intro to Multimedia October 25Multimedia Enabling Technologies, Macromedia Flash Intro and Demo October 29Sound Processing, Speech Recognition November 1Digital Video Creation and Transmission November 5Speech Synthesis
6
Carnegie Mellon Course Outline, Part 2 of 3 More details at www.cs.cmu.edu/~alex/mmCourse November 8Image Processing November 12Digital Music and Music Processing November 15Multimedia Internet Protocols, SMIL November 19Synthetic Interviews: A Multimedia Company (Experiences from the Field) November 22Programming for Interactive Multimedia (CGI Scripts/ASP)
7
Carnegie Mellon Course Outline, Part 3 of 3 More details at www.cs.cmu.edu/~alex/mmCourse November 29Content Analysis and Coding of Digital Audio and Video, Multimedia Storage and Retrieval Management. December 3Video Retrieval Evaluation and Testing Multimedia Interface Design, Digital Libraries December 6Visual Design, Multimedia Interface Design Guidelines, Multimedia use in the future (Experience on Demand) December 10Multimedia as Entertainment Technology, Virtual Reality
8
Carnegie Mellon
9
Carnegie Mellon Homeworks See http://www.cs.cmu.edu/~alex/mmCourse 9 Homeworks planned, 10 points each One hard homework will be worth 20 points No final, no midterm Publish homeworks on your web page - email us URL Space?
10
Carnegie Mellon Today: Intro to Multimedia Apple Knowledge Navigator Vision 1988
11
Audio Images InformationRetrieval StorageSystems Networking Psychology HCI DataCompression NaturalLanguageProcessing Multimedia CPU Power Video
12
Carnegie Mellon Definition of Multimedia Multi (latin multus - numerous) Media, medium (latin medius, medium: middle, center, intermediary; latin mediat: intermediary, means) Multiple types of information captured, stored, manipulated, transmitted, and presented. Specifically: Images, Video, Audio (+Speech) and Text
13
Carnegie Mellon Definition of Multimodal Multi (latin multus - numerous) Modal (latin modus: manner) Traditionally refers to input/output formats: Input: sounds, speech (mike) gestures (camera, tablet) eye-gaze (camera), mouse, keyboard Output: sounds, speech video Pictures Animations Text
14
Carnegie Mellon Perceived Information Physical Variables Sound is a waveform An image is a waveform light is electromagnetic radiation with different intensity in spatial coordinates color corresponds to wavelength
15
Carnegie Mellon History of Multimedia I Analog signals to sensors E.g. vinyl records Fidelity is faithfulness to the original Digital representation (‘60s) Sampling Quantizing Coding codec, modem, (A/D and D/A)
16
Carnegie Mellon Hardware Advances CPU Bus Network I/O Keyboard, Mouse Disk Mike + A/D Board Camera + A/D Board Speakers (+ D/A Board) Display
17
Carnegie Mellon History of Multimedia II Analog controls only Special hardware (Displays, Scanners, FFTs) Integrated hardware components Further Integration Other devices
18
Carnegie Mellon History of Multimedia III Limiting Factors: Storage Limits CPU Speeds I/O Speeds Network Bandwidth
19
Carnegie Mellon Why Digital? Universal storage, transmission format CD, internet Precision (Range of values, number of bits, floating point) Lossless transmission/storage BUT: sampling rate distorts information size requirements may be ‘large’ compared to analog
20
Carnegie Mellon Digitization Process Sampling from an analog signal Sampling Errors relate to signal frequencies Quantization Errors
21
Carnegie Mellon Text ASCII, Unicode Formatted Text, Rich Text Document Formats: –Structured: Tex, HTML –Page Descriptions: Postscript, PDF
22
Carnegie Mellon Graphics Objects –circles, splines, rectangles, lines Editable –resize, reshape, move, colorize Synthetic
23
Carnegie Mellon Images (Pictures) Fixed digitized representation –bitmap, colors per pixel Editable in limited ways –retouch, cut and paste, remap colors, filter [Photoshop tools] –no ‘model’ of the thing Captured –not just from real life, clip art, screen dump
24
Carnegie Mellon Audio Sounds –hear 15 Hz to 20 kHz –Speech is 50 Hz to 10 kHz Speech Recognition –It is hard to wreck a nice beach –Ice cream I scream Synthesis –Speech –Music MIDI for 127 instruments, 47 percussion sounds Notes, timing
25
Carnegie Mellon Speech Recognition Issues Continuous vs Discrete Vocabulary Size Channel (Microphone) Environment (Location of mike and Speaker) Speaker Dependent/Speaker Independent Context (Language Model) Interactivity (Dialog Model)
26
Carnegie Mellon Acoustic Modeling Describes the sounds that make up speech Lexicon Describes which sequences of speech sounds make up valid words Language Model Describes the likelihood of various sequences of words being spoken Speech Recognition Speech Recognition Knowledge Sources
27
Carnegie Mellon Speech Variations Style Variations careful, clear, articulated, formal, casual spontaneous, normal, read, dictated, intimate Voice Quality breathy, creaky, whispery, tense, lax, modal Context sport, professional, interview, free conversation, man-machine dialogue Speaking Rate normal, slow, fast, very fast Stress in noise, with increased vocal effort (Lombard reflex), emotional factors (e.g. angry), under cognitive load
28
Carnegie Mellon Video Frames comprise the video –Frame rate = delay between successive frames –minimal change between frames Sequencing creates the illusion of movement > 16 fps is “smooth” Standards: 29.97 is NTSC, 25 is PAL, 60 is HDTV Interlacing Display scan rate is different –monitor refresh rate –60 - 70 Hz (= 1/s)
29
Carnegie Mellon Captured vs. Synthetic Animation vs Video Graphics vs Pictures Synthesizer vs Recording Storage? Manipulation? Processor Requirements? Fidelity to real world Hybrids are possible
30
Carnegie Mellon Why is Multimedia Important? Our society - – captures its experience, – records its accomplishments, – portrays its past – informs its masses ……in pictures, audio and video For many, CNN has become the “publication of record” Multimedia learning leverages “multiple intelligences” Gardner, 1993 Multimedia Digital libraries are an essential component of – formal, informal, and professional learning – distance education, telemedicine
31
Carnegie Mellon Technology Push vs Market Pull –Home Entertainment –Catalog Ordering –Multimedia Training, Education –Videoconferencing –Professional Video Services –Videomail –Speech Recognition
32
Carnegie Mellon Hype vs. Reality What is feasible, under what circumstances? What is possible? What is impossible? What is unlikely?
33
Carnegie Mellon Multimedia Visions DARPA: Dominate the Battle Space HP “1995” LSI “Flash Point” HP “Synergies”
34
Carnegie Mellon Intro to Multimedia That’s all for today
35
Carnegie Mellon
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.