Education and Research in the Center for Signal and Image Processing http://www.eedsp.gatech.edu/
CSIP Summary Our Ph.D. graduates have impact worldwide in DSP education and research Distinguished faculty 17 faculty (7 IEEE Fellows, 2 National Academy members) Co-authors of over 25 books on DSP & its applications Over 80 current Ph.D. students Located in GCATT building with excellent, modern facility Support from Georgia Research Alliance has provided outstanding well equipped labs.
Beowulf Cluster 26 dual processors 1 Gbyte memories
CSIP Faculty Yucel Altunbasak David V. Anderson Thomas P. Barnwell Mark A. Clements Faramarz Fekri Monson H. Hayes Joel R. Jackson Fred Juang Aaron Lanterman Chin Lee Vijay K. Madisetti Francois Malassenet James H. McClellan Russell M. Mersereau Ronald W. Schafer Douglas B. Williams G. Tong Zhou
Past and Present Funding Industry: Texas Instruments, Intel, BAE Systems, Hewlett-Packard, Mathworks, National Semiconductor, Analog Devices, Lucent, Harris, Hughes, Prentice-Hall Federal: NSF, U.S. Army, DARPA, ONR, NASA,MPO State: Georgia Research Alliance Private Foundation: John and Mary Franklin Foundation Total Funding: Current funding from government and industry totals about $6.5M
Current Research Areas - I Speech Processing Robust automatic speech recognition New architectures for speech recognition High-quality low-bit-rate speech coding for voice over IP Blind separation of speech signals Audio Signal Processing Music analysis and synthesis Compressed-domain processing of audio Acoustic Signal Processing Noise and reverberation removal Microphone array processing Spatialization
Current Research Areas - II Video Signal Processing Target tracking in video Video streaming with error concealment and MDC Graphics streaming for the Internet Automated analysis of video Video indexing for smart VCR Super-resolution of video Face Recognition Video compression Image Processing Image-based graphical rendering Image interpolation for digital color cameras
Current Research Areas - III Multimedia & Multi-modal Signal Processing “Intelligent Environments” Automatic storage/retrieval of speech and audio Audio-visual speech recognition Speech-driven facial animation Application of multimedia processing in education Communications Signal Processing Chaos in wireless communication systems Space-time coding and OFDM Compensation for selective fading effects Finite field wavelet transforms and applications to error control coding and cryptography Compensation of nonlinear power amps
Current Research Areas - IV Signal Modeling Multi-scale sinusoidal modeling Biological Signal Processing Automated measurement and modeling of behavior in biological systems Military Signal Processing Buried mine detection using GPR, seismic & EMI Target Tracking in sensor networks Hyperspectral imaging and target classification SAR imaging Medical Signal Processing Segmentation of cardiac MRI images DSP for hand-held communication devices
Industrial Partnership Examples Texas Instruments Leadership Univ. Program Members with MIT and Rice U. Seven projects - 7 faculty and 7 Ph.D. students Wireless video, CFA interpolation, speech coding, speech recognition, chaotic systems, face recognition, MIMO communication systems Hewlett Packard Laboratories Four faculty and six students Focus on PDAs: low-power analog front-ends, structured audio, applications in education. Also, 3D video for video conferencing, HP Labs researcher in residence
Linearization of RF Power Amplifiers G. Tong Zhou, J. Stevenson Kenney Power amplifiers (PAs) are inherently nonlinear. Desire: high efficiency PAs, leading to low cost. Downside of high efficiency: high nonlinearity. Nonlinearity causes: (1) high bit error rate; (2) adjacent channel interference: must satisfy FCC. DSP-based predistortion linearization. Challenging issue: memory nonlinear effects in high power amplifiers (e.g., base station PAs). Indirect Learning Architecture adapts to changing characteristics RF TESTBED
Indirect Learning Architecture A/D Advantage: No need to model or identify the PA.
8-Tone Test Result 8-tone, 1.2MHz signal, Siemens CGY0819 dual-band PA Purple: w/o PD; green: w/ memoryless PD (K=7); cyan: w/ memory polynomial PD (K=7, Q=10) 35 dB of spectral regrowth suppression w/ memory polynomial PD
Video Resolution Enhancement Y. Altunbasak and R. Mersereau Future broadcasting will be all digital. High definition displays will dominate the market. However, most programming is expected to be in SDTV format. HDTV NTSC Set SDTV Multi-frame Spatial PC Video Resolution Enhancement PC Monitor HDTV There is a clear need and technical opportunity to design systems to enhance the quality of the SDTV signal so that it matches the quality and capabilities of high definition displays.
Applications - Digital Cameras Subsequent multiple pictures (JPEG format) Reconstructed high-resolution picture Also applicable to high-quality printing from video sources such as DVD players, set-top boxes, TV sets, software MPEG players and camcorders. Requires a resolution enhancing print driver.
Face Recognition Monson H. Hayes Major problem is lighting and pose variations.
Results and Next Step We have developed a new face recognition system based on a segmented linear subspace model Robust to varying illuminations and tolerant to different poses, Has recognition accuracy equaling or exceeding (>99%) other state-of-the-art systems, and Has a fraction of the complexity. Next Step: Face Recognition from Video Face detection (patent awarded). Pose detection (find best frontal view). Face recognition (robust to varying illuminations, poses, facial expressions). The Intriguing Question How can we incorporate the multitude of images that are extracted from video to enhance the recognition system?
Finite Field Wavelet Transforms F. Fekri and D. Williams Goal: Establishment of a new research field that brings together researchers from signal processing, error control coding, data security and multicarrier signaling systems. Error Control Coding Finite Field Wavelets And I would like to conclude my talk by this slides that summarize my research plan: in which the finite field wavelet plays a central role to do coding, data security and muliuser access and combine them under unifying theory. Thus I intend to propose a program that will systematically explore these application areas and I will encourage and welcome collaboration with other faculty members. OFDM Modulation Security coding
New Research Directions in Data Security LL HL LH row-wise column-wise LL LH HL HH HH New Research Directions in Error Control Coding
Passive Radar Systems Aaron Lanterman Target Tracking Positions Exploit “illuminators of opportunity” such as commercial TV and FM radio broadcasts for covert operation Target Tracking Positions Velocities Radar Imaging Radar Cross Section Passive Radar System Target Classification Signature Prediction via Computational EM Target Library
Imaging With 100.0 on Your FM Dial Falcon-100 Target Shape Formatted Raw Data Image Formed Via Processing VFY-218
Detection of Obscured Targets Jim McClellan & Waymond Scott Landmines No single sensor has proven capable of reliable detection across many types of “targets” Can multiple sensors be used cooperatively to produce a system with robust performance? A three sensor experiment Electromagnetic Induction (EMI) Sensor Ground Penetrating Radar (GPR) Sensor Seismic Sensor Multimodal processing Imaging & Inversion Cooperative Fusion of multiple sensors
EMI Sensor and GPR Tx Rx EMI Sensor: 0.6 - 60 kHz GPR: 500 MHz – 8 GHz Physical Properties of Target Permittivity Contrast Low Conductivity (Dielectric) High Conductivity (Metal) Mechanical Contrast EMI No Weak Yes GPR Yes* Seismic EMI Sensor: 0.6 - 60 kHz GPR: 500 MHz – 8 GHz Tx Rx 4.5”
Seismic Sensor: Surface Waves Man-made items often resonate
Comparison of EMI, GPR and Seismic Responses: VS-1.6, 6.5 cm deep x depth y t
Comparison of EMI, GPR & Seismic Responses Uncrushed Aluminum Can, 2 cm deep x depth y t
Cooperative Analog/Digital Signal Processing D. Anderson and P. Hasler Target: Complex signal processing functionality with extremely low power Approach: Perform substantial amounts of the processing in programmable analog VLSI Real world (analog) DSP Processor A/D Convertor Computer (digital) Specialized A/D Real world (analog) ASP IC Computer (digital) DSP Processor A/D
Cooperative Analog/Digital Signal Processing Advantages of CADSP: Better problem “fit” Orders of magnitude improvement in power consumption / efficiency Simpler A/D converter requirements, Smaller size. Current Applications Include: Audio noise suppression Audio source localization / beam-steering Focal plane image / video processing Speech Recognition Field Programmable Analog Processor Arrays
Digital Media Asset Management Mark Clements Sam Nunn Archives: Cooperative Effort between CSIP, IMTC, GT and Emory Libraries. Fast searching of audio based on phonetic content. Typical speed of search: 72,000x real time (20 hours of content searched in 1 elapsed second). Basis for startup company Fast-Talk which has received over $10M venture funding. New results demonstrate rapid searching of music by lyrics and melodies using same approach. The speech part can be the basis for voice-mail management, data-mining, call center monitoring and alerting, market research, distance learning tool. The music part can be for indexing content by melody, detection copyright infringement, accessing music by “humming a tune.”
An Integrated Auditory-Cognitive Model speech Auditory Model Neural Transduction Model 3-D Cortical Representation s f Cortical Scene Analysis (Phonemic Detection) Language Model Sound Units Semantics & Schema Multi-target tracking Reinforcement Syntactic & Semantic Analysis (error correcting) Cortical Scene Analysis (Phonological Tracking) Segment Units Understanding results Enhancement results Re-generation Recognition results
Immersive Telecollaboration Presentation Capturing, transmission and reconstruction of audio and visual information (conventional view) Projection and rendering of the interaction in a 3-dimensional space (virtual view) Participation Coexistence of all participants in a shared virtual space (“shared reality”) Control and manipulation of shared virtual objects (“virtual collaboration” for hands-on experience)
Perceptual Spatialization Sound spatialization makes talker-tracking easier in multi-party conferencing environments, resulting in improved effectiveness in communication Spatial separation plays a role. Compare mono with stereo Binaural Hearing & Cocktail Party Effect Stream segregation also plays a role. Compare one talker (m1+m2) with two (m1+f2) (m1 m2 f2 ) Stereophonic Conferencing Demonstration
Multi-channel Source Separation x1 W11 s’1 W21 W12 s2 x2 W22 s’2 mixing un-mixing (room impulse responses) One possible approach (Ikram of Gatech and Morgan of Bell Labs): x = H s R’ = x xH s’ = W x Find un-mixing filter matrix W such that s’ = W R’ WH is diagonalized by minimizing the squared Frobenius norm of the off-diagonal matrix of s’
Sound Source Localization 1. Time Delay Estimation 2. Source Location Estimation Various methods: triangulation - solve a set of hyperbolic equations spherical intersection - solve a set of linearized spherical equations spherical interpolation - similar to SI, but with reduced constraint one-step-least-squares – transforms the problem into an estimation/minimization problem; works the best talker Further challenge Applications: Conferencing with participant tracking Improved sound and sight pickup Developed at Bell Labs & Georgia Tech
Low Complexity Rate-Distortion Optimal Coding Mode Selection Hyungjoon Kim and Yucel Altunbasak Proposed Rate-Distortion Model Model based Model Selection Candidate modes Mode 0 Mode 1 Mode N D = 2 e - R Rate for DCT coefficients Distortion Standard deviation Model parameter R-D cost calculation Provides 10-15% bit-rate savings Patented, licensed, and commercialized Based on General Gaussian R-D model Calculation of D has low computational complexity Adaptive model parameter Minimum cost Mode k Best mode Experimental Results 1. Distortion-based coding mode selection gives low compression efficiency especially at low bit-rate 2. We developed low complexity model-based R-D optimal coding mode selection 3. Model-based approach improves coding efficiency and visual quality significantly with small increase in computation over TM5 R-D model-based approach (Proposed) Distortion-based approach (TM5+Rho)
R-D Optimized Multi-Server Streaming Ali C. Begen and Yucel Altunbasak Client Server Goal: Developing media-aware and network-adaptive packet delivery and error recovery mechanisms for multipoint-to-point networks Approach: Client-driven rate-distortion optimized streaming Suitable For: Multi-homed clients, wireless systems, CDNs
Mobile Video Streaming Umut Demircin and Yucel Altunbasak Video Rate Bandwidth Available Error Propagation and Frame Freeze Challenges: Fluctuating wireless channel error-rate and bandwidth Video error propagation Solution Approaches: Video and channel aware FEC code rate and link-layer ARQ adaptation. Rate reduction and error-resiliency video transcoding. R-D optimized packet scheduling Diversity methods
Video Resolution Enhancement Yucel Altunbasak Sequence of limited dynamic range images Composite image with higher dynamic range and resolution Compressed-domain resolution enhancement Bit-depth and contrast enhancement Resolution enhancement for FACE video Three patents, one licensed Zoom:
Hyper-Spectral Super-Resolution Panchromatic Multi-spectral Hyper-spectral Hyper-spectral images offer huge amounts of data. Spectrum is sampled at more than 200 wavelengths. Spatial resolution is the key parameter in many related applications. To improve spatial resolution we combine A precise physical model of the imaging model, and The intrinsic low dimensionality of hyper-spectral data. The result is an efficient and noise-robust super-resolution (SR) method. Bilinear interpolation Separate band SR Our method
Demosaicing Yucel Altunbasak Sensors CFA Optical system Scene Digital cameras use a single sensor array with a color filter array (CFA) to sample different spectral components. At each pixel location, only one color sample is taken, and the other colors must be interpolated. This color plane interpolation is known as demosaicing.
Advanced Collaborative Systems Fred Juang and Ghassan AlRegib 3D Collaboration System Shared Virtual Space Shared Reality: Allows the virtual world to coexist and to interact with the real world undividedly and seamlessly. Sensors are used to capture users’ motions in the real world and are used to control objects in the virtual world synchronously. Smart Objects: A new data structure creates smart objects and introduces efficient usability. Details Count: The multimodal micro-tool supports dexterous and real-time control of remote virtual objects. Speech-enabled commands facilitate micro-level manipulation and control. Display Display 3D Networking 3D Collaboration System Participant in one location Realtime Registration Kinematics Control Current System built at GT 42
Distributed Processing & Communications Protocols for Distributed Sensor Systems Ghassan AlRegib Distributed Detection Parameter Estimation (which sensor to send and what to send) Communication Protocols (how sensors communicate among each other) Analog Waveform Digitized Observations/ Decisions Data Models Application Data Processing Data Communication Distributed Parameter Estimation -- System Overview: Distributed Sensors: observe, quantize and transmit their observations Fusion Center: perform the final estimation based on the received messages Goal: Minimize the estimation MSE under the constrained total bit rate
Bit Allocation for Textured 3D Models Ghassan AlRegib Target: Best display quality of the 3D model during progressive streaming Approach: Optimal bit allocation between geometry and texture in bitstream Original model Case I Case II
Streaming Meshes over Lossy Networks Ghassan AlRegib Sender receiver Best-effort service Packet losses Delay constraints 3D data Interactive 3D applications Large amounts of data Real-time interactivity High-resolution visualization Technical problems Packet losses Stringent delay constraints Best-effort network: bandwidth bottleneck, congestion… Server-side: Multi-resolution compression Network side: FEC-based packet loss protection Feedback-based retransmission Congestion control RS: Reed-Solomon code From: 28.35 dB To: 37.76 dB
Summary The premier academic program in the country in the signal processing field is in the Georgia Tech School of Electrical and Computer Engineering. We have many outstanding graduate students. Internships Long-term contributors We have lots of outstanding technology waiting to be developed. We have a demonstrated capability to work with industry. Contact jim.mcclellan@ece.gatech.edu if you want to come for a visit.