In ♫ ♫ otion Harmony Zohar Barzelay, Yoav Y. Schechner Dept. Elect. Eng. Technion – Israel Institute of Technology 1 Ack: Einav Namer, Yael Waissman, ISF.

Slides:



Advertisements
Similar presentations
Descriptive schemes for facial expression introduction.
Advertisements

Analysis of Contour Motions Ce Liu William T. Freeman Edward H. Adelson Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.
Fourier Transform – Chapter 13. Image space Cameras (regardless of wave lengths) create images in the spatial domain Pixels represent features (intensity,
Harmonic Series and Spectrograms 220 Hz (A3) Why do they sound different? Instrument 1 Instrument 2Sine Wave.
Content-based retrieval of audio Francois Thibault MUMT 614B McGill University.
AES 120 th Convention Paris, France, 2006 Adaptive Time-Frequency Resolution for Analysis and Processing of Audio Alexey Lukin AES Student Member Moscow.
“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound Kidron, Schechner, Elad, CVPR
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
METHODS OF OBJECT TRACKING IN VISION SYSTEMS Grzegorz Bieszczad Tutor: Tomasz Sosnowski ph.d. Military University of Technology Faculty of Electronics.
Dept. Elect. Eng. Technion – Israel Institute of Technology Ultrasound Image Denoising by Spatially Varying Frequency Compounding Yael Erez, Yoav Y. Schechner,
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.
Dept. Elect. Eng. Technion – Israel Institute of Technology Radiometric Nonidealities: A Unified Framework Anatoly Litvinov, Yoav Y. Schechner Support:
Hearing & Deafness (5) Timbre, Music & Speech Vocal Tract.
Efficient Coding of Natural Sounds Grace Wang HST 722 Topic Proposal.
W M AM A I AI IM AIM Time (samples) Response (V) True rating Predicted rating  =0.94  =0.86 Irritation Pleasantness.
MULTIPLE MOVING OBJECTS TRACKING FOR VIDEO SURVEILLANCE SYSTEMS.
Hearing & Deafness (5) Timbre, Music & Speech.
Content-Based Interactivity Visualization tools results of a query whole visual information space multi-dimensional space selected features image clustering.
Attenuating Natural Flicker Patterns Yoav Y. Schechner Nir Karpel Support: Taub Foundation, Ollendorff Foundation (BMBF), ISF Ack: Yoav Fhiler, Naftali.
College and Engineering Physics Quiz 9: Simple Harmonic Motion 1 Simple Harmonic Motion.
Human Psychoacoustics shows ‘tuning’ for frequencies of speech If a tree falls in the forest and no one is there to hear it, will it make a sound?
Information Retrieval in Practice
Representing Acoustic Information
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
DIVA - University of Fribourg - Switzerland Seminar presentation, jan Lawrence Michel, MSc Student Portable Meeting Recorder.
What’s Making That Sound ?
Final Exam Review CS485/685 Computer Vision Prof. Bebis.
1 Computational Vision CSCI 363, Fall 2012 Lecture 26 Review for Exam 2.
Multiresolution STFT for Analysis and Processing of Audio
Open Problems in Computer Vision(oProCV) Topic 4. Motion MSc. Luis A. Mateos.
1 ELEN 6820 Speech and Audio Processing Prof. D. Ellis Columbia University Midterm Presentation High Quality Music Metacompression Using Repeated- Segment.
Preprocessing Ch2, v.5a1 Chapter 2 : Preprocessing of audio signals in time and frequency domain  Time framing  Frequency model  Fourier transform 
Implementing a Speech Recognition System on a GPU using CUDA
Digital Watermarking SIMG 786 Advanced Digital Image Processing Mahdi Nezamabadi, Chengmeng Liu, Michael Su.
Dept. Elect. Eng. Technion – Israel Institute of Technology Rotating Beams Yoav Y. Schechner 1 Joint studies with J. Shamir, R. Piestun, A. Greengard.
Multimodal Information Analysis for Emotion Recognition
Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Part I: Image Transforms DIGITAL IMAGE PROCESSING.
Phase Congruency Detects Corners and Edges Peter Kovesi School of Computer Science & Software Engineering The University of Western Australia.
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
Pre-Class Music Paul Lansky Six Fantasies on a Poem by Thomas Campion.
Introduction to Onset Detection Functions HAO-HSUN LI 1/30.
Discrete Fourier Transform in 2D – Chapter 14. Discrete Fourier Transform – 1D Forward Inverse M is the length (number of discrete samples)
Extracting features from spatio-temporal volumes (STVs) for activity recognition Dheeraj Singaraju Reading group: 06/29/06.
Segmentation of Vehicles in Traffic Video Tun-Yu Chiang Wilson Lau.
Cross-Modal (Visual-Auditory) Denoising Dana Segev Yoav Y. Schechner Michael Elad Technion – Israel Institute of Technology 1.
MMDB-9 J. Teuhola Standardization: MPEG-7 “Multimedia Content Description Interface” Standard for describing multimedia content (metadata).
Harmonic Series and Spectrograms BY JORDAN KEARNS (W&L ‘14) & JON ERICKSON (STILL HERE )
Video Tracking G. Medioni, Q. Yu Edwin Lei Maria Pavlovskaia.
Motion Estimation using Markov Random Fields Hrvoje Bogunović Image Processing Group Faculty of Electrical Engineering and Computing University of Zagreb.
Determining 3D Structure and Motion of Man-made Objects from Corners.
Speech Processing Using HTK Trevor Bowden 12/08/2008.
Time Compression/Expansion Independent of Pitch. Listening Dies Irae from Requiem, by Michel Chion (1973)
MULTIMEDIA DATA MODELS AND AUTHORING
Motion Detection Frame 1Frame 2 Anomalous activity.
Oscillations SHM 1 Simple harmonic motion defined Simple harmonic motion is the motion of any system in which the position of an object can be put in the.
1 Tempo Induction and Beat Tracking for Audio Signals MUMT 611, February 2005 Assignment 3 Paul Kolesnik.
Onset Detection, Tempo Estimation, and Beat Tracking
Ch. 2 : Preprocessing of audio signals in time and frequency domain
Uncontrolled Modulation Imaging
Spatially Varying Frequency Compounding of Ultrasound Images
HARMONICS AND FILTERS.
Fundamentals Data.
הפקולטה להנדסת חשמל - המעבדה לבקרה ורובוטיקה גילוי תנועה ועקיבה אחר מספר מטרות מתמרנות הטכניון - מכון טכנולוגי לישראל TECHNION.
Constant Force (F = constant)
Wavelet Based Real-time Smoke Detection In Video
Outline Linear Shift-invariant system Linear filters
The Production of Speech
Sound shadow effect Depends on the size of the obstructing object and the wavelength of the sound. If comparable: Then sound shadow occurs. I:\users\mnshriv\3032.
Presentation transcript:

in ♫ ♫ otion Harmony Zohar Barzelay, Yoav Y. Schechner Dept. Elect. Eng. Technion – Israel Institute of Technology 1 Ack: Einav Namer, Yael Waissman, ISF

2 Barzelay, Schechner Violin-guitar: raw “Harmony in otion” ♫ ♫

3 Barzelay, Schechner Violin: Detected and Recovered “Harmony in otion” ♫ ♫

4 Barzelay, Schechner Guitar: Detected and Recovered “Harmony in otion” ♫ ♫

5 Video features: track all Barzelay & Schechner, Harmony in Motion Find the best

6 Barzelay & Schechner, Harmony in Motion Finding an Audio-Visual Object (AVO)

Spatial matching: Many “coincidences” Barzelay & Schechner, Harmony in Motion ? ? ? 7 Corresponding images? * Always: unmatched features * Good image match: many “coincidences” * Spatial Edges

Spatial matching * Feature-based * Feature = significant change in space: edge, corner * Maximize coincidences * No need to match everything Barzelay & Schechner, Harmony in Motion Audio-Visual matching * Feature-based * Feature = significant change in time: temporal-edge * Maximize coincidences * No need to match everything 8

Barzelay & Schechner, Harmony in Motion Feature-based Cross-Modal Matching 9

Barzelay & Schechner, Harmony in Motion Feature-based Cross-Modal Matching 9

Barzelay & Schechner, Harmony in Motion Feature-based Cross-Modal Matching time [frames] Acceleration 10

Feature-based Cross-Modal Matching ‘Visual Onsets’‘Audio Onsets’ t 0 1 t 0 1 Amplitude t 11

Barzelay & Schechner, Harmony in Motion Audio-Visual Coincidences 12

13 Barzelay & Schechner, Harmony in Motion Audio Pre-processing t 0 frequency t amplitude 0 frequency energy 0 F Spectrogram

Significant change in audio Barzelay & Schechner, Harmony in Motion t 0 frequency spectrogram Audio Onsets Beginning of new sounds t 0 temporal derivative 14

Handling pitch-drift Barzelay & Schechner, Harmony in Motion 15

directional derivative spectrogram non-directional derivativespectrogram Barzelay & Schechner, Harmony in Motion Handling pitch-drift 16

0 1 t t Visual Matching 17

t t -5 t Visual Matching 18 Amplitude

0 1 t 0 1 coincidences inconsistencies Barzelay & Schechner, Harmony in Motion Ranking Criterion t 0 t 19

0 1 t 0 1 Barzelay & Schechner, Harmony in Motion Residual Audio Onsets 20 coincidences Residual Onsets 0 t

t 0 1 t Sequential Object Detection 21 t 0 Amplitude Residual Onsets 0 1 Barzelay & Schechner, Harmony in Motion

22 Barzelay, Schechner Speech: raw “Harmony in otion” ♫ ♫

23 Barzelay, Schechner Speech A-B-C: Detected & Recovered “Harmony in otion” ♫ ♫

24 Barzelay, Schechner Speech 1-2-3: Detected & Recovered “Harmony in otion” ♫ ♫

Audio Isolation 25

26 Barzelay & Schechner, Harmony in Motion Audio Pre-processing t 0 frequency t amplitude 0 frequency energy 0 F Spectrogram

t 0 frequency Spectrogram t Audio Isolation 27 Corresponding Onsets Barzelay & Schechner, Harmony in Motion

0 Harmonic Sounds t Audio Isolation Spectrogram 27 Corresponding Onsets t frequency

28 Barzelay & Schechner, Harmony in Motion Fourier representation t 0 frequency t amplitude 0 frequency energy 0 Spectrogram frequency phase 0 F

29 Barzelay & Schechner, Harmony in Motion Filtered audio t 0 frequency t amplitude 0 frequency energy 0 Spectrogram frequency old phase 0 F -1

0 1 t t Barzelay & Schechner, Harmony in Motion Limitations: Temporal Tolerance t 0 t 30 00:00:16 ¼ sec

Time-Frequency overlap Barzelay & Schechner, Harmony in Motion Limitations: Audio Sparsity 31 t frequency Overlapping audio onsets Sounds may overlap in time Onsets should not

0 1 t time acceleration Feature-Detection: –edge scale –significance level –pruning Barzelay & Schechner, Harmony in Motion Detection Parameters 32 Visual Edges: 00:00:15

33 Barzelay, Schechner Dual Viloin “Harmony in otion” ♫ ♫

Barzelay, Schechner “Harmony in otion” ♫ ♫ 34

Barzelay, Schechner “Harmony in otion” ♫ ♫ 35

Feature-based Cross-Modal Association Features: Temporal Audio/Visual Edges. Simultaneous Objects + Sounds. A General Concept. 36