Invariance and context Nothing in the real world is interpreted on its own. –Low-level vision Completely different energy spectra are interpreted as the.

Slides:

Advertisements

Similar presentations

Advertisements

Timbre perception. Objective Timbre perception and the physical properties of the sound on which it depends Formal definition: ‘that attribute of auditory.

Recap – lesson 1 What is perception? Perception: The process which we give meaning to sensory information, resulting in our own interpretation. What is.

HMAX Models Architecture Jim Mutch March 31, 2010.

The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/

Auditory Scene Analysis (ASA). Auditory Demonstrations Albert S. Bregman / Pierre A. Ahad “Demonstration of Auditory Scene Analysis, The perceptual Organisation.

Artificial Neural Networks - Introduction -

Data Visualization STAT 890, STAT 442, CM 462

Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.

Soft computing Lecture 6 Introduction to neural networks.

Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.

How does the visual system represent visual information? How does the visual system represent features of scenes? Vision is analytical - the system breaks.

Classifying Motion Picture Audio Eirik Gustavsen

1 Pattern Recognition (cont.). 2 Auditory pattern recognition Stimuli for audition is alternating patterns of high and low air pressure called sound waves.

Non-Destructive Testing of Fruit Firmness with Real-Time constraints Christopher Mills Supervisors: Dr. Andrew Paplinski Mr Charles Greif.

Evaluating the Quality of Image Synthesis and Analysis Techniques Matthew O. Ward Computer Science Department Worcester Polytechnic Institute.

CS292 Computational Vision and Language Visual Features - Colour and Texture.

Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.

The Perception of Speech

Why is ASR Hard? Natural speech is continuous

Applications of Signals and Systems Fall 2002 Application Areas Control Communications Signal Processing.

A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST

Sound Name: ________________ Class: _________________

Sensation and Perception

So? We argue that sensory preprocessing is important –And should be applied as early as possible –Don’t try to do everything all at once Even at the sensor!

SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,

DO NOW: What do you know about our sense of sight and vision? What parts of the eye do you know? What do you know about light?

Applications of Signals and Systems Application Areas Control Communications Signal Processing (our concern)

Active Vision Key points: Acting to obtain information Eye movements Depth from motion parallax Extracting motion information from a spatio-temporal pattern.

Unit 1_9 Human Computer Interface. Why have an Interface? The user needs to issue instructions Problem diagnosis The Computer needs to tell the user what.

Artificial Neural Nets and AI Connectionism Sub symbolic reasoning.

IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.

Major Sensory and Perceptual Systems SenseSource of information SeeingLight HearingSound BalanceGravity and acceleration TouchPressureTemperature PoseJoint.

Lecture 2b Readings: Kandell Schwartz et al Ch 27 Wolfe et al Chs 3 and 4.

Dan Rosenbaum Nir Muchtar Yoav Yosipovich Faculty member : Prof. Daniel LehmannIndustry Representative : Music Genome.

Psychology 001 Introduction to Psychology Christopher Gade, PhD Office: 621 Heafey Office hours: F 3-6 and by apt. Class WF 7:00-8:30.

Overview of Part I, CMSC5707 Advanced Topics in Artificial Intelligence KH Wong (6 weeks) Audio signal processing – Signals in time & frequency domains.

Learning to perceive how hand-written digits were drawn Geoffrey Hinton Canadian Institute for Advanced Research and University of Toronto.

speech, played several metres from the listener in a room - seems to have the same phonetic content as when played nearby - that is, perception is constant.

Memory and Cognition PSY 324 Chapter 2: Cognition and the Brain Part III: Neural Representation Dr. Ellen Campana Arizona State University.

Chapter 4 Sensation What Do Sensory Illusions Demonstrate? Streams of information coming from different senses can interact. Experience can change the.

VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.

Robotic Chapter 8. Artificial IntelligenceChapter 72 Robotic 1) Robotics is the intelligent connection of perception action. 2) A robotic is anything.

Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.

Andrew Ng, Director, Stanford Artificial Intelligence Lab

Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.

COMP135/COMP535 Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 2 Lecture 2 – Digital Representations.

Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....

1 Robotic Chapter AI & ESChapter 7 Robotic 2 Robotic 1) Robotics is the intelligent connection of perception action. 2) A robotic is anything.

1 Andrew Ng, Associate Professor of Computer Science Robots and Brains.

Pulse Code Modulation (PCM) Analog voice data must be translated into a series of binary digits before they can be transmitted. With Pulse Code Modulation.

Perceptual organization How do we form meaningful perceptions from sensory information?

Neural Networks Lecture 4 out of 4. Practical Considerations Input Architecture Output.

Visual Information Processing. Human Perception V.S. Machine Perception  Human perception: pictorial information improvement for human interpretation.

Sight Our Visual Perception

CS 445/656 Computer & New Media

Recognizing Visual and Auditory Stimuli

Precedence-based speech segregation in a virtual auditory environment

When to engage in interaction – and how

How do we realize design? What should we consider? Technical Visual Interaction Search Context of use Information Interacting/ transacting.

Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.

Creating Data Representations

Audio and Speech Computers & New Media.

Object recognition in the inferior temporal cortex

Advances in Deep Audio and Audio-Visual Processing

Experiencing the World

Auditory, Tactical, and Olfactory Displays

Measuring the Similarity of Rhythmic Patterns

Presentation transcript:

Invariance and context Nothing in the real world is interpreted on its own. –Low-level vision Completely different energy spectra are interpreted as the same colour in different illuminations –Or the same energy spectra are interpreted as different colours Or even in different parts of an image –Colour constancy. UKCI

What is context? Sensory contexts / sensory fusion –Lifting a coffee-cup: locating it, moving a hand to it, gripping it hard enough not to drop it, but not so hard that it breaks Visual, proprioceptual, tactile sensing all needs to be fused. Usually thought of in terms of motor programs that have been learned –But is this a good description? Contextual modulation is clearly present at every level of neural description –E.g. Phillips and Kay at the synapse level –The McGurk effect at the sensory level –Difficulty recognising someone out of context UKCI

What is invariance? Adjusting perception to reality –Interpreting as the same, things which are the same, even when they appear different –Invariant visual perception under varying illumination, varying distance (size), differing orientation –Invariant auditory perception Varying loudness, varying reverberation levels, in the presence of background “noise” UKCI

Why are context & invariance so important? For real-world interaction. –For pure physics we’d want the opposite Objects produce characteristic reflectivity patterns, characteristic sounds, … –And we want to identify them correctly in varying visual and auditory environments –Critically important both for animal and synthetic intelligent systems UKCI

How is invariance implemented? Invariance is tied to specific modalities –Invariance under … –In animal perception it is often at the sensor –Visually: Retinal architecture Lots of processing on the retina, before it is transmitted to the cortex –Invariance under varying illumination Overall, and local changes UKCI

… and in audition Processing takes place in brainstem nuclei Before transmission to the auditory midbrain, and thence to the cortex Very similar across a large range of animals Invariance under level, reverberation changes, –Also sound source separation –…–… UKCI 20126

An aside: on onsets An interesting feature –“defined” as a sudden increase in energy in some part(s) of the spectrum Many different ways of detecting them –Of interest in music recognition Note starts, drumbeats –In speech segmentation Voicing onset, sibilance onset –(reasonably) immune to degradation by reverberation (i.e. invariant under reverberation) –Very clearly enhanced in the auditory brainstem Multipolar and octopus cells in cochlear nucleus, and many others as well Psychology at Glasgow July

Onsets and offsets Psychology at Glasgow July Original With room reverb

The bio-inspired front end Psychology at Glasgow July

Musical Instrument class problem: detail x 417 BrassReed Bowed string Plucked string Struck string Sound descriptor (2085 examples in total) Train classifier with 1460 examples (292 per class) Test classifier success using 625 unseen examples (125 per class) CLASSIFIER MODEL Strategy 2: cepstral coefficients Strategy 1: spiking auditory model 27Psychology at Glasgow July 2012

Issues: coding the signal (biomorphic) The onset spikes are in N_channels * N_Sensitivity_levels trains –Each of which has few spikes Far fewer than the AN code Difficult to use directly in a classifier –Too many spike trains, memory constraints Recoded by –Binning in 2ms bins –Recoding the N_sensitivity levels spikes using pulse height modulation –This gives an (Onset interval/2ms) * N_Channels vector. –We call this the Initial Onset Fingerprint Time Series (IOFTS) Psychology at Glasgow July

Psychology at Glasgow July

Coding the signal (cepstral) Comparison technique: –Use standard techniques for coding audio signals –from the speech identification community Cepstral techniques Psychology at Glasgow July

Classification technique (recogniser) For the IOFTS signal an echo state network was used –Appropriate for processing/classifying time series –Jaeger et al’s implementation was used For the Cepstral system, a back-propagated neural network (MLP) was used –An 15: 100: 5 MLP was used (WEKA software) Psychology at Glasgow July

Results (1: on the McGill dataset) Psychology at Glasgow July Onset fingerprint codingMFCC coding (whole note) Results are very comparable: but the confusions differ.

Results 2: train on McGill + test on Iowa dataset Psychology at Glasgow July Onset fingerprint MFCC Note that the onset technique holds up much better on a differently recorded dataset.