ECE 417 Lecture 1: Multimedia Signal Processing

Slides:

Advertisements

Similar presentations

Neural Networks and Kernel Methods

Advertisements

Machine Learning for Vision-Based Motion Analysis Learning pullback metrics for linear models Oxford Brookes Vision Group Oxford Brookes University 17/10/2008.

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Introduction to Neural Networks Computing

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Machine learning continued Image source:

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Region labelling Giving a region a name. Image Processing and Computer Vision: 62 Introduction Region detection isolated regions Region description properties.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Radial-Basis Function Networks

Information Retrieval in Practice

University of Texas at Austin CS395T - Advanced Image Synthesis Spring 2006 Don Fussell Orthogonal Functions and Fourier Series.

Crash Course on Machine Learning

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

TEMPORAL VIDEO BOUNDARIES -PART ONE- SNUEE KIM KYUNGMIN.

TINONS1 Nonlinear SP and Pattern recognition

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

7-Speech Recognition Speech Recognition Concepts

© Negnevitsky, Pearson Education, Will neural network work for my problem? Will neural network work for my problem? Character recognition neural.

University of Texas at Austin CS384G - Computer Graphics Fall 2008 Don Fussell Orthogonal Functions and Fourier Series.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Human pose recognition from depth image MS Research Cambridge.

CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.

Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.

1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

Today’s Lecture Neural networks Training

Neural networks and support vector machines

CS 9633 Machine Learning Support Vector Machines

Vector Space Examples Definition of vector space

Deep Feedforward Networks

Deep Learning Amin Sobhani.

an introduction to: Deep Learning

Sentence Modeling Representation of sentences is the heart of Natural Language Processing A sentence model is a representation and analysis of semantic.

Data Mining, Neural Network and Genetic Programming

Intro to Machine Learning

Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.

ECE 417 Lecture 2: Metric (=Norm) Learning

Radial Basis Function G.Anuradha.

Conditional Random Fields for ASR

Intelligent Information System Lab

Neural networks (3) Regularization Autoencoder

Deep learning and applications to Natural language processing

4.3 Feedforward Net. Applications

Overview What is Multimedia? Characteristics of multimedia

Image Segmentation Techniques

Chapter 3. Artificial Neural Networks - Introduction -

2D transformations (a.k.a. warping)

3D Transformation CS380: Computer Graphics Sung-Eui Yoon (윤성의)

EE513 Audio Signals and Systems

Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.

Convolutional Neural Networks

Deep Learning Some slides are from Prof. Andrew Ng of Stanford.

Speech recognition, machine learning

Neural networks (3) Regularization Autoencoder

Advances in Deep Audio and Audio-Visual Processing

Linear Algebra Lecture 7.

Math review - scalars, vectors, and matrices

Game Programming Algorithms and Techniques

An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,

Word representations David Kauchak CS158 – Fall 2016.

Modeling IDS using hybrid intelligent systems

End-to-End Facial Alignment and Recognition

Speech recognition, machine learning

Huawei CBG AI Challenges

Presentation transcript:

ECE 417 Lecture 1: Multimedia Signal Processing Mark Hasegawa-Johnson 8/29/2017

Today’s Lecture Syllabus Overview of what we’ll learn this semester Review of important linear algebra concepts, part one Sample problem

Syllabus Go read the syllabus: http://courses.engr.Illinois.edu/ece417/

Overview of what we’ll learn this semester What is multimedia? What is processing? The basic architecture: encoder, embedder, aligner, decoder Parametric learning: categories of methods, types of parameters that you can learn from data

What is multimedia? Audio, Video, and Images Defining feature #1: very big Speech audio: 16000 samples/second Multimedia audio: 44100 samples/second (often downsampled to 22050!) Image: 6M pixels ~ 2000x3000 (often downsampled to 224x224) Video: 640x480 pixels/image, 30 frames/second = 9,216,000 real numbers per second Why “very big” matters: you can’t train a neural net to observe a whole image Rule of 5: there must be 5 training examples per trainable parameter Two-layer neural net with 1024 hidden nodes, 6M inputs has 1024*6000000 trainable parameters = 6 billion trainable parameters With 6 billion trainable parameters, you need 30 billion training examples Result: we need some trick to reduce the number of parameters This semester is all about finding tricks that work (much better)

What is Multimedia: Defining Feature #2 Defining feature #2: variable input dimension Image classification, e.g., imagenet: we like to downsample every image to 224x224 first, then train a NN with 224*224=50176 inputs Audio speech recognition: no can do Why? Solution: streaming methods “Recognition” instead of “Classification” --- variable number of inputs, variable number of outputs

Processing = convert from one type of signal to another Image Video Audio Text (variable-length sequence of symbols) Metadata (tags drawn from predefined set) X Motion prediction Animation Spoken captioning Automatic captioning Object recognition Face recognition Heat map Frame capture Summarization Shot boundary detection Lipreading/ Speechreading Audiovisual speech recognition Spectrogram(S TFT) STFT Avatar animation Automatic speech recognition Music genre Emotion/sentiments Speaker ID Text Audiovisual speech synthesis Speech synthesis Sentiment Metadata Search

Basic Multimedia Architecture: EEAD Decoder (compute outputs) n=Output Timescale Aligner (time alignment) Embedder (compute state) Encoder (compute features) t=Input Timescale

Parametric Learning What it means: Define some general family of functions, 𝑦 =𝑓( 𝑥 , 𝑊), where 𝑥 is an input feature vector, and 𝑊 is a set of learnable parameters. For example, in MP1, it will be something like 𝑦 =[ 𝑥 1 𝑤 1 , 𝑥 2 𝑤 2 ,…] “Learn” the parameters: choose W in order to maximize performance on some training dataset

Types of Parametric Learning Metric Learning/Feature Learning (MP1): Learn to convert input features into some new feature set that better matches human perception Classifier Learning (MP2, MP3): Learn to classify the input. Bayesian Learning (MP4, MP5): Learn to estimate how likely the input is, given some assumed class label. Deep learning (MP6, MP7): Combines feature learning and classifier learning. Each layer computes features that are used by the next layer up.

Basics of Linear Algebra Vector Space Banach Space, Norm Hilbert Space, Inner Product Linear Transform Affine Transform

Vector Space A vector space is a set, closed under addition, that satisfies: Addition commutativity Addition associativity Addition identity element Addition inverse Compatibility of scalar and field multiplication Multiplication identity element Distribution of multiplication over vector addition Distribution of multiplication over field addition

Banach space, Norm A Banach space is a vector space with a norm. A norm is: Non-negative Positive definite Absolute homogeneous Satisfies the triangle inequality

Inner Product Space An inner product space is a Banach space with a dot product (a.k.a. inner product). An inner product is a function of two vectors that satisfies: Conjugate commutative Linear in its first argument Positive definite

Linear Transform A linear transform converts one vector space into another, with the following rules: Homogeneous Satisfies superposition A linear transform is written as a matrix multiplication, y=Wx

Affine Transform An affine transform is a linear transform plus an offset. It’s usually written as y=Wx+b, where Wx is the linear part, and b is the offset.

Example Problem Suppose that 𝑥 is a vector of 312 image features including the color from each of 26 different sub-images (78 features total), and the 3x3 Fourier transform of each of the 26 different sub-images (234 features), for a total of 312 features per image. Suppose that 𝑦∈ −1,1 is its class label (either +1 or -1). Suppose that 𝑥 𝑖 and 𝑦 𝑖 are training examples, for 1≤𝑖≤𝑁. From these training data, we want to learn a parametric classifier that estimates y from x.

How many trainable parameters? Method 1: 𝑦=𝑠𝑖𝑔𝑛( 𝑤 𝑇 𝑥+ 𝑏 ) Method 2: 𝑖= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑖=1 𝑁 ( 𝑥 𝑇 𝑊 𝑥 𝑖 ) 𝑦=𝑠𝑖𝑔𝑛( 𝑦 𝑖 ) Method 3: 𝑖= 𝑎𝑟𝑔𝑚𝑎𝑥 𝑖=1 𝑁 ( 𝑑=1 𝐷 1 𝜎 𝑑𝑖 𝑒 − 1 2 𝑥 𝑑 − 𝜇 𝑑𝑖 𝜎 𝑑𝑖 2 )