The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.

The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System

Network-based AdvAIR System Consists of client side and server side Client Side  Consists of 2 parts Advanced Part Audio Data Mining Audio Data Retrieval and Indexing Basic Part Audio Streaming from Server side Server Side  For Audio Streaming, Searching on Server

Advanced Part of AdvAIR system Audio Data Mining  Segmentation  Recognition Engine  Segmentation with Speaker Recognition Audio Retrieval and Indexing  Query by Humming  Pattern Matching  Search on Server

Audio Data Mining – Recognition Engine Consists of Three functions:  Speaker Recognition  Language Recognition  Gender Recognition Speaker Recognition engine  Open-set system with 10 models and 1 general model Language Recognition engine  Close-set system with 3 models (Cantonese, English, Mandarin) Gender Recognition engine  Close-set system with 2 models (Male and Female)

Group 1 Group 2 Group 3 Audio Data Mining - Segmentation

Bayesian Information Criterion is used for determining the acoustic change point of the input Mpeg file First, input an Mpeg file Next, extract the features Use BIC criterion to calculate the change point Finally, have a list of segments which is cut according to acoustic change point

Audio Data Mining – Recognition Engine Input Mpeg Extract feature Trained Model Calculate a score For each model Select the most suitable model

Audio Data Mining – Recognition Engine Use Gaussian Mixture Model  text independent, robust, computationally efficient 256 mixture for each models Need pre-processing (Training) First, input Mpeg file Next, extract the features Calculate a score for each models and select the model with the best score

Audio Data Mining – Segmentation with Speaker Recognition Automatic speaker recognition engine First, do segmentation Next, each segmentation is sent to the speaker recognition engine Finally, we get list of segments in which the speakers of each segment will be known

Group 1 Group 2 Group 3 Speaker identification Process Speaker1 Speaker 2 Speaker 3 Speaker 2 Speaker 1 Speaker 2

Audio Retrieval and Indexing - Query by Humming First Step:  Do Pitch Tracking using time domain autocorrelation function, ACF for the input audio clips  Track the trend of input audio clips, in the manner of “UP”, “Down” or “Same”  Intermediate output: a file consists of a list of “Up”, “Down”, “Same” Second Step:  Do largest substring matching for each of the intermediate output of audio clips in the database and the intermediate output of the input audio clip and calculate a score Last Step:  List the audio clips in database according to the score

Hummed Song Intermediate representation Intermediate Database Largest Substring matching Pitch tracker

Pitch tracker Tack the pitch of hummed voice, convert into representation of relative change of voice E.g. Do Me Fa So Fa Re Me U U U D D U

Audio Retrieval and Indexing – Direct Audio Search First Step:  covariance matrix is calculated from the feature vectors of the cue-audio and a clip in database Second Step:  AHS (arithmetic harmonic sphericity) distance measurement to calculate a score Last Step:  List the audio clips in database according to the score

Target Clips with Same size Source Clip AHU Comparison

Audio Retrieval and Indexing – Search on Server Direct Audio Search on Server Server Side has a database Client connect to server Client select a cue-audio and upload to the server Server will do the direct audio search and send back the result Client can use the audio streaming to get the result file

Basic Part - Audio Streaming AdvAIR is N-to-N system, allow N server and N client Client and Server can be added at any time It’s Fault Tolerant

Basic Part – Server Side Have two parts:  For Audio Streaming  For Searching on Server (Direct Search on server) Separate it because Searching on Server use a lot of resource A server can’t process for too many users at the same time Only privileged users allow to use the searching on server function

Basic Part – Client Side Client request for download, an audio clips is divided into many small parts Each server send a small parts to client simultaneously to speed up the download speed Client combine all the small parts to form the whole file

The End

The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.

Similar presentations

Presentation on theme: "The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.

Similar presentations

Presentation on theme: "The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System."— Presentation transcript:

Similar presentations

About project

Feedback