Download presentation
Presentation is loading. Please wait.
1
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System
2
Network-based AdvAIR System Consists of client side and server side Client Side Consists of 2 parts Advanced Part Audio Data Mining Audio Data Retrieval and Indexing Basic Part Audio Streaming from Server side Server Side For Audio Streaming, Searching on Server
3
Advanced Part of AdvAIR system Audio Data Mining Segmentation Recognition Engine Segmentation with Speaker Recognition Audio Retrieval and Indexing Query by Humming Pattern Matching Search on Server
4
Audio Data Mining – Recognition Engine Consists of Three functions: Speaker Recognition Language Recognition Gender Recognition Speaker Recognition engine Open-set system with 10 models and 1 general model Language Recognition engine Close-set system with 3 models (Cantonese, English, Mandarin) Gender Recognition engine Close-set system with 2 models (Male and Female)
5
Group 1 Group 2 Group 3 Audio Data Mining - Segmentation
6
Bayesian Information Criterion is used for determining the acoustic change point of the input Mpeg file First, input an Mpeg file Next, extract the features Use BIC criterion to calculate the change point Finally, have a list of segments which is cut according to acoustic change point
7
Audio Data Mining – Recognition Engine Input Mpeg Extract feature Trained Model Calculate a score For each model Select the most suitable model
8
Audio Data Mining – Recognition Engine Use Gaussian Mixture Model text independent, robust, computationally efficient 256 mixture for each models Need pre-processing (Training) First, input Mpeg file Next, extract the features Calculate a score for each models and select the model with the best score
9
Audio Data Mining – Segmentation with Speaker Recognition Automatic speaker recognition engine First, do segmentation Next, each segmentation is sent to the speaker recognition engine Finally, we get list of segments in which the speakers of each segment will be known
10
Group 1 Group 2 Group 3 Speaker identification Process Speaker1 Speaker 2 Speaker 3 Speaker 2 Speaker 1 Speaker 2
11
Audio Retrieval and Indexing - Query by Humming First Step: Do Pitch Tracking using time domain autocorrelation function, ACF for the input audio clips Track the trend of input audio clips, in the manner of “UP”, “Down” or “Same” Intermediate output: a file consists of a list of “Up”, “Down”, “Same” Second Step: Do largest substring matching for each of the intermediate output of audio clips in the database and the intermediate output of the input audio clip and calculate a score Last Step: List the audio clips in database according to the score
12
Hummed Song Intermediate representation Intermediate Database Largest Substring matching Pitch tracker
13
Pitch tracker Tack the pitch of hummed voice, convert into representation of relative change of voice E.g. Do Me Fa So Fa Re Me U U U D D U
14
Audio Retrieval and Indexing – Direct Audio Search First Step: covariance matrix is calculated from the feature vectors of the cue-audio and a clip in database Second Step: AHS (arithmetic harmonic sphericity) distance measurement to calculate a score Last Step: List the audio clips in database according to the score
15
Target Clips with Same size Source Clip AHU Comparison
16
Audio Retrieval and Indexing – Search on Server Direct Audio Search on Server Server Side has a database Client connect to server Client select a cue-audio and upload to the server Server will do the direct audio search and send back the result Client can use the audio streaming to get the result file
17
Basic Part - Audio Streaming AdvAIR is N-to-N system, allow N server and N client Client and Server can be added at any time It’s Fault Tolerant
18
Basic Part – Server Side Have two parts: For Audio Streaming For Searching on Server (Direct Search on server) Separate it because Searching on Server use a lot of resource A server can’t process for too many users at the same time Only privileged users allow to use the searching on server function
19
Basic Part – Client Side Client request for download, an audio clips is divided into many small parts Each server send a small parts to client simultaneously to speed up the download speed Client combine all the small parts to form the whole file
20
The End
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.