Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 445/656 Computer & New Media

Similar presentations


Presentation on theme: "CS 445/656 Computer & New Media"— Presentation transcript:

1 CS 445/656 Computer & New Media
Audio, Speech and Music CS 445/656 Computer & New Media

2 Topics for Monday & Wednesday
General Audio Speech Music Music management support

3 Music Music processing can support a variety of activities Composition
From traditional to interactive Selection Example: iTunes, Pandora, Use for shared spaces Playback Example: MobinLenin Management & Summarization Example: MusicWiz, APOLLO Games Guitar Hero, Rockband, etc.

4 MobiLenin Enable interaction with music in a public space
Not karaoke Voting like in many pub/bar games Audience can affect which version of music and video is shown

5

6 Lessons Gave a focal point for interaction between members of a group
Content variety is necessary for continued engagement Lottery for free beer motivated participation

7 Music Summarization Music Thumbnailing
Effective browsing Indexing, retrieval and management of tracks stored Typical assumption is that the most repeated pattern is the most representative part of music Leitmotifs (key phrases) ABACAB pattern (A-verse, B-chorus, C-bridge) Summarization methods Signal analysis Automatically detect repeated patterns of musical signal using self-similarity analysis or semantics Keys, pitch, and length of notes, tempo etc. Clustering or HMM to find key phrases of songs Using similarity matrix based on MFCC

8 Music Summarization Most summaries in commercial sites are either the first phrase or a single selected musical phrase Study of whether 22 second long multi-phrase music summaries would be better previews Three algorithms vary the selection of the components between phrases that are sonically distinct and phrases that are repeated more often A comparative evaluation study showed that: Multi-phrase previews were selected in 87% of the cases over the preview representing the first 22 seconds of the song

9 Managing Personal Music Collections
Music management is mainly based on: explicit attributes (e.g. metadata values like the artist, the composer and the genre). explicit feedback (e.g. ratings of preference and relevance) Benefits Easy to understand Formal: consistent updating and access Context-free Question How can music be accessed based on the feelings or memories it triggers? e.g. Music that sounds happy, makes us feel gloomy or reminds us of a person

10 Current Practices Common metadata tags usually not sufficient to describe mood, feelings, memories and complex concepts Effort/benefit trade-off issues Personal reactions to music change Explicit feedback and usage statistics helpful in retrieving music of preference Questions How would people organize music if there was a low-effort way of expressing their personalized interpretation of music? Use of additional tags or customization of the existing ones can be tedious Use of additional ratings associated to specific music attributes can be overwhelming for the user

11 MusicWiz 12 participants asked to organize songs & create playlists using spatial hypertext In spatial hypertext, information has visual attributes & spatial layout that can be changed to express associations The majority found spatial hypertext helpful in organizing Participants appreciated: expressive power and freedom of the workspace directly accessible metadata information of music music previews for remembering music Participants missed: interactive hierarchical / tree views music previews for understanding music

12 Organization using categories & subcategories with labels
Preliminary Study Organization using categories & subcategories with labels Figure shows part of the finished workspace for one participant. The songs are divided into those the participant knew and those he did not. The unknown songs were organized based on the participant’s opinion about the artist (“generally like the artist”, “neutral about the artist”). The songs he knew were grouped based on personal assessments of the music (“like but hard to listen to”, “cheesy”, “hate”, “fun songs”, and “too slow”) and associations the music had for the participant (“remind me of my wife”). Some of these categories had further subcategories such as the “I swear my wife has these songs on a mix-CD” under “remind me of my wife” and “classics” under “fun songs”. This participant’s workspace shows a greater degree of structure and interpretation than the workspaces created by most of the participants.

13 Music Access & Implicit Attributes
Considerable research into extracting and using implicit cues for associating music to overcome: limitations of metadata & statistics to describe music concepts unwillingness of users to provide explicit feedback cost of employing human experts to find music similarity Music Management extended by: signal features (e.g. intensity, timbre and rhythm) collaborative filtering e.g. Last.fm, Genius, Music Gathering Application, Flytrap, Musicovery, MusicSim, Musicream In an effort to escape from the limitations of using metadata to describe custom music concepts and the unwillingness of users to provide explicit feedback, there is considerable research into extracting and using implicit cues for associating music

14 Statistics of Artist Similarity Relatedness Assessment
MusicWiz Architecture Metadata Module Audio Signal Module Lyrics Module Worksp. Express. Module Artist Module Relatedness Table Inference Engine Workspace Status Related Song Titles Music Collection Songs & Metadata Songs MusicWiz Interface Lyrics Statistics of Artist Similarity Internet Relatedness Assessment Sim. Values Music management environment that combines: explicit information implicit information non-verbal expression of personal interpretation Two basic components: interface for interacting with the music collection inference engine for assessing music relatedness

15 MusicWiz Interface Hierarchical Folder Tree View Workspace
Playlist Pane Related Songs & Search Results View Folder Tree View: Provides a location-based hierarchical views of the music collection. Related Songs & Search Results View: Displays songs that are similar to the currently selected songs in the system tree view or the results of the search. Songs then can be dragged and dropped from the list into the workspace and the playlist pane to update collections and playlists respectively. Playback Controls The MusicWiz interface

16 MusicWiz Inference Engine
5 modules for extracting, processing and comparing artists, metadata, audio content, lyrics, and workspace expression Overall Similarity (S1, S2) = = W1 * Overall Metadata Similarity(S1, S2) + + W2 * Overall Audio Signal Similarity(S1, S2) + + W3 * Overall Lyrics Similarity(S1, S2) + + W4 * Overall Workspace Expression Similarity(S1, S2) where, S1, S2 are the songs under comparison and Wn, n = 1..4 the user adjusted weights of the specialized similarity assessments Each module produces an assessment of relatedness (a normalized value ranging from 0 – songs very dissimilar, to 1 – songs almost identical)

17 MusicWiz Inference Engine – Artist Module
Assesses relatedness in music using online resources: human evaluations of artist similarity from: Similar Artists lists of the All Music Guide website co-occurrence of artists in playlists from: OpenNap file-sharing network Art of the Mix website Its output is used directly by the metadata module when comparing the artist name

18 MusicWiz Inference Engine – Metadata Module
Evaluates the pair wise similarity of the metadata values of all songs String comparison is applied to the title, genre, album-name, and year of the songs as well as the file-system path where they are stored uses a distance metric that combines the Soundex and the Monge-Elkan algorithms The Soundex phonetic algorithm is valuable for identifying similarity between transliterated or misspelled names. It uses the six phonetic classifications of human speech sounds to convert the input into a string that identifies the set of words that are phonetically alike (similar pronunciation). The Monge-Elkan algorithm identifies similarity among expressions where the words are listed in a different order; it is a dynamic programming algorithm that calculates the distance of two strings based on the cost of transformations required to convert the first expression into the second expression.

19 MusicWiz Inference Engine – Audio Signal Module
Uses signal processing techniques to analyze music content Extracts and compares information about the harmonic structure and acoustic attributes of music beat, brightness, pitch, starting note and potential key (music scale) of the song The greater the distance in the beat, brightness and pitch levels, the less likely songs are perceived as being of similar style or mood

20 MusicWiz Inference Engine – Lyrics Module
Textually analyzes the lyrics Lyrics are scraped from a pool of popular websites for: display in music objects comparison Lyrical comparison uses term vector cosine similarity: Overall Lyrics Similarity (S1, S2) = cos(θ) The more words lyrics have in common, the greater the possibility that the songs are motivated by or describe related themes dict = { dog, cat, lion } Document 1 “cat cat” → (0,2,0) Document 2 “cat cat cat” → (0,3,0) Document 3 “lion cat” → (0,1,1) Document 4 “cat lion” → (0,1,1)

21 MusicWiz Inference Engine – Workspace Expression Module
Music objects can be related visually and spatially Spatial parser identifies relations between the music objects Recognizes three types of spatial structures: lists, stacks and composites List Stack Composite

22 MusicWiz Functionality
Music collection can be explored by filtering: attribute values (i.e. id3 tags, audio signal attributes and lyrics) similarity values (i.e. overall similarity) Playlists can be created: manually: songs can be added from the left-side views & the workspace) automatically: filter - based mode: selection based on the ID3 tags similarity - based mode: selection based on the relatedness of songs on the current playlist Id3- metadata container to allow information such as the title, artist, album, track number, and other information about the file to be stored in the file itself.

23 MusicWiz Evaluation 20 participants were asked to:
Task 1: organize 50 rock songs into sub-collections according to their preference Task 2: form three, twenty-minute long playlists based on three different moods or occasions of their choice Task 3: form three six-song long playlists, where each of them had to be related to a provided “seed”-song (not from the fifty of the original collection)

24 MusicWiz Evaluation Configuration No Suggestions Suggestions
Importance of the workspace Importance of the music previews Four groups of system use: Group 1 (no workspace / no suggestions) had to complete the three tasks using MusicWiz’s browsing, searching, and playback functionality and Windows Explorer folders to form the collections and playlists Group 2 (no workspace / with suggestions) used the same features as group 1 but also received suggestions from the similarity inference engine Group 3 (with workspace / no suggestions) had to perform the tasks using the features available in group 1 but used the MusicWiz workspace to create the collections and playlists Group 4 (with workspace / with suggestions) had all MusicWiz features Configuration No Suggestions Suggestions No Workspace Group 1 Group 2 Workspace Group 3 Group 4

25 Task 1 - Organization of Music
Statement (1 – “I strongly disagree” to 7 – “I strongly agree”) Group 1 Group 2 Group 3 Group 4 The system support in organizing effortlessly / quickly was enough 4.4 5.4 5.6 6.2 Enjoyed doing task 5.8 6.4 6 Organization will be easily understood by others 4.2 Configuration No Suggestions Suggestions No Workspace Group 1 Group 2 Workspace Group 3 Group 4

26 Tasks 2 & 3 – Playlist Creation
Statement (1 – “I strongly disagree” to 7 – “I strongly agree”) Task Group 1 Group 2 Group 3 Group 4 System support for quick selection was enough Two 4.8 6.2 5.8 Three 4.4 6.8 5.6 finding music 6 5.4 4.6 6.4 Enjoyed doing task 5.2 6.6 Configuration No Suggestions Suggestions No Workspace Group 1 Group 2 Workspace Group 3 Group 4

27 Apollo A hierarchically structured set of freeform canvases for creating and manipulating musical ideas (text and audio recording) Support for searching the hierarchy with either melodic or text-based queries Less structured and prescriptive approach to recording and developing musical inspirations Bainbridge, David, Brook J. Novak, and Sally Jo Cunningham. "A user-centered design of a personal digital library for music exploration.“ Proceedings of the 10th annual joint conference on Digital libraries. ACM, 2010.

28 Apollo

29 Apollo

30 Bag of Audio Words Audio can be treated as a kind of special document which is composed of unordered collection of audio words Features from each audio frame map to certain audio word A piece of music = A bag of Audio words

31 Topics Summary General Audio Speech Music
Audio cues, spatialized audio Speech Segmentation, speaker id, recognition Music Interactive music, summarization, organization, Bag of audio


Download ppt "CS 445/656 Computer & New Media"

Similar presentations


Ads by Google