Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan 15 March 2004
Outline Introduction Motivation Architecture of PVCAIS Implementation Media Acquisition Module Archive Indexing Module Videoconference Accessing Module Implementation Conclusions Future Work
Introduction PVCAIS stands for Personal Video Conference Archives Indexing System A system that provides convenient searching and browsing support for videoconferencing users on past videoconference archives
Introduction What is video conference? A real-time communication technology which combines different media that may include: audio, video, text chat, file transfer, whiteboard and shared applications More precisely is “multimedia conference”
Motivation Videoconference is becoming popular in education, business and personal communication Participants wish to keep videoconference archives for later references Normal video and audio files are neither searchable nor helpful to recall their contents Indexing of videoconference archives has not been investigated till now
Architecture of PVCAIS Consists of 3 modules: Media Acquisition Module Archive Indexing Module Videoconference Accessing Module
Architecture of PVCAIS Archive Indexing Media Acquisition Videoconference Accessing
Architecture Media Acquisition Module Extracts channel data and forms media files Videoconferencing physically contains 4 types of channels: Audio, Video, Data and Control Audio and Video channels: transmit incoming/ outgoing audio and video information Data channel: carries information for user application such as Text Chat, Whiteboard and File Transfer Control channel: transmits system control information such as Member Information
Architecture Media Acquisition Module
Architecture Media Acquisition Module
Architecture Archive Indexing Module Raw files are extracted in Media Acquisition Module Need to implement some multimedia indexing functions to retrieve more information These includes: Face Detection, Face Recognition, Speech Recognition, Time-based Text Merging, Keyword Selection, Title Generation
Architecture Archive Indexing Module Face Detection and Recognition Associate human faces in Video-in with name Need to keep a face base If no match in the face base, ask remote user to enter the name
Architecture Archive Indexing Module Speech Recognition Generate speech script from audio archive Speech of a videoconferencing contains the most information Can use commercial library: Microsoft SAPI, IBM Via Voice
Architecture Archive Indexing Module Time-based Text Merging Merge the Speech transcript, Chat script, Whiteboard script and slide text archive into the Text Source according to their timestamp Keyword Selection Take the Text Source as input Generate keyword for the videoconference
Architecture Archive Indexing Module Title Generation Take the Text source as input Automatically generate a title for the videoconference Generate XML index file Integrate all the archives Store all the related files of a videoconference into a single directory
Architecture Videoconference Accessing Module Provides an interface for user to manage, search and review all indexed conference archives Allows user to modify the content of a conference, such as editing title or keywords, or delete a conference Allows user to search for a conference by different criteria, such as period of meeting, member name or keyword Allows user to review a conference by playing back different media in a synchronized way
Implementation Face Verification Feature Each registered user is assigned with a user ID and his/her face is saved in face base Before joining a videoconference, PVCAIS needs to verify the face of the user against his/her user ID
Implementation NetMeeting 3.0 A Windows feature that provide Internet conferencing function Support video, audio and data conferencing including application sharing, chat, whiteboard and file transfer Other features include remote desktop sharing
Implementation NetMeeting 3.0 SDK An extension of NetMeeting, provides an interface for programmers and Web developers to integrate conferencing capabilities into their applications API is in the form of COM interfaces and functions
Implementation A simple NetMeeting compatible videoconference program built on top of the NetMeeting 3.0 SDK Support: Video Audio Text Chat File Transfer Whiteboard
Implementation Media Acquisition Module By directly using the functions of the API, the following raw data can be obtained: the members information file transfer record text messages record Video, audio and whiteboard data cannot be directly obtained
Implementation Media Acquisition Module Video create a thread to check the display of the video windows if scene change is detected, the video will be captured and stored as a still image the stored images are key frames of the conference
Implementation Media Acquisition Module Audio create a thread to record the local audio from the microphone members of the conference will continuously exchange the audio data all the received audio files and locally recorded audio files will be combined to generate a single audio file
Implementation Media Acquisition Module Whiteboard cannot capture the NetMeeting whiteboard information because the format of the data is not stated in the API We have designed and created our own whiteboard and data format
Implementation Archive Indexing Module The stored key-frames will be used for face detection and recognition after the conference The final audio file will be used for voice recognition, the voice engine used is Microsoft SAPI
Implementation Videoconference Accessing Module An interface for conferences management search conference by member name or chatting content review conference by playing back the content of the conference, including audio, key-frames, member information, file exchange record and chatting content
Implementation Videoconference Accessing Module SMIL stands for Synchronized Multimedia Integration Language HTML-like language can integrate streaming audio and video with images, text, or any other media type into one presentation
Conclusions We developed a videoconferencing client All the media can be extracted in Media Acquisition Module Multimedia indexing functions are implemented A stand alone Videoconference Accessing Module is being developed
Future Work XML Better searching method Better Graphical User Interface
Q & A Session