1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan 15 March 2004
2 Outline Introduction Motivation Architecture of PVCAIS Media Acquisition Module Archive Indexing Module Videoconference Accessing Module Implementation Conclusions Future Work
3 Introduction PVCAIS stands for Personal Video Conference Archives Indexing System A system that provides convenient searching and browsing support for videoconferencing users on past videoconference archives
4 Introduction What is video conference? A real-time communication technology which combines different media that may include: audio, video, text chat, file transfer, whiteboard and shared applications More precisely is “multimedia conference”
5 Motivation Videoconference is becoming popular in education, business and personal communication Participants wish to keep videoconference archives for later references Normal video and audio files are neither searchable nor helpful to recall their contents Indexing of videoconference archives has not been investigated till now
6 Architecture of PVCAIS Consists of 3 modules: Media Acquisition Module Archive Indexing Module Videoconference Accessing Module
7 Architecture of PVCAIS
8 Architecture Media Acquisition Module Extracts channel data and forms media files Videoconferencing physically contains 4 types of channels: Audio, Video, Data and Control Audio and Video channels: transmit incoming/ outgoing audio and video information Data channel: carries information for user application such as Text Chat, Whiteboard and File Transfer Control channel: transmits system control information such as Member Information
9 Architecture Media Acquisition Module
10 Architecture Media Acquisition Module
11 Architecture Archive Indexing Module Raw files are extracted in Media Acquisition Module Need to implement some multimedia indexing functions to retrieve more information These includes: Face Detection, Face Recognition, Speech Recognition, Time-based Text Merging, Keyword Selection, Title Generation
12 Architecture Archive Indexing Module Face Detection and Recognition Associate human faces in Video-in with name Need to keep a face base If no match in the face base, ask remote user to enter the name
13 Architecture Archive Indexing Module Speech Recognition Generate speech script from audio archive Speech of a videoconferencing contains the most information Can use commercial library: Microsoft SAPI, IBM Via Voice
14 Architecture Archive Indexing Module Time-based Text Merging Merge the Speech transcript, Chat script, Whiteboard script and slide text archive into the Text Source according to their timestamp Keyword Selection Take the Text Source as input Generate keyword for the videoconference
15 Architecture Archive Indexing Module Title Generation Take the Text source as input Automatically generate a title for the videoconference Generate XML index file Integrate all the archives Store all the related files of a videoconference into a single directory
16 Architecture Videoconference Accessing Module Provides an interface for user to manage, search and review all indexed conference archives Allows user to modify the content of a conference, such as editing title or keywords, or delete a conference Allows user to search for a conference by different criteria, such as period of meeting, member name or keyword Allows user to review a conference by playing back different media in a synchronized way
17 Implementation Face Verification Feature Each registered user is assigned with a user ID and his/her face is saved in face base Before joining a videoconference, PVCAIS needs to verify the face of the user against his/her user ID
18 Implementation NetMeeting 3.0 A Windows feature that provide Internet conferencing function Support video, audio and data conferencing including application sharing, chat, whiteboard and file transfer Other features include remote desktop sharing
19 Implementation NetMeeting 3.0 SDK An extension of NetMeeting, provides an interface for programmers and Web developers to integrate conferencing capabilities into their applications API is in the form of COM interfaces and functions
20 Implementation A simple NetMeeting compatible videoconference program built on top of the NetMeeting 3.0 SDK Support: Video Audio Text Chat File Transfer Whiteboard
21 Implementation Media Acquisition Module By directly using the functions of the API, the following raw data can be obtained: the members information file transfer record text messages record Video, audio and whiteboard data cannot be directly obtained
22 Implementation Media Acquisition Module Video create a thread to check the display of the video windows if scene change is detected, the video will be captured and stored as a still image the stored images are key frames of the conference
23 Implementation Media Acquisition Module Audio create a thread to record the local audio from the microphone members of the conference will continuously exchange the audio data all the received audio files and locally recorded audio files will be combined to generate a single audio file
24 Whiteboard cannot capture the NetMeeting whiteboard information because the format of the data is not stated in the API We have designed and created our own whiteboard and data format Implementation Media Acquisition Module
25 Implementation Archive Indexing Module The stored key-frames will be used for face detection and recognition after the conference The final audio file will be used for voice recognition, the voice engine used is Microsoft SAPI
26 Implementation Videoconference Accessing Module An interface for conferences management search conference by member name or chatting content review conference by playing back the content of the conference, including audio, key-frames, member information, file exchange record and chatting content
27 Implementation Videoconference Accessing Module SMIL stands for Synchronized Multimedia Integration Language HTML-like language can integrate streaming audio and video with images, text, or any other media type into one presentation
28 Conclusions We developed a videoconferencing client All the media can be extracted in Media Acquisition Module Multimedia indexing functions are implemented A stand alone Videoconference Accessing Module is being developed
29 Future Work XML Better searching method Better Graphical User Interface
30 Q & A Session