Presentation is loading. Please wait.

Presentation is loading. Please wait.

Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof.

Similar presentations


Presentation on theme: "Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof."— Presentation transcript:

1 Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof. K.W. Ng Markers:Prof. K. H. Lee Prof. Y. S. Moon 3 May 2000

2 Abstract n Digital Library is getting more and more popular, due to its strength in searching and retrieving information. n Web-based environment provides a better media for information sharing. n The trend that more multimedia information are needed to be stored instead of pure text. n Research on the techniques for multimedia information searching and retrieval in a web-based digital library.

3 Presentation Outline n XML overview n Data structures for multimedia news archives u for video clips u using graph structures of XML u giving annotation n Architecture and agents of digital library n Research plan and conclusion

4 Overview of XML n XML - eXtensible Markup Language n Proposed by WWW Consortium, in 1998 n To define a complete, platform-independent and system-independent environment for the authoring and delivery of information resources across the web. n Semistructured

5 How XML differs from HTML n Extensibility - new tags may be defined at will n Structure - XMLStructures can be nested to arbitrary depth n Structure - XML Structures can be nested to arbitrary depth n Validation - An XML document can contain an optional description of its grammar

6 XML Documents n use elements and attributes to describe your document Press warning appropriate, says Beijing Kong Lai-fan Greg Torode Beijing yesterday defended remarks made by senior SAR-based official Wang Fengchao that local media should avoid reporting separatist views.... database news datetitlecontentreporter

7 Document Type Definition n providing the definition of a document type, for member documents to follow <!DOCTYPE database [ <!ELEMENT date year CDATA#REQUIRED monthCDATA#REQUIRED dayCDATA#REQUIRED> ]>

8 Data Structure for News Videos n Multimedia presentation n Graph structure property u keyword directory u thesaurus / classification directory u person / place directory u Chinese-English dictionary n Semistructure property u annotation

9 Indexing a Video n n Segment the video hierarchically into scenes. (A video is composed of one or more related scenes.) n n Describe the complete news video using bibliographic information (title, source, reporters, and abstract, etc…) plus format, duration, etc. n n Describe each scene – id, start frame (time), end frame (time), keyframe, and scripts. n n A OCR tools is implemented for indexing the videos in last semester.

10 Indexing a Video For a news clip: id = 1234 title = N. T. swamped after torrential downpour date = 1999-9-9 source = Hone Kong ATV reporter = Chan Tai Man abstract = Large areas of the northwest New Territories were under water yesterday as torrential rain swept across the SAR. duration = 2:34:56 has_scene = 1234.1, 1234.2, 1234.3 format = MPEG language = Cantonese identifier = http://www.cse.cuhk.edu.hk/1.mpg”

11 Indexing a Video For a scene: id = 1234.1 belong_to = 1234 next_scene = 1234.2 prev_scene = null start_time = 0:0:00 end_time = 0:30:45 keyframe = 1238 transcrpt =...

12 In NewsDatabase.XML: 2000 4 15 N.T.swamped after torrential downpour Large areas of the northwest New Territories were under water yesterday as torrential rain swept across the SAR.... Sample News Entry

13 Keyword Directory n Each news has its own keyword elements n Build a keyword directory containing all keywords n Every keyword points to the news that having the same keyword

14 Clifford LoN. T. swamped after torrential downpour flood15 April, 2000 ID = 0010 news titledatekeywordreporter … News Database is a tree structure Francefuelgunflood 001001370017 keyword ID … … Keyword directory would be pointed by news entries, and also point to news entries. ID = 0043ID = 0010ID = 0017ID = 0015 database news … Keywords point to news database again to for a graph structure Keyword Directory

15 In NewsDatabase.XML: 2000 4 15 N.T.swamped after torrential downpour flood storm Large areas of the northwest New Territories were under water yesterday as torrential rain swept across the SAR.... Keyword Directory

16 In KeywordDirectory.XML:... 0010 0017 0137...... Keyword Directory

17 To search for terms with similar meaning to the keyword organization association World Trade Organisation WTO... Thesaurus/Classification Directory

18 To search for subset terms of the given keyword organization association flood earthquake fire storm disaster... Thesaurus/Classification Directory

19 Web Search Engine

20 Person Directory ( Person ID, name, newsid, …) Fengchao Wang Chinese The central Government’s Liaison Office deputy director 0123 0245...... Person / Place Directory

21 In news database: Press warning appropriate, says Beijing Kong Lai-fan Beijing yesterday defended remarks madeby senior SAR-based official Wang Fengchao that local media should avoid reporting separatist views.... Person / Place Directory

22 15 April, 2000 media Presswarning appropriate, says Beijing ID = 0123 news titledatekeywordcontent … Person Wang Fengchao JohnTomRobertWang Fengchao 012303690246 person ID … … Person directory would be pointed by news entries, and also point to news entries. ID = 0258ID = 0123ID = 0246ID = 0155 database news … Person entries point to news database again to form a graph structure Person / Place Directory

23 Place Directory: category structure China 5839... =“hongkong” class=“SAR”> Hong Kong New Territories... 0010...... ` Person / Place Directory

24 In news database: N.T.swamped after torrential downpour Clifford Lo Large areas of the northwest New Territories were under water yesterday as torrential rain swept across the SAR.... Person / Place Directory

25 Chinese-English Dictionary n Translate the keywords for searching n We can have English to Chinese dictionary: <e2cdict> 氾濫 氾濫 水災 水災 洪水 洪水... </english></english> </e2cdict>

26 Chinese-English Dictionary n We can have Chinese to English dictionary: <c2edict> WTO WTO World Trade Organization World Trade Organization</english></chinese>... </chinese> </c2edict>

27 Annotation n XML is semistructured! n More flexibility in adding tags to contents. n Add our tags to give annotation to the strings to provide “meanings” to it. n Hence, more expressive queries can be supported.

28 Annotation: example Radioactive coolant water leaked at a nuclear reactor in western Japan yesterday, but the accident had no impact on the environment, the plant director said. "Today when the plant was operating with its usual output, a worker found a small leak of primary coolant water from a pipe of the No 2 reactor," said Katsuhiko Takahashi. </content> n We understand… but the system doesn’t…

29 Annotation: example Radioactive coolant water leaked at a nuclear reactor in western Japan yesterday, but the accident had no impact on the environment, the plant director said. " Today when the plant was operating with its usual output, a worker found a small leak of primary coolant water from a pipe of the No 2 reactor," said Katsuhiko Takahashi. Radioactive coolant water leaked at a nuclear reactor in western Japan yesterday, but the accident had no impact on the environment, the plant director said. " Today when the plant was operating with its usual output, a worker found a small leak of primary coolant water from a pipe of the No 2 reactor," said Katsuhiko Takahashi.</content>

30 Usage of Annotation n So, we can have queries like: u All the speeches from Zhu Rongji in last month u All storms which kill more than 200 people n We can also make some links to give more details to people or places, etc.

31 Architecture of Digital Library n n Designing stores and query processors for semistructured data. n n Traditional database systems use a client/server architecture. n n Over the distributed environment has given rise to two new architectures, they are data warehouses and mediators. n n Video servers will also be integrated to our system to provide video streaming.

32 Data Warehouse data update client warehouse data server data server data server data answer query

33 Mediator answer query client mediator server data server data server data answer query

34 Agents Using Structured Data n n Larger demands for more structured data than loosely structured HTML. n n Using semistructured XML data can provide a very good environment for Web agents. n n Our main aim of implementing our agent is to illustrate that our semistructured XML data can provide a better environment for an agent to work.

35 Research Plan & Conclusion n Design of the structure in XML semistructured format u to support multimedia data, multilingual data, and various kind of retrieval. n n Architecture of the system that allows multiple sources of data. n n Implementing an agent is to illustrate that our semistructured data can provide a better environment for an agent to work.

36 Q & A Session


Download ppt "Techniques for Information Searching and Retrieval of Web-based Multimedia Digital Library Presented by:Vincent Cheung Supervisors: Prof. Michael Lyu Prof."

Similar presentations


Ads by Google