HYP Progress Update By Zhao Jin
Outline Background Progress Update
Background Query (Text-based) –The set of keywords to be entered into the system to retrieve the desired information or resources –Main category Traditional IR Web (ex. Google) OPAC (ex. LINC) Video (ex. TRECVID)
Background Query Analysis –To analyze the pattern and hidden information in the queries –To efficiently classify and support such queries.
Progress update Mid-May to Early June –Background reading –Around 30 to 40 papers on various topic –Summarizing of key points in the paper
Progress update Mid-June to late-June –Log analysis BBC Video Query NUS OPAC Query –Background reading on OPAC and TRECVID
Progress update July to now –Follow up on two main topics Query classification and division on content-based and feature-based keywords (OPAC)Query classification and division on content-based and feature-based keywords Identifying ASR-oriented keywords in a video query (TRECVID)Identifying ASR-oriented keywords in a video query –Background reading on MARC, wordnet and LOC subject heading
Progress update Plan for the near future –Refine and experiment with the current ideas –Log analysis –Background reading (Textbook & Related paper) –Preparation for implementation
Q&A?
End of progress update Thank you for your attention!
Two types of keywords Content-Based Keyword (CBK) –The keywords that concern what the item is about –Ex. title, subject heading, etc Feature-Based Keyword (FBK) –The keywords that concern the features of the item. –Ex. author, publisher, genre, medium
Benefits Benefits: –Faster retrieval –More precise retrieval –Help in relevance ranking
Possible implementation Possible implementation: –term co-occurrence for concept division –list of special words and machine learning for FBK and CBK division –wordnet for classification among CBKs
Possible implementation Possible implementation: –CL and IL search algorithms for actual searching with CBKs. –list of special words and machine learning for classification among FBKs. –Marc record search algorithms for actual searching with FBKs. Back
Means to retrieve shots Example: –To find shots of “Bill Clinton” Face recognition Closed-caption Automatic Speech Recognition (ASR)
Metrics Common VS Special (In reality) –How common in reality is the concept represented by the keyword. Generic VS Specific –How generic is the concept represented by the keyword.
Metrics Concrete VS Abstract –Whether the keyword represented is concrete or abstract Topic frequency (Low VS High) –How often the keyword becomes (closely related to) a topic.
Metrics Formal VS Informal –Whether the keyword is in formal or informal language Written VS spoken –Whether the keyword is in spoken or written language
Metrics Feature-level VS Content-level –Whether the keyword is about the feature of the video (ex. camera motion) or the content of the video Back