Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Retrieval Hanyang Univ. Distributed Computing Systems Laboratory Query  Retrieval Techniques ( Query – Document )

Similar presentations


Presentation on theme: "Information Retrieval Hanyang Univ. Distributed Computing Systems Laboratory Query  Retrieval Techniques ( Query – Document )"— Presentation transcript:

1 Information Retrieval Hanyang Univ. Distributed Computing Systems Laboratory Query  Retrieval Techniques ( Query – Document )

2 Query Expansion Hanyang Univ. Distributed Computing Systems Laboratory “Indonesia 의 어떤 섬인데 기억이 안나는 경우 ?” “Java 와 비슷한 프로그래밍 언어였는데 기억이 안나는 경우 ?” “Java 만 치면 내가 원하는게 나오겠지 ?” “Java 랑 비슷하고 OOP 언어가 뭐가 있지 ?” Problem of Ambiguity Term ( Virus, Jaguar, Apple, Plane ( 비행기, 판자 ), … )  Query Recommendation Specific Query / Broad meaning Query ( 단어의 의미를 파악 가능 )  ODP, Word Net

3 Motive System ( Google’s Wonder Wheel ) Hanyang Univ. Distributed Computing Systems Laboratory Java C++ Ruby Perl Applet Progr ammi ng Indon esia JSP Hadoo p

4 Programming Computers Algorithms Games Language Java C++ Perl Compilers Ruby Applet History Create Term Network Hanyang Univ. Distributed Computing Systems Laboratory ODP / Word Net / thesaurus ODP / Word Net / thesaurus

5 Create Term Network Hanyang Univ. Distributed Computing Systems Laboratory Java C++ Ruby Perl Applet Progra mming Compu ter

6 ODP Problem Hanyang Univ. Distributed Computing Systems Laboratory Using ODP Metadata to Personalize Search (SIGIR 2005) - manually annotate web pages and export this information in RDF Format - just about 0.1 percent of the Web pages indexed by Google  모든 Term 에 대한 Term Network 불가능

7 ODP Problem Hanyang Univ. Distributed Computing Systems Laboratory 여러 개의 단어로 이루어진 쿼리는 Term Network 생성이 어려움 ( e.g. Java map, Google File System, MS Bing … ) 새롭게 생성되는 Term 에 대해서 지속적인 추가가 어려움 ( e.g. Hadoop, Mono, Bing, … )

8 Folksonomy Hanyang Univ. Distributed Computing Systems Laboratory Hierarchical Structure / Clustering Hierarchical Structure / Clustering Folksonomy e.g. Social Annotation (Flickr, Delicious, Blog,... ), User Click Stream, Bookmark, Desktop Data Folksonomy e.g. Social Annotation (Flickr, Delicious, Blog,... ), User Click Stream, Bookmark, Desktop Data

9 Create Term Network Hanyang Univ. Distributed Computing Systems Laboratory Programming Java C++ Web Dev JDBC JSP Indonesia Coffee Social Annotation (Tag) Java map

10 Create Term Network (Add) Hanyang Univ. Distributed Computing Systems Laboratory Java C++ Ruby Perl Applet Progra mming Compu ter Indone sia JDBC Web Dev

11 Term Network Expansion Hanyang Univ. Distributed Computing Systems Laboratory Indo nesia Europe China Korea Travel Java map Map Asia Java Japan

12 Retrieval Result Hanyang Univ. Distributed Computing Systems Laboratory 위의 정보를 이용한 새로운 Technique 도 가능

13 Create Term Network Hanyang Univ. Distributed Computing Systems Laboratory Query Analysis Hadoop Java GFS Big Table C++ Google Yahoo

14 Create Term Network (Add) Hanyang Univ. Distributed Computing Systems Laboratory Java C++ Ruby Perl Applet Progra mming Compu ter Coffee JDBC Web Dev Hadoop

15 Show Term Network (Ranking Algorithm) Hanyang Univ. Distributed Computing Systems Laboratory Java C++ Ruby Perl Applet Progra mming Indone sia JSP Hadoop Web- Dev

16 Compare with Google’s Wonder Wheel Hanyang Univ. Distributed Computing Systems Laboratory Java C++ Ruby Perl Applet Progr ammi ng Indon esia JSP Hadoo p Web- Dev Ctrl Button  Change Color 쿼리의 확장이 제한적

17 ‘Google ‘ 과 같은 쿼리는 none of personalization 가 이익 Sports fan  ‘Office’ ( MS Office ) irrelevant result could erroneously be move to the front and the user may become confused. Some queries for some users  great improvement but it can also be unnecessary and even harmful. ( A Large-scale Evaluation and Analysis of Personalized Search Strategies WWW 2007 ) Personalization ( Far from Optimal ) Hanyang Univ. Distributed Computing Systems Laboratory

18 Problem of Personalization ( Identity & Privacy ) Hanyang Univ. Distributed Computing Systems Laboratory

19 Problem of Personalization Hanyang Univ. Distributed Computing Systems Laboratory Server Side  Social Annotation 분석 시 각 사이트 마다 ID 불일치의 문제  모든 Term 에 대한 유저의 선호도 중복 저장  로그인시만 확인 가능  유저가 자신의 정보가 서버에 저장되지 원하지 않을 경우  Performance 가 느리다.

20 Problem of Personalization Hanyang Univ. Distributed Computing Systems Laboratory Client Side  Social Annotation 분석 시 각 사이트의 모든 정보를 통해 해당 Annotation 간의 관계를 분석해야 하는데, PC 의 성능, 용량등 여러 가지 문제점이 존재  Privacy 는 유지되지만, 유저의 데이터를 이용한 활용이 불가능 ( 맞춤형 광고 )

21 Problem of Personalization ( Performance ) Hanyang Univ. Distributed Computing Systems Laboratory

22 Keyword docID ranking docID summary Google Cluster Architecture Hanyang Univ. Distributed Computing Systems Laboratory Personal Server Cache Personal Server

23 Indexing Hanyang Univ. Distributed Computing Systems Laboratory Serve r ODP / WN Folksonomy Java C++ Ruby Perl Applet Progra mming Indone sia JSP Hadoop

24 Indexing Hanyang Univ. Distributed Computing Systems Laboratory Client Folksonomy e.g. Social Annotation (Flickr, Delicious, Blog,... ), User Click Stream, Bookmark, Desktop Data Folksonomy e.g. Social Annotation (Flickr, Delicious, Blog,... ), User Click Stream, Bookmark, Desktop Data User A’s Preference Indexing

25 Hanyang Univ. Distributed Computing Systems Laboratory Client Serve r User’s Term Network Delicious Data Set (User, URL, Tag, Time) Delicious Data Set (User, URL, Tag, Time) ( Del.icio.us, java.sun.com,, 2009-07-09 )

26 Retrieval Hanyang Univ. Distributed Computing Systems Laboratory Serve r Client User Preference 가 없는 쿼리면, 일반적인 Term Network 를 전송. 클라이언트에서 유저의 Preference 확인 및 랭킹을 통해 보여줌 ( Sun ) User Preference 가 없는 쿼리면, 일반적인 Term Network 를 전송. 클라이언트에서 유저의 Preference 확인 및 랭킹을 통해 보여줌 ( Sun ) HTTP Header : 검색 결과는 일반적인 Java 유저에 맞는 Term Network

27 Indexing Hanyang Univ. Distributed Computing Systems Laboratory Java C++ Ruby Perl Applet Programmi ng Indonesia JSP Hadoop Serve r Indexing 을 통하여 새로운 Term 에 대하여 일정기간 마다 추가 Query Log

28 Architecture Hanyang Univ. Distributed Computing Systems Laboratory WWW, SIGIR, KDD, CIKM … 2010 ( Web Service-based architecture and applications, Web service engineering, Web usability and accessibility, Adaptive and personalized Web application ) ( IR Platforms and Scalability e.g. IR architecture including distributed/P2P, efficiency, scalability, indexing, compression) ( Useful system architecture ) ( IR Architectures, Scalability and Efficiency ) ( 해당 분야 관련 논문의 실험 결과 체크 )


Download ppt "Information Retrieval Hanyang Univ. Distributed Computing Systems Laboratory Query  Retrieval Techniques ( Query – Document )"

Similar presentations


Ads by Google