Download presentation
Presentation is loading. Please wait.
Published byAshley Reynolds Modified over 8 years ago
1
Information Retrieval Hanyang Univ. Distributed Computing Systems Laboratory Query Retrieval Techniques ( Query – Document )
2
Query Expansion Hanyang Univ. Distributed Computing Systems Laboratory “Indonesia 의 어떤 섬인데 기억이 안나는 경우 ?” “Java 와 비슷한 프로그래밍 언어였는데 기억이 안나는 경우 ?” “Java 만 치면 내가 원하는게 나오겠지 ?” “Java 랑 비슷하고 OOP 언어가 뭐가 있지 ?” Problem of Ambiguity Term ( Virus, Jaguar, Apple, Plane ( 비행기, 판자 ), … ) Query Recommendation Specific Query / Broad meaning Query ( 단어의 의미를 파악 가능 ) ODP, Word Net
3
Motive System ( Google’s Wonder Wheel ) Hanyang Univ. Distributed Computing Systems Laboratory Java C++ Ruby Perl Applet Progr ammi ng Indon esia JSP Hadoo p
4
Programming Computers Algorithms Games Language Java C++ Perl Compilers Ruby Applet History Create Term Network Hanyang Univ. Distributed Computing Systems Laboratory ODP / Word Net / thesaurus ODP / Word Net / thesaurus
5
Create Term Network Hanyang Univ. Distributed Computing Systems Laboratory Java C++ Ruby Perl Applet Progra mming Compu ter
6
ODP Problem Hanyang Univ. Distributed Computing Systems Laboratory Using ODP Metadata to Personalize Search (SIGIR 2005) - manually annotate web pages and export this information in RDF Format - just about 0.1 percent of the Web pages indexed by Google 모든 Term 에 대한 Term Network 불가능
7
ODP Problem Hanyang Univ. Distributed Computing Systems Laboratory 여러 개의 단어로 이루어진 쿼리는 Term Network 생성이 어려움 ( e.g. Java map, Google File System, MS Bing … ) 새롭게 생성되는 Term 에 대해서 지속적인 추가가 어려움 ( e.g. Hadoop, Mono, Bing, … )
8
Folksonomy Hanyang Univ. Distributed Computing Systems Laboratory Hierarchical Structure / Clustering Hierarchical Structure / Clustering Folksonomy e.g. Social Annotation (Flickr, Delicious, Blog,... ), User Click Stream, Bookmark, Desktop Data Folksonomy e.g. Social Annotation (Flickr, Delicious, Blog,... ), User Click Stream, Bookmark, Desktop Data
9
Create Term Network Hanyang Univ. Distributed Computing Systems Laboratory Programming Java C++ Web Dev JDBC JSP Indonesia Coffee Social Annotation (Tag) Java map
10
Create Term Network (Add) Hanyang Univ. Distributed Computing Systems Laboratory Java C++ Ruby Perl Applet Progra mming Compu ter Indone sia JDBC Web Dev
11
Term Network Expansion Hanyang Univ. Distributed Computing Systems Laboratory Indo nesia Europe China Korea Travel Java map Map Asia Java Japan
12
Retrieval Result Hanyang Univ. Distributed Computing Systems Laboratory 위의 정보를 이용한 새로운 Technique 도 가능
13
Create Term Network Hanyang Univ. Distributed Computing Systems Laboratory Query Analysis Hadoop Java GFS Big Table C++ Google Yahoo
14
Create Term Network (Add) Hanyang Univ. Distributed Computing Systems Laboratory Java C++ Ruby Perl Applet Progra mming Compu ter Coffee JDBC Web Dev Hadoop
15
Show Term Network (Ranking Algorithm) Hanyang Univ. Distributed Computing Systems Laboratory Java C++ Ruby Perl Applet Progra mming Indone sia JSP Hadoop Web- Dev
16
Compare with Google’s Wonder Wheel Hanyang Univ. Distributed Computing Systems Laboratory Java C++ Ruby Perl Applet Progr ammi ng Indon esia JSP Hadoo p Web- Dev Ctrl Button Change Color 쿼리의 확장이 제한적
17
‘Google ‘ 과 같은 쿼리는 none of personalization 가 이익 Sports fan ‘Office’ ( MS Office ) irrelevant result could erroneously be move to the front and the user may become confused. Some queries for some users great improvement but it can also be unnecessary and even harmful. ( A Large-scale Evaluation and Analysis of Personalized Search Strategies WWW 2007 ) Personalization ( Far from Optimal ) Hanyang Univ. Distributed Computing Systems Laboratory
18
Problem of Personalization ( Identity & Privacy ) Hanyang Univ. Distributed Computing Systems Laboratory
19
Problem of Personalization Hanyang Univ. Distributed Computing Systems Laboratory Server Side Social Annotation 분석 시 각 사이트 마다 ID 불일치의 문제 모든 Term 에 대한 유저의 선호도 중복 저장 로그인시만 확인 가능 유저가 자신의 정보가 서버에 저장되지 원하지 않을 경우 Performance 가 느리다.
20
Problem of Personalization Hanyang Univ. Distributed Computing Systems Laboratory Client Side Social Annotation 분석 시 각 사이트의 모든 정보를 통해 해당 Annotation 간의 관계를 분석해야 하는데, PC 의 성능, 용량등 여러 가지 문제점이 존재 Privacy 는 유지되지만, 유저의 데이터를 이용한 활용이 불가능 ( 맞춤형 광고 )
21
Problem of Personalization ( Performance ) Hanyang Univ. Distributed Computing Systems Laboratory
22
Keyword docID ranking docID summary Google Cluster Architecture Hanyang Univ. Distributed Computing Systems Laboratory Personal Server Cache Personal Server
23
Indexing Hanyang Univ. Distributed Computing Systems Laboratory Serve r ODP / WN Folksonomy Java C++ Ruby Perl Applet Progra mming Indone sia JSP Hadoop
24
Indexing Hanyang Univ. Distributed Computing Systems Laboratory Client Folksonomy e.g. Social Annotation (Flickr, Delicious, Blog,... ), User Click Stream, Bookmark, Desktop Data Folksonomy e.g. Social Annotation (Flickr, Delicious, Blog,... ), User Click Stream, Bookmark, Desktop Data User A’s Preference Indexing
25
Hanyang Univ. Distributed Computing Systems Laboratory Client Serve r User’s Term Network Delicious Data Set (User, URL, Tag, Time) Delicious Data Set (User, URL, Tag, Time) ( Del.icio.us, java.sun.com,, 2009-07-09 )
26
Retrieval Hanyang Univ. Distributed Computing Systems Laboratory Serve r Client User Preference 가 없는 쿼리면, 일반적인 Term Network 를 전송. 클라이언트에서 유저의 Preference 확인 및 랭킹을 통해 보여줌 ( Sun ) User Preference 가 없는 쿼리면, 일반적인 Term Network 를 전송. 클라이언트에서 유저의 Preference 확인 및 랭킹을 통해 보여줌 ( Sun ) HTTP Header : 검색 결과는 일반적인 Java 유저에 맞는 Term Network
27
Indexing Hanyang Univ. Distributed Computing Systems Laboratory Java C++ Ruby Perl Applet Programmi ng Indonesia JSP Hadoop Serve r Indexing 을 통하여 새로운 Term 에 대하여 일정기간 마다 추가 Query Log
28
Architecture Hanyang Univ. Distributed Computing Systems Laboratory WWW, SIGIR, KDD, CIKM … 2010 ( Web Service-based architecture and applications, Web service engineering, Web usability and accessibility, Adaptive and personalized Web application ) ( IR Platforms and Scalability e.g. IR architecture including distributed/P2P, efficiency, scalability, indexing, compression) ( Useful system architecture ) ( IR Architectures, Scalability and Efficiency ) ( 해당 분야 관련 논문의 실험 결과 체크 )
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.