Word AdHoc Network: Using Google Core Distance to extract the most relevant information Presenter : Wei-Hao Huang Authors : Ping-I Chen, Shi-Jen Lin KBS 2010
Outlines Motivation Objectives Methodology Experiments Conclusions Comments
Motivation Most previous research methods need predictive models, which are based on the training data or Web log of the users’ browsing behaviors. Those are complexity and the keyword extraction methods are limited to certain areas.
Objectives To present a new algorithm called ‘‘Word AdHoc Network’’ (WANET). This method needs no pre-processing, and all the executions are real-time. To extract any keyword sequence from various knowledge domains. Document WANET System Relevant Documents
Methodology Word AdHoc Network System Architecture 1-gram filtering method Part-of-speech Length of the words Number of Google search results Google Core Distance Hop-by-Hop Routing algorithm PageRank algorithm BB’s graph-based clustering algorithm
WANET System Architecture
1-gram filtering method Part-of-speech NN (common noun, singular), NP (proper noun), DT (determiner), or JJ (adjectives) Length of the words At least 3 word Number of Google search results
Google Core Distance The original algorithm NGD The New algorithm GCD
Hop-by-Hop Routing Algorithm PageRank algorithm
Hop-by-Hop Routing Algorithm BB’s graph-based clustering algorithm BB score = 1 6
Hop-by-Hop Routing Algorithm
Experiments Time variance effect of the Google search results Execution time Precision and recall rate Top-k search results analysis Dataset: To select four knowledge domains from the Elsevier Web site, and to chose the top 25 most-downloaded papers in each journal.
Time variance effect of the Google search results To use spearman’s footrule to compare the sequences that were extracted by those two algorithm.
Execution time
Precision and recall rate
Top-k search results analysis
Conclusions To propos a new system that can extract the most important keyword sequence to represent a document To help users automatically find relevant documents or Web pages. Future work To hope it can used in a mobile device or an e-book.
Comments Advantages Applications To extract the most important keyword sequence. Applications Information retrieval