Efficient Semantic Based Content Search in P2P Network Heng Tao Shen, Yan Feng Shu, and Bei Yu
Semantic-based Search Introduction The problem of semantic-based content search is in context of document retrieval The problem of semantic-based content search is in context of document retrieval Given a query which may be a phrase, a statement or a paragraph, we look for documents that are semantically close to the query Given a query which may be a phrase, a statement or a paragraph, we look for documents that are semantically close to the query Super-peer P2P is underlying architecture Super-peer P2P is underlying architecture
Super-peer P2P Architecture A peer(client) submits its query to the super- peer of its group. A peer(client) submits its query to the super- peer of its group. The super-peer then broadcasts the query to other peers within the group and its neighboring super peers The super-peer then broadcasts the query to other peers within the group and its neighboring super peers A neighboring super-peer broadcasts the query to its clients and to its neighboring super-peers A neighboring super-peer broadcasts the query to its clients and to its neighboring super-peers Until some criterion is satisfied. Until some criterion is satisfied.
Semantic-based Search To facilitate semantic-based content search, a novel indexing structure called Hierarchical Summary Indexing Structure is proposed To facilitate semantic-based content search, a novel indexing structure called Hierarchical Summary Indexing Structure is proposed A three tier hierarchical structure is used : A three tier hierarchical structure is used : Unit level Unit level Peer level Peer level Super peer level Super peer level
Semantic-based Search (1) Vector Space Model(VSM) and Latent Semantic Indexing(LSI) are used Vector Space Model(VSM) and Latent Semantic Indexing(LSI) are used Summaries are first represented as vectors, which are further optimized by LSI Techniques and represented as high dimensional points Summaries are first represented as vectors, which are further optimized by LSI Techniques and represented as high dimensional points Summaries enables easy maintenance and update of information, also reduces both storage and communication costs. Summaries enables easy maintenance and update of information, also reduces both storage and communication costs.
Semantic-based Search(2) Group index and global index are used to locate most relevant peers and peer groups quickly and local index speeds up the searching of most relevant documents Group index and global index are used to locate most relevant peers and peer groups quickly and local index speeds up the searching of most relevant documents
A Novel Hierarchical Summary Indexing Framework Super-peer P2P architecture is used. Super-peer P2P architecture is used. A super peer is a node that acts both as a server to a set of clients, and as an equal in a network of super peers A super peer is a node that acts both as a server to a set of clients, and as an equal in a network of super peers A peer group is formed by a super peer and its clients. A peer group is formed by a super peer and its clients.
Hierarchical Summary Indexing Structure Used for minimizing the overhead of super-peer broadcasts Used for minimizing the overhead of super-peer broadcasts In framework, 3 levels of summarization In framework, 3 levels of summarization Unit level : an information unit is summarized Unit level : an information unit is summarized Peer level : all info. owned by a peer summarized Peer level : all info. owned by a peer summarized Super level : all info. owned by a peer group summarized Super level : all info. owned by a peer group summarized
Hierarchical Summary Indexing Structure With the summary info.,queries only need to be forwarded to nodes that potentially contain the answers With the summary info.,queries only need to be forwarded to nodes that potentially contain the answers To improve the efficiency, indexes on the summary information is used To improve the efficiency, indexes on the summary information is used Unit level local index Unit level local index Peer level group index Peer level group index Super level global index Super level global index
Hierarchical Summary Indexing Structure By examining the super level summaries,a super peer can determine which peer group is relevant By examining the super level summaries,a super peer can determine which peer group is relevant By examining peer level summaries, a super peer can determine which of its peers have the answers By examining peer level summaries, a super peer can determine which of its peers have the answers In this framework, Information searching can become more guided In this framework, Information searching can become more guided A peer group is first decided A peer group is first decided Then a peer Then a peer Finally an information unit Finally an information unit
Query Processing There is NO broadcast activity during the whole query processing. This is one of the main achievements of the summary indices. There is NO broadcast activity during the whole query processing. This is one of the main achievements of the summary indices.
Summary & Conclusions The general and extensible hierarchical framework for summary building and indexing is proposed The general and extensible hierarchical framework for summary building and indexing is proposed A hierarchical summary indexing structures can be easily adopted and a such a system can achieve remarkable achievements. A hierarchical summary indexing structures can be easily adopted and a such a system can achieve remarkable achievements.