 Fatemeh Lashkari UNB University May 7 th 2014. 2  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance.

Indexing
 Semantic Search
 Semantic Search Architecture
 Index process
 Index Maintenance

3 3  Inverted Index  Sort-based inversion  Single-pass in memory inversion  HYB Index  Prefix search  Autocompletion search  Expansion query and faceted search  Fast error tolerant search  Support ‘’select’’ and ‘’join’’ in database-style

5 5  Query: “astronauts walk on moon”

7 7 Indexing Query Process Answers of the question Ontology Text Collection

9 9  Preprocessing  Stemming  Lower case General Motors general motors  Remove some of stop words e.g is, do, a, of,..  Annotation text  Annotators  Machine learning approaches

11 11  The fast and efficient index does not  need the whole vocabulary of the indexed collection in main memory  need to sort postings  need merge postings cache efficiently

13 13  How many index do we need? Index for relation Index for text  What is the structure of vocabulary?  What is the structure of posting?  What are statistic information that a posting contains? e.g apple:

14 14  How to compute score to improve the final result?  How to save index? Distribute index Process query parallel  Which methods of compression can be used?

16 16  Strategies for maintaining index:  Merge-based (remerge)  In-place  Hybrid index update operation  Geometric partitioning

17 17 Thank You

18 18 1] Bast, Hannah, and Marjan Celikik. "Fast construction of the HYB index." ACM Transactions on Information Systems (TOIS) 29.3 (2011): 16. 2] Bast, Holger, and Ingmar Weber. "Type less, find more: fast autocompletion search with a succinct index." Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2006 [3]Celikik, Marjan, and Hannah Bast. "Fast single-pass construction of a half-inverted index." String Processing and Information Retrieval. Springer Berlin Heidelberg, 2009. [4] Heinz, S., Zobel, J.: Efficient single-pass index construction for text databases. Jour. of the American Society for Information Science and Technology (2003) [5]Celikik, Marjan, and Holger Bast. "Fast error-tolerant search on very large texts." Proceedings of the 2009 ACM symposium on Applied Computing. ACM, 2009. [6] Bast, Holger, Debapriyo Majumdar, and Ingmar Weber. "Efficient interactive query expansion with complete search." Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM, 2007.

19 19 [7] Bast, Hannah, et al. "A case for semantic full-text search." Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search. ACM, 2012. [8] Bast, Holger, et al. "ESTER: efficient search on text, entities, and relations." Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 2007. [9]Bast, Holger, Fabian Suchanek, and Ingmar Weber. "Semantic Full-Text Search with ESTER: Scalable, Easy, Fast." Data Mining Workshops, 2008. ICDMW'08. IEEE International Conference on. IEEE, 2008. [10] Bast, Hannah, et al. "Broccoli: Semantic full-text search at your fingertips." arXiv preprint arXiv:1207.2615 (2012). [11] Bast, Hannah, and Elmar Haussmann. "Open information extraction via contextual sentence decomposition." Semantic Computing (ICSC), 2013 IEEE Seventh International Conference on. IEEE, 2013. [12] Cheng, Tao, and Kevin Chen-Chuan Chang. "Beyond pages: supporting efficient, scalable entity search with dual-inversion index." Proceedings of the 13th International Conference on Extending Database Technology. ACM, 2010.

