Download presentation
Presentation is loading. Please wait.
Published byJulius Wood Modified over 9 years ago
1
Document Indexing Document indexing is the process of associating or tagging documents with different “search” terms Content: 1.Index construction 2.Scaling index construction 3.Sort-based index construction 4.BSBI: Blocked sort-based Indexing
2
Index construction Steps: 1.Parse the documents and extract words. 2.Store extracted words with document-ID Doc 1 Doc 2 Sec. 4.2 THERE are growing signs that Hurricane Andrew, unwelcome as it was for the devastated inhabitants of Florida and Louisiana. HURRICANE Andrew, claimed to be the costliest natural disaster in US history, yesterday smashed its way through the state of Louisiana. Fig: Sample Indexing
3
Term Document Indexing Sec. 4.2
4
Scaling index construction Sec. 4.2
5
Sort-based index construction: some issues Sec. 4.2
6
BSBI: Blocked sort-based Indexing (Sorting with fewer disk seeks) Sec. 4.2
8
Pseudo Code: BSB- Index Construction
9
Analysis of BSBI
10
Applying Merge Sort Can do binary merges, with a merge tree of log 2 10 = 4 layers. During each layer, read into memory runs in blocks of 10M, merge, write back. Disk 1 34 2 2 1 4 3 Runs being merged. Merged run. Sec. 4.2
11
Some Issues with Merge Sort based Indexing Sec. 4.2
12
Reference Information Retrieval, 2008 Cambridge University Press.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.