Download presentation
Presentation is loading. Please wait.
1
Project Description 2 Inverted List Database
2
Create an Inverted File Tokenize a text document, and attach to each token a list of locations that this token has appeared Sort and Store these result in Oracle database
3
Tokenizer –Admissible symbols for token; we will not user delimiter to capture the token. –Keep a record of the position of each token
4
Tokenizer Example: Document1: He is a dumb teacher Dumb! Dumb! and Dumb! Document2:“He is a great council. His advices are really great. He truly helps.
5
Tokenizer Inverted File for document 1: -continue: dumb 4 Dumb 6 Dumb 8 Dumb 11 He 1 is 2 teacher 5
6
Tokenizer - Example: Inverted File for document 1: ! 12 ! 7 ! 9 a 3 and 10
7
Tokenizer Inverted File for document 1 ! 7, 9, 12 a 3 and 10 Dumb 4, 6, 8, 11 He 1 is 2 teacher 5
8
Tokenizer Inverted File for document 2 : (period). 6, 12 a 3 advices 8 are 9 council 5 great 4, 11 He 1. 7 is 2 really 10
9
Token database Store the token into database First Column is sorted tokens Second Column is the Document Name/NO Rest of a tuple keeps locations of the token This is the so called inverted list –(option) Compressed the sequence of locations into some new data type.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.