Download presentation
Presentation is loading. Please wait.
Published byΦοῖνιξ Μπουκουβαλαίοι Modified over 6 years ago
1
Finding Out About II Lecture Notes Prepared by Jagdish S. Gangolly
Ph.D Program in Information Science State University of New York at Albany 11/21/2018 Inf703 Information Organisation (Fall, 2003) Gangolly
2
Interdocument Parsing I
Corpus (broken into documents) Directory structure Filtering to remove tags Lexical analysis (tokenising) The algorithm (p.52) Stemming, morphological processing) Removal of stopwords Representation of frequencies in splay trees 11/21/2018 Inf703 Information Organisation (Fall, 2003) Gangolly
3
Interdocument Parsing I
Document length normalisation Refined Postings data structures (p.54) STAIRS Posting (p.56) 11/21/2018 Inf703 Information Organisation (Fall, 2003) Gangolly
4
Descriptive Statistics: An Example: The Graph
11/21/2018 Inf703 Information Organisation (Fall, 2003) Gangolly
5
Inf703 Information Organisation (Fall, 2003) Gangolly
Weighting I Zipfian distribution Principle of least effort / vocabulary balance Mandelbrot: 1/ a measure of richness of vocabulary Simon: Introduction of new terms as a birth process Genetic code sequences as linguistic objects Huberman study of surfing behaviours and Zipfian distribution Word occurrence as a Poisson process: Identification of stopwords 11/21/2018 Inf703 Information Organisation (Fall, 2003) Gangolly
6
Inf703 Information Organisation (Fall, 2003) Gangolly
Weighting II Resolving Power and Luhn’s work Specificity/exhaustivity trade-offs (p.78, Fig.3.4) 11/21/2018 Inf703 Information Organisation (Fall, 2003) Gangolly
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.