Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding Out About II Lecture Notes Prepared by Jagdish S. Gangolly

Similar presentations


Presentation on theme: "Finding Out About II Lecture Notes Prepared by Jagdish S. Gangolly"— Presentation transcript:

1 Finding Out About II Lecture Notes Prepared by Jagdish S. Gangolly
Ph.D Program in Information Science State University of New York at Albany 11/21/2018 Inf703 Information Organisation (Fall, 2003) Gangolly

2 Interdocument Parsing I
Corpus (broken into documents) Directory structure Filtering to remove tags Lexical analysis (tokenising) The algorithm (p.52) Stemming, morphological processing) Removal of stopwords Representation of frequencies in splay trees 11/21/2018 Inf703 Information Organisation (Fall, 2003) Gangolly

3 Interdocument Parsing I
Document length normalisation Refined Postings data structures (p.54) STAIRS Posting (p.56) 11/21/2018 Inf703 Information Organisation (Fall, 2003) Gangolly

4 Descriptive Statistics: An Example: The Graph
11/21/2018 Inf703 Information Organisation (Fall, 2003) Gangolly

5 Inf703 Information Organisation (Fall, 2003) Gangolly
Weighting I Zipfian distribution Principle of least effort / vocabulary balance Mandelbrot: 1/ a measure of richness of vocabulary Simon: Introduction of new terms as a birth process Genetic code sequences as linguistic objects Huberman study of surfing behaviours and Zipfian distribution Word occurrence as a Poisson process: Identification of stopwords 11/21/2018 Inf703 Information Organisation (Fall, 2003) Gangolly

6 Inf703 Information Organisation (Fall, 2003) Gangolly
Weighting II Resolving Power and Luhn’s work Specificity/exhaustivity trade-offs (p.78, Fig.3.4) 11/21/2018 Inf703 Information Organisation (Fall, 2003) Gangolly


Download ppt "Finding Out About II Lecture Notes Prepared by Jagdish S. Gangolly"

Similar presentations


Ads by Google