INFORMATION RETRIEVAL VECTOR SPACE MODEL IN-DEPTH PART 2 Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.

Slides:



Advertisements
Similar presentations
Basic IR: Modeling Basic IR Task: Slightly more complex:
Advertisements

INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
Pemrosesan Teks Pendahuluan. Buku referensi [1]Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze Introduction to Information.
Web Search and Mining Course Overview 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 0: Course Overview.
CMU SCS : Multimedia Databases and Data Mining Lecture #16: Text - part III: Vector space model and clustering C. Faloutsos.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
9/18/2001Information Organization and Retrieval Vector Representation, Term Weights and Clustering (continued) Ray Larson & Warren Sack University of California,
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Modeling Modern Information Retrieval
Modern Information Retrieval Chapter 5 Query Operations.
Information Retrieval - Organization of the course Jian-Yun Nie 聂建云.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
The Vector Space Model …and applications in Information Retrieval.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
Automatic Indexing (Term Selection) Automatic Text Processing by G. Salton, Chap 9, Addison-Wesley, 1989.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Searching Full Text 3.
CSC482 INTRODUCTION TO TEXT ANALYTICS COURSE INTRODUCTION: PART ONE Thomas Tiahrt, MA, PhD.
PROBABILITY REVIEW PART 9 CONDITIONAL PROBABILITY II Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
INFORMATION THEORY BAYESIAN STATISTICS I Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
INFORMATION THEORY BAYESIAN STATISTICS II Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
PROBABILITY REVIEW PART 4 PROBABILITY FOR TEXT ANALYTICS Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
PROBABILITY REVIEW PART 5 PROBABILITY FOR TEXT ANALYTICS Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
INFORMATION RETRIEVAL VECTOR SPACE MODEL IN-DEPTH PART 3 Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
COURSE OVERVIEW ADVANCED TEXT ANALYTICS Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
INFORMATION RETRIEVAL LINEAR ALGEBRA REVIEW Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
TEXT CATEGORIZATION THE FEDERALIST – PART 3 Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
INFORMATION RETRIEVAL VECTOR SPACE MODEL IN-DEPTH PART 1 Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
INFORMATION RETRIEVAL VECTOR SPACE MODEL IN-DEPTH PART 5 Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
Donghui Xu Spring 2011, COMS E6125 Prof. Gail Kaiser.
1 Web Search and Advanced Internet Services 290N Class Introduction Tao Yang, 2014.
INFORMATION THEORY CONDITIONAL ENTROPY Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
INFORMATION THEORY SIMPLIFIED POLYNESIAN LANGUAGE EXAMPLE Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
1 Information Retrieval and Advanced Internet Services 290N Class Introduction Tao Yang, 2015
PROBABILITY REVIEW PART 2 PROBABILITY FOR TEXT ANALYTICS Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32-33: Information Retrieval: Basic concepts and Model.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 11.9 Curvature and Normal Vectors.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 11.5 Lines and Curves in Space.
© Cambridge University Press 2013 Thomson_alphaem.
© Cambridge University Press 2013 Thomson_Fig
Search Engine and SEO Presented by Yanni Li. Various Components of Search Engine.
Vector Space Models.
INFORMATION THEORY POLYNESIAN REVISITED Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics.
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Searching Full Text 3.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Vector Space Classification 1.Vector space text classification 2.Rochhio Text Classification.
© Cambridge University Press 2013 Thomson_Fig
Information Retrieval and Web Search IR models: Vector Space Model Term Weighting Approaches Instructor: Rada Mihalcea.
IR 6 Scoring, term weighting and the vector space model.
Text Indexing and Search
© Cambridge University Press 2011
Vector Space Model Seminar Social Media Mining University UC3M
15-826: Multimedia Databases and Data Mining
Thomson_eeWWtgc © Cambridge University Press 2013.
Thomson_atlascmsEventsAlt
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Information Retrieval Systems
Thomson_CandP © Cambridge University Press 2013.
Boolean and Vector Space Retrieval Models
SVMs for Document Ranking
Thomson_AFBCartoon © Cambridge University Press 2013.
CS 430: Information Discovery
VECTOR SPACE MODEL Its Applications and implementations
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

INFORMATION RETRIEVAL VECTOR SPACE MODEL IN-DEPTH PART 2 Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

Inverse Document Frequency (IDF) 2

Inverse Document Frequency 3

4

Document/Term Matrix 5

Weight Factor Computation 6

VSM Pros and Cons 7  Benefits  Documents can be ordered by importance  Threshold display limits are easy to honor  Documents similar to the query retrieved early can be used for relevance feedback  Drawbacks  Orthogonal terms assumption is false  Some vector operations have no theoretical justification

References 8 Sources: Introduction to Information Retrieval by Christopher Manning, Prabhakar Raghavan and Hinrich Schütze, The Cambridge University Press Automatic Text Processing Gerard Salton, Addison-Wesley Publishing.

The end of the second in-depth description of the vector space model slide show has come. End of the Slides 9