APPLYING INFORMATION RETRIEVAL TO TEXT MINING Data mining Lab 이아람
IR ( Information retrieval ) Returning relevant texts for query A measure of similarity is computed between the query and each document The similarity scores The vector space model
Counting Letters
Counting words
Counting Pronouns Occurring
heshe himher hisher hishers himselfherself
TEXT COUNT AND VECTOR
Vectors and Angles 두 Text 를 비교하기 위해 Angle 이용 Vector 를 이용하여 Angle 을 구한다. Angle 값이 0 에 가까울 수록 두 Text 는 유사함
Vectors and Angles Inner product Dot product
Vectors and Angles Vector length =
Computing Angles
cosθ = Angle of radians, which about 26.5º