Download presentation
Presentation is loading. Please wait.
Published byEmory Dickerson Modified over 9 years ago
1
ID Identification in Online Communities Yufei Pan Rutgers University
2
ID Identification Online communities are a large part of lives of people Interact with each other via different IDs. Who is that ID? –Text Identification: given a piece of text, could we identify the ID for it from known IDs? –ID Identification: given an ID with all its text evidences, could we identify it with any other known ID.
3
Text Identification vs Text Categorization Text categorization –Well-known area –Categorize the type based on the content –Treats the text as a bag of words Text Identification –Identify the ID who produces the text –The similarity of content wouldn’t help –Find out the constant features, independent of text content
4
Approach for Text Identification Stylometric Features –The style features of an author with his/her known text. –Rudman(1997) Steps : –Firstly, we would extract some kind of stylometric features. –Secondly, we would choose some kind of machine learning algorithms. –Finally, we conduct experiments to get the good results
5
Text Identification VS ID Identification Same? –No. Depends on the consistency of the stylometric features over different IDs. – What if the entity controls the text styles for each ID intentionally ? –Or he/she unconsciously changes the text behavior to match the expected behavior of ID ?
6
Style Variation Pattern Observation –An entity would demonstrate a certain style variation over changed environment –The variation may contain invariant pattern for each ID of this entity It means: –Find the constant variation pattern for an entity, which is independent of the ID it uses. –Use this pattern to identify IDs.
7
Experiment Setup Input Data –2nd light forum (http://2ndlight.com/forum42ndlight/index.cfm). Stylometirc Features (56) (De vel, 2001 ) Machine learning algorithm –Support Vector Machine Average sentence length(number of words) Total number of function words/W Function word frequency distribution ……………….
8
Experiment Result Text Identification –TrainingTesting –Correctly Classified 63.6364%56.6038% –Incorrectly Classified36.3636%43.3962% –Kappa statistic 0.2542 0.1147 –Mean absolute error 0.4646 0.4731 –Root mean squared error 0.4742 0.4845 –Relative absolute error 93.1145 % 95.5107 % –Root relative squared error 94.9291 % 97.7028 % –Total Number of Instances 88 53
9
Experiment Result(cont’d) Variation Matrixes –VM[Floridave] 83.33 22.2218.18 2544.44 9.09 16.6611.1181.82 –VM[paddleout] 85.715040 42.86750 14.295060 –Eigenvalues: 109.2, 67.1, 33.3 1.4, 0.4 + 0.2i, 0.4 - 0.2i
10
The End Thank you !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.