Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2006 Nielsen BuzzMetrics, A VNU business affiliate Natalie Glance Senior Research Scientist Nielsen BuzzMetrics.

Similar presentations


Presentation on theme: "© 2006 Nielsen BuzzMetrics, A VNU business affiliate Natalie Glance Senior Research Scientist Nielsen BuzzMetrics."— Presentation transcript:

1 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Natalie Glance Senior Research Scientist Nielsen BuzzMetrics

2 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Background  Nielsen BuzzMetrics aggregates consumer opinion expressed in message boards, weblogs, Usenet and other online discussions  Parent company behind BlogPulse, blog search and analytics website

3 © 2006 Nielsen BuzzMetrics, A VNU business affiliate What drives weblog spam?  Same goal as any other website spam: SEO  Weblog hosts provide:  Free hosting for link farms to promote affiliate sites  Free hosting for web pages with sponsored ads  Types of weblog spam  spam blogs – (pollute ping servers)  spam comments on legitimate blogs  spam trackback pings to legitimate blogs

4 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Collateral damage: blog search result contamination  Search results for ‘mortgage’ :

5 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Collateral damage: trend graphs  Explain the peaks: are they real or artifacts of spam?

6 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Collateral damage: real-time monitoring  Spikes in keyword clusters  2006/07/28 10:39 a.m. {deleted myspace account}  2006/07/27 10:55 a.m. {landis tested yesterday}  2006/08/07 3:22 a.m. {investing debt directory}  2006/08/07 6:54 a.m. {adsense cents makers}  2006/08/07 1:11 p.m. {wwdc keynote}  Breaking news or spam attack?

7 © 2006 Nielsen BuzzMetrics, A VNU business affiliate Spam filtering challenges  Different analytics, different trade-offs  weblog search requirements: high coverage, clean results, minimize false positives  trend search: high precision to eliminate spurious artifacts  real-time monitoring: high coverage w/human oversight  Different timeframes, different approaches  real-time search: highly efficient classification algorithms; automated identification of spam attacks  historic search: offline spam identification can use combination of approaches; sandbox for new weblogs


Download ppt "© 2006 Nielsen BuzzMetrics, A VNU business affiliate Natalie Glance Senior Research Scientist Nielsen BuzzMetrics."

Similar presentations


Ads by Google