Presentation is loading. Please wait.

Presentation is loading. Please wait.

Memeta: A Framework for Analytics on the Blogosphere Pranam Kolari, Tim Finin Partially supported by NSF award ITR-IIS-0326460 and ITR-IDM- 0219649 and.

Similar presentations


Presentation on theme: "Memeta: A Framework for Analytics on the Blogosphere Pranam Kolari, Tim Finin Partially supported by NSF award ITR-IIS-0326460 and ITR-IDM- 0219649 and."— Presentation transcript:

1 Memeta: A Framework for Analytics on the Blogosphere Pranam Kolari, Tim Finin Partially supported by NSF award ITR-IIS-0326460 and ITR-IDM- 0219649 and IBM http://ebiquity.umbc.edu What is memeta?  Our framework that puts research into real world use  Features blog identification and splog detection modules  Includes Language Identification Modules, for more than 10 languages (provided by James Mayfield)  memeta has been used on a need-to basis to analyze the blogosphere What is memeta?  Our framework that puts research into real world use  Features blog identification and splog detection modules  Includes Language Identification Modules, for more than 10 languages (provided by James Mayfield)  memeta has been used on a need-to basis to analyze the blogosphere 1. Welcome to the Splogosphere: 75% of pings are spings (splogs)  Monitored a ping server – weblogs.com over a period of 3 weeks from 20 Nov 2005 to 11 Dec 2005  Total of 16 million update pings  See 1 for ping distribution of URLs  Pings were first classified into languages  Blogs from Italian followed a predictable pattern – higher during the day  Blogs from the English languages follows a similar pattern – not as obvious as Italian  Splogs followed no pattern and number of pings were three times of authentic English blogs (2, 3) 1. Welcome to the Splogosphere: 75% of pings are spings (splogs)  Monitored a ping server – weblogs.com over a period of 3 weeks from 20 Nov 2005 to 11 Dec 2005  Total of 16 million update pings  See 1 for ping distribution of URLs  Pings were first classified into languages  Blogs from Italian followed a predictable pattern – higher during the day  Blogs from the English languages follows a similar pattern – not as obvious as Italian  Splogs followed no pattern and number of pings were three times of authentic English blogs (2, 3) 2. Characterizing the Splogosphere  Blogosphere dump for 21 days of July 2005  1.3 million total blogs  Blogs run through splog detector  Link distribution of blogs vs. splogs plotted on a log-log scale  Predictably only authentic blogs subscribe to a power-law (4, 5) 2. Characterizing the Splogosphere  Blogosphere dump for 21 days of July 2005  1.3 million total blogs  Blogs run through splog detector  Link distribution of blogs vs. splogs plotted on a log-log scale  Predictably only authentic blogs subscribe to a power-law (4, 5) Continuing Work  Inducing new features for splog detection  Language Independent and Adaptive Techniques for Splog Detection  Splog Taxonomy and Evaluation Metrics  Multi-Relational Local Models for Splog Detection  Tuning memeta to harvest blogs regularly Continuing Work  Inducing new features for splog detection  Language Independent and Adaptive Techniques for Splog Detection  Splog Taxonomy and Evaluation Metrics  Multi-Relational Local Models for Splog Detection  Tuning memeta to harvest blogs regularly Blogosphere Analytics Blog Directories Ping Servers Search Engines Blog Crawler Language Identifier Language Identifier Blog Identifier (98% Accuracy) Blog Identifier (98% Accuracy) Splog Detector (87% Accuracy) Splog Detector (87% Accuracy) BLOGS + Heuristics Language Identifiers Blog Identification Spam Blog Detectors IP Blacklists Authentic Blogs Spam Blogs Splog Detector Host Distribution of Pings at weblogs.com Nature of pinging URLs at weblogs.com 1 2 3 Ping time-series of Italian blogs over five days Ping time-series of Italian blogs on a single day Ping time-series of Authentic blogs on a single day Ping time-series of Spam blogs on a single day Ping time-series of Spam blogs over five days Ping time-series of Authentic blogs over five days 4 5 Only in-degree distribution of authentic blogs subscribe to a power law Only out-degree distribution of authentic blogs subscribe to a power law


Download ppt "Memeta: A Framework for Analytics on the Blogosphere Pranam Kolari, Tim Finin Partially supported by NSF award ITR-IIS-0326460 and ITR-IDM- 0219649 and."

Similar presentations


Ads by Google