Blog Data Analysis S. Muthukrishnan, CS Rutgers & DIMACS Graham Cormode, DIMACS.

Slides:



Advertisements
Similar presentations
On-line media tools for strategic communications purposes When using media tools for communication we try to use the latest technologies such us blogging,
Advertisements

To blog or not to blog? Sue Manuel
Using Blogs, MySpace, and Facebook to Reach Your Students Vincci Kwong & Julie Elliott Franklin D. Schurz Library, Indiana University South Bend Librarian’s.
Business Blogging. Reaching Your Customers Educate customers about products Easy user-friendly interface New content displayed at the top Search Engine.
Free Blogging Presented by Derek Southern (Branch 8) To SIR Area 2 Computer & Technology Group 18 September 2008.
“Introduction to Blogging” Article by: WordPress Presented by: Elizabeth Kuhn Presented by: Elizabeth Kuhn ENGL /09/07.
Creating and Managing RSS Feeds Kate Pitcher SUNY Geneseo © 2005
What is RSS? Kate Pitcher ©
CSC 101 Slide Show Ashley Carroll. Podcast What is Podcasting? Podcasting is the distribution of audio or video files, such as radio programs or music.
Mark Frydenberg Computer Information Systems Department.
By: Wordpress.org Present by: Bora Hong Introduction to Blogging.
Web 2.0: Concepts and Applications 3 Syndicating Content.
Overview of Search Engines
Don Westover Director of Instructional Design Mount Wachusett Community College.
Bloglines: LISD Brown Bag Webinar, February 23, 2010.
 What is a “blog?” Short for “web log” An online journal.  Allows for interaction between the writer and the readers through comments Includes articles,
Using Blogs for Legal Research and Practice Development Raizel Liebler Reference Librarian John Marshall Law Library.
Blogs. Short for Weblog Blogs are simple web pages often made up short, informal and frequently updated posts.
LayoutLayout Manage your blog! Baharstudio.blogspot.com Baharstudio.50webs.com.
Free e-Sources for English Language Teachers by Wallace Barboza Carolina TESOL December 6th, 2008 Charleston, SC.
WordPress Web. WordPress Blogging system with full content management Personal publishing system Built on PHP scripting language and MySQL relational.
1 The Gateway to Information: Simplifying Access to Library Resources Fred Roecker Head Instruction The Ohio State University Libraries
Workshop on e-learning tools Research Assistant Ian Semey Research Assistant Thomas Christiansen E-Learning Lab, Aalborg University.
Liblogarian Annual LIS symposium 2008 Annette le Roux.
© 2005 Stephan M Spencer Netconcepts RSS, Blogs and Search Marketing: Leveraging the Power of RSS.
Adventures in Radio UserLand Lincoln Cushing, UC Berkeley Institute of Industrial Relations Library.
Kelly rowland WHAT WE ALL NEED!!. hoppadon formly of village deuce mafia...the hottest rap don spitting!!
RSS Feeds What, Why, & How… …without a CMS Don Parsons
Blogging Information Technology and Social Life April 11, 2005.
Community Building Through Your Web Site: Library Blogs and RSS Feeds Michael Stephens Dominican University Tame the Web.
Web 1.0 vs. Web 2.0 Shift from the read to the write web!
Copyright ©: SAMSUNG & Samsung Hope for Youth. All rights reserved Tutorials The internet: Blogging Suitable for: Advanced.
Searching the “New” Web: Blogs & RSS ORALL Annual Meeting October 13, 2005 Presented by Bonnie Shucha UW Law Library
Introduction to Blogs as an Information Resource Kevin Reiss Rutgers School of Law- Library
How to View Multiple Blogs at One Time RSS Feeds.
The What, Why, and How of BLOGS. What is a BLOG? A personal website or web page on which an individual records opinions, links to other sites, etc. on.
Using an RSS Feed Aggregator An Introduction Prepared by Liz Rodrigues.
RSS Basics and Beyond RSS Basics and Beyond Tips and Tricks for Getting the Most out of Syndicated Content.
World English Dictionary Web 2.0 —n the Internet viewed as a medium in which interactive experience, in the form of blogs, wikis, forums, etc, plays.
Blogs: Bizarre and Brilliant Lib Catawba Spring 2004 Susan Dudley.
Emily Puleston. Wordpress is a free blogging website It is the #1 Content Management System site today First released in May, 2003 Has been downloaded.
Publishing independent authors from around the world…
Combating Information Overload with RSS Feeds Meghan Sitar Instruction and Outreach Librarian Library Instruction Services University of Texas Libraries.
1 Emerging Technology Using RSS RSS and syndication By Steve Sloan RSS and syndication By Steve Sloan.
Internet Media’s Alton Campbell CSC 101 March 30, 2006.
Using RSS to Promote Scholarly Publications Ken Varnum Associate Librarian Edwin Ginn Library The Fletcher School Tufts University Cool Tools and New Technologies.
Social Computing Social networking, Social software.
Music Exchange 2010 Creating a cost effective web presence Utilizing free tools and platforms to get your message and music out there. Brian Currin
IBM Lotus Software © 2006 IBM Corporation IBM Lotus Notes Domino Blog Template Steve Castledine.
Using Web 2.0 Technologies to Create Classroom Websites: Session 3.
Social Software. Enables people to connect or collaborate through computer- mediated communication and to form online communities People form online communities.
Blogging. Website and blog A website, also written as web site,or simply site, is a set of related web pages typically served from a single web domain.
Blogs and RSS Siobhan Champ- Blackwell. Definitions  Blog – Web Log; an online journal; A web page with periodic posts in reverse chronologic order 
RSS Interfaces and Standards Chander Iyer. Really Simple Syndication (RSS) Web data format providing users with frequently updated content. Make a collection.
Kendra Hunter & Charde Johnson EDUC Dr. M. Kariuki.
Podcasts. (derived from Apple's "iPod" and "broadcasting“) a method of publishing audio files to the internet, allowing users to subscribe to a feed and.
+ “Introduction to Blogging” Katelyn Jacobsen By WordPress.org.
LIBRARY 2.0 Cleveland State University Library July 10, 2008.
■ A blog originally was a personal website meant to be like a diary or journal. ■ Basically a type of website, like a forum or a social bookmarking site.
Three Internet Medias Podcast, Blogs, Wiki Jasmine Sampson CSC101.
CREATE, IMPLEMENT AND ENJOY! Blogs,Wikis & RSS Readers.
Mashups for the Nontechies: Yahoo! Pipes Jody Condit Fagan Digital Services Librarian James Madison University
LIBRARY BLOGS IN THEORY AND PRACTICE Helen Nneka Okpala [ Presentation done at University of Abuja Library Staff Training, 3 rd May.
Blogs and Blogging: creating your own Weblog with blogger >>>> By Helen Nneka Eke Blogs and Blogging: creating your own Weblog with Blogger By Helen Nneka.
Creating and maintaining a blog - and - the Archives Hub experience Jane Stevenson & Amanda Hill.
Types and purposes of online communities. Types of websites within online communities blogs chat rooms forums social networking wikis.
Blogging What, Why and How? Ask the Experts Online 17th July, 2007
What is a Blog? short for Weblog journal on a website
User Information Architecture: Blogs, Wikis, and RSS
Using a Blog.
Presentation transcript:

Blog Data Analysis S. Muthukrishnan, CS Rutgers & DIMACS Graham Cormode, DIMACS

Weblog data "Weblogs" blanket term for regularly updated on-line journals Usually informal, opinionated, candid: more like than web Many millions of "blogs" created with free tools and websites Published as web pages

RSS XML structured document, representation of the blog More structured than HTML, indicated title, timestamp, permalink, content etc. of posts Enables easy checking of updates to blogs But… may not contain whole content Not all blogs RSS feed available? Given a blog, how to find accompanying RSS feed automatically?

Different blog systems Hosted Blogs Blogger / Blogspot (owned by google) Livejournal Myspace (owned by NewsCorp) Xanga? Others… Blog management systems TypePad WordPress MovableType

Blogging Ecosystem RSS readers Bloglines Google reader / yahoo blog reader? Blog metadata Blogcensus.net Blogpulse Technorati Others…

Collection and Analysis Automatically collect blogs, strip formatting and tags, ads etc. Output "bag of words" into streaming algorithms for analysis, archival. So far: 900,000 blogs, 10GB compressed. Scale to 100s of GBs To Do: extract more meta-data (time of posting, title, links etc.), per-blog analysis, retroactive analysis... Preetham Mysore, Claudio Tancioni What I Want For WHAT Joel Spolsky satisfyingly nails a bunch of ways to improve client-side web app development which the WHAT Working Group should work on. All his suggestions are excellent and well worth looking over, even if some seem to require the same "boiling the ocean" that he doesn't want to hear about. That said, most of his list could probably be done with a good set of Javascript libraries, along the lines of Dean Edwards's IE7, and his #2 (fast REST queries back to the server in JS) is pretty much - well, almost - with us already, looking at combinations of things like XMLHttpRequest and mod_pubsub. But anyway, he ended his piece with a call for more suggestions that he could link to. I've been doing a little bit of browser app development in the last few days, and these are the things that spring most readily to mind:

Early Results Began building systems June Extracted most common terms on 1GB using streaming analysis New blog stopwords Multilingual Non-standard word distribution: "love" vs "war" used only ~ 26KB. 3000:1 compaction

Blog Statistics Top Weblog Hosts 1. blogspot (418803) 2. livejournal (342265) 3. xanga (187021) 4. diaryland (71649) 5. persianblog (59645) Blog Languages 1. English (70%) 2. Portuguese (4.5%) 3. Farsi (3.2%) 4. Polish (2.8%) 5. French (1.8%) 6. Spanish (1.1%) 7. German (1.0%) 8. Chinese (0.7%) 9. Italian (0.5%) 10. Dutch (0.4%) Top Blog Nouns Current rank (last month) 6. (8) love 29. (37) school 31. (45) friends 34. (46) music 41. (59) fun 58. (60) god 65. (89) happy 79. (47) news 146. (175) movie 156. (88) war 166. (167) money 168. (142) book 171. (183) family 173. (190) car 186. (211) mom 234. (118) bush 253. (172) iraq

Homework 1 page or more write up for each item: Survey different blogging sites, blog formats/templates and blog data collection mechanisms. Survey RSS feed mechanism for blog data List methods for “reverse” links for a given blog – how to find who links to a blog? How to estimate the number of blogs in the world, and the number of blogs not hosted by well-known blogging sites (LJ, blogger etc.)?