Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fintan The Amazing Fish of Knowledge…

Similar presentations


Presentation on theme: "Fintan The Amazing Fish of Knowledge…"— Presentation transcript:

1 Fintan The Amazing Fish of Knowledge…
…filtering out the blogosphere so you don’t have to!

2 Overview Description Demo Pipeline Problems Future work Questions

3 What is Fintan? Provides a news aggregating service similar to Digg and Reddit based on blog entries. Presents topic-based clusters of entries. Algorithmically ranks clusters based on ranks of the entries and votes.

4 1: Retrieving data Spinn3r crawls >10M blogs on the web
Offers their data free for academic use Use their API to collect blog entries Marshall data into Hadoop formats Contributed code back to Spinn3r

5 2: Syntax Tree Clustering
O(n) nodes to suffixes O(n2) operations to corpus data Pipeline Several Tactics used: Get rid of useless nodes Eliminate stop words from prefixes Break trees apart by prefix and distribute

6 3: To ranked SQL Bridges the clustering and user interface
Determines algorithmic ranking Original idea: PageRank with voting Clusters scored based on entries Entries ranked by reputation and date MapReduce job to convert to SQL statements

7 4: User Interface Aim to keep it simple & intuitive Written in RoR
Tracking user actions User votes User comments Clickthroughs Cluster views Future: Personalization

8 Problems Quality of clusters Runtime of clusters Classification
Ranking

9 Future Work Real time updates Personalization Faster clustering
Blog reputation system

10 Questions?


Download ppt "Fintan The Amazing Fish of Knowledge…"

Similar presentations


Ads by Google