Term Project #2 Data Management on a Cloud (Azure)
Input Dataset Social graph –Format USER \t FOLLOWER \n Both are numeric IDs (integers). –Example Users 13, 14 and 15 are followers of user 12. User 17 is a follower of user 16. –Provided as text files Restricted user profiles –About users who have > 10,000 followers –Schema twitter.profiles ( numeric_id int primary key, name varchar(20), screen_name varchar(16), friends_count int, followers_count int, following varchar(5), statuses_count int, favourites_count int, location varchar(40), description varchar(165), profile_image_url varchar(235), url varchar(100), created_at varchar(30), time_zone varchar(30), gender varchar(1), verified varchar(5), protected varchar(5) … ) –Stored in SQL Azure Server name: foqev3v3fp.database.windows.net Login: student Password: csed***$
Problem: Who has the largest number of mutual friends in Twitter? 1.Upload a local file (social graph) to Azure blob storage 2.Bulk-load Azure table 1)Read a blob 2)Parse following relationships 3)Store the relationships into Azure table 3.Find mutual friends 1)Read Azure table 2)Self-join the table 4.Count mutual friends for each user 5.Get the name of the user who has the largest number of mutual friends from SQL Azure Distribute and parallelize the workload !!!
Web Interface Screen shot
Upload to Azure blob storage Web RoleWorker Role … Storage upload
Upload to Azure blob storage Web Role _Default.UploadDataFileTo BlobStorageButton_Click(…)
Bulk-load Azure table Web RoleWorker Role … Storage bulk-load … userid followerid
Bulk-load Azure table Web Role _Default.LoadFollowerTabl eFromBlobButton_Click(…)
Find mutual friends Web RoleWorker Role Storage Find … … … … … … Self-join
Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke11 Parallel Hash Join v In first phase, partitions get distributed to different sites: –A good hash function automatically distributes work evenly! v Do second phase at each site. v Almost always the winner for equi-join. Original Relations (R then S) OUTPUT 2 B main memory buffers Disk INPUT 1 hash function h B-1 Partitions 1 2 B-1... Phase 1 Textbook Chapter 22 p Textbook Chapter 22 p
Database Management Systems, 2 nd Edition. Raghu Ramakrishnan and Johannes Gehrke12 Dataflow Network for || Join v Good use of split/merge makes it easier to build parallel versions of sequential join code.
Find mutual friends Web Role _Default.FindMutualFr iendsButton_Click(…) Web Role ToDo.FindMutualFriends(req uestQueue,responseQueue) Worker Role WorkerRole.Run() Worker Role WorkerRole.Run() Worker Role WorkerRole.Run() 1:n
Count mutual friends for each user Web RoleWorker Role Storage … … … Count
Count mutual friends for each user Web RoleWorker Role Storage … … … 12 : 3 17 : 5 … userid : #friends 17 : 2 19 : 7 … 12 : 6 25 : 3 … 12 : 9 17 : 7 19 : 7 … Aggregate Summation
Count mutual friends for each user Web Role _Default.CountMutual FriendsButton_Click( …) Web Role ToDo.CountMutualFriends(re questQueue,responseQueue); Worker Role WorkerRole.Run() Worker Role WorkerRole.Run() Worker Role WorkerRole.Run() 1:n
Get the name of the user Web RoleWorker Role Storage 12 : 9 17 : 7 19 : 7 … SQL Azure SELECT name FROM profiles WHERE numeric_id = 247; Hyunsouk Get name
Get the name of the user Web Role _Default.GetNameOf PersonWhoHasTheLa rgestNumberOfFriend sButton_Click(…)
ServiceConfiguration.cscfg
References Windows Azure Platform Training Course – –Demos Hello Windows Azure Building and Deploying a Service Windows Azure using Blobs Demo Windows Azure Worker Role Demo - Using the Worker Role Windows Azure Using Queues Demo Windows Azure Using Table Storage Demo Preparing your SQL Azure Account Connecting to SQL Azure Azure Academic Pilot – –FREE 30-day pass (promo code: KKUMAR) Q&A –
Submission Instructions Make your team of 3-4 people Attachment –Compressed Windows Azure project file –Presentation file Implementation idea Experimental results –on Azure »Web page screen capture running –on your PC emulator »Performance with different number of worker roles Bonus: Other interesting problems with twitter data on Azure Due –To be announced
Demo Hello Windows Azure