Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFO 344 Web Tools And Development CK Wang University of Washington Spring 2014.

Similar presentations


Presentation on theme: "INFO 344 Web Tools And Development CK Wang University of Washington Spring 2014."— Presentation transcript:

1 INFO 344 Web Tools And Development CK Wang University of Washington Spring 2014

2 Announcements PA4 due Monday!!! Comments for Eric & Jeremy on Canvas! PA3 done grading. – Good job Stanley!

3 PA4 stuff Sports illustrated xmls => don’t have html! I’ll give +3 pts extra credit if you still crawl them even if they end in http://*/ +5 extra credit for hybrid trie/list to fit everything into memory (does not double count from PA2, if you fit everything in PA2 = no EC for this)

4 PA4 stuff What should partition key & row key be? How does a cache work? – int GetResults(input) – A function has inputs & outputs. In between = processing. Cache = maps inputs to outputs and skips the processing! (dictionary!)

5 Scalability Theme throughout the quarter

6 Add more hardware

7 AWS – Spin up new instance – Setup (script) – Add more to Load Balancer Static & Manual Gradual increase in traffic Try it!

8 Add more hardware AWS EC2 API – Spin up new instance – Via PHP Code – With AMI Dynamic & Automatic! Great for spikey traffic http://blogs.aws.amazon.com/php/post/TxMLFLE50W UAMR/Provision-an-Amazon-EC2-Instance-with-PHP AWS SDK = http://aws.amazon.com/sdkforphp/http://aws.amazon.com/sdkforphp/ Download AWS SDK Terminate after you see it works! Try it!

9 Add more hardware Azure – VS cloud config file – Cloud/Local each have a different file. Make sure you change the right one! – Try running 5 worker roles in PA3 locally! – Take a look at Compute Emulator – Pretty cool!!! – VS integration = nice Don’t have PA3 = join another group Try it!

10 Add more hardware Azure – Scale on dashboard – New feature – Drag and increase! – Super easy to scale! Try it!

11 Data vs. Compute

12 Remember Easier to find/read Easier to add/remove layers 3-tier architecture

13 Anatomy QuerySuggest Web Role Search.aspx Dashboard.aspx Admin.asmx Azure Blob QuerySuggest Azure Queue URLs to Crawl Azure Table Web Index Red = Storage Blue = Compute Worker Role Crawler User query suggestions URLs word, URLs AWS RDS Structured Data (NBA stats) Wiki dataset query stats This is basically how Google works! query Azure Table Ranking Azure Blob User Logs

14 Simplified Compute vs. Data – Scale compute = easy – Scale read-only data = easy – Scale read & write data = very difficult Scale data (read & write) with consistency is very very hard. Compute Web Role Storage Read & Write Table/SQL Storage Read S3/Blob Compute Worker Role

15 Scale read & write Data Sharding – Horizontal Partition – Smaller index size (faster) Same as how NoSQL scales – Azure Table, Partition key Row key Partition Key = 0 Partition Key = 1

16 SQL vs. NoSQL

17 A lot of buzz around NoSQL If you do SQL w/ sharding and don’t do join’s – Use MySQL in a “NoSQL way” – Just as scalable as NoSQL! – Facebook uses MySQL, possible to scale! But Azure Table / AWS DynamoDB works just fine (like in PA3) I prefer MySQL + sharding + “NoSQL way” easier to add new indexes In Azure Table/DynamoDB => what if you have Books Product, want to access it via category AND via name of book? In Azure Table => multiple tables where value = key, and 1 more table for actual books?

18 Exploiting data freshness & caching

19 Data Freshness/Caching Depends on Application – If freshness requirement is low => cache @ front-end web roles! – Great for Search Results Lots of computation If not fresh for 5min = that’s ok! (ex: body text in search result = 1 hr delay = that’s ok!) – Great for a lot of applications! – NOT financial transactions… – Ok for newsfeed? Kooapps Games Leaderboard – Cache but mix with client user data to “hide” latency/freshness Client-side caching too! Compute Web Role Storage Read & Write Table/SQL Storage Read S3/Blob Compute Worker Role Cache Storage = cheap. Compute = expensive

20 How to make a Cache Cache = hash table – Key = input – Value = output Try this! On PA2 (slow version) – Download http://uwinfo344.chunkaiw.com/files/cache-slow-PA2.zip http://uwinfo344.chunkaiw.com/files/cache-slow-PA2.zip – Change data.txt to your wiki dataset (> 50mb) – Implement the cache layer! Try it!

21 Computing online vs. offline Anything that takes longer than 5s should be offline…

22 Online vs. Offline Online = user facing, user blocking (WebRole) Offline = async, non- blocking (WorkerRole) PA3 = perfect example – Scale PA3 WebRole = easy – Scale PA3 WorkerRole = hard Compute Web Role Storage Read & Write Table/SQL Storage Read S3/Blob Compute Worker Role

23 Scaling PA3 URL Queue => Easy = everyone grabs and puts to URL Queue Index Table => Easy = everyone writes URL/page_title to Index on Table Dashboard => Harder, search worker writes to 1 row in dashboard table, web role reads all data and combines them. URL Queue PA3 Worker Role ID = 0 PA3 Worker Role ID = 1 PA3 Worker Role ID = 2 Index Table Dashboard Table

24 Client-side vs. Server-side Push computation to client-side to make system more scalable!

25 Client-side vs. Server-side Computations on client-side – Very scalable and it’s free! – Javascript, desktop apps or mobile apps Depends on the application, examples – Game logic @ Kooapps (easier to cheat but that’s ok if it’s not an MMO) – Sorting leaderboard scores for friends – Parsing config data PPStream/PPTV/Funshion – Netflix of China – P2P Videos! Don’t have to pay server bills like Netflix does. Genius!

26 Questions?


Download ppt "INFO 344 Web Tools And Development CK Wang University of Washington Spring 2014."

Similar presentations


Ads by Google