Download presentation
Presentation is loading. Please wait.
Published byJuniper Sims Modified over 9 years ago
1
INFO 344 Web Tools And Development CK Wang University of Washington Spring 2014
2
Announcements Monday = no class But I will host extra office hours – 12 noon to 3pm – MGH commons – Great opportunity to get PA4 help.
3
Programming Assignment #4
4
Connecting Everything
5
Anatomy QuerySuggest Web Role Search.aspx Dashboard.aspx Admin.asmx Azure Blob QuerySuggest Azure Queue URLs to Crawl Azure Table Web Index Red = Storage Blue = Compute Worker Role Crawler User query suggestions URLs word, URLs AWS RDS Structured Data (NBA stats) Wiki dataset query stats This is basically how Google works! query Azure Table Ranking Azure Blob User Logs PA3 PA1 PA2
6
Final Product On Azure & AWS Admin.asmx Dashboard.aspx Worker Role = crawler Search.aspx Query suggestion (web role)Results Page (web role) Results from table storage Results from cnn.com (table storage) LeBron James LeBron James stats Full Google Experience! PA2 PA1 PA3
7
Connecting P1, P2, P3 PA1 => change to support JSONP – Only 1 result, only exact matches PA2 => This will be our core front-end for PA #4 – Add code to query PA #1’s API to grab NBA players – Add code to query Table Storage for indexed sites (from crawler) – Add code to rank results (LINQ) – Show results to end user – Add query suggestion admin stats to Dashboard PA3 => – Change Table storage to map words in title to URL, instead of the current page URL to title/date. For example, if the title is “Microsoft goes IPO” then the key should be “microsoft”, “goes”, “ipo” and the value is the pair. This is a simplified inverted index. – 1 word will likely map to multiple URL’s – Case insensitive! – Still only cnn.com & sports illustrator
8
Caching, Monetization, Ranking Caching – Web role has cache, size = 100 rows – Just use a dictionary Monetization – Google Adsense! Ranking – Sort by #keyword matches – Then by date – Only use LINQ, 1 statement!
9
Start Now! Less than 2 weeks! No Late Days
10
Deliverables Due on Jun 2, 11pm PST – NO LATE DAYS Submit on Canvas Please submit the following as a single zip file: URL to your Azure instance hosting this website in readme.txt URL to your GitHub repro (share your GitHub with me & TA) in readme.txt Visual Studio 2013 project & source code Make sure crawling is complete (or has crawled a bunch of pages) Write up explaining how you implemented everything. Make sure to address each of the requirements, writeup.txt (~500 words) Extra credits – short paragraph in extracredits.txt for each extra credit (how to see/trigger/evaluate/run your extra credit feature and how you implemented it)
11
Hint Probably easier to start from PA3 Worker Role = same as PA3 – except maybe write to Azure Table part Web Role = PA2 + PA3 Re-launch AWS – Make sure you do Single AZ + Micro instance!!!
12
Hint Start Early Ask on discussion forum Early
13
Extra Credit [up to 10pts] Beautiful search results page [10pts] Show body snippet in results page with query words bolded [5pts] Learn ranking from user clicks on URLs – Still 1 LINQ query for all ranking [5pts] Google instant (AJAX, every keystroke in query box => update results page)*
14
What if my other PAs aren’t working? Start ASAP! PA1 = Not too hard PA2 = Instead of trie, just use Dictionary > where key = first 3 characters of input query. List => titles with that starting character PA3 – Focus on the URL queue and getting sitemap into that queue then getting page title/words into table storage. Don’t worry about the other stuff. This will be much much more important PA4 is basically our last assignment + final exam + final project
15
Secret I’ll weigh PA4 slightly more if I see a huge improvement. Don’t give up : )
16
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.