1 Web Search and Advanced Internet Services 290N Class Introduction Tao Yang, 2014
2 Introduction Web Search/Traffic User interests –Web content Importance of search engine traffic Online advertisement Class Topics
Internet Users
Sales of Mobile Devices/PCs
More Mobil Search (2012 Survey)
Users’ Interests
7 Content trend and ownership Content consumption is fragmenting – nobody owns more than 10% of WW PVs No single place will own all the content [Ramakrishnan and Tomkins 2007]
Web Search Engine Market in USA (Jan 2012) Google: 66.2% Bing: 15.2% Yahoo: 14.1% Ask: 3% AOL: 1.6%
Search Traffic is Important for Business
2012 Survey: Web Search Importance for Business
Online advertising market, Worldwide
12 Search query Ad
13 Questions Do you think an “average” user, knows the difference between sponsored search links and algorithmic search results?
14 Course Objectives Practice and experience for building search services and developing related mining applications Broad topics in web mining and search engines, advertisement Algorithms & System support Workload: 1 take-home exam Group project (2 persons). –paper reviewing and presentation –Implementation/evaluation. Report. 2 group HW exercises (Lucene/Solr search, Hadoop log analysis)
15 Course Topics Web Search Indexing, Compression, and Online Search Ranking methods with text/ link/click analysis. Machine learning. Text Mining Duplicate analysis. Text Categorization and Clustering Recommendation Advertisement Systems Support Online servers and offline computation. MapReduce. Caching. Crawling and document parsing. Open source systems
16 Expected Work Tentatively Project 50%. Take-home exam 40%. 10% HW exercise. Timeline Jan 29: 1-page project proposal (plain text). Jan 30-Feb 6: – Meet with me and select paper(s) for reviewing. – Demo for HW 1 Feb 15 week: –Project progress & related papers presentation Feb 27. HW2 –Then schedule second meeting with me on HW2 and proj March 15 week or earlier: –Project demo/interview –Final project slides/report. Take-home exam. Problems based on class presentation/references/HW.
17 References Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze (MRS), Introduction to Information Retrieval, Cambridge University Press Christopher D. ManningPrabhakar Raghavan Hinrich Schütze Search Engines: Information Retrieval in Practice by Croft, Metzler, Strohman (CMS) Addison-Wesley, 2010 Selected papers
18 Class Computing Resource Triton supercomputer accounts: Week 2 (Jan 13).Get a class account in Triton by ing your name, UCSB , and ssh public key with subject "CS290N ssh key" to Instructions on generating ssh keys can be found in keypair.html CSIL sandbox disk space /cs/sandbox/class/cs290n /cs/sandbox/student/ 290N class discussion group at Google.com (we will send an invitation based on the class list).