Advertising on the web.

Slides:



Advertisements
Similar presentations
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
Advertisements

Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
BTrees & Bitmap Indexes
CS 345 Data Mining Online algorithms Search advertising.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
Parallel Algorithms for Relational Operations. Models of Parallelism There is a collection of processors. –Often the number of processors p is large,
CS 345 Data Mining Online algorithms Search advertising.
CLICK FRAUD Alexander Tuzhilin By Vinny Rey. Why was the study done? Google was getting sued by advertisers because of click fraud. Google agreed to have.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman.
Calculating Discrete Logarithms John Hawley Nicolette Nicolosi Ryan Rivard.
CMPE 421 Parallel Computer Architecture
Database Management 9. course. Execution of queries.
Data Structures & Algorithms and The Internet: A different way of thinking.
1 Online algorithms Typically when we solve problems and design algorithms we assume that we know all the data a priori. However in many practical situations.
Copyrighted material John Tullis 10/17/2015 page 1 04/15/00 XML Part 3 John Tullis DePaul Instructor
Date: 2012/3/5 Source: Marcus Fontouraet. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou 1 Efficiently encoding term co-occurrences in inverted.
March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 5: Paid Search Marketing
Analysis of Massive Data Sets Prof. dr. sc. Siniša Srbljić Doc. dr. sc. Dejan Škvorc Doc. dr. sc. Ante Đerek Faculty of Electrical Engineering and Computing.
PPC Tutorial For Beginners. Content  PPC  Paid & Organic Advertisement  What is Search Engine?  How to set up account in Google Adwords? Target Your.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
How To Market Disaster Restoration Services in The Internet Era
Search Engine Optimization
Rob Gleasure IS3320 Developing and Using Management Information Systems Lecture 11: Flow-Charts 1 Rob Gleasure
بازاریابی دیجیتال در یک نگاه
Information Retrieval in Practice
What about streaming data?
Storage Access Paging Buffer Replacement Page Replacement
Module 11: File Structure
CS 540 Database Management Systems
Computational Advertising
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
CHP - 9 File Structures.
Azita Keshmiri CS 157B Ch 12 indexing and hashing
Technology and Marketing The Awesomeness That Is Google Part II
Parallel Databases.
Information Retrieval in Practice
Greedy Method 6/22/2018 6:57 PM Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 2015.
AdWords and Generalized On-line Matching
Are they better or worse than a B+Tree?
Digital Marketing Overview
Subject Name: File Structures
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Review Graph Directed Graph Undirected Graph Sub-Graph
Digital Marketing Overview
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Cloud Computing Group 7.
Multi - Way Number Partitioning
Handling Data Using Databases
Review of Bulk-Synchronous Communication Costs Problem of Semijoin
Information Retrieval
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Online algorithms Search advertising
Operating Systems.
NoSQL Databases Antonino Virgillito.
One-Pass Algorithms for Database Operations (15.2)
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Overview of Query Evaluation
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Computer Organization & Architecture 3416
DIGITAL MARKETING SERVICES FOR YOUR BUSINESS ftlmedia.com.
Review of Bulk-Synchronous Communication Costs Problem of Semijoin
Computational Advertising and
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Approximation Algorithms
Best Useful Guide For Digital Marketing Freelancing 2019
HOW BUSINESSES USE GOOGLE
Presentation transcript:

Advertising on the web

On-Line Advertising Main places to show ads to customer Some sites, such as eBay, Craigslist, or auto trading sites allow advertisers to post their ads directly (FREE, FEE or COMMISSION) Search ads are placed among the results of a search query Online Stores such as Amazon show ads in many contexts. Ads are not paid by the manufacturers of the product advertised. Display ads are placed on many Websites. Advertisers pay for the display at a fixed per impression

Issues for Display Ads Lack of focus on newspaper Ad blocking Website can use an inverted index of word Ranking ads Position of the ad in a list

On-line and Off-line Algorithms Can get parts or all of its input data while running. When all the information for the algorithm is not available before the algorithm makes a decision. Off-line All the data needed by the algorithm is presented initially. Algorithm can access data in any order. Once the algorithm is finished it produces an answer

Greedy Algorithm and the Competitive Ratio Many on-line algorithms are greedy algorithms. Greedy algorithms make their decision in response to each input element by maximizing some function of the input element and the past. Competitive Ratio When comparing on-line and off-line algorithms, we can expect that there will be some constant c less than 1 such that on any input, the result of an on-line algorithm is at least c times the result of the optimum off- line algorithm. On-line Result Off-line Result

The Matching Problem Definition: a large number of advertisers want to deliver multiple messages to a large number of consumers. Matching: subset of the edges such that no node is an end of two or more edges. Perfect Matching: when every node appears in the matching. Search queries were introduced by Overture in 2000 -> Google AdWords Advertisers ”bid” on search keywords -> When someone searches for that keyword, the highest bidder’s ad is shown -> Advertiser is charged only if the ad is clicked on. Ads: 1, 2, 3, 4,… Users / search queries: a, b, c, d,… Bi-partite graph

Maximal Matching Greedy algorithm: We consider the edges in whatever order they are given. When we consider (x, y), add this edge to the matching if neither x nor y are ends of any edge selected for the matching so far. Perfect Match: Ford-Fulkerson, Augmenting Path, and Hopcroft-Karp algorithms. {(1,a), (2,b), (3,d)} {(1,c), (2,b), (3,d), (4,a)}

Fundamental Problem of Search Advertising The Adwords Problem - Is how to pick which ads are displayed on a search page when a user types in a query. The ideal off-line algorithm uses all of the budgets money so its competitive ratio = 1 The order of the ad’s must be determined by an on-line algorithm that must consider: The limited number of advertisement slots for any given query. The budget on how much a company will pay for all the uses of their add. The total amount expected to be made by the ad not which company bid the highest. What are the things that are considered when solving the Adwords Problem and which one can be ignored in practice?

Solutions with example Assume two bidders A1 and A2, two possible queries X or Y, bids of 1 or 0, budgets B = 2, and a query pattern of X1X2Y1Y2. If A1 bids 1 on X. A2 bids 1 on X and Y. X1→ A2. Greedy Algorithm - considers price of bid. X2→A2, Y → nothing. Competitive Ratio = ½ Balance Algorithm - considers price of bid and remaining budget. X2→A1, Y1→A2, Y2→nothing. Competitive Ratio = ¾ Generalised form - advertiser → Ai, bid → xi, unspent fraction of Ai’s budget → fi then ψi=xi(1-e-fi). Query is assigned to Ai such that ψi is a maximum. Is the Greedy Algorithm or the Balance Algorithm better to salve The Adwords Problem?

Matching Bids and Search Queries If a search query occurs having exactly that set of words in some order, then the bid is said to match the query, and it becomes a candidate for selection. We can avoid having to deal with word order by storing all sets of words representing a bid in alphabetic order. The list of words in sorted order forms the hash-key for the bid, and these bids may be stored in a hash table used as an index. Search queries also have their words sorted prior to lookup. When we hash the sorted list, we find in the hash table all the bids for exactly that set of words. They can be retrieved quickly, since we have only to look at the contents of that bucket. If there are a million advertisers, each bidding on 100 queries, and the record of the bid requires 100 bytes, then we require ten gigabytes of main memory, which is well within the limits of what is feasible for a single machine.

More Complex Matching Problems We can still maintain a hash-table index for the bids, but the number of subsets of words in a hundred-word email is much too large to look up all the sets, or even all the small sets of about three words. They all involve standing queries that users post to a site, expecting the site to notify them whenever something matching the query becomes available at the site. EX: 1. Twitter allows one to follow all the “tweets” of a given person. However, it is feasible to allow users to specify a set of words, such as “ipod free music” 2. Online news sites often allow users to select from among certain keywords or phrases, e.g., “healthcare” and receive alerts whenever a new news article contains that word or consecutive sequence of words. It is simpler than the email/adwords problem for several reasons. Matching single words or consecutive sequences of words, even in a long article, is not as time-consuming as matching small sets of words.

A Matching Algorithm for Documents and Bids A bid is a (typically small) set of words. A document is a larger set of words, such as an email, tweet, or news article. We assume there are many bids, perhaps on the order of a hundred million or a billion. We shall, as before, represent a bid by its words listed in some order. There are two new elements in the representation. First, we shall include a status with each list of words. The status is an integer indicating how many of the first words on the list have been matched by the current document. When a bid is stored in the index, its status is always 0(zero). Second, while the order of words could be lexicographic, we can lower the amount of work by ordering words by the rarest-first. WHAT IS A BID? A SET OF WORDS HOW DO YOU LOWER THE AMOUNT OF WORK? Sorting by rarest-first

Bid Example For each word w, in the sorted order: 1. Using w as the hash-key for the table of partially matched bids, find those bids having w as key. 2. For each such bid b, if w is the last word of b, move b to the table of matched bids. 3. If w is not the last word of b, add 1 to b’s status, and rehash b using the word whose position is one more than the new status, as the hash-key. 4. Using w as the hash key for the table of all bids, find those bids for which w is their first word in the sorted order. 5. For each such bid b, if there is only one word on its list, copy it to the table of matched bids. 6. If b consists of more than one word, add it, with status 1, to the table of partially matched bids, using the second word of b as the hash-key.

MapReduce Solution to Graph Model The MapReduce solution to the graph model is used to decrease the communication and the movement of large data The solution utilized a modified join, semi-join before processing. A semijoin removed all R’s that do not have a corresponding b value in S for a dataset R(a,b) and S(b,c) If a is really large, this will reduce the computation of the MapReduce

The Bulk-Synchronous Solution to Graph Model Like the MapReduce solution, the Bulk-Synchronous Solution to the Graph Model reduces the amount of communications between nodes by using a semijoin but it also is done in parallel In the Bulk-Synchronous solution to the Graph Model for a set R(a, b) and S(b, Creates a graph node for each tuple Create a graph node for each value of b For each tuple, connect/send messages to the corresponding value of b Send messages if the b node has at least one S node and R node. After this, it will do a MapReduce like process for all non-dangling tuples.. Some examples of Bulk-Synchronous Systems are: Pregel - Google’s Giraph - opensource Pregel build on Hadoop GraphX - for sparks GraphLab - is built to handle higher degree nodes What is the benefits of using the MapReduce and Bulk-Synchronous Solutions to the Graph Model?

Video

Thank You!