Some Final Material
GOOGLE FLU TRENDS
Sore throat? Sniffles? Google it! Duh! During flu season, more people enter search queries concerning the flu. Each year 90 million American adults search web for info about specific illnesses = LOTS OF DATA Importance: 250, ,000 deaths from respiratory illnesses worldwide.
Previous Attempts Swedish website counted queries in order track flu activity. There was a strong correlation between frequency of search terms containing “flu” and “influenza” and virologic surveillance data These models look for a very limited number of queries.
Google’s Version Took 50 million of the most common search queries between and did a weekly count for each state Normalized data by dividing count by total searches for the week (thereby getting a percentage)
Each of 50 million queries were tested for correlation with CDC data Ranked according from most to least correlated We want to estimate flu activity based on more then just a few queries
Google added top ranked queries together to see what number would yield the most accurate results. The magic number is 45
Previously unused data for flu season of as a test set The mean correlation was 0.97 (ranged between 0.92 and 0.99)
Advantages Generate accurate estimates faster than CDC. CDC takes one to two weeks to process data and generate a flu activity report It takes Google one to two days to generate an estimate Faster estimates means that health officials can quickly direct resources to where the need is greatest
Future Expand Google Flu Trends to predict flu activity across the globe. Challenges: some countries do not have official historical data
Self Driving Cars Google “commercial” videovideo Alternative future autonomous “vehicles” – video video
Sample Telecommunication Applications
Some Applications Applications – Classify a phone line/customer as a business or residential customer Will build predictive model for called customer, who may not be an AT&T customer. – Classifying inbound service by types of use (voice, fax, modem) – Identify telemarketers Uses: Marketing, revenue prediction, impact of changes (e.g., do not call list for telemarketing)
Distribution of Weekday Calls by Hour
Comparison of Weekday Calling Patterns
Call Durations
Market Segments
Enterprise Miner Workspace
Some Results Segment 0Segment 1
More Results Segment 2Segment 3
Application 2 Identify how inbound (toll-free) service is used – Is an inbound line being used for: Voice Fax Data/Modem – Useful for identifying trends and prediction Fax usage has dropped significantly since last study, most likely to increased use of the Internet – Useful for Marketing For example, for new fax services
Segmentation of Inbound Lines
Type of Usage by Segment
Distribution of Usage Fax and modem lines show opposite trends. Fax lines become more common in the low-usage segments while modem lines become less common in these segments. Fax usage grows to 5% in segment 8, but this contributes very few minutes
Summary Results for AT&T Toll-Free Lines
Chronological Comparison