RapStar’s Solution to Data Mining Hackathon on Best Buy Mobile Site Kingsfield, Dragon
Beat Benchmark
Use Time information Time is a good feature in data mining.
Use Time information Divided data into 12 time periods based on click_time field Use frequency at time period where click_time belongs to as “prior” instead of global frequency.
Use Time information Smooth data
Unigram to Bigram
Data Processing The most important part: Query Correction – Lemmatization – Split words and number – Query correction(in small version) A lot of thing that can help to improve: – “x box”, “x men” – New algorithm for query correction Rank predictions that user clicked lower.
Conclusion Data Preprocessing and feature Engineering are most important things.