Download presentation
Presentation is loading. Please wait.
Published byShyann Ashwell Modified over 10 years ago
1
Data Mining in Practice: Techniques and Practical Applications
Junling Hu May 14, 2013
2
What is data mining? Mining patterns from data Is it statistics?
Functional form? Computation speed concern? Data size Variable size Is it machine learning? Big data issue New methods: network mining E.g. stroke prediction
3
Examples of data mining
Frequently bought together Movie recommendation
4
More examples of data mining
Keyword suggestions Genome & disease mining Heart monitoring
5
Overview of data mining
Frequent pattern mining Machine Learning Supervised Unsupervised Stream mining Recommender system Graph mining Unstructured data Text, Audio Image and Video Big data technology
6
Frequent Pattern Mining
Diaper and Beer Product assortment Click behavior Machine breakdown ? Product display, assortment, re-stocking
7
The case of Amazon Count frequency of co-occurrence
User Items 1 {Princess dress, crown, gloves, t-shirt} 2 {Princess dress, crown, gloves, pink dress, t-shirt } 3 {Princess dress, crown, gloves, pink dress, jeans} 4 { Princess dress, crown, gloves, pink dress} 5 {crown, gloves } Count frequency of co-occurrence Efficient algorithm
8
Machine Learning Process
9
Machine Learning Supervised Unsupervised (clustering)
Examples: Churn, Click, yes/no Unsupervised: discussion topics (Twitter), customer feedback, …
10
Binary classification
Input features Output class Checking Duration (years) Savings ($k) Current Loans Loan Purpose Risky? Yes 1 10 TV 2 4 No 5 75 Car 66 Repair 83 11 99 Data point Millions of data points, hundreds of thousands of rows
11
Classification (1) Decision tree
12
Classification (2): Neural network
Perceptron Multi-layer neural netowrk
13
Head pose detection
14
Support Vector Machine (SVM)
Search for a separating hyperplane Maximize margin
15
Perceived advantage of SVM
Transform data into higher dimension
16
Applications of SVM: Spam Filter
Input Features: Transmission IP address Sender URL -- one-spam.com header From To “undisclosed” cc Body # of paragraphs # words structure # of attachments # of links
17
Logistic regression Advantage: Simple functional form
Can be parallelized Large scale
18
Applications of logistic regression
Click prediction Search ranking (web pages, products) Online advertising Recommendation The model Output: Click/no click Input features: page content, search keyword, User information
19
Regression Linear regression Non-linear regression Application:
Stock price prediction Credit scoring employment forecast Numeric number Nonlinear is used by machine learning
20
History of Supervised learning
21
Semi-supervised learning
Application: Speech dialog system
22
Unsupervised learning: Clustering
No labeled data Methods K-means
23
Categories of machine learning
24
Applications of Clustering
Malware detection Document clustering: Topic detection
25
Graphs in our life Social network Molecular compound
Friend recommendation Drug discovery
26
Graph and its matrix representation
Adjacency matrix 1 2 3 4 5 6 1 2 6 3 5 4
27
The web graph Page 2 Page 1 Hyperlink Page 3 Anchor text
Data become large, unsupervised learning becomes popular
28
PageRank as a steady state
Transition matrix P= PageRank is a probability vector such that 1 2 3 4 5 6 0.33 0.5 0.25
29
Discover influencers on Twitter
The Twitter graph Node Link A PageRank approach: TwitterRank 2 1 3 5 4 Following “following”
30
Facebook graph search Entity graph Natural language search
“Restaurants liked by my friends”
31
Recommending a game
32
Recommendation in Travel site
33
Prediction Problems ? Rating Prediction Top-N Recommendation ****
Given how an user rated other items, predict the user’s rating for a given item Top-N Recommendation Given the list of items liked by an user, recommend new items that the user might like ? ****
34
Explicit vs. Implicit Feedback Data
Explicit feedback Ratings and reviews Implicit feedback (user behavior) Purchase behavior: Recency, frequency, … Browsing behavior: # of visits, time of visit, time of staying, clicks
35
Collaborative Filtering
Hypotheses User/Item Similarities Similar users purchase similar items Similar items are purchased by similar users Matching characteristics Match exists between user’s and item’s characteristics
36
User-User similarity User’s movie rating Out of Africa Star Wars
Air Force One Liar, Liar John 4 5 1 Adam 2 Laura ?
37
Item-item similarity Out of Africa Star Wars Air Force One Liar, Liar
John 4 5 1 Adam 2 Laura ?
38
Application of item-item similarity
Amazon
39
SVD (Singular Value Decomposition)
40
Latent factors
41
Application of Latent Factor Model
GetJar
42
Ranking-based recommendation
43
Application in LinkedIn
Ranking-based model
44
Thanks and Contact Co-author: Patricia Hoffman Contact:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.