Data Mining for Web Personalization

Name: Data Mining for Web Personalization
Uploaded: 2017-12-16T21:35:28+00:00
Duration: PTM7S59
Channel: Rafe Mills
Description: Data Mining for Web Personalization

Data Mining for Web Personalization
Patrick Dudas

Outline Personalization Data mining Web mining MapReduce
Examples Web mining MapReduce Data Preprocessing Knowledge Discovery Evaluation Information High

Personalization Goal of data mining approach is for “automatic personalization” Automatic Personalization: Content-based Collaborative Rule-based

Rule-based (Brief overview)
Create decision rules Implicitly/Explicitly Highly domain dependent Rules nontransferable Profiles are based on user input Biased Static Degrade over time

Content-based (Brief overview)
Profile based on users past experiences and their interest (ratings) Think Amazon, Pandora, eBay.. Vector similarities based on cosine similarity Bayesian classification Remember: Ratings = Profile = Recommendation

Collaborative (Brief overview)
Creating groups of users based on ratings Nearest neighbor approach Once grouped, recommendation based on the other neighbors are presented More users or items = more dimensions of data Dynamic or real-time not applicable

Data Mining Data rich descriptions Large volumes of data
reliable models Automated data collection Evaluate results/make decisions Integration with existing data sources

Examples of Large Datasets
Featured data sets: Illumina - Jay Flatley (CEO of Illumina) Human Genome Data Setcience 315(5814): 972. 350 GB YRI Trio Dataset 700GB Sloan Digital Sky Survey DR6 Subset 160 GB Genome, survey data, Google Books n-gram corpuses, traffic statistics, OpenStreetMap dataset, Wikipedia traffic…

Data Mining Web Personalization
Recommendations based on Web objects: Items Pages Documents Navigation by links Web mining Pros: Personalization (duh.), real-time, more enriched datasets Cons: Privacy issues, building complex systems that misrepresent the individual

Extend the Data Mining Paradigm
1) Data Preparation and Transformation 2) Pattern Discovery 3) Recommendation

Data Preparation and Transformation
Web logs Date/time usage Site information Resource requested (image, video, etc.) Site files/meta-data The power of the cookie Server-side cookies!

Data Preparation and Transformation (cont.)
Pageview: User actions (where they clicked and the path) User events (what they are trying to accomplish) Session: Sequence of page views

MapReduce Google design Hoodop implemented
C++, C#, Erlang, Java, Ocaml, Perl, Python, Ruby, F#, R..

Example

Usage Data Pre-Processing

Pattern Discovery We have data! Now what? Cluster Classification
Association Rule Discovery Sequential pattern Discovery Markov Models Latent Variable Model

Clustering Partitioning Hierarchical Model-based
Split your data into groups K-means Hierarchical Divisive (top-down) Start with everything, find groups Agglomerative (bottom-up) Start with a cluster and add additional information Model-based Building a model for the data (best fit)

K-means

User-Based Clustering
Start with the user profile Partition into k-groups of profiles Based on similarity

Association Discovery
Support min(support) Confidence min(confidence)

Evaluation (Personalization Model)
Challenges: Recommendation algorithms may require unique set of evaluation metrics Personalization actions may be different Domain Intended application Data gathered Check for overfitting data Training set ROC Curve

ROC Curve .95 .05 .20 .80 TPR = TP / (TP + FN) FPR = FP / (FP + TN)
"Yes" "No" .95 .05 .20 .80 SN N

Information High Information is addictive
Information can be misleading Information ethics Information is power, sometimes too powerful

Personal Suggestions Develop a hypothesis
Figure out what data is needed Make informed decisions Don’t trust just your judgment Experts are experts for a reason! Develop a way to validate based on experience Then get more data if needed

Sources Kohavi, R. and F. Provost (2001). "Applications of data mining to electronic commerce." Data Mining and Knowledge Discovery 5(1): 5-10. Dean, J. and S. Ghemawat (2008). "MapReduce: Simplified data processing on large clusters." Communications of the ACM 51(1): Mobasher, B. (2007). "Data mining for web personalization." The adaptive web: Zaharia, M., A. Konwinski, et al. (2008). Improving mapreduce performance in heterogeneous environments, USENIX Association. Witten, I. H. and E. Frank (2002). "Data mining: practical machine learning tools and techniques with Java implementations." ACM SIGMOD Record 31(1): Senthil kumar, A. (2011). Knowledge discovery practices and emerging applications of data mining : trends and new domains. Hershey, PA, Information Science Reference.

Thank you! Questions?

Data Mining for Web Personalization

Similar presentations

Presentation on theme: "Data Mining for Web Personalization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Mining for Web Personalization

Similar presentations

Presentation on theme: "Data Mining for Web Personalization"— Presentation transcript:

Similar presentations

About project

Feedback