Data Mining for Web Personalization

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Data Mining for Web Personalization
Interception of User’s Interests on the Web Michal Barla Supervisor: prof. Mária Bieliková.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Collaborative Filtering Sue Yeon Syn September 21, 2005.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
Chapter 12: Web Usage Mining - An introduction
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Web Mining Research: A Survey
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Web Usage Mining: Processes and Applications
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Web Mining Research: A Survey
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Discovery of Aggregate Usage Profiles for Web Personalization
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Data Mining – Intro.
Overview of Web Data Mining and Applications Part I
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:
Dr. Guandong Xu Intelligent Web & Information Systems (IWIS) Department of Computer Science, Aalborg University Web Usage Mining & Personalization.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Data mining and machine learning A brief introduction.
Recommender systems Drew Culbert IST /12/02.
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
Data Mining By Dave Maung.
Discovery of Aggregate Usage Profiles for Web Personalization Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Yuqing Sun, Jim Wiltshire WebKDD 2000.
Data Mining Jim King. What is Data Mining?  A.k.a. knowledge discovery The search for previously unknown relationships in large data setsThe search for.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Collaborative Data Analysis and Multi-Agent Systems Robert W. Thomas CSCE APR 2013.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Collaborative Filtering Zaffar Ahmed
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Text Clustering Hongning Wang
Information Design Trends Unit Five: Delivery Channels Lecture 2: Portals and Personalization Part 2.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
User Modeling and Recommender Systems: recommendation algorithms
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Web Analytics Xuejiao Liu INF 385F: WIRED Fall 2004.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Collaborative Filtering - Pooja Hegde. The Problem : OVERLOAD Too much stuff!!!! Too many books! Too many journals! Too many movies! Too much content!
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Data Mining – Intro.
Recommender Systems & Collaborative Filtering
DATA MINING © Prentice Hall.
Data Mining Jim King.
Data Mining 101 with Scikit-Learn
Data Warehousing and Data Mining
Boštjan Kožuh Statistical Office of the Republic of Slovenia,
Web Mining Department of Computer Science and Engg.
Data Warehousing Data Mining Privacy
Web Mining Research: A Survey
Presentation transcript:

Data Mining for Web Personalization Patrick Dudas

Outline Personalization Data mining Web mining MapReduce Examples Web mining MapReduce Data Preprocessing Knowledge Discovery Evaluation Information High

Personalization Goal of data mining approach is for “automatic personalization” Automatic Personalization: Content-based Collaborative Rule-based

Rule-based (Brief overview) Create decision rules Implicitly/Explicitly Highly domain dependent Rules nontransferable Profiles are based on user input Biased Static Degrade over time

Content-based (Brief overview) Profile based on users past experiences and their interest (ratings) Think Amazon, Pandora, eBay.. Vector similarities based on cosine similarity Bayesian classification Remember: Ratings = Profile = Recommendation

Collaborative (Brief overview) Creating groups of users based on ratings Nearest neighbor approach Once grouped, recommendation based on the other neighbors are presented More users or items = more dimensions of data Dynamic or real-time not applicable

Data Mining Data rich descriptions Large volumes of data reliable models Automated data collection Evaluate results/make decisions Integration with existing data sources

Examples of Large Datasets http://aws.amazon.com/datasets Featured data sets: Illumina - Jay Flatley (CEO of Illumina) Human Genome Data Setcience 315(5814): 972. 350 GB YRI Trio Dataset 700GB Sloan Digital Sky Survey DR6 Subset 160 GB Genome, survey data, Google Books n-gram corpuses, traffic statistics, OpenStreetMap dataset, Wikipedia traffic…

Data Mining Web Personalization Recommendations based on Web objects: Items Pages Documents Navigation by links Web mining Pros: Personalization (duh.), real-time, more enriched datasets Cons: Privacy issues, building complex systems that misrepresent the individual

Extend the Data Mining Paradigm 1) Data Preparation and Transformation 2) Pattern Discovery 3) Recommendation

Data Preparation and Transformation Web logs Date/time usage Site information Resource requested (image, video, etc.) Site files/meta-data The power of the cookie Server-side cookies!

Data Preparation and Transformation (cont.) Pageview: User actions (where they clicked and the path) User events (what they are trying to accomplish) Session: Sequence of page views

MapReduce Google design Hoodop implemented C++, C#, Erlang, Java, Ocaml, Perl, Python, Ruby, F#, R..

Example

Usage Data Pre-Processing

Pattern Discovery We have data! Now what? Cluster Classification Association Rule Discovery Sequential pattern Discovery Markov Models Latent Variable Model

Clustering Partitioning Hierarchical Model-based Split your data into groups K-means Hierarchical Divisive (top-down) Start with everything, find groups Agglomerative (bottom-up) Start with a cluster and add additional information Model-based Building a model for the data (best fit)

K-means

User-Based Clustering Start with the user profile Partition into k-groups of profiles Based on similarity

Association Discovery Support min(support) Confidence min(confidence)

Evaluation (Personalization Model) Challenges: Recommendation algorithms may require unique set of evaluation metrics Personalization actions may be different Domain Intended application Data gathered Check for overfitting data Training set ROC Curve

ROC Curve .95 .05 .20 .80 TPR = TP / (TP + FN) FPR = FP / (FP + TN) "Yes" "No" .95 .05 .20 .80 SN N

Information High Information is addictive Information can be misleading Information ethics Information is power, sometimes too powerful

Personal Suggestions Develop a hypothesis Figure out what data is needed Make informed decisions Don’t trust just your judgment Experts are experts for a reason! Develop a way to validate based on experience Then get more data if needed

Sources Kohavi, R. and F. Provost (2001). "Applications of data mining to electronic commerce." Data Mining and Knowledge Discovery 5(1): 5-10. Dean, J. and S. Ghemawat (2008). "MapReduce: Simplified data processing on large clusters." Communications of the ACM 51(1): 107-113. Mobasher, B. (2007). "Data mining for web personalization." The adaptive web: 90-135. http://en.wikipedia.org/wiki/File:FrequentItems.png Zaharia, M., A. Konwinski, et al. (2008). Improving mapreduce performance in heterogeneous environments, USENIX Association. Witten, I. H. and E. Frank (2002). "Data mining: practical machine learning tools and techniques with Java implementations." ACM SIGMOD Record 31(1): 76-77. Senthil kumar, A. (2011). Knowledge discovery practices and emerging applications of data mining : trends and new domains. Hershey, PA, Information Science Reference.

Thank you! Questions?