CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University
What we did Data Mining Overview The KDD Process Data Preprocessing and Understanding Using Python and Numpy Using Scikit-learn modules Some emphasis on visualizing and understanding characteristics of the data Supervised Knowledge Discovery Regression Analysis Classification Techniques such as KNN, Ridge Regression, Decision Tree and Bayesian classification Lots of emphasis on model evaluation Evaluation metrics Train-Test methodologies such as cross-validation 2
What we did Unsupervised Knowledge Discovery Cluster analysis Using PCA and SVD for dimensionality reduction, data characterization, and noise reduction. Association rule discovery Emphasis on using unsupervised approaches as components of larger knowledge discovery efforts E.g., using PCA before clustering; using clustering as the basis for classification Real application domains Text Mining and document analysis/filtering Recommender systems Predictive modeling for marketing/business applications Image analysis 3
What we did not do (and you should learn later) Approaches for mining sequential/temporal data Markov models; time series analysis, sequential pattern mining Ensemble and Hybrid Classifiers/Predictors Combining multiple classifiers Random Forest classifiers AdaBoost and meta-learners Support Vector Machines and Kernel-Based Classifiers Topic modeling with Latent factor models LDA Latent Dirichlet Allocation Non-Negative Matrix Factorization 4