Data Mining, Machine Learning, Data Analysis, etc. scikit-learn http://scikit-learn.org/stable/ Data Science
scikit-learn Machine Learning in Python Simple and efficient tools for data mining and data analysis Built on NumPy, SciPy, and matplotlib Open source, commercially usable - BSD license Language: Python http://scikit-learn.org/stable/index.html
Techniques: Classification Regression Clustering Identifying to which category an object belongs to. Regression Clustering Dimensionality reduction Model selection Preprocessing
Examples Face completion with a multi-output estimators Multilabel classification
Multilabel classification
Face completion with a multi- output estimators use of multi-output estimator to complete images goal: predict the lower half of a face given its upper half
Classification Identifying to which category an object belongs to. Applications: Spam detection, Image recognition. Algorithms: SVM, nearest neighbors,random forest, ... Example: Multilabel classification
Classification Examples based on real world datasets Visualizing the stock market structure unsupervised learning techniques extract the stock market structure from variations in historical quotes.
Classification- Examples http://scikit-learn
Regression Predicting a continuous-valued attribute associated with an object. Applications: Drug response, Stock prices. Algorithms: SVR, ridge regression, Lasso, ...
Regression - examples
Clustering Automatic grouping of similar objects into sets. Applications: Customer segmentation, Grouping experiment outcomes Algorithms: k-Means, spectral clustering,mean-shift, ...
Clustering - Examples
Dimensionality reduction Reducing the number of random variables to consider. Applications: Visualization, Increased efficiency Algorithms: PCA, feature selection, non-negative matrix factorization. .
Model selection Comparing, validating and choosing parameters and models. Goal: Improved accuracy via parameter tuning Modules: grid search, cross validation,metrics.
Preprocessing Feature extraction and normalization. Application: Transforming input data such as text for use with machine learning algorithms. Modules: preprocessing, feature extraction.
SAS® Enterprise Miner™ https://www. sas Descriptive and predictive modeling Descriptive Modeling: uncovers shared similarities or groupings in historical data Categorizing customers by product preferences or sentiment Techniques: Predictive modeling Classify events in the future or estimate unknown outcomes. Helps uncover insights for things like customer churn, campaign response or credit defaults. Example: using credit scoring to determine an individual's likelihood of repaying a loan
SAS - Descriptive Modeling Clustering Grouping similar records together. Anomaly detection Identifying multidimensional outliers. Association rule learning Detecting relationships between records. Principal component analysis Detecting relationships between variables. Affinity grouping Grouping people with common interests or similar goals (e.g., people who buy X often buy Y and possibly Z).
SAS - Predictive Modeling Classify events in the future or estimate unknown outcomes. Helps uncover insights for things like customer churn, campaign response or credit defaults. Example: using credit scoring to determine an individual's likelihood of repaying a loan
SaS - Predictive Modeling techniques Regression A measure of the strength of the relationship between one dependent variable and a series of independent variables. Neural networks Computer programs that detect patterns, make predictions and learn. Decision trees Tree-shaped diagrams in which each branch represents a probable occurrence. Support vector machines Supervised learning models with associated learning algorithms.