 Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing.

Slides:



Advertisements
Similar presentations
Contingency Table Analysis Mary Whiteside, Ph.D..
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
Categorical and discrete data. Non-parametric tests.
Logistic Regression.
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
Statistics 200b. Chapter 5. Chapter 4: inference via likelihood now Chapter 5: applications to particular situations.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
1 Econ 240A Power Outline Review Projects 3 Review: Big Picture 1 #1 Descriptive Statistics –Numerical central tendency: mean, median, mode dispersion:
Log-linear and logistic models
Statistics 200b. Chapter 5. Chapter 4: inference via likelihood now Chapter 5: applications to particular situations.
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Multivariate Probability Distributions. Multivariate Random Variables In many settings, we are interested in 2 or more characteristics observed in experiments.
Generalized Linear Models
1 B. The log-rate model Statistical analysis of occurrence-exposure rates.
C. Logit model, logistic regression, and log-linear model A comparison.
Review of Lecture Two Linear Regression Normal Equation
Logistic Regression and Generalized Linear Models:
A Primer on the Exponential Family of Distributions David Clark & Charles Thayer American Re-Insurance GLM Call Paper
Baburao Kamble (Ph.D) University of Nebraska-Lincoln Data Analysis Using R Week 8: Advanced Regression and Probability Distribution.
Introduction to Regression with Measurement Error STA302: Fall/Winter 2013.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Today: Lab 9ab due after lecture: CEQ Monday: Quizz 11: review Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions.
Repeated measures ANOVA in SPSS Cross tabulations Survival analysis.
Generalized Linear Models All the regression models treated so far have common structure. This structure can be split up into two parts: The random part:
Linear Model. Formal Definition General Linear Model.
Linear correlation and linear regression + summary of tests
Machine Learning Recitation 6 Sep 30, 2009 Oznur Tastan.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
Recap of data analysis and procedures Food Security Indicators Training Bangkok January 2009.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
XIAO WU DATA ANALYSIS & BASIC STATISTICS.
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
1 Fighting for fame, scrambling for fortune, where is the end? Great wealth and glorious honor, no more than a night dream. Lasting pleasure, worry-free.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Nonparametric Statistics
R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.
Strategies for Metabolomic Data Analysis Dmitry Grapov, PhD.
Biostatistics Class 3 Probability Distributions 2/15/2000.
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.
CHAPTER 7 Linear Correlation & Regression Methods
Data Science Maxim Lukiyanov Senior Program Manager 6/4/ :04 AM
Chapter 13 Nonlinear and Multiple Regression
Multivariate Analysis
Generalized Linear Models
Kin 304 Regression Linear Regression Least Sum of Squares
CH 5: Multivariate Methods
CHOOSING A STATISTICAL TEST
R at Microsoft Joseph Rickert 9/15/2018 1:13 PM
BPK 304W Regression Linear Regression Least Sum of Squares
Generalized Linear Models
Part Three. Data Analysis
Correct statistics in ecological research
BPK 304W Correlation.
Generalized Linear Models (GLM) in R
Introduction to logistic regression a.k.a. Varbrul
SA3202 Statistical Methods for Social Sciences
Quantitative Methods What lies beyond?.
Nonparametric Statistics
What is Regression Analysis?
Quantitative Methods What lies beyond?.
Statistics II: An Overview of Statistics
Ch 4.1 & 4.2 Two dimensions concept
Introductory Statistics
Presentation transcript:

 Naïve Bayes  Data import – Delimited, Fixed, SAS, SPSS, OBDC  Variable creation & transformation  Recode variables  Factor variables  Missing value handling  Sort, Merge, Split  Aggregate by category (means, sums)  Min / Max, Mean, Median (approx.)  Quantiles (approx.)  Standard Deviation  Variance  Correlation  Covariance  Sum of Squares (cross product matrix for set variables)  Pairwise Cross tabs  Risk Ratio & Odds Ratio  Cross-Tabulation of Data (standard tables & long form)  Marginal Summaries of Cross Tabulations  Chi Square Test  Kendall Rank Correlation  Fisher’s Exact Test  Student’s t-Test  Subsample (observations & variables)  Random Sampling Data Step Statistical Tests Sampling Descriptive Statistics  Sum of Squares (cross product matrix for set variables)  Multiple Linear Regression  Generalized Linear Models (GLM) exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions.  Covariance & Correlation Matrices  Logistic Regression  Classification & Regression Trees  Predictions/scoring for models  Residuals for all models Predictive Models  K-Means  Decision Trees  Decision Forests  Gradient Boosted Decision Trees Cluster Analysis Classification Simulation Variable Selection  Stepwise Regression  Simulation (e.g. Monte Carlo)  Parallel Random Number Generation Combination New in v7.3  PEMA-R API  rxDataStep  rxExec Coming in v7.4