STA 511 Statistical Computing

Slides:



Advertisements
Similar presentations
Halomda Educational Software ( Established 1988) Mathematics and Science for Primary, Intermediate and High schools Computer Aided and e-Learning Math-XPress.
Advertisements

How to improve your Data Analysis Processes in your Web Application / ERP using RClass Juan Antonio Breña Moral
MATLAB Presented By: Nathalie Tacconi Presented By: Nathalie Tacconi Originally Prepared By: Sheridan Saint-Michel Originally Prepared By: Sheridan Saint-Michel.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
June 13, Introduction to CS II Data Structures Hongwei Xi Comp. Sci. Dept. Boston University.
July 16, Introduction to CS II Data Structures Hongwei Xi Comp. Sci. Dept. Boston University.
What is R Muhammad Omer. What is R  R is the programing language software for statistical computing and data analysis  The R language is extensively.
Android 4: Creating Contents Kirk Scott 1. Outline 4.1 Planning Contents 4.2 GIMP and Free Sound Recorder 4.3 Using FlashCardMaker to Create an XML File.
Chapter 1 Introduction Outstanding Features About This Book 1. A novel writing style is adopted to try to attract students’ or beginning programmers’ interesting.
James Matte Nicole Calbi SUNY Fredonia AMTNYS October 28 th, 2011.
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
Biostatistics, statistical software II. A brief survey of statistical program systems Krisztina Boda PhD Department of Medical Informatics, University.
An innovative learning model for computation in first year mathematics Birgit Loch Department of Mathematics and Computing, USQ Elliot Tonkes CS Energy,
Data, graphics, and programming in R 28.1, 30.1, Daily:10:00-12:45 & 13:45-16:30 EXCEPT WED 4 th 9:00-11:45 & 12:45-15:30 Teacher: Anna Kuparinen.
Objectives Understand what MATLAB is and why it is widely used in engineering and science Start the MATLAB program and solve simple problems in the command.
Wiley eGrade. What is eGrade? Web-based software that enables instructors to automate the process of assigning and grading homework and quiz assignments.
Instructors begin using McGraw-Hill’s Homework Manager by creating a unique class Web site in the system. The Class Homepage becomes the entry point for.
1 Computer Programming (ECGD2102 ) Using MATLAB Instructor: Eng. Eman Al.Swaity Lecture (1): Introduction.
Chapter 4 Calculators for Electricity and Electronics.
MATLAB Harri Saarnisaari, Part of Simulations and Tools for Telecommunication Course.
Ch 6. The Evolution of Analytic Tools and Methods Taming The Big Data Tidal Wave 31 May 2012 SNU IDB Lab. Sengyu Rim.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
Most of contents are provided by the website Introduction TJTSD66: Advanced Topics in Social Media Dr.
INTRODUCTION TO PROGRAMMING ISMAIL ABUMUHFOUZ | CS 146.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Introduction to Matlab By Nazarudin,S.Si,M.Si,PhD.
Halomda Educational Software ( Established 1988) Mathematics and Science for Primary, Intermediate and High schools, Colleges and Universities Computer.
NET 222: COMMUNICATIONS AND NETWORKS FUNDAMENTALS ( NET 222: COMMUNICATIONS AND NETWORKS FUNDAMENTALS (PRACTICAL PART) Tutorial 2 : Matlab - Getting Started.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Pinellas County Schools
Physics 114: Lecture 1 Overview of Class Intro to MATLAB
A quick guide to other statistical software
Managing and Monitoring Windows 7 Performance
Subject : Computer Science
CS 3034: Widely Used Programming Languages
John Metz and Jeff Potts Michigan’s A. E. R. Annual Conference 2017
MASS Java Documentation, Verification, and Testing
CS6501 Advanced Topics in Information Retrieval Course Policy
Matlab.
MATH/COMP 340: Numerical Analysis I
2017年6月4日更新 1. イントロダクション 東北大学 大学院工学研究科 嶋田 慶太.
Software for scientific calculations
MATLAB Basics Nafees Ahmed Asstt. Professor, EE Deptt DIT, DehraDun.
Introduction to R Programming with AzureML
R Programming.
Using MyMathLab Features
Introduction CSE 1310 – Introduction to Computers and Programming
Adding Assignments and Learning Units to Your TSS Course
Week 1 Gates Introduction to Information Technology cosc 010 Week 1 Gates
Fall 2017: BUSA 3110 – Statistics for Business Week 1 – Day 1
© Paradigm Publishing, Inc.
Lecture 1: Introduction
Cryptography This week we are going to use OpenSSL
Part I – Matlab Basics.
Visual Solution to Room Usage Reporting
Use of Mathematics using Technology (Maltlab)
Introduction to CS II Data Structures
Using MyMathLab Features
Computer Programming 1 introduction to JAVA Lecture 1 Instructor: Ruba A. Salamah Islamic University of Gaza.
Communication and Coding Theory Lab(CS491)
Accelerated Introduction to Computer Science
BUSINESS COMMUNICATION SKILLS PRESENTATION SKILLS OF THESIS & PROJECT
CSCI N317 Computation for Scientific Applications Unit 1 – 1 MATLAB
Title of Project Joseph Hallahan Computer Systems Lab
Moodle Training — Advanced Topics —
SOFTWARE TECHNOLOGIES
Guide: Report results Version of Ladok by the latest update:
Using R for Data Analysis and Data Visualization
Suggested TALKING POINTS:
Presentation transcript:

STA 511 Statistical Computing Changxing Ma cxma@buffalo.edu http://www.buffalo.edu/~cxma Homepage: http://www.buffalo.edu/~cxma/STA511

Statistical Computing What skills we need to do “Biostatistical consulting and projects”? Biostatistics SAS, (or STATA, SPSS etc.) Microsoft Office (or LaTex for mathematician/statistician) Google …

Statistical Computing What skills we need in research - dissertation work? Biostatistics One language: Fortran, C, C++ (Optional) Matrix language: Matlab, R Latex (Calculus, Algebra) Maple/Matlab …

Contents LaTex (Microsoft word) SAS Basic Advanced SAS Matlab/R SAS Macro SAS SQL SAS IML Matlab/R Matlab (Maple) – Symbolic calculation

Example 1 - SAS Stepwise regression y=f(x1, x2, … xs) Stepwise regression is a technique for choosing the variables i.e., terms, to include in a multiple regression model. Forward stepwise regression starts with no model terms. At each step it adds the most statistically significant term (the one with the highest F statistic or lowest p-value) until there are none left. Backward stepwise regression starts with all the terms in the model and removes the least significant terms until all the remaining terms are statistically significant. It is also possible to start with a subset of all the terms and then add significant terms or remove insignificant terms.

Example 1 - SAS Stepwise regression The stepwise method is a modification of the forward-selection technique and differs in that variables already in the model do not necessarily stay there. As in the forward-selection method, variables are added one by one to the model, and the F statistic for a variable to be added must be significant at the SLENTRY= level. After a variable is added, however, the stepwise method looks at all the variables already included in the model and deletes any variable that does not produce an F statistic significant at the SLSTAY= level. Only after this check is made and the necessary deletions accomplished can another variable be added to the model. The stepwise process ends when none of the variables outside the model has an F statistic significant at the SLENTRY= level and every variable in the model is significant at the SLSTAY= level, or when the variable to be added to the model is the one just deleted from it.

Example 1 - SAS Stepwise regression proc reg data=yourdata; model y=x1-x10 / selection=stepwise SLENTRY=0.15 SLSTAY=0.15; run;

SELECTION=BACKWARD | B Example 1 - SAS Stepwise logistic regression proc logistic data=yourdata; model y=x1-x10 / selection=stepwise SLENTRY=0.15 SLSTAY=0.15; run; SELECTION=BACKWARD | B                           | FORWARD | F                           | NONE | N                           | STEPWISE | S

Example 1 - SAS Stepwise General Linear Model proc GENMOD data=yourdata; model y=x1-x10 / LINK = LOG selection=stepwise SLENTRY=0.15 SLSTAY=0.15; run; HOW? Wait SAS next release to hope that it will have the stepwise selection Do the selection manually using SAS genmod (most people did this) Write your own SAS code for stepwise selection using SAS MACRO

Rule to use MACRO Basically, any repeat or partially repeat job should use MACRO to do it automatically. The most of jobs are partially repeated. A job could split into partially repeated parts. Using macro will save you big TIME & $$$

STA 511 - Advanced SAS SAS Macro SAS IML SAS SQL

Example 2 - Math A = determinant of A: Det (A) ? Inverse (A) ? Any solution?

Example 2 - Math Any solution? Very simple If you are good enough in algebra, you know the answer If you just learned algebra, you should know it If you have good memory, you should remember it from the algebra you learned years ago Check an algebra or matrix textbook For me – too lazy to check a textbook. Then

Maple/Matlab Define the matrix (Matlab mupad) x:=matrix([[1, r, r^2, r^3, r^4],[ r, 1, r, r^2, r^3], [ r^2, r, 1, r, r^2], [ r^3, r^2, r, 1, r], [ r^4, r^3, r^2, r, 1]]) Display it

Matlab Inverse of x 1/x

Det(x) factor(det(x)) for k=5 Any k?

Symbolic calculation – Maple/Matlab Calculus Algebra Help you to “produce” formula More … Another example

Formulated as Partially BY Maple HELP Ma CX, Fang KT ,and Lin DKJ A note on uniformity and orthogonality, Journal of Statistical Planning and Inference 113 (1) 323-334 2003 Formulated as Partially BY Maple HELP

STA 511: Maple/Matlab Introduce the language Learned it by real examples practice & practice

STA511 – MATLAB or R “MATLAB is a high-level language and interactive environment that enables you to perform computationally intensive tasks faster than with traditional programming languages such as C, C++, and Fortran. ”

MATLAB or R Introduction and Key Features Developing Algorithms and Applications Analyzing and Accessing Data Visualizing Data Performing Numeric Computation Publishing Results and Deploying Applications

Example 3: Matlab 90% of my publications are calculated by MATLAB, 10% by Fortran or C See http://www.buffalo.edu/~cxma/ All graphs in my papers are produced by Matlab The same task written by MATLAB will cost you one 10th of that by Fortran or C base on my experience Matlab is a matrix-based language

Example 4: R R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R R provides a wide variety of statistical and graphical techniques, and is highly extensible. One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. R is available as Free Software

Why we learn and use so many software if one software (like SAS) may provide all the functions? You should always use “proper part” of “proper software” for “proper tasks”. Each software has its “best part” to use, although every software is trying to provide “other part” for you. Sounds strange? For example, The best part of SAS is statistical analysis, although it provide all graphical functions. You should use its graph only for draft purpose. You should use Matlab, R, or Microsoft Office for publication-level plots.

miscellaneous Latex: a high-quality typesetting system; it includes features designed for the production of technical and scientific documentation Statistician should use it for the dissertation and papers preparation Other useful software?

COURSE DESCRIPTION Statistical packages and computing is an essential part of modern statistical training, as it touches on almost every aspect of statistical theory and practice. This course covers advanced SAS, symbolic calculation (Matlab), and scientific calculation software (R or MATLAB). My goals in teaching this class are: To help the students build the advanced SAS skills needed for statistical consulting and projects To help the students build the programming skills needed for their thesis or dissertation work. To present some examples of computational problems in statistics. To build a ability to learn any new language The MACRO developed through STA511 could be used in the future in your real projects

Software SAS/Matlab is installed in our lab, or UB virtual machine UB students can get a free copy of Matlab R can be freely downloaded from http://www.r-project.org/

TEXT BOOK: No specific text is required TEXT BOOK: No specific text is required. The course materials will be drawn from following recommended resources SAS Michele M. Burlew, SAS Macro Programming Made Easy, ISBN: 1580253431. SAS Macro User Guide, download from here. SAS IML, download from here. SAS SQL, download from here. R, MATLAB

Grading 6 Homework assignments (100%).

SAS basic base_lrconcept_9196.pdf Above title comprehensively documents essential concepts for SAS features, the DATA step, and SAS files. This reference is a companion volume to the SAS Language Reference: Dictionary, which provides complete reference information about fundamental SAS language element features and the DATA step debugger.