Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,

Slides:



Advertisements
Similar presentations
Programming for Beginners
Advertisements

Guy Griffiths. General purpose interpreted programming language Widely used by scientists and programmers of all stripes Supported by many 3 rd -party.
Introduction to R Brody Sandel. Topics Approaching your analysis Basic structure of R Basic programming Plotting Spatial data.
Basics of Biostatistics for Health Research Session 2 – February 14 th, 2013 Dr. Scott Patten, Professor of Epidemiology Department of Community Health.
Introduction to R - Functions, Packages Andrew Jaffe 10/18/10.
Dr. Ken Hoganson, © August 2014 Programming in R COURSE NOTES 2 Hoganson Language Translation.
ICS103 Programming in C Lecture 1: Overview of Computers & Programming
Lecture 1: Overview of Computers & Programming
 Statistics package  Graphics package  Programming language  Can be used to share/reproduce analyses  Many new packages being created - can be downloaded.
Beginning the Visualization of Data
Chapter 1 Computing Tools Data Representation, Accuracy and Precision Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction.
Introduction to GTECH 201 Session 13. What is R? Statistics package A GNU project based on the S language Statistical environment Graphics package Programming.
The Binary Machine Modern high-level programming languages are designed to make programming easier. On the other end, the low level, all modern digital.
LSP 121 Intro to Statistics and SPSS. Statistics One of many definitions: The mathematics of collecting and analyzing data to draw conclusions and make.
Program Flow Charting How to tackle the beginning stage a program design.
Programming Introduction November 9 Unit 7. What is Programming? Besides being a huge industry? Programming is the process used to write computer programs.
Introduction to a Programming Environment
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Interval Estimates A point estimate gives a plausible single number estimate for a parameter. We may also be interested in a range of plausible values.
CS190/295 Programming in Python for Life Sciences: Lecture 1 Instructor: Xiaohui Xie University of California, Irvine.
Principles of Programming Chapter 1: Introduction  In this chapter you will learn about:  Overview of Computer Component  Overview of Programming 
Visual Displays of Data and Basic Descriptive Statistics
Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median.
Topics Introduction Hardware and Software How Computers Store Data
Introduction to Python
Hands-on Introduction to R. Outline R : A powerful Platform for Statistical Analysis Why bother learning R ? Data, data, data, I cannot make bricks without.
General Computer Science for Engineers CISC 106 Lecture 02 Dr. John Cavazos Computer and Information Sciences 09/03/2010.
Lecture 4 MATLAB Windows Arithmetic Operators Maintenance Functions
Programming 1 1. Introduction to object oriented programming and problem-solving.
Objectives Understand what MATLAB is and why it is widely used in engineering and science Start the MATLAB program and solve simple problems in the command.
Introduction to R Part 2. Working Directory The working directory is where you are currently saving data in R. What is the current working directory?
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
CMPD 434 MULTIMEDIA AUTHORING Chapter 06 Multimedia Authoring Process IV.
Matlab Basics Tutorial. Vectors Let's start off by creating something simple, like a vector. Enter each element of the vector (separated by a space) between.
Just as there are many human languages, there are many computer programming languages that can be used to develop software. Some are named after people,
Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,
Getting Started with MATLAB 1. Fundamentals of MATLAB 2. Different Windows of MATLAB 1.
ISU Basic SAS commands Laboratory No. 1 Computer Techniques for Biological Research Animal Science 500 Ken Stalder, Professor Department of Animal Science.
Basic of Programming Language Skill Area Computer System Computer Program Programming Language Programmer Translators.
Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
I Power Higher Computing Software Development Development Languages and Environments.
What Do I Represent?. Translators – Module Knowledge Areas Revisiting object code When we disassemble code we can view the opcodes used This is just a.
Exam Format  105 Total Points  25 Points Short Answer  20 Points Fill in the Blank  15 Points T/F  45 Points Multiple Choice  The above are approximations.
Introduction to R Introductions What is R? RStudio Layout Summary Statistics Your First R Graph 17 September 2014 Sherubtse Training.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
Data Science and Big Data Analytics Chap 3: Data Analytics Using R
Introduction to Python Dr. José M. Reyes Álamo. 2 Three Rules of Programming Rule 1: Think before you program Rule 2: A program is a human-readable set.
Principles of Programming Chapter 1: Introduction  In this chapter you will learn about:  Overview of Computer Component  Overview of Programming 
The Development Process Compilation. Compilation - Dr. Craig A. Struble 2 Programming Process Problem Solving Phase We will spend significant time on.
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
1 Types of Programming Language (1) Three types of programming languages 1.Machine languages Strings of numbers giving machine specific instructions Example:
OCR A Level F453: The function and purpose of translators Translators a. describe the need for, and use of, translators to convert source code.
Software Engineering Algorithms, Compilers, & Lifecycle.
Quiz 1 A sample quiz 1 is linked to the grading page on the course web site. Everything up to and including this Friday’s lecture except that conditionals.
Software Development Languages and Environments. Computer Languages Just as there are many human languages, there are many computer programming languages.
Physics 114: Lecture 1 Overview of Class Intro to MATLAB
EMPA Statistical Analysis
Development Environment
Topics Introduction Hardware and Software How Computers Store Data
Introduction Osborn.
A451 Theory – 7 Programming 7A, B - Algorithms.
CS190/295 Programming in Python for Life Sciences: Lecture 1
Lab 1 Introductions to R Sean Potter.
High Level Programming Languages
Use of Mathematics using Technology (Maltlab)
Code is on the Website Outline Comparison of Excel and R
CSCI N317 Computation for Scientific Applications Unit 1 – 1 MATLAB
1.3.7 High- and low-level languages and their translators
Presentation transcript:

Hands-on Introduction to R

We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java, MATLAB, Python, Perl, R and/or Mathematica Data collection and analysis very important in Forensic Science since NAS 2009 Using the above languages, codes can easily be made available for review/discovery Why Leaning Programing?

All machines understand is on/off! High/low voltage High/low current High/low charge 1/0 binary digits (bits) To make a computer do anything, you have to speak machine language to it: Getting a computer to do anything useful Add 1 and 2. Store the result. Wikipedia

Machine language is not intuitive and can vary a great deal over designs The basic operations operations however are the same, e.g.: Move data here Combine these values Store this data Etc. “Human readable” language for basic machine operations: assembly language Getting a computer to do anything useful

Assembly is still cumbersome for (most) humans Getting a computer to do anything useful MOV AL, 61h Assembly A machine encoding Move the number 97 over to “storage area” AL

Better yet is a more “Englishy”, “high-level” language Enter: C, C++, Fortran, Java, … Higher level languages like these are translated (“compiled”) to machine language Not exactly true for Java, but it’s something analogous… Getting a computer to do anything useful

Even more “Englishy” and “high-level” are interpreted languages Enter: R MATLAB, Perl, Python, Mathematica, Maple, … The “code” of these languages are “interpreted” as commands by a program that is already running They make many assumptions behind the scenes Much easier to program with Much slower than compiled languages Getting a computer to do anything useful

R is not a black box! Codes available for review; totally transparent! R maintained by a professional group of statisticians, and computational scientists From very simple to state-of-the-art procedures available Very good graphics for exhibits and papers R is extensible (it is a full scripting language) Coding/syntax similar to Python and MATLAB Easy to link to C/C++ routines Why ?

Where to get information on R : R: Just need the base RStudio: A great IDE for R Work on all platforms Sometimes slows down performance… CRAN: Library repository for R Click on Search on the left of the website to search for package/info on packages Why ?

Finding our way around R/RStudio Script Window Command Line

Basic Input and Output Handy Commands: x <- 4 x <- “text goes in quotes” variables: store information Numeric input Text (character) input :Assignment operator

Get help on an R command: If you know the name: ?command name ?plot brings up html on plot command If you don’t know the name: Use Google (my favorite) ??key word Handy Commands:

R is driven by functions: Handy Commands: func(arguement1, argument2) x <- func(arg1, arg2) function name input to function goes in parenthesis function returns something; gets dumped into x

Input from Excel Save spreadsheet as a CSV file Use read.csv function Needs the path to the file Handy Commands: "/Users/npetraco/latex/papers/data.csv” Mac e.g.: “C:\Users\npetraco\latex\papers\data.csv” Windows e.g.: *Exercise: basicIO.R

Matrices: X X[,1] returns column 1 of matrix X X[3,] returns row 3 of matrix X Handy functions for data frames and matrices: dim, nrow, ncol, rbind, cbind User defined functions syntax: func.name <- function(arguements) { do something return(output) } To use it: func.name(values) Handy Commands:

o Explore the Glass dataset of the mlbench package Source (load) all_data_source.R *visualize_with_plots.r Scatter plots: plot any two variables against each other First Thing: Look at your Data

Pairs plots: do many scatter plots at once First Thing: Look at your Data

Histograms: “bin” a variable and plot frequencies First Thing: Look at your Data

Histograms conditioned on other variables: use lattice package First Thing: Look at your Data RIs Conditioned on glass group membership

Probability density plots: also needs lattice First Thing: Look at your Data

Empirical Probability Distribution plots: also called empirical cumulative density First Thing: Look at your Data

Box and Whiskers plots: First Thing: Look at your Data 25 th -%tile 1 st -quartile 75 th -%tile 3 rd -quartile median 50 th -%tile range possible outliers possible outliers RI

Note the relationship: Visualizing Data

Box and Whiskers plots: First Thing: Look at your Data Box-Whiskers plots for actual variable values Box-Whiskers plots for scaled variable values

Confidence Intervals A confidence interval (CI) gives a range in which a true population parameter may be found. Specifically, (1 –  )×100% CIs for a parameter, constructed from a random sample (of a given sample size), will contain the true value of the parameter approximately (1 –  )×100% of the time. Different from tolerance and prediction intervals

Confidence Intervals Caution: IT IS NOT CORRECT to say that there a (1 -  )×100% probability that the true value of a parameter is between the bounds of any given CI. true value of parameter Here 90% of the CIs contain the true value of the parameter Graphical representation of 90% CIs is for a parameter: Take a sample. Compute a CI.

Construction of a CI for a mean depends on: Sample size n Standard error for means Level of confidence 1-  is significance level Use to compute t c -value (1-  )×100% CI for population mean using a sample average and standard error is: Confidence Intervals

Compute a 99% confidence interval for the mean using this sample set: Confidence Intervals Fragment #Fragment nD (  /2=0.005) t c = 3.17 Putting this together: [ (3.17)( ), (3.17)( )] 99% CI for sample = [ , ] *Try out confidence_intervals.R