Download presentation
Presentation is loading. Please wait.
Published byJoan Preston Modified over 8 years ago
1
Hands-on Introduction to R
2
We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java, MATLAB, Python, Perl, R and/or Mathematica Data collection and analysis very important in Forensic Science since NAS 2009 Using the above languages, codes can easily be made available for review/discovery Why Leaning Programing?
3
All machines understand is on/off! High/low voltage High/low current High/low charge 1/0 binary digits (bits) To make a computer do anything, you have to speak machine language to it: Getting a computer to do anything useful 000000 00001 00010 00110 00000 100000 Add 1 and 2. Store the result. Wikipedia
4
Machine language is not intuitive and can vary a great deal over designs The basic operations operations however are the same, e.g.: Move data here Combine these values Store this data Etc. “Human readable” language for basic machine operations: assembly language Getting a computer to do anything useful
5
Assembly is still cumbersome for (most) humans Getting a computer to do anything useful MOV AL, 61h 10110000 01100001 Assembly A machine encoding Move the number 97 over to “storage area” AL
6
Better yet is a more “Englishy”, “high-level” language Enter: C, C++, Fortran, Java, … Higher level languages like these are translated (“compiled”) to machine language Not exactly true for Java, but it’s something analogous… Getting a computer to do anything useful
7
Even more “Englishy” and “high-level” are interpreted languages Enter: R MATLAB, Perl, Python, Mathematica, Maple, … The “code” of these languages are “interpreted” as commands by a program that is already running They make many assumptions behind the scenes Much easier to program with Much slower than compiled languages Getting a computer to do anything useful
8
R is not a black box! Codes available for review; totally transparent! R maintained by a professional group of statisticians, and computational scientists From very simple to state-of-the-art procedures available Very good graphics for exhibits and papers R is extensible (it is a full scripting language) Coding/syntax similar to Python and MATLAB Easy to link to C/C++ routines Why ?
9
Where to get information on R : R: http://www.r-project.org/http://www.r-project.org/ Just need the base RStudio: http://rstudio.org/http://rstudio.org/ A great IDE for R Work on all platforms Sometimes slows down performance… CRAN: http://cran.r-project.org/http://cran.r-project.org/ Library repository for R Click on Search on the left of the website to search for package/info on packages Why ?
10
Finding our way around R/RStudio Script Window Command Line
11
Basic Input and Output Handy Commands: x <- 4 x <- “text goes in quotes” variables: store information Numeric input Text (character) input :Assignment operator
12
Get help on an R command: If you know the name: ?command name ?plot brings up html on plot command If you don’t know the name: Use Google (my favorite) ??key word Handy Commands:
13
R is driven by functions: Handy Commands: func(arguement1, argument2) x <- func(arg1, arg2) function name input to function goes in parenthesis function returns something; gets dumped into x
14
Input from Excel Save spreadsheet as a CSV file Use read.csv function Needs the path to the file Handy Commands: "/Users/npetraco/latex/papers/data.csv” Mac e.g.: “C:\Users\npetraco\latex\papers\data.csv” Windows e.g.: *Exercise: basicIO.R
15
Matrices: X X[,1] returns column 1 of matrix X X[3,] returns row 3 of matrix X Handy functions for data frames and matrices: dim, nrow, ncol, rbind, cbind User defined functions syntax: func.name <- function(arguements) { do something return(output) } To use it: func.name(values) Handy Commands:
16
o Explore the Glass dataset of the mlbench package Source (load) all_data_source.R *visualize_with_plots.r Scatter plots: plot any two variables against each other First Thing: Look at your Data
17
Pairs plots: do many scatter plots at once First Thing: Look at your Data
18
Histograms: “bin” a variable and plot frequencies First Thing: Look at your Data
19
Histograms conditioned on other variables: use lattice package First Thing: Look at your Data RIs Conditioned on glass group membership
20
Probability density plots: also needs lattice First Thing: Look at your Data
21
Empirical Probability Distribution plots: also called empirical cumulative density First Thing: Look at your Data
22
Box and Whiskers plots: First Thing: Look at your Data 25 th -%tile 1 st -quartile 75 th -%tile 3 rd -quartile median 50 th -%tile range possible outliers possible outliers RI
23
Note the relationship: Visualizing Data
24
Box and Whiskers plots: First Thing: Look at your Data Box-Whiskers plots for actual variable values Box-Whiskers plots for scaled variable values
25
Confidence Intervals A confidence interval (CI) gives a range in which a true population parameter may be found. Specifically, (1 – )×100% CIs for a parameter, constructed from a random sample (of a given sample size), will contain the true value of the parameter approximately (1 – )×100% of the time. Different from tolerance and prediction intervals
26
Confidence Intervals Caution: IT IS NOT CORRECT to say that there a (1 - )×100% probability that the true value of a parameter is between the bounds of any given CI. true value of parameter Here 90% of the CIs contain the true value of the parameter Graphical representation of 90% CIs is for a parameter: Take a sample. Compute a CI.
27
Construction of a CI for a mean depends on: Sample size n Standard error for means Level of confidence 1- is significance level Use to compute t c -value (1- )×100% CI for population mean using a sample average and standard error is: Confidence Intervals
28
Compute a 99% confidence interval for the mean using this sample set: Confidence Intervals Fragment #Fragment nD 11.52005 21.52003 31.52001 41.52004 51.52000 61.52001 71.52008 81.52011 91.52008 101.52008 111.52008 ( /2=0.005) t c = 3.17 Putting this together: [1.52005 - (3.17)(0.00001), 1.52005 + (3.17)(0.00001)] 99% CI for sample = [1.52002, 1.52009] *Try out confidence_intervals.R
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.