Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,

Similar presentations


Presentation on theme: "Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,"— Presentation transcript:

1 Hands-on Introduction to R

2 We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java, MATLAB, Python, Perl, R and/or Mathematica Data collection and analysis very important in Forensic Science since NAS 2009 Using the above languages, codes can easily be made available for review/discovery Why Leaning Programing?

3 All machines understand is on/off! High/low voltage High/low current High/low charge 1/0 binary digits (bits) To make a computer do anything, you have to speak machine language to it: Getting a computer to do anything useful 000000 00001 00010 00110 00000 100000 Add 1 and 2. Store the result. Wikipedia

4 Machine language is not intuitive and can vary a great deal over designs The basic operations operations however are the same, e.g.: Move data here Combine these values Store this data Etc. “Human readable” language for basic machine operations: assembly language Getting a computer to do anything useful

5 Assembly is still cumbersome for (most) humans Getting a computer to do anything useful MOV AL, 61h 10110000 01100001 Assembly A machine encoding Move the number 97 over to “storage area” AL

6 Better yet is a more “Englishy”, “high-level” language Enter: C, C++, Fortran, Java, … Higher level languages like these are translated (“compiled”) to machine language Not exactly true for Java, but it’s something analogous… Getting a computer to do anything useful

7 Even more “Englishy” and “high-level” are interpreted languages Enter: R MATLAB, Perl, Python, Mathematica, Maple, … The “code” of these languages are “interpreted” as commands by a program that is already running They make many assumptions behind the scenes Much easier to program with Much slower than compiled languages Getting a computer to do anything useful

8 R is not a black box! Codes available for review; totally transparent! R maintained by a professional group of statisticians, and computational scientists From very simple to state-of-the-art procedures available Very good graphics for exhibits and papers R is extensible (it is a full scripting language) Coding/syntax similar to Python and MATLAB Easy to link to C/C++ routines Why ?

9 Where to get information on R : R: http://www.r-project.org/http://www.r-project.org/ Just need the base RStudio: http://rstudio.org/http://rstudio.org/ A great IDE for R Work on all platforms Sometimes slows down performance… CRAN: http://cran.r-project.org/http://cran.r-project.org/ Library repository for R Click on Search on the left of the website to search for package/info on packages Why ?

10 Finding our way around R/RStudio Script Window Command Line

11 Basic Input and Output Handy Commands: x <- 4 x <- “text goes in quotes” variables: store information Numeric input Text (character) input :Assignment operator

12 Get help on an R command: If you know the name: ?command name ?plot brings up html on plot command If you don’t know the name: Use Google (my favorite) ??key word Handy Commands:

13 R is driven by functions: Handy Commands: func(arguement1, argument2) x <- func(arg1, arg2) function name input to function goes in parenthesis function returns something; gets dumped into x

14 Input from Excel Save spreadsheet as a CSV file Use read.csv function Needs the path to the file Handy Commands: "/Users/npetraco/latex/papers/data.csv” Mac e.g.: “C:\Users\npetraco\latex\papers\data.csv” Windows e.g.: *Exercise: basicIO.R

15 Matrices: X X[,1] returns column 1 of matrix X X[3,] returns row 3 of matrix X Handy functions for data frames and matrices: dim, nrow, ncol, rbind, cbind User defined functions syntax: func.name <- function(arguements) { do something return(output) } To use it: func.name(values) Handy Commands:

16 o Explore the Glass dataset of the mlbench package Source (load) all_data_source.R *visualize_with_plots.r Scatter plots: plot any two variables against each other First Thing: Look at your Data

17 Pairs plots: do many scatter plots at once First Thing: Look at your Data

18 Histograms: “bin” a variable and plot frequencies First Thing: Look at your Data

19 Histograms conditioned on other variables: use lattice package First Thing: Look at your Data RIs Conditioned on glass group membership

20 Probability density plots: also needs lattice First Thing: Look at your Data

21 Empirical Probability Distribution plots: also called empirical cumulative density First Thing: Look at your Data

22 Box and Whiskers plots: First Thing: Look at your Data 25 th -%tile 1 st -quartile 75 th -%tile 3 rd -quartile median 50 th -%tile range possible outliers possible outliers RI

23 Note the relationship: Visualizing Data

24 Box and Whiskers plots: First Thing: Look at your Data Box-Whiskers plots for actual variable values Box-Whiskers plots for scaled variable values

25 Confidence Intervals A confidence interval (CI) gives a range in which a true population parameter may be found. Specifically, (1 –  )×100% CIs for a parameter, constructed from a random sample (of a given sample size), will contain the true value of the parameter approximately (1 –  )×100% of the time. Different from tolerance and prediction intervals

26 Confidence Intervals Caution: IT IS NOT CORRECT to say that there a (1 -  )×100% probability that the true value of a parameter is between the bounds of any given CI. true value of parameter Here 90% of the CIs contain the true value of the parameter Graphical representation of 90% CIs is for a parameter: Take a sample. Compute a CI.

27 Construction of a CI for a mean depends on: Sample size n Standard error for means Level of confidence 1-  is significance level Use to compute t c -value (1-  )×100% CI for population mean using a sample average and standard error is: Confidence Intervals

28 Compute a 99% confidence interval for the mean using this sample set: Confidence Intervals Fragment #Fragment nD 11.52005 21.52003 31.52001 41.52004 51.52000 61.52001 71.52008 81.52011 91.52008 101.52008 111.52008 (  /2=0.005) t c = 3.17 Putting this together: [1.52005 - (3.17)(0.00001), 1.52005 + (3.17)(0.00001)] 99% CI for sample = [1.52002, 1.52009] *Try out confidence_intervals.R


Download ppt "Hands-on Introduction to R. We live in oceans of data. Computers are essential to record and help analyse it. Competent scientists speak C/C++, Java,"

Similar presentations


Ads by Google