Steven Gollmer Cedarville University Big Data Steven Gollmer Cedarville University Picture from wikipedia - Dice
Working with Large Data Accessing data Collection and calibration assumptions Selecting appropriate parameters Formatting Calculation Testing hypothesis
Hipparcos Space Astrometry Main Page http://www.rssd.esa.int/index.php?project=HIPPARCOS Data Catalogues http://www.rssd.esa.int/index.php?project=HIPPARCOS&page=Overview http://cdsweb.u-strasbg.fr/ Software Desktop - http://www.rssd.esa.int/index.php?project=HIPPARCOS&page=Celestia2000 Search tool - http://www.rssd.esa.int/index.php?project=HIPPARCOS&page=multisearch2 Data Format Flexible Image Transport System (FITS) - http://fits.gsfc.nasa.gov/
Sloan Digital Sky Survey Main Page http://www.sdss.org/ Data 9th Data Release - http://www.sdss3.org/dr9/ Archive Server - http://dr9.sdss3.org/ Software IDL - http://www.sdss3.org/dr9/software/
Weather Data NOAA National Climatic Data Center http://www.ncdc.noaa.gov/ Popular Data - http://www.ncdc.noaa.gov/most-popular-data Environmental Modeling Center http://www.emc.ncep.noaa.gov/
TERRA/AQUA http://terra.nasa.gov http://aqua.nasa.gov Data Format LARC DAAC - http://eosweb.larc.nasa.gov/ LAADS Web - http://ladsweb.nascom.nasa.gov/index.html Format NetCDF - http://www.unidata.ucar.edu/software/netcdf/ HDF - http://www.hdfgroup.org/
Other Topics of Interest Extra-Solar Planets Asteroid Mapping and Near Earth Detection Earthquakes Agencies and Products NASA - http://www.nasa.gov/home/index.html ESA - http://www.esa.int/ESA USGS - http://www.usgs.gov/ GOES - http://www.goes.noaa.gov/ Paleoclimatology - http://www.ncdc.noaa.gov/paleo/pubs/pcn/pcn-proxy.html
Hypothesis Testing P-value T-test Probability of a value being found assuming the null hypothesis. Usually reject the null hypothesis if p < 0.05 or 0.01 (5% or 1%) May have more stringent criteria for rejection. T-test Assume a normal distribution One-sample test 𝑡= 𝑥 − 𝜇 0 𝑆/ 𝑛 Two-sample test 𝑡= 𝑀 𝑥 − 𝑀 𝑦 𝑆𝑥2 𝑛𝑥 + 𝑆𝑦2 𝑛𝑦 Check significance using T distribution table Compare t value and degrees of freedom 1 sample df = n-1 2 sample df = n1 + n2 – 2 S – Estimate of standard deviation M – Estimate of the mean n – Number of samples
Example Hypothesis Statistics Result 2 tail rejection Data is from a distribution with mean m = 2.5 Statistics X = 3.317 S = 0.7139 df = 5 Result T = 2.80 2 tail rejection p = 0.05 is 2.571 p = 0.02 is 3.365 Data 2.3 4.2 3.6 3.1 2.8 3.9
Z-Value Assume a normal random variable Z – Value x ~ (m, s2) m – mean s – standard deviation Z – Value z ~ (0, 1) If number of samples is large, then z-test will work on one-sample test instead of a t-test. erf(x)= 2 𝜋 0 𝑥 𝑒 − 𝑢 2 𝑑𝑢 One Tail: p=1/2(1+erf(z/ 2 ) Two Tail: p=erf(z/ 2 ) 𝑓 𝑥 = 1 𝜎 2𝜋 𝑒 − (𝑥−𝜇) 2 2 𝜎 2 𝑧~ 𝑥−𝜇 𝜎