Collaboration and data sharing in the Distributed Graduate Seminars Role of NCEAS in the DGS Coordination Collaborative webspace Grad Research Associate.

Slides:



Advertisements
Similar presentations
Single Search By Rakphao Theppan, librarian Searching Online Resources.
Advertisements

Statistics flavors Descriptive Inferential – Non-parametric / Monte Carlo – Parametric Model estimation (sometimes used for inference) – Maximum likelihood.
Confidence Intervals Underlying model: Unknown parameter We know how to calculate point estimates E.g. regression analysis But different data would change.
Multiple Regression Predicting a response with multiple explanatory variables.
Zinc Data SPH 247 Statistical Analysis of Laboratory Data.
Ann Arbor ASA ‘Up and Running’ Series: SPSS Prepared by volunteers of the Ann Arbor Chapter of the American Statistical Association, in cooperation with.
x y z The data as seen in R [1,] population city manager compensation [2,] [3,] [4,]
Lecture 23: Tues., Dec. 2 Today: Thursday:
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
FISH 397C Winter 2009 Evan Girvetz Basic Statistical Analyses and Contributed Packages in R © R Foundation, from
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
MATH 3359 Introduction to Mathematical Modeling Linear System, Simple Linear Regression.
Regression Transformations for Normality and to Simplify Relationships U.S. Coal Mine Production – 2011 Source:
Google Confidential and Proprietary 1 Intro to Docs Google Apps Apps.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
Mendeley What is it? How is it different from other “Bibliographic databases” like End Note and Reference.
How to plot x-y data and put statistics analysis on GLEON Fellowship Workshop January 14-18, 2013 Sunapee, NH Ari Santoso.
Tutorial Introduction Fidelity NTSConnect is an innovative Web-based software solution designed for use by customers of Fidelity National Title Insurance.
Docs, Spreadsheets, & Presentations. What Do YOU Know???
Inference for regression - Simple linear regression
Classroom User Training June 29, 2005 Presented by:
Review of Econ424 Fall –open book –understand the concepts –use them in real examples –Dec. 14, 8am-12pm, Plant Sciences 1129 –Vote Option 1(2)
PCA Example Air pollution in 41 cities in the USA.
ACL: Introduction & Tutorial
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
SPSS Presented by Chabalala Chabalala Lebohang Kompi Balone Ndaba.
Ranjeet Department of Physics & Astrophysics University of Delhi Working with Origin.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
 Word Processing  Spreadsheets  Presentations  Drawings  Forms.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Collaboration and Data Sharing What have I been doing that’s so bad, and how could it be better? August 1 st, 2010.
By: Jennifer Huff & Courtney Stenzhorn. What Do YOU Know???
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Using R for Marketing Research Dan Toomey 2/23/2015
FACTORS AFFECTING HOUSING PRICES IN SYRACUSE Sample collected from Zillow in January, 2015 Urban Policy Class Exercise - Lecy.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Exercise 1 The standard deviation of measurements at low level for a method for detecting benzene in blood is 52 ng/L. What is the Critical Level if we.
WRITING REPORTS Introduction Section 0 Lecture 1 Slide 1 Lecture 6 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall 2015.
Introduction to Morpho BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
SPSS- Tutorial The following power-point slides show you how to use some of the features in SPSS. A survey of 20 randomly selected companies asked them.
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
Chapter 22: Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Registering your data with OBFS Ecoinformatics Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Lecture 6: Multiple Linear Regression Adjusted Variable Plots BMTRY 701 Biostatistical Methods II.
Introduction to R Las Vegas 2015 James McCaffrey Microsoft Research, Advanced Development Tuesday, October 27, :15 - 3:30 PM devintersection.com.
Linear Models Alan Lee Sample presentation for STATS 760.
John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio.
HINDU STYLE PORTFOLIO TEMPLATE
Morpho – metadata management software SEEK Training January 2004.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
The Effect of Race on Wage by Region. To what extent were black males paid less than nonblack males in the same region with the same levels of education.
1 Analysis of Variance (ANOVA) EPP 245/298 Statistical Analysis of Laboratory Data.
Digitalcommons.unl.edu Archiving Department Records.
TechKnowlogy Conference August 2, 2011 Using GoogleDocs for Collaboration.
Before the class starts: Login to a computer Read the Data analysis assignment 1 on MyCourses If you use Stata: Start Stata Start a new do file Open the.
WSUG M AY 2012 EViews, S-Plus and R Damian Staszek Bristol Water.
EXCEL: Multiple Regression
Peter Fox and Greg Hughes Data Analytics – ITWS-4600/ITWS-6600
Data Analytics – ITWS-4600/ITWS-6600
Résolution de l’ex 1 p40 t=c(2:12);N=c(55,90,135,245,403,665,1100,1810,3000,4450,7350) T=data.frame(t,N,y=log(N));T; > T t N y
Correlation and regression
Data Analytics – ITWS-4600/ITWS-6600/MATP-4450
Console Editeur : myProg.R 1
ITWS-4600/ITWS-6600/MATP-4450/CSCI-4960
Presentation transcript:

Collaboration and data sharing in the Distributed Graduate Seminars Role of NCEAS in the DGS Coordination Collaborative webspace Grad Research Associate Informatics support Outreach support & publicity Time & space for synthesis

Collaboration and data sharing in the Distributed Graduate Seminars Organization of this presentation: Basic problems and solutions in data sharing & collaboration Introduction to NCEAS collaborative sites Overview of NCEAS computing research and support

Collaboration & data stewardship Personal data management problems are magnified in collaboration Data organization Data documentation Data analysis Data & analysis preservation Collaboration and data sharing

“Data Entropy” Michener et al. 1997, Ecological Monographs Information content Time Time of publication Specific details General details Retirement or career change Death Accident

A personal example... Collaboration and data sharing

2 tables Collaboration and data sharing

Random notes What is this? Collaboration and data sharing

Wash Cres Lake Dec 15 Dont_Use.xls Collaboration and data sharing

What if we want to merge files?

What is this? Collaboration and data sharing

Personal data management problems are magnified in collaboration Data organization – standardize Data documentation – standardize descriptions of data (metadata) Data analysis - document Data & analysis preservation - protect Collaboration and data sharing

Example – using R for data exploration, analysis and presentation ### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r)

Example – using R for data exploration, analysis and presentation ### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r)

Example – using R for data exploration, analysis and presentation ### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r)

Example – using R for data exploration, analysis and presentation ### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r) Call: lm(formula = LNtrout ~ TEMPC, data = HohTrout) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) TEMPC e-14 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 1501 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: 60 on 1 and 1501 DF, p-value: 1.735e-14

### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r) Compare this method to: Copy and paste from Excel Log-transform in Excel Copy and paste new file Graph in SigmaPlot Analyze in Systat Graph in SigmaPlot

Collaboration & data stewardship Personal data management problems are magnified in collaboration Data organization – standardize Data documentation – standardize metadata Data analysis - document Data & analysis preservation - protect

Personal data management problems are magnified in collaboration Data organization – standardize Data documentation – standardize metadata Data analysis - document Data & analysis preservation - protect Collaboration and data sharing

“Data Entropy” Michener et al. 1997, Ecological Monographs Information content Time Time of publication Specific details General details Retirement or career change Death Accident

Databases that protect original data structure (e.g., not “sorting” in Excel) Statistical software that documents transformations, sorts, analysis, etc. (e.g., R, Matlab, SAS, JMP, PRIMER) Free software for data management & analysis Morpho, free software that standardizes descriptions of data (metadata) Central collaborative space organizing communications Technological solutions

Communication for collaboration Emphasis is on accessibility Open to collaborators Archived for the future Highly organized Searchable

Communication for collaboration Emphasis is on accessibility Open to collaborators √ √ √ Archived for the future √ √ √ √ Highly organized √ Searchable √ √ √ √ Discussion Forum Blogs Chat

NCEAS Collaborative Websites What we have in place:  User-editable website  Discussion Forums  Realtime chat with archived transcripts What others have used:  Skype  Google Docs  Zoho  GoToMeeting

Easy file uploading PDF library Discussion boards Wiki-style collaborative page editing Real-time chat NCEAS Collaborative Websites

Easy file uploading PDF library Discussion boards Wiki-style collaborative page editing Real-time chat All is searchable Click to go to the DGS Website NCEAS Collaborative Websites

Making useful tools for the students and faculty is critical Technology helps, but many of the problems are more social than technical Find a mixture that helps you get your work done NCEAS Collaborative Websites

traits-dgs.nceas.ucsb.edu Please create an account: Explore a previous site: name: student password: stu00

General NCEAS Tech Support

NCEAS recycles!

NCEAS Tech Support

NCEAS collaboration support

NCEAS Tech Support

NCEAS Analytical Support

NCEAS Tech Support

Ecoinformatics at NCEAS

Metadata - Describing your data Metadata – description of your data – makes data useful Needed to Assess & interpret data Provide context for data

Metadata - Describing your data Metadata – description of your data – makes data useful Needed to Assess & interpret data Provide context for data But how to handle different styles of describing data? e.g., abundance vs. biomass

Metadata - Describing your data Metadata – description of your data – makes data useful Needed to Assess & interpret data Provide context for data But how to handle different styles of describing data? e.g., abundance vs. biomass Ecological Metadata Language (EML) Metadata standard Human-readable, yet machine-interpretable

Metadata - Describing your data What EML metadata looks like...

Metadata - Describing your data 1. Use our basic web interface 2. Install Morpho desktop software (free) Create & save metadata using wizards a. Manage data on your own computer b. Share metadata/data with the broader community c. Share with specific colleagues: set access privileges Search and locate data and metadata on the network

Metadata - Describing your data

Search knb.ecoinformatics.org NCEAS – LTER Network – OBFS – PISCO – ESA – UC Natural Reserve System – and more

NCEAS recycles!