Collaboration and data sharing in the Distributed Graduate Seminars Role of NCEAS in the DGS Coordination Collaborative webspace Grad Research Associate Informatics support Outreach support & publicity Time & space for synthesis
Collaboration and data sharing in the Distributed Graduate Seminars Organization of this presentation: Basic problems and solutions in data sharing & collaboration Introduction to NCEAS collaborative sites Overview of NCEAS computing research and support
Collaboration & data stewardship Personal data management problems are magnified in collaboration Data organization Data documentation Data analysis Data & analysis preservation Collaboration and data sharing
“Data Entropy” Michener et al. 1997, Ecological Monographs Information content Time Time of publication Specific details General details Retirement or career change Death Accident
A personal example... Collaboration and data sharing
2 tables Collaboration and data sharing
Random notes What is this? Collaboration and data sharing
Wash Cres Lake Dec 15 Dont_Use.xls Collaboration and data sharing
What if we want to merge files?
What is this? Collaboration and data sharing
Personal data management problems are magnified in collaboration Data organization – standardize Data documentation – standardize descriptions of data (metadata) Data analysis - document Data & analysis preservation - protect Collaboration and data sharing
Example – using R for data exploration, analysis and presentation ### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r)
Example – using R for data exploration, analysis and presentation ### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r)
Example – using R for data exploration, analysis and presentation ### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r)
Example – using R for data exploration, analysis and presentation ### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r) Call: lm(formula = LNtrout ~ TEMPC, data = HohTrout) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) TEMPC e-14 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 1501 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: 60 on 1 and 1501 DF, p-value: 1.735e-14
### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r) Compare this method to: Copy and paste from Excel Log-transform in Excel Copy and paste new file Graph in SigmaPlot Analyze in Systat Graph in SigmaPlot
Collaboration & data stewardship Personal data management problems are magnified in collaboration Data organization – standardize Data documentation – standardize metadata Data analysis - document Data & analysis preservation - protect
Personal data management problems are magnified in collaboration Data organization – standardize Data documentation – standardize metadata Data analysis - document Data & analysis preservation - protect Collaboration and data sharing
“Data Entropy” Michener et al. 1997, Ecological Monographs Information content Time Time of publication Specific details General details Retirement or career change Death Accident
Databases that protect original data structure (e.g., not “sorting” in Excel) Statistical software that documents transformations, sorts, analysis, etc. (e.g., R, Matlab, SAS, JMP, PRIMER) Free software for data management & analysis Morpho, free software that standardizes descriptions of data (metadata) Central collaborative space organizing communications Technological solutions
Communication for collaboration Emphasis is on accessibility Open to collaborators Archived for the future Highly organized Searchable
Communication for collaboration Emphasis is on accessibility Open to collaborators √ √ √ Archived for the future √ √ √ √ Highly organized √ Searchable √ √ √ √ Discussion Forum Blogs Chat
NCEAS Collaborative Websites What we have in place: User-editable website Discussion Forums Realtime chat with archived transcripts What others have used: Skype Google Docs Zoho GoToMeeting
Easy file uploading PDF library Discussion boards Wiki-style collaborative page editing Real-time chat NCEAS Collaborative Websites
Easy file uploading PDF library Discussion boards Wiki-style collaborative page editing Real-time chat All is searchable Click to go to the DGS Website NCEAS Collaborative Websites
Making useful tools for the students and faculty is critical Technology helps, but many of the problems are more social than technical Find a mixture that helps you get your work done NCEAS Collaborative Websites
traits-dgs.nceas.ucsb.edu Please create an account: Explore a previous site: name: student password: stu00
General NCEAS Tech Support
NCEAS recycles!
NCEAS Tech Support
NCEAS collaboration support
NCEAS Tech Support
NCEAS Analytical Support
NCEAS Tech Support
Ecoinformatics at NCEAS
Metadata - Describing your data Metadata – description of your data – makes data useful Needed to Assess & interpret data Provide context for data
Metadata - Describing your data Metadata – description of your data – makes data useful Needed to Assess & interpret data Provide context for data But how to handle different styles of describing data? e.g., abundance vs. biomass
Metadata - Describing your data Metadata – description of your data – makes data useful Needed to Assess & interpret data Provide context for data But how to handle different styles of describing data? e.g., abundance vs. biomass Ecological Metadata Language (EML) Metadata standard Human-readable, yet machine-interpretable
Metadata - Describing your data What EML metadata looks like...
Metadata - Describing your data 1. Use our basic web interface 2. Install Morpho desktop software (free) Create & save metadata using wizards a. Manage data on your own computer b. Share metadata/data with the broader community c. Share with specific colleagues: set access privileges Search and locate data and metadata on the network
Metadata - Describing your data
Search knb.ecoinformatics.org NCEAS – LTER Network – OBFS – PISCO – ESA – UC Natural Reserve System – and more
NCEAS recycles!