Download presentation
Presentation is loading. Please wait.
Published byAmy Pierce Modified over 8 years ago
1
Collaboration and data sharing in the Distributed Graduate Seminars Role of NCEAS in the DGS Coordination Collaborative webspace Grad Research Associate Informatics support Outreach support & publicity Time & space for synthesis
2
Collaboration and data sharing in the Distributed Graduate Seminars Organization of this presentation: Basic problems and solutions in data sharing & collaboration Introduction to NCEAS collaborative sites Overview of NCEAS computing research and support
3
Collaboration & data stewardship Personal data management problems are magnified in collaboration Data organization Data documentation Data analysis Data & analysis preservation Collaboration and data sharing
4
“Data Entropy” Michener et al. 1997, Ecological Monographs Information content Time Time of publication Specific details General details Retirement or career change Death Accident
5
A personal example... Collaboration and data sharing
6
2 tables Collaboration and data sharing
7
Random notes What is this? Collaboration and data sharing
8
Wash Cres Lake Dec 15 Dont_Use.xls Collaboration and data sharing
9
What if we want to merge files?
10
What is this? Collaboration and data sharing
11
Personal data management problems are magnified in collaboration Data organization – standardize Data documentation – standardize descriptions of data (metadata) Data analysis - document Data & analysis preservation - protect Collaboration and data sharing
13
Example – using R for data exploration, analysis and presentation ### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### http://knb.ecoinformatics.org/knb/style/skins/nceas/ ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r)
14
Example – using R for data exploration, analysis and presentation ### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### http://knb.ecoinformatics.org/knb/style/skins/nceas/ ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r)
15
Example – using R for data exploration, analysis and presentation ### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### http://knb.ecoinformatics.org/knb/style/skins/nceas/ ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r)
16
Example – using R for data exploration, analysis and presentation ### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### http://knb.ecoinformatics.org/knb/style/skins/nceas/ ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r) Call: lm(formula = LNtrout ~ TEMPC, data = HohTrout) Residuals: Min 1Q Median 3Q Max -1.7534 -1.1924 -0.3294 0.9304 4.2231 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.07545 0.18220 -0.414 0.679 TEMPC 0.11220 0.01448 7.746 1.74e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.365 on 1501 degrees of freedom Multiple R-Squared: 0.03844, Adjusted R-squared: 0.0378 F-statistic: 60 on 1 and 1501 DF, p-value: 1.735e-14
17
### Simple Linear Regression - 0+ age Trout in Hoh River, WA against Temp Celsius ### Load data HohTrout<-read.csv("Hoh_Trout0_Temp.csv") ### See full metadata in Rosenberger, E.E., S.L. Katz, J. McMillan, G. Pess., and S.E. Hampton. In prep. Hoh River trout habitat associations. ### http://knb.ecoinformatics.org/knb/style/skins/nceas/ ### Look at the data HohTrout plot(TROUT ~ TEMPC, data=HohTrout) ### Log Transform the independent variable (x+1) - this method for transform creates a new column in the data frame HohTrout$LNtrout<-log(HohTrout$TROUT+1) ### Plot the log-transformed y against x ### First I'll ask R to open new windows for subsequent graphs with the windows command windows() plot(LNtrout ~ TEMPC, data=HohTrout) ### Regression of log trout abundance on log temperature mod.r <- lm(LNtrout ~ TEMPC, data=HohTrout) ### add a regression line to the plot. abline(mod.r) ### Check out the residuals in a new plot layout(matrix(1:4, nr=2)) windows() plot(mod.r, which=1) ### Check out statistics for the regression summary.lm(mod.r) Compare this method to: Copy and paste from Excel Log-transform in Excel Copy and paste new file Graph in SigmaPlot Analyze in Systat Graph in SigmaPlot
18
Collaboration & data stewardship Personal data management problems are magnified in collaboration Data organization – standardize Data documentation – standardize metadata Data analysis - document Data & analysis preservation - protect
19
Personal data management problems are magnified in collaboration Data organization – standardize Data documentation – standardize metadata Data analysis - document Data & analysis preservation - protect Collaboration and data sharing
20
“Data Entropy” Michener et al. 1997, Ecological Monographs Information content Time Time of publication Specific details General details Retirement or career change Death Accident
21
Databases that protect original data structure (e.g., not “sorting” in Excel) Statistical software that documents transformations, sorts, analysis, etc. (e.g., R, Matlab, SAS, JMP, PRIMER) Free software for data management & analysis Morpho, free software that standardizes descriptions of data (metadata) Central collaborative space organizing communications Technological solutions
22
Communication for collaboration Emphasis is on accessibility Open to collaborators Archived for the future Highly organized Searchable
23
Communication for collaboration Emphasis is on accessibility Open to collaborators √ √ √ Archived for the future √ √ √ √ Highly organized √ Searchable √ √ √ √ Email Discussion Forum Blogs Chat
24
NCEAS Collaborative Websites What we have in place: User-editable website Discussion Forums Realtime chat with archived transcripts What others have used: Skype Google Docs Zoho GoToMeeting
25
Easy file uploading PDF library Discussion boards Wiki-style collaborative page editing Real-time chat NCEAS Collaborative Websites
26
Easy file uploading PDF library Discussion boards Wiki-style collaborative page editing Real-time chat All is searchable Click to go to the DGS Website NCEAS Collaborative Websites
27
Making useful tools for the students and faculty is critical Technology helps, but many of the problems are more social than technical Find a mixture that helps you get your work done NCEAS Collaborative Websites
29
traits-dgs.nceas.ucsb.edu Please create an account: http://traits-dgs.nceas.ucsb.edu/ Explore a previous site: http://dgs.nceas.ucsb.edu http://dgs.nceas.ucsb.edu name: student password: stu00
30
General NCEAS Tech Support
31
NCEAS recycles!
32
NCEAS Tech Support
33
NCEAS collaboration support http://help.nceas.ucsb.edu
34
NCEAS Tech Support
35
NCEAS Analytical Support
38
NCEAS Tech Support
39
Ecoinformatics at NCEAS
41
Metadata - Describing your data Metadata – description of your data – makes data useful Needed to Assess & interpret data Provide context for data
42
Metadata - Describing your data Metadata – description of your data – makes data useful Needed to Assess & interpret data Provide context for data But how to handle different styles of describing data? e.g., abundance vs. biomass
43
Metadata - Describing your data Metadata – description of your data – makes data useful Needed to Assess & interpret data Provide context for data But how to handle different styles of describing data? e.g., abundance vs. biomass Ecological Metadata Language (EML) Metadata standard Human-readable, yet machine-interpretable
44
Metadata - Describing your data What EML metadata looks like...
45
Metadata - Describing your data 1. Use our basic web interface 2. Install Morpho desktop software (free) Create & save metadata using wizards a. Manage data on your own computer b. Share metadata/data with the broader community c. Share with specific colleagues: set access privileges Search and locate data and metadata on the network
46
Metadata - Describing your data
47
Search knb.ecoinformatics.org NCEAS – LTER Network – OBFS – PISCO – ESA – UC Natural Reserve System – and more
48
NCEAS recycles!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.