Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data.

Similar presentations


Presentation on theme: "CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data."— Presentation transcript:

1 CHAPTER 8 Managing and Curating Data

2 The Second Step Storing and Curating Data

3 Storage: Temporary and Archival Permanent archives  The only medium acceptable as truly archival is acid-free paper Electronic storage  Do not expect electronic media to last more than 5-10 years  Should be used primarily for working copies  If used, copy datasets onto newer electronic media on a regular basis

4 Curating Data  Most ecological and environmental data are collected by researchers using funds obtained through grants and contracts They are technically owned by the granting agency, and they need to be made widely available (e.g., Internet)  Unfortunately, when budgets are cut, data management and curation costs are often the first items to be dropped

5 The Final Step Transforming the Data

6 Transformation  A mathematical function that is applied to all of the observations of a given variable Y*=f(Y)  Most are fairly simple algebraic functions as long as they are continuous monotonic functions DO NOT change the rank order of the data DO change relative spacing

7 Why Transform Data? (1) Patterns in the data may be easier to understand and communicate than patterns in the raw data Converting curves into straight lines (2) Necessary for analysis to be valid – “meeting the assumptions”

8 The Species-Area Relationship A classic example If we plot the number of species against the area of the island, the data often follow a simple power function, S=cA z where S = number of species A = is island area c and z are constants fitted to the data

9 The Species-Area Relationship A classic example IslandArea (km 2 )No. of speciesLog 10 (Area)Log 10 (Species) Albermarle 5824.93253.7652.512 Charles 165.83192.2202.504 Chatham 505.13062.7032.486 James 525.82242.7212.350 Indefatigable 1007.51933.0032.286 Abingdon 51.81191.7142.076 Duncan 18.41031.2652.013 Narborough 634.6802.8031.903 Hood 46.6791.6681.898 Seymour 2.6520.4151.716 Barrington 19.4481.2881.681 Gardner 0.548-0.3011.681 Bindloe 116.6472.0671.672 Jervis 4.8420.6811.623 Tower 11.4221.0571.342 Wenman 47141.6721.146 Culpepper 2.370.3620.845

10 The Species-Area Relationship (km 2 )

11 The Species-Area Relationship If species richness and island area are related exponentially, we can transform this equation by taking logarithms of both sides log (S) = log (cA z ) log (S) = log (c) + zlog (A)

12 The Species-Area Relationship log 10

13 Other Transformations Cube-Root Transformation (Y 3 ) measures of mass or volume that are allometrically related to linear measures of body size or length Logarithmically transformed examines relationships between two measures of masses or volumes (Y 3 ), and transforms both X and Y

14 Why Transform Data? Statistics Demands it All statistical tests require data to fit certain mathematical assumptions Examples Analysis of Variance (1) homoscedastic (2) residuals must be normal random variables Regression (1) normally-distributed residuals that are uncorrelated with the independent variable

15 Five Common Transformations (1)Logarithmic Transformation (2)Square-root Transformation (3)Angular (or arcsine) Transformation (4)Reciprocal Transformation (5)Box-Cox Transformation

16 Logarithmic Transformation Replaces each observation with its logarithm Y*=log (Y) Often equalizes variances for data which mean and variance are positively correlated, which also tend to have outliers with positively-skewed residuals Logarithm of 0 is not defined – add 1 to each observation

17 Square-root Transformation Replaces each observation with its square root Y*=SQRT(Y) Used most frequently for count data, which often follows a Poisson distribution Yields a variance independent of mean Does not transform data values equal to 0 – add some small number to observations

18 Arcsine Transformation Also Arcsine-square root or angular Replaces each observation with the arcsine of the square root of the value Y*=arcsine(SQRT(Y)) Principally used for proportions Removes the dependence of the variance on the mean Gives transformed data in units of radians, not degrees

19 Reciprocal Transformation Replaces each value with its reciprocal Y*=1/Y Commonly used for data that records rates, which often appear as hyperbolic

20 Box-Cox Transformation A family of transformations Y*=(Y lambda -1)/lambda (for lambda 0) Y*=log e (Y) (for lambda=0) L= -(v/2)log e (s 2 T )+(lambda-1)(v/n)sigma (log e Y) V=degrees of freedom N=sample size s 2 T =variance of transformed values of Y

21 Box-Cox Transformation Y*=(Y lambda -1)/lambda (for lambda not equal to 0) Y*=log e (Y)(for lambda=0) L= -(v/2)log e (s 2 T )+(lambda-1)(v/n)sigma (log e Y) The value of lambda that results when the last equation is maximized is used in one of the first two equations to provide the closest fit of the transformed data to a normal distribution The last equation must be solved iteratively (trying different lambda values until L is maximized) using computer software

22 Box-Cox Transformation Y*=(Y lambda -1)/lambda (for lambda not equal to 0) Y*=log e (Y)(for lambda=0) L= -(v/2)log e (s 2 T )+(lambda-1)(v/n)sigma (log e Y)  When lambda=1, equation 1 results in a linear transformation  When lambda=1/2, a square-root transformation  When lambda=-1, a reciprocal transformation  When lambda=0, equation 2 results in a natural logarithmic transformation  ALWAYS try using simple arithmetic transformations FIRST

23 Box-Cox Transformation Y*=(Y lambda -1)/lambda (for lambda not equal to 0) Y*=log e (Y)(for lambda=0) L= -(v/2)log e (s 2 T )+(lambda-1)(v/n)sigma (log e Y)  ALWAYS try using simple arithmetic transformations FIRST  If data is right-skewed, try using familiar transformations from the series1/SQRT(Y), SQRT(Y), ln (Y), 1/Y  If left-skewed, try Y 2, Y 3, etc

24

25 Reporting Results  You should report results in the original units, which includes back-transforming the transformed values  Back-transformed mean will be very different from arithmetic mean  Also, back-transformations will normally result in asymmetrical confidence intervals

26 Back-Transformations  Logarithmic – antilog(Y*) or e Y  Square Root – Y* 2  Arcsine – Sin(Y* 2 )  Reciprocal – 1/(Y*)

27  Lastly, transforming data should be added to your audit trail (documented in the metadata) Create a new spreadsheet and store it on permanent media Reporting Results


Download ppt "CHAPTER 8 Managing and Curating Data. The Second Step Storing and Curating Data."

Similar presentations


Ads by Google