Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slide Template for Module 2: Types, Formats, and Stages of Data

Similar presentations


Presentation on theme: "Slide Template for Module 2: Types, Formats, and Stages of Data"— Presentation transcript:

1 Slide Template for Module 2: Types, Formats, and Stages of Data

2 Module 2: Data Types, Stages & Formats
Learner Objectives 1. Explain what research data is and the range of data types 2. Identify stages of research data 3. Identify common potential storage formats for data 4. Identify relevant quality control techniques/technical standards 5. Identify methods of recording data that are specific to researchers’ disciplines and research interests Module 2: Data Types, Stages & Formats

3 Research Data Associated with Most Disciplines
Images Video Mapping/GIS data Numerical measurements Module 2: Data Types, Stages & Formats

4 Research Data Associated with Social Sciences
survey responses focus group and individual interviews economic indicators demographics opinion polling Module 2: Data Types, Stages & Formats

5 Research Data Associated with Hard Sciences
measurements generated by sensors/laboratory instruments computer modeling simulations observations and/or field studies specimen Of course, research data is not limited just to these two broad disciplinary areas. For example, digital humanities scholars may generate research data by text mining historical documents, analyzing the prevalence of particular words. While many types of data are generated in the course of research, existing data can also be used in research projects. Examples include social scientists using census data for their studies, or scientists comparing published gene sequences from various organisms. Module 2: Data Types, Stages & Formats

6 Stages of Data Related to Research Data Life Cycle
Raw Data Processed Data Analyzed Data Finalized/Published Data Existing Data across Different Sources Raw Data: What is being measured or observed? The data being generated during the research project. Processed Data: Making the raw data useful/manipulable Analyzed Data: Manipulated/interpreted data. What does the data tell us? Is it significant? How so? Finalized/Published Data: How do the data support your research question? Existing Data across Different Sources: e.g. GIS data Module 2: Data Types, Stages & Formats

7 Stages of Data Related to Research Data Life Cycle
Sample hypothesis: Water temperatures in Lake Superior are now significantly warmer than in previous years. The evidence lends support to global warming. That’s a bit abstract – let’s try illustrating those stages with a sample research hypothesis: Water temperatures in Lake Superior are now significantly warmer than in previous years. This evidence lends support to global warming. Module 2: Data Types, Stages & Formats

8 Using our sample hypothesis
Water temperatures in Lake Superior are now significantly warmer than in previous years. This evidence lends support to global warming. Raw Data = daily lake temperatures Processed Data = ‘cleaned’ temp. data in spreadsheet Analyzed Data = average temps., graphing changes Finalized Data = does data support the hypothesis? Raw data – What is being measured or observed? This is the data that is being generated during the research project. An example of raw data in our hypothetical research question might be daily measurements of temperature in Lake Superior. Raw data coincides with the ‘creating data’ section of the research data lifecycle. Processed data – How can the raw data be made useful/manipulable? To continue with the example above, lake temperature data may become processed once researchers ‘clean it’ by removing clearly erroneous temperature measurements from the data set, and enter the remaining temperatures into a spreadsheet for manipulation and analysis. Processed data coincides with the ‘processing data’ section of the research data lifecycle. Analyzed data – What does the data tell us? Is it significant? How so? For example, daily lake temperature data could be analyzed by finding average temperatures, looking at seasonal fluctuations, and generating graphs that demonstrate these changes. Analyzed data coincides with the ‘analysing data’ section of the research data lifecycle. Finalized/published data – How does the data support your research question? For example, a plot of our average lake temperatures for 2013 may show statistically significant differences when compared to the same data from 1913 and Finalized/published data coincides with a few sections of the research data lifecycle including preserving, giving access to, and re-using data. Note that our lake temperature scenario is a very simplified example Module 2: Data Types, Stages & Formats

9 Preferable Format Types for Long-Term Access to Data
Data formats that offer the best chance for long-term access are both: Non-proprietary (also known as open), and Unencrypted and uncompressed Non-proprietary or Open Formats are Readable by more than just the equipment and/or program that generated it. Sometimes proprietary file formats are unavoidable. However, proprietary formats can often be converted to open formats. Unencrypted / uncompressed Formats: Offer the best prospects for long-term access. If files are encrypted and/or compressed, the method used will need to be both discoverable and usable for file access in the future. Module 2: Data Types, Stages & Formats

10 Module 2: Data Types, Stages & Formats
Preferred Formats Examples of preferred formats for various data types include: Moving Images: MOV, MPEG Audio: WAVE, MP3 Numbers/statistics: ASCII, SAS Images: TIFF, JPEG 2000 Text: PDF/A, ASCII For these and more, see: Module 2: Data Types, Stages & Formats

11 Converting to Preferable Formats
Information can be lost when converting file formats. To mitigate the risk of lost information: Note conversion steps taken If possible, keep the original file as well as the converted one In file format conversion, information can be lost eg, converting from tiff to jpeg Carefully note the steps taken during the conversion/migration A best practice, whenever possible, is to keep the original file as well as the converted one Module 2: Data Types, Stages & Formats

12 Describing Data, Documenting Reliability & Collection Techniques
Data documentation explains the: Who What Where When And why of data. Who collected this data? Who/what were the subjects under study? What was collected, and for what purpose? What is the content/structure of the data? Where was this data collected? What were the experimental conditions? When was this data collected? Is it part of a series, or ongoing experiment? Why was this experiment performed? Module 2: Data Types, Stages & Formats

13 Describing Data, Documenting Reliability & Collection Techniques
Who: Who collected this data? Who or what were the subjects under study? Module 2: Data Types, Stages & Formats

14 Describing Data, Documenting Reliability & Collection Techniques
What: What data was collected, and for what purpose? What is the content and structure of the data? Module 2: Data Types, Stages & Formats

15 Describing Data, Documenting Reliability & Collection Techniques
Where: Where was this data collected? What were the experimental conditions that produced it? Why was this experiment performed? Module 2: Data Types, Stages & Formats

16 Describing Data, Documenting Reliability & Collection Techniques
When: When was the data collected? Is the data part of a series, or ongoing experiment? Module 2: Data Types, Stages & Formats

17 Describing Data, Documenting Reliability & Collection Techniques
Why: Why was this experiment performed? How does it relate to your research question? Module 2: Data Types, Stages & Formats

18 Cross Discipline Concerns
No matter what, you need to have: File naming conventions Version control Module 2: Data Types, Stages & Formats

19 Why Do I Need to Worry about That?
Consider this: If you unexpectedly have to leave your research project for a few months, could a colleague easily make sense of your data files? Module 2: Data Types, Stages & Formats

20 Why Use File Naming Conventions?
Naming conventions make life easier! Help you find your data Help others find your data Help track which version of a file is most current Module 2: Data Types, Stages & Formats

21 What File Naming Convention Should I Use?
Has your research group established a convention? If not, general guidelines include: Meaningful file names that aren’t too long Avoid certain characters Dates can help with sorting and version control Ultimately, the best file naming convention is one that works for you and your research project. Here are some points to consider: Generally speaking, a good protocol is to give files meaningful, descriptive names, while avoiding certain problematic characters (such as spaces, slashes, colons, periods, ampersands). They behave differently in different operating systems Overly long file names (>25 characters) should also be avoided, as these may not cooperate well with different operating systems. Consider, too, strategies you might use for version control; how will you keep track of which version of a file or document is the most current? Version control can be particularly challenging for collaborative projects in which several people may have access to, or even the ability to change, a file. Module 2: Data Types, Stages & Formats


Download ppt "Slide Template for Module 2: Types, Formats, and Stages of Data"

Similar presentations


Ads by Google