Basic Concept of Data Coding Codes, Variables, and File Structures
Two Ways to Think About Coding Coding “ON” the data source Use for unstructured narrative data in digital form Use for unstructured narrative data in digital form Search for themes, key terms and mark on the text Search for themes, key terms and mark on the text CAQDAS software helps manage the material CAQDAS software helps manage the material Coding “FROM” the data source Use any data source in any form and any language Use any data source in any form and any language Create a database to collect what you find Create a database to collect what you find Code what you need from the source into database Code what you need from the source into database Manage and analyze the data in the database Manage and analyze the data in the database
Steps in Coding FROM Data Source First think about: How is your source organized in units? How is your source organized in units? What do you want to capture from the units? What do you want to capture from the units? Then create a structure to hold the data that: Represents the units in your source Represents the units in your source Contains places to put what you want to capture Contains places to put what you want to capture Uses basic rules to keep data organized Uses basic rules to keep data organized
Key Ideas One ROW=One RECORD=One CASE One FIELD=One COLUMN=One Variable A flat file holds records (rows) records (rows) fields (columns) fields (columns)
Simple Flat File Field 1 Field 2 Field 3 Record Record Record
Flat File Rules 1. Each row or record of data needs a UNIQUE ID number 2. Each column or field holds ONE type of information. Do not try to put different things into one field. Why? Why? 3. Data in one field can be plain text, numbers, or can have a systematic code What is the simplest possible code? What is the simplest possible code? 4. Quantitative analysis requires codes or numbers Can be counted and compared: variables
Flat File Structure, Again Flat File Structure, Again Field 1 (Variable 1) Field 2 (Variable 2) Field 3 (Variable 3) Record 1 (unit #1) Record 2 (unit #2) Record 3 (unit #3)
Flat File Structure Aids Analysis Count # of cases of each category in one field Cross-classify categories in two different fields Plot one coded variable against another Standardize raw numbers with percentages Perform other forms of quantitative analysis
Three Kinds of Flat Files Spreadsheet (Excel) Statistical Program (SPSS, SAS, Stata) Relational Database (Access) THEY LOOK SIMILAR BUT DO DIFFERENT THINGS
What Can You Do in Excel? put data in rows and columns enter text, numbers, dates, and formulas add numbers in column or row (VALUES) enter foreign language text make charts from columns of data import and export data in flat file format
What Are Limitations of Excel? row are not stable (oriented to CELLS, not ROWS) row are not stable (oriented to CELLS, not ROWS) difficult to sort, count, manipulate RECORDS difficult to sort, count, manipulate RECORDS repeat all data entry for each row (but can fill) repeat all data entry for each row (but can fill) spelling errors in entry limit finding and sorting spelling errors in entry limit finding and sorting flat file format itself has limitations for some data flat file format itself has limitations for some data what if there are multiple instances for one case? what if there are multiple instances for one case?
What Can You do in SPSS? put data in rows that are stable as records primarily useful for numbers and codes can separately define and label the codes can count frequencies, do crosstabs, % can collapse or combine codes can do statistical analyses
Limitations of SPSS Flat Files need to pre-code data into numeric codes need to repeat all code fields for each record problems handling multiple instances per case what if code cannot be developed yet? what if actual words need to be preserved? what if code needs to expand later?
What Can Relational Database Do? create stable records as rows handles numbers, words, dates, notes handles foreign languages define data types to reduce errors, standardize LINK different files in one-to-many relations simplifies data entry to avoid repeated entry simplifies data entry to avoid repeated entry can preserve words and develop codes later can preserve words and develop codes later use lookup tables to standardize codes use lookup tables to standardize codes Create forms to simplify data entry Use queries and reports to extract data
Solving Limitations in Access create frequencies and crosstabs with % use queries for quick and dirty counts use queries for quick and dirty counts export flat file to SPSS export flat file to SPSS make pretty charts to display data export to Excel export to Excel export to SPSS export to SPSS Do statistical analysis export to SPSS export to SPSS EXPORT AND IMPORT TABLES OR QUERIES
Get Started with a Test Sample find out what is POSSIBLE in your data what content does it contain? what content does it contain? what questions could you answer with it? what questions could you answer with it? how can you extract relevant content? how can you extract relevant content? how much effort does it take? how much effort does it take? start with a few cases of the text data
Developing Coding Scheme Think about data source as set of records Think about different pieces of information Think about appropriate way to code each Think about whether data are multilevel Work interactively with your data Mistakes are fixable at this stage
A Code is a List of Categories Divides up content in a systematic, meaningful way Gender=Male vs. Female Gender=Male vs. Female Fruit=Apples, Oranges, Pears, Bananas, Other Fruit=Apples, Oranges, Pears, Bananas, Other May assign numbers to the categories Such numbers do not have NUMERIC meaning Such numbers do not have NUMERIC meaning They simply refer to the different categories They simply refer to the different categories Coding means assigning content to categories A data field with coded categories is a “variable” A data field with coded categories is a “variable” Provides a systematic basis for analysis Provides a systematic basis for analysis
Three Ways to Code “Content” 1. Each item is a separate field and is coded present or absent in every record. 2. Various mutually exclusive options are coded in one field. Each record has one code category. 3.Use a sub-table to collect multiple instances that occur in one record; code in sub-table (requires a relational database)
Code What is There Some data will be missing—too bad Resist temptation to code only judgments Code the evidence into database Code the evidence into database Then code your judgment (positive, negative) Then code your judgment (positive, negative) This provides evidence for the judgment This provides evidence for the judgment Allows for reliability checks of judgments Allows for reliability checks of judgments Can start with some standard codes, add more later Can enter actual terms, recode later
Content Coding Questions How would you code Male and Female? How would you code a word or phrase? What if you don’t know all the words now? What if you don’t know all the words now? What if there can be more than one/record? What if there can be more than one/record? How would you code a topic or theme? What if you don’t know all the topics now? What if you don’t know all the topics now? What if there can be more than one/record? What if there can be more than one/record?
Content Coding Questions How would you code Male and Female? How would you code key words or phrases? What if you don’t know all the words now? What if you don’t know all the words now? What if there can be more than one per record? What if there can be more than one per record? How would you code a topic or theme? What if you don’t know all the topics now? What if you don’t know all the topics now? What if there can be more than one per record? What if there can be more than one per record?