Spreadsheet design an overview of further issues Research Methods Group Wim Buysse – ICRAF-ILRI Research Methods Group October 2004
How to get the perfect spread sheet ?
Data entry: When ? ASAP ! (As Soon As Possible) before you forget the details
Data entry: what ? RAW DATA Do calculations and conversions afterwards, in the spreadsheet using the computer –Formula in MS Excel –Also consistency check
9/11 or 11/9 ? Outside the USA, customize the regional settings !!
Do not leave ‘gaps’
Heading information Design factors, measurement variables => detailed information Column titles, codes Don’t leave blank Short Avoid using “strange” characters (%, …) Otherwise summaries / filtering importing in other software
Heading information
Calculations ?
Calculations - Use spreadsheet or statistical software - Use figures - Let computer do the calculations - Use Forms to easily write down information when in the field - Enter data on the computer in a way that they are easy to use in further calculations
Calculations Same data entered in our standard format
Calculations Now it is just a matter of presenting the data in the way that best fits the needs
Calculations Even a participants list can be used to discover trends
One observation/answer per cell - How many training activities took place per location ? - What was the total number of training days ?
One observation/answer per cell - Our standard format… - (still data missing)
One observation/answer per cell - How many training activities took place per location ? - What was the total number of training days ?
Confusing entries - What does 0 mean ? - What does a blank cell mean ? - What does an asterisk (*) mean ? - What does 99 mean ? AGAIN: - DO NOT mix figures and text => calculations - make sure the same code is written in the same way (N/A, n.a., not avail., ….) - do not forget or mix meaning of codes, ex. 1 = male, 2 = female or vice versa ?? => should not be problem when using standard format
Confusing entries - Solutions - documentation (description of design factors, description of measurement variables, experiment details) - insert comment - add column with comments - if lots of comments - easy to find (AutoFilter)
Include or exclude ?? - Include all cases ?? -The short answer is INCLUDE - Analysis later can be done on subsets - Insert comment if you think it is a strange observation
Include or exclude ?? - Include all variables ?? -The short answer is INCLUDE - Variables => depend on objectives - Variables => do not forget to include important design factors like location
Include or exclude : example Is there any significant difference between apple variety Anna and Golden Dorset in terms of flowering, fruiting and agronomy in the different orchards (= different altitudes) ?
Include or exclude : example Example = ‘nice try’ - Gives useful information - Easy to collect data (workers in orchards can do it) - So, research is carried out at almost no cost
Include or exclude : example - Problem: which trees ? - What happened with the apples (harvested, fallen, eaten by birds, … ?) - (Probably) flowers and apples were only counted on those trees that were flowering. - The problem is that this way we’re looking at groups of different size and with different variability. - We can only answer a question like: of all those trees that are flowering at a given moment, where do they flower the most and where do they have most apples hanging on the trees? - Solution = include ‘tree number’ as an extra factor + record what happened with the apples.
Unit or level of analysis e.g. farm/household, person, plot, community,… Is determined by objectives Multiple objectives may require data at different levels One row of data per case If data are measured at several levels, move data between levels:
Unit or level of analysis replicating data down levels Analysis at plot levelData at farm level
Unit or level of analysis Summarising up levels Data at plot levelAnalysis at farm level No unique way – think!