Download presentation
Presentation is loading. Please wait.
1
ASPIRE Workshop 5: Data Collection Tools
Lauren Heath, PharmD Research Fellow University of Colorado Skaggs School of Pharmacy and Pharmaceutical Sciences & Kaiser Permanente Colorado
2
Outline Data management techniques using Microsoft Excel®
Note: we will be using Microsoft Excel 2013 for PC Tools for statistical analysis Data collection tools Develop your data collection tool
3
Data Management
4
Quick Excel® Tips ‘N Tricks
Excel does not treat blank cells as zero If there are blank cells and you are performing a function on these cells, use “” to indicate blank If referring to words in an Excel formula, such as counting the number of females, enter the words in quotes, i.e. “female” If intend to copy your formula and apply it down an entire column and do not want the cells references in a formula to change, use the $ sign: For example: A$1:B$88 vs. A1: B88
5
Basic Spreadsheet Manipulation
Using the data fill handle Task: enter “4” into HF clinic duration and use the fill handle to drag down 10 cells (then undo) Hold mouse over bottom right corner until a ‘+’ forms. This is the fill handle.
6
Basic Spreadsheet Manipulation
Using the data fill handle
7
Basic Spreadsheet Manipulation
Select the entire column/row/spreadsheet Task 1: select row 3 of the spreadsheet Select the number 3 on the left. A black arrow should form.
8
Basic Spreadsheet Manipulation
Select the entire column/row/spreadsheet Task 2: select column D of the spreadsheet Select column D at the top. A black arrow should form.
9
Basic Spreadsheet Manipulation
Select the entire column/row/spreadsheet Task 3: select the entire spreadsheet Select the top left corner where there is a triangle. A white plus sign should form.
10
Basic Spreadsheet Manipulation
Paste special: can retain formatting or formulas when copy/paste Task: copy the first 5 columns and first 10 rows, then use paste special to retain values and source formulas Recommend take a look at the various paste special options to know what options are available
11
Basic Spreadsheet Manipulation
Paste special: can retain formatting or formulas when copy/paste Task: copy the first 5 columns and first 10 rows, then use paste special to retain values and source formulas Recommend take a look at the various paste special options to know what options are available Select values & source formatting (E)
12
Freeze Panes(s) in a Dataset
Task: freeze top row of dataset This is very helpful to do if you are entering may rows of data and want to scroll down and still see the top row
13
Freeze Panes(s) in a Dataset
Task: freeze both top row and first column of dataset Select first cell next to the column and row of interest (in this case, cell B2) This is very helpful to do if you are entering may rows of data and want to scroll down and still see the top row
14
Freeze Panes(s) in a Dataset
Task: freeze both top row and first column of dataset While B2 is selected, select freeze panes This is very helpful to do if you are entering may rows of data and want to scroll down and still see the top row
15
Formatting Cells Task: format column F (discharge date) as a date (mm/dd/yy) Select all of column F
16
Formatting Cells Task: format column F (discharge date) as a date (mm/dd/yy) Right click on the column and select format cells
17
Formatting Cells Task: format column F (discharge date) as a date (mm/dd/yy) Select date and choose the desired formatting option. Click OK to accept.
18
Formatting Cells Task: format column F (discharge date) as a date (mm/dd/yy) Formatted dates
19
Formatting Cells Task: Custom format ID (column A) to include 3 digits
Select all of column A
20
Formatting Cells Task: Custom format ID (column A) to include 3 digits
Right click on the column and select format cells
21
Formatting Cells Task: Custom format ID (column A) to include 3 digits
Select custom format and option to include 3 figures without a decimal point. Replace # signs with zeros.
22
Formatting Cells Task: Custom format ID (column A) to include 3 digits
Formatted IDs to all include 3 digits
23
Formatting Cells Task: highlight the row for patient number 147 (row 9) in red
24
Formatting Cells Task: highlight the row for patient number 147 (row 9) in red
25
Formatting Cells Task: highlight the row for patient number 147 (row 9) in red
26
Formatting Cells Task: highlight the row for patient number 147 (row 9) in red Patient 147 highlighted in red
27
Formatting Cells Task: change columns P, Q, R (SBP, DBP, HR) to be vertical or horizontal
28
Formatting Cells Task: change columns P, Q, R (SBP, DBP, HR) to be vertical or horizontal
29
Formatting Cells Task: change columns P, Q, R (SBP, DBP, HR) to be vertical or horizontal Enter 90 degrees orientation for vertical text or 180 degrees orientation for horizontal
30
Format as a Table Task: change excel spreadsheet to a table
Make desired selection for table style or customize
31
Format as a Table Task: change excel spreadsheet to a table
If do not select this, Excel will add headers to your table
32
Conditional Formatting
Task: use conditional formatting to shade cells in “metoprolol equivalent_discharge” (column N) with value >100 in green Conditional formatting is under the home tab
33
Conditional Formatting
Task: use conditional formatting to shade cells in “metoprolol equivalent_discharge” (column N) with value >100 in green Select format
34
Conditional Formatting
Task: use conditional formatting to shade cells in “metoprolol equivalent_discharge” (column N) with value >100 in green Select green and OK
35
Conditional Formatting
Task: use conditional formatting to shade cells in “metoprolol equivalent_discharge” (column N) with value >100 in green Cells with values >100 are highlighted in green
36
Data Validation Task: create a column after “HF clinic duration” for “smoking status” with the following choices: current, past, never Highlight entire column next to HF clinic duration. Right click and select insert.
37
Data Validation Task: create a column after “HF clinic duration” for “smoking status” with the following choices: current, past, never
38
Data Validation Task: create a column after “HF clinic duration” for “smoking status” with the following choices: current, past, never
39
Data Validation Task: create a column after “HF clinic duration” for “smoking status” with the following choices: current, past, never
40
Data Validation Task: create a column after “HF clinic duration” for “smoking status” with the following choices: current, past, never Dropdown menu for selection with current, past, never
41
Data Validation Task: change columns “SBP” and “DBP” to only allow numbers
42
Data Validation Task: change columns “SBP” and “DBP” to only allow numbers
43
Data Validation Task: change columns “SBP” and “DBP” to only allow numbers Set criteria specific to expected values in that column
44
Data Validation Task: Exploration
Message will appear before user enters any information
45
Data Validation Task: Exploration
Stop will prevent user from entering incorrect information. Warning will warn the user (can include a message) and requires that the user okay entering invalidated information.
46
Basic Excel Functions Task: calculate the mean, maximum and minimum age Select cell where you would like to insert the function
47
Basic Excel Functions Task: calculate the mean, maximum and minimum age You can also search for functions
48
Basic Excel Functions Task: calculate the mean, maximum and minimum age Click on this box to select your data range (instead of typing the range) Enter or select your desired data range. Could also enter values individually in the number boxes
49
Basic Excel Functions Task: calculate the mean, maximum and minimum age Calculated mean, maximum, and minimum ages in dataset.
50
CountIF and CountIFS Task: count the number of females in the sample using Countif Select cell where you would like to insert the function CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria
51
CountIF and CountIFS Task: count the number of females in the sample using Countif CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria
52
CountIF and CountIFS Task: count the number of females in the sample using Countif CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria If counting something that is a word or letter, put in double quotes
53
CountIF and CountIFS Task: count the number of females in the sample using Countif Total number of female patients in the dataset. CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria
54
CountIF and CountIFS Task: count the number of females > 80 years old in the sample using Countifs Select cell where you would like to insert the function CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria
55
CountIF and CountIFS Task: count the number of females > 80 years old in the sample using Countifs CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria
56
CountIF and CountIFS Task: count the number of females > 80 years old in the sample using Countifs Use >= to denote “greater than or equal to” CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria
57
CountIF and CountIFS Task: count the number of females > 80 years old in the sample using Countifs Number of female patients greater than or equal to 80 years old. CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria
58
IF Statements Task: insert a column after “BL_EF_actual” called “LVSD_BL” that displays a value of 1 if the patient’s EF is <40% and 0 if EF is >40%, and blank if the EF is not available I typically enter the formula into one cell first, then drag down to the rest of the cells (if applicable) later
59
IF Statements Task: insert a column after “BL_EF_actual” called “LVSD_BL” that displays a value of 1 if the patient’s EF is <40% and 0 if EF is >40%, and blank if the EF is not available
60
IF Statements Task: insert a column after “BL_EF_actual” called “LVSD_BL” that displays a value of 1 if the patient’s EF is <40% and 0 if EF is >40%, and blank if the EF is not available Since there are missing EF values in column I, need to embed an IF statement using “” to denote blank
61
IF Statements Task: insert a column after “BL_EF_actual” called “LVSD_BL” that displays a value of 1 if the patient’s EF is <40% and 0 if EF is >40%, and blank if the EF is not available Second IF statement indicating to show a 1 if I2<40, or otherwise to show a 0
62
IF Statements Final IF formula: =IF(I2="","",IF(I2<40, 1, 0))
Task: insert a column after “BL_EF_actual” called “LVSD_BL” that displays a value of 1 if the patient’s EF is <40% and 0 if EF is >40%, and blank if the EF is not available Final IF formula: =IF(I2="","",IF(I2<40, 1, 0)) This could be read as: IF I2 is blank, then insert blank. Otherwise, if I2 is less than 40 insert 1. Otherwise (if I2 is NOT less than 40), insert 0. When this formula is dragged down the column, I2 will change to refer to I3, then I4, etc.
63
Vlookup Task: insert a column after the one you just created (LVSD_BL) that includes the discharge EF data available in Sheet 1
64
Vlookup Task: insert a column after the one you just created (LVSD_BL) that includes the discharge EF data available in Sheet 1
65
Vlookup Task: insert a column after the one you just created (LVSD_BL) that includes the discharge EF data available in Sheet 1 References the first “linking” cell (ID) that is present in both datasets All cells of the second spreadsheet (Sheet1) – this is where the formula will search for values In Sheet1, column B or the second column, contains the new values we want to insert into the cohort spreadsheet False indicates that we want to find an exact match.
66
Vlookup Final formula: =VLOOKUP(A2,Sheet1!A$2:B$51,2,FALSE)
Task: insert a column after the one you just created (LVSD_BL) that includes the discharge EF data available in Sheet 1 Final formula: =VLOOKUP(A2,Sheet1!A$2:B$51,2,FALSE) Note that in Sheet1 there are only 51 patients with a follow-up EF whereas there are 223 patients in the cohort. Therefore, when this formula is applied to other cells, many will show “#N/A” which means not available.
67
Pivot Tables Task: Display gender by COPD status in a separate worksheet
68
Pivot Tables Task: Display gender by COPD status in a separate worksheet
69
Pivot Tables Task: Display gender by COPD status in a separate worksheet
70
Pivot Tables Task: Display mean metoprolol equivalent dose at discharge by COPD status
71
Pivot Tables Task: Display mean metoprolol equivalent dose at discharge by COPD status
72
Pivot Tables Task: Display mean metoprolol equivalent dose at discharge by COPD status
73
Pivot Tables Task: Display mean metoprolol equivalent dose at discharge by COPD status
74
Pivot Tables Task: display mean SBP by beta blocker agent at discharge, then limit to just patients receiving metoprolol
75
Pivot Tables Task: display mean SBP by beta blocker agent at discharge, then limit to just patients receiving metoprolol
76
Pivot Tables Task: display mean SBP by beta blocker agent at discharge, then limit to just patients receiving metoprolol
77
Pivot Tables Task: display mean SBP by beta blocker agent at discharge, then limit to just patients receiving metoprolol Select the dropdown arrow to view filter options
78
Pivot Tables Task: display mean SBP by beta blocker agent at discharge, then limit to just patients receiving metoprolol
79
Statistical Tests Options for Statistical Tests (if you don’t have access to a biostatistician/analyst) Many basic statistical tests can be performed in Excel Learn to use some of the open source or free software “R” is very common and robust: “PSPP”, open source similar to SPSS: If you are affiliated with a University, you may have free access to some statistical software For example, SAS University Edition (SAS Studio), JMP Using Excel is probably best for quick calculations as the options for statistical tests are limited. However, if your analysis is relatively straightforward, it could be used. Keep in mind that if you are seeking to publish your work, use of an official statistical analysis software is probably preferred. For R, you may be able to change the user interface (such to use drop down menus instead of programming), can consider installing some packages to change your interface. Such as JGR and Deducer. SAS studio runs SAS through your web browser and you may be able to access it for free depending on contracts in place at your University.
80
Data Analysis Toolpak To activate: file options add-ins
Go to manage Excel add-ins and select the analysis toolpak to activate The data analysis toolpak offers more options for tests than are available – take a look to see what is available. We will not go through how to use all of the statistical tests here. If you aren’t sure how to use one – suggest using the help feature and/or searching for help online. One nice thing about the analysis toolpak is it can be a faster way to do multiple tests at once. For example, using the “descriptive statistics” option on a continuous variable can give you a broad overview of the data including mean, standard error, median, mode, standard deviation, variance, kurtosis, skewness, minimum, maximum, sum, count. If activated, you can select the analysis toolpak options under “data” by selecting data analysis
81
Statistical tests in Excel: T-test
Task: compare age in the COPD versus non-COPD patients If using the basic t-test (not the analysis toolpak), suggest setting up desired data in two columns in a separate sheet. If keep the values in the same column of the original spreadsheet (‘cohort’), this can lead to inaccuracies in the calculation. (I believe this is because excel will calculate the variance of the sample using the whole column). T-test: tests for equality of the population means that underlie each sample Note that Excel can be used for some basic statistical tests successfully, but there are some challenges you could face: Missing values are handled inconsistently at times. The only way for Excel to recognize missing data is using a blank cell. You may need to reorganize your data into different arrangements depending on what analyses you are doing. Many analyses can only be done on one column at a time, which can make it inconvenient Variances Equal variances: assumes that the two data sets come from distributions with the same variances (homescedastic). Use to determine whether two samples are likely to have come from distributions with equal population means. Unequal variances: assumes unequal variances (heteroscedastic). Can use to determine whether the two samples are likely to have come from distributions with equal population means. better to assume this if not sure, as it is more strict If variances are not known, use the Z-test for two sample means. Ftest Returns the two-tailed probability that the variances in the two groups are not significantly different. Use to determine if there are different variances.
82
Statistical tests in Excel: T-test
Task: compare age in the COPD versus non-COPD patients Go to formulas insert function select “ttest”
83
Statistical tests in Excel: T-test
Task: compare age in the COPD versus non-COPD patients Array1: range of data for COPD patients Array2: range of data for Non-COPD patients Tails: 2 means two-tailed test; use 1 if want a one-tailed test Type: select 1 for paired data, 2 if equal variance, and 3 for unequal variance
84
Statistical tests in Excel: T-test
Task: compare age in the COPD versus non-COPD patients P-value
85
Statistical tests in Excel: T-test
Task: compare age in the COPD versus non-COPD patients Other option: t-test using analysis toolpak Data data analysis select desired t-test
86
Statistical tests in Excel: T-test
Task: compare age in the COPD versus non-COPD patients Other option: t-test using analysis toolpak Select range of age data for COPD and non-COPD. Option to change alpha level. One nice thing about doing it this way is that you don’t have to enter the ages into new columns. You can calculate with the ages in the same column, but you need to make sure the data is sorted by group first.
87
Statistical tests in Excel: T-test
Task: compare age in the COPD versus non-COPD patients Other option: t-test using analysis toolpak As you can see, it provides more data than using the t-test without the analysis toolpak. P-value for two-sided t-test
88
Statistical tests in Excel: Chi-Square test
Task: compare the number of males and females in each group Step 1: create a 2 x 2 contingency table with observed values Create using a pivot table or by hand. Need to count number of males and females in each group, and enter into the table. Also need to add up the total in each row, column, and overall total.
89
Statistical tests in Excel: Chi-Square test
Task: compare the number of males and females in each group Step 2: create a 2 x 2 contingency table with expected values Copy expected table, including totals. Then delete observed values.
90
Statistical tests in Excel: Chi-Square test
Task: compare the number of males and females in each group Step 2: create a 2 x 2 contingency table with expected values Fill in expected values: formula = ((row total x column total)/grand total) A = (155*86/223) C = (86*68/223) B = (155*137/223) D = (137*68/223)
91
Statistical tests in Excel: Chi-Square test
Task: compare the number of males and females in each group Step 2: create a 2 x 2 contingency table with expected values
92
Statistical tests in Excel: Chi-Square test
Task: compare the number of males and females in each group Step 3: execute chi-square test Go to formulas insert function select “chitest”
93
Statistical tests in Excel: Chi-Square test
Task: compare the number of males and females in each group Step 3: execute chi-square test Observed values from first table Expected values from second table
94
Statistical tests in Excel: Chi-Square test
Task: compare the number of males and females in each group
95
Data Collection Tools
96
Examples of Data Collection Tools
Centralized database Ex. Redcap: HIPAA compliant, with online and offline data capture capabilities Allows for data validation, branching logic, basic calculations, and can export entered data to most statistical packages Spreadsheet-based, e.g. Excel Redcap is a secure web application for building and managing online surveys and databases. It is HIPAA-compliant. It is geared toward support of online and offline data capture for research studies and operations. One nice thing about redcap is that multiple researchers working on the same project can access the database remotely without having to worry about sharing in a HIPAA compliant fashion (such as with Excel). Redcap provides: An easy-to-use data entry system, with data validation The ability to import data from external sources Automated exports to most statistical packages Audit trails for tracking data changes and exports Branching logics, calculations, and answer piping to increase functionality A sophisticated survey tool
97
Excel as a Data Collection Tool
Keep in mind what you will be doing with the spreadsheet after entering the data Consider how to enter data to make analysis easier Suggestions to consider: Better to enter more “raw” data, and perform calculations in Excel For example, if calculating length of stay: create a column for admission date and discharge date, then calculate difference in dates in another column Enter data into Excel cell in a usable form Can use a separate data dictionary to code entries Avoid entering more than one data point into the same cell Cannot perform calculations Employ data validation techniques to avoid errors If someone else will be doing the analysis, be sure to communicate with them about how to enter data to make it easier to analyze -for example, if using SAS, do not include spaces in variable names: metoprolol equivalent discharge metoprolol_equivalent_discharge
98
Importance of Creating a Good Data Collection Tool to Minimize Possible Errors
From: Examples of errors that occurred: -interpreting a gene “SEPT2” as September second -misindentifying numbers as in scientific notation format, making the numbers larger Accessed September 24, 2016.
100
Acknowledgements Thank you to Sarah Billups, PharmD, BCPS
101
Now…work on your data collection tool
If time allows during the session… Split into groups of 2-3 Revise your data collection tool employing principles discussed today Review and provide feedback to your partner(s) about their tool
102
ASPIRE Workshop 5: Data Collection Tools
Lauren Heath, PharmD Research Fellow University of Colorado Skaggs School of Pharmacy and Pharmaceutical Sciences & Kaiser Permanente Colorado
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.