Presentation is loading. Please wait.

Presentation is loading. Please wait.

ASPIRE Workshop 5: Data Collection Tools

Similar presentations


Presentation on theme: "ASPIRE Workshop 5: Data Collection Tools"— Presentation transcript:

1 ASPIRE Workshop 5: Data Collection Tools
Lauren Heath, PharmD Research Fellow University of Colorado Skaggs School of Pharmacy and Pharmaceutical Sciences & Kaiser Permanente Colorado

2 Outline Data management techniques using Microsoft Excel®
Note: we will be using Microsoft Excel 2013 for PC Tools for statistical analysis Data collection tools Develop your data collection tool

3 Data Management

4 Quick Excel® Tips ‘N Tricks
Excel does not treat blank cells as zero If there are blank cells and you are performing a function on these cells, use “” to indicate blank If referring to words in an Excel formula, such as counting the number of females, enter the words in quotes, i.e. “female” If intend to copy your formula and apply it down an entire column and do not want the cells references in a formula to change, use the $ sign: For example: A$1:B$88 vs. A1: B88

5 Basic Spreadsheet Manipulation
Using the data fill handle Task: enter “4” into HF clinic duration and use the fill handle to drag down 10 cells (then undo) Hold mouse over bottom right corner until a ‘+’ forms. This is the fill handle.

6 Basic Spreadsheet Manipulation
Using the data fill handle

7 Basic Spreadsheet Manipulation
Select the entire column/row/spreadsheet Task 1: select row 3 of the spreadsheet Select the number 3 on the left. A black arrow should form.

8 Basic Spreadsheet Manipulation
Select the entire column/row/spreadsheet Task 2: select column D of the spreadsheet Select column D at the top. A black arrow should form.

9 Basic Spreadsheet Manipulation
Select the entire column/row/spreadsheet Task 3: select the entire spreadsheet Select the top left corner where there is a triangle. A white plus sign should form.

10 Basic Spreadsheet Manipulation
Paste special: can retain formatting or formulas when copy/paste Task: copy the first 5 columns and first 10 rows, then use paste special to retain values and source formulas Recommend take a look at the various paste special options to know what options are available

11 Basic Spreadsheet Manipulation
Paste special: can retain formatting or formulas when copy/paste Task: copy the first 5 columns and first 10 rows, then use paste special to retain values and source formulas Recommend take a look at the various paste special options to know what options are available Select values & source formatting (E)

12 Freeze Panes(s) in a Dataset
Task: freeze top row of dataset This is very helpful to do if you are entering may rows of data and want to scroll down and still see the top row

13 Freeze Panes(s) in a Dataset
Task: freeze both top row and first column of dataset Select first cell next to the column and row of interest (in this case, cell B2) This is very helpful to do if you are entering may rows of data and want to scroll down and still see the top row

14 Freeze Panes(s) in a Dataset
Task: freeze both top row and first column of dataset While B2 is selected, select freeze panes This is very helpful to do if you are entering may rows of data and want to scroll down and still see the top row

15 Formatting Cells Task: format column F (discharge date) as a date (mm/dd/yy) Select all of column F

16 Formatting Cells Task: format column F (discharge date) as a date (mm/dd/yy) Right click on the column and select format cells

17 Formatting Cells Task: format column F (discharge date) as a date (mm/dd/yy) Select date and choose the desired formatting option. Click OK to accept.

18 Formatting Cells Task: format column F (discharge date) as a date (mm/dd/yy) Formatted dates

19 Formatting Cells Task: Custom format ID (column A) to include 3 digits
Select all of column A

20 Formatting Cells Task: Custom format ID (column A) to include 3 digits
Right click on the column and select format cells

21 Formatting Cells Task: Custom format ID (column A) to include 3 digits
Select custom format and option to include 3 figures without a decimal point. Replace # signs with zeros.

22 Formatting Cells Task: Custom format ID (column A) to include 3 digits
Formatted IDs to all include 3 digits

23 Formatting Cells Task: highlight the row for patient number 147 (row 9) in red

24 Formatting Cells Task: highlight the row for patient number 147 (row 9) in red

25 Formatting Cells Task: highlight the row for patient number 147 (row 9) in red

26 Formatting Cells Task: highlight the row for patient number 147 (row 9) in red Patient 147 highlighted in red

27 Formatting Cells Task: change columns P, Q, R (SBP, DBP, HR) to be vertical or horizontal

28 Formatting Cells Task: change columns P, Q, R (SBP, DBP, HR) to be vertical or horizontal

29 Formatting Cells Task: change columns P, Q, R (SBP, DBP, HR) to be vertical or horizontal Enter 90 degrees orientation for vertical text or 180 degrees orientation for horizontal

30 Format as a Table Task: change excel spreadsheet to a table
Make desired selection for table style or customize

31 Format as a Table Task: change excel spreadsheet to a table
If do not select this, Excel will add headers to your table

32 Conditional Formatting
Task: use conditional formatting to shade cells in “metoprolol equivalent_discharge” (column N) with value >100 in green Conditional formatting is under the home tab

33 Conditional Formatting
Task: use conditional formatting to shade cells in “metoprolol equivalent_discharge” (column N) with value >100 in green Select format

34 Conditional Formatting
Task: use conditional formatting to shade cells in “metoprolol equivalent_discharge” (column N) with value >100 in green Select green and OK

35 Conditional Formatting
Task: use conditional formatting to shade cells in “metoprolol equivalent_discharge” (column N) with value >100 in green Cells with values >100 are highlighted in green

36 Data Validation Task: create a column after “HF clinic duration” for “smoking status” with the following choices: current, past, never Highlight entire column next to HF clinic duration. Right click and select insert.

37 Data Validation Task: create a column after “HF clinic duration” for “smoking status” with the following choices: current, past, never

38 Data Validation Task: create a column after “HF clinic duration” for “smoking status” with the following choices: current, past, never

39 Data Validation Task: create a column after “HF clinic duration” for “smoking status” with the following choices: current, past, never

40 Data Validation Task: create a column after “HF clinic duration” for “smoking status” with the following choices: current, past, never Dropdown menu for selection with current, past, never

41 Data Validation Task: change columns “SBP” and “DBP” to only allow numbers

42 Data Validation Task: change columns “SBP” and “DBP” to only allow numbers

43 Data Validation Task: change columns “SBP” and “DBP” to only allow numbers Set criteria specific to expected values in that column

44 Data Validation Task: Exploration
Message will appear before user enters any information

45 Data Validation Task: Exploration
Stop will prevent user from entering incorrect information. Warning will warn the user (can include a message) and requires that the user okay entering invalidated information.

46 Basic Excel Functions Task: calculate the mean, maximum and minimum age Select cell where you would like to insert the function

47 Basic Excel Functions Task: calculate the mean, maximum and minimum age You can also search for functions

48 Basic Excel Functions Task: calculate the mean, maximum and minimum age Click on this box to select your data range (instead of typing the range) Enter or select your desired data range. Could also enter values individually in the number boxes

49 Basic Excel Functions Task: calculate the mean, maximum and minimum age Calculated mean, maximum, and minimum ages in dataset.

50 CountIF and CountIFS Task: count the number of females in the sample using Countif Select cell where you would like to insert the function CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria

51 CountIF and CountIFS Task: count the number of females in the sample using Countif CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria

52 CountIF and CountIFS Task: count the number of females in the sample using Countif CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria If counting something that is a word or letter, put in double quotes

53 CountIF and CountIFS Task: count the number of females in the sample using Countif Total number of female patients in the dataset. CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria

54 CountIF and CountIFS Task: count the number of females > 80 years old in the sample using Countifs Select cell where you would like to insert the function CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria

55 CountIF and CountIFS Task: count the number of females > 80 years old in the sample using Countifs CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria

56 CountIF and CountIFS Task: count the number of females > 80 years old in the sample using Countifs Use >= to denote “greater than or equal to” CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria

57 CountIF and CountIFS Task: count the number of females > 80 years old in the sample using Countifs Number of female patients greater than or equal to 80 years old. CountIFs allows you to count based on MULTIPLE set criteria whereas with CountIF you can only count based on one criteria

58 IF Statements Task: insert a column after “BL_EF_actual” called “LVSD_BL” that displays a value of 1 if the patient’s EF is <40% and 0 if EF is >40%, and blank if the EF is not available I typically enter the formula into one cell first, then drag down to the rest of the cells (if applicable) later

59 IF Statements Task: insert a column after “BL_EF_actual” called “LVSD_BL” that displays a value of 1 if the patient’s EF is <40% and 0 if EF is >40%, and blank if the EF is not available

60 IF Statements Task: insert a column after “BL_EF_actual” called “LVSD_BL” that displays a value of 1 if the patient’s EF is <40% and 0 if EF is >40%, and blank if the EF is not available Since there are missing EF values in column I, need to embed an IF statement using “” to denote blank

61 IF Statements Task: insert a column after “BL_EF_actual” called “LVSD_BL” that displays a value of 1 if the patient’s EF is <40% and 0 if EF is >40%, and blank if the EF is not available Second IF statement indicating to show a 1 if I2<40, or otherwise to show a 0

62 IF Statements Final IF formula: =IF(I2="","",IF(I2<40, 1, 0))
Task: insert a column after “BL_EF_actual” called “LVSD_BL” that displays a value of 1 if the patient’s EF is <40% and 0 if EF is >40%, and blank if the EF is not available Final IF formula: =IF(I2="","",IF(I2<40, 1, 0)) This could be read as: IF I2 is blank, then insert blank. Otherwise, if I2 is less than 40 insert 1. Otherwise (if I2 is NOT less than 40), insert 0. When this formula is dragged down the column, I2 will change to refer to I3, then I4, etc.

63 Vlookup Task: insert a column after the one you just created (LVSD_BL) that includes the discharge EF data available in Sheet 1

64 Vlookup Task: insert a column after the one you just created (LVSD_BL) that includes the discharge EF data available in Sheet 1

65 Vlookup Task: insert a column after the one you just created (LVSD_BL) that includes the discharge EF data available in Sheet 1 References the first “linking” cell (ID) that is present in both datasets All cells of the second spreadsheet (Sheet1) – this is where the formula will search for values In Sheet1, column B or the second column, contains the new values we want to insert into the cohort spreadsheet False indicates that we want to find an exact match.

66 Vlookup Final formula: =VLOOKUP(A2,Sheet1!A$2:B$51,2,FALSE)
Task: insert a column after the one you just created (LVSD_BL) that includes the discharge EF data available in Sheet 1 Final formula: =VLOOKUP(A2,Sheet1!A$2:B$51,2,FALSE) Note that in Sheet1 there are only 51 patients with a follow-up EF whereas there are 223 patients in the cohort. Therefore, when this formula is applied to other cells, many will show “#N/A” which means not available.

67 Pivot Tables Task: Display gender by COPD status in a separate worksheet

68 Pivot Tables Task: Display gender by COPD status in a separate worksheet

69 Pivot Tables Task: Display gender by COPD status in a separate worksheet

70 Pivot Tables Task: Display mean metoprolol equivalent dose at discharge by COPD status

71 Pivot Tables Task: Display mean metoprolol equivalent dose at discharge by COPD status

72 Pivot Tables Task: Display mean metoprolol equivalent dose at discharge by COPD status

73 Pivot Tables Task: Display mean metoprolol equivalent dose at discharge by COPD status

74 Pivot Tables Task: display mean SBP by beta blocker agent at discharge, then limit to just patients receiving metoprolol

75 Pivot Tables Task: display mean SBP by beta blocker agent at discharge, then limit to just patients receiving metoprolol

76 Pivot Tables Task: display mean SBP by beta blocker agent at discharge, then limit to just patients receiving metoprolol

77 Pivot Tables Task: display mean SBP by beta blocker agent at discharge, then limit to just patients receiving metoprolol Select the dropdown arrow to view filter options

78 Pivot Tables Task: display mean SBP by beta blocker agent at discharge, then limit to just patients receiving metoprolol

79 Statistical Tests Options for Statistical Tests (if you don’t have access to a biostatistician/analyst) Many basic statistical tests can be performed in Excel Learn to use some of the open source or free software “R” is very common and robust: “PSPP”, open source similar to SPSS: If you are affiliated with a University, you may have free access to some statistical software For example, SAS University Edition (SAS Studio), JMP Using Excel is probably best for quick calculations as the options for statistical tests are limited. However, if your analysis is relatively straightforward, it could be used. Keep in mind that if you are seeking to publish your work, use of an official statistical analysis software is probably preferred. For R, you may be able to change the user interface (such to use drop down menus instead of programming), can consider installing some packages to change your interface. Such as JGR and Deducer. SAS studio runs SAS through your web browser and you may be able to access it for free depending on contracts in place at your University.

80 Data Analysis Toolpak To activate: file  options  add-ins
Go to manage Excel add-ins and select the analysis toolpak to activate The data analysis toolpak offers more options for tests than are available – take a look to see what is available. We will not go through how to use all of the statistical tests here. If you aren’t sure how to use one – suggest using the help feature and/or searching for help online. One nice thing about the analysis toolpak is it can be a faster way to do multiple tests at once. For example, using the “descriptive statistics” option on a continuous variable can give you a broad overview of the data including mean, standard error, median, mode, standard deviation, variance, kurtosis, skewness, minimum, maximum, sum, count. If activated, you can select the analysis toolpak options under “data” by selecting data analysis

81 Statistical tests in Excel: T-test
Task: compare age in the COPD versus non-COPD patients If using the basic t-test (not the analysis toolpak), suggest setting up desired data in two columns in a separate sheet. If keep the values in the same column of the original spreadsheet (‘cohort’), this can lead to inaccuracies in the calculation. (I believe this is because excel will calculate the variance of the sample using the whole column). T-test: tests for equality of the population means that underlie each sample Note that Excel can be used for some basic statistical tests successfully, but there are some challenges you could face: Missing values are handled inconsistently at times. The only way for Excel to recognize missing data is using a blank cell. You may need to reorganize your data into different arrangements depending on what analyses you are doing. Many analyses can only be done on one column at a time, which can make it inconvenient Variances Equal variances: assumes that the two data sets come from distributions with the same variances (homescedastic). Use to determine whether two samples are likely to have come from distributions with equal population means. Unequal variances: assumes unequal variances (heteroscedastic). Can use to determine whether the two samples are likely to have come from distributions with equal population means. better to assume this if not sure, as it is more strict If variances are not known, use the Z-test for two sample means. Ftest Returns the two-tailed probability that the variances in the two groups are not significantly different. Use to determine if there are different variances.

82 Statistical tests in Excel: T-test
Task: compare age in the COPD versus non-COPD patients Go to formulas  insert function  select “ttest”

83 Statistical tests in Excel: T-test
Task: compare age in the COPD versus non-COPD patients Array1: range of data for COPD patients Array2: range of data for Non-COPD patients Tails: 2 means two-tailed test; use 1 if want a one-tailed test Type: select 1 for paired data, 2 if equal variance, and 3 for unequal variance

84 Statistical tests in Excel: T-test
Task: compare age in the COPD versus non-COPD patients P-value

85 Statistical tests in Excel: T-test
Task: compare age in the COPD versus non-COPD patients Other option: t-test using analysis toolpak Data  data analysis  select desired t-test

86 Statistical tests in Excel: T-test
Task: compare age in the COPD versus non-COPD patients Other option: t-test using analysis toolpak Select range of age data for COPD and non-COPD. Option to change alpha level. One nice thing about doing it this way is that you don’t have to enter the ages into new columns. You can calculate with the ages in the same column, but you need to make sure the data is sorted by group first.

87 Statistical tests in Excel: T-test
Task: compare age in the COPD versus non-COPD patients Other option: t-test using analysis toolpak As you can see, it provides more data than using the t-test without the analysis toolpak. P-value for two-sided t-test

88 Statistical tests in Excel: Chi-Square test
Task: compare the number of males and females in each group Step 1: create a 2 x 2 contingency table with observed values Create using a pivot table or by hand. Need to count number of males and females in each group, and enter into the table. Also need to add up the total in each row, column, and overall total.

89 Statistical tests in Excel: Chi-Square test
Task: compare the number of males and females in each group Step 2: create a 2 x 2 contingency table with expected values Copy expected table, including totals. Then delete observed values.

90 Statistical tests in Excel: Chi-Square test
Task: compare the number of males and females in each group Step 2: create a 2 x 2 contingency table with expected values Fill in expected values: formula = ((row total x column total)/grand total) A = (155*86/223) C = (86*68/223) B = (155*137/223) D = (137*68/223)

91 Statistical tests in Excel: Chi-Square test
Task: compare the number of males and females in each group Step 2: create a 2 x 2 contingency table with expected values

92 Statistical tests in Excel: Chi-Square test
Task: compare the number of males and females in each group Step 3: execute chi-square test Go to formulas  insert function  select “chitest”

93 Statistical tests in Excel: Chi-Square test
Task: compare the number of males and females in each group Step 3: execute chi-square test Observed values from first table Expected values from second table

94 Statistical tests in Excel: Chi-Square test
Task: compare the number of males and females in each group

95 Data Collection Tools

96 Examples of Data Collection Tools
Centralized database Ex. Redcap: HIPAA compliant, with online and offline data capture capabilities Allows for data validation, branching logic, basic calculations, and can export entered data to most statistical packages Spreadsheet-based, e.g. Excel Redcap is a secure web application for building and managing online surveys and databases. It is HIPAA-compliant. It is geared toward support of online and offline data capture for research studies and operations. One nice thing about redcap is that multiple researchers working on the same project can access the database remotely without having to worry about sharing in a HIPAA compliant fashion (such as with Excel). Redcap provides: An easy-to-use data entry system, with data validation The ability to import data from external sources Automated exports to most statistical packages Audit trails for tracking data changes and exports Branching logics, calculations, and answer piping to increase functionality A sophisticated survey tool

97 Excel as a Data Collection Tool
Keep in mind what you will be doing with the spreadsheet after entering the data Consider how to enter data to make analysis easier Suggestions to consider: Better to enter more “raw” data, and perform calculations in Excel For example, if calculating length of stay: create a column for admission date and discharge date, then calculate difference in dates in another column Enter data into Excel cell in a usable form Can use a separate data dictionary to code entries Avoid entering more than one data point into the same cell Cannot perform calculations Employ data validation techniques to avoid errors If someone else will be doing the analysis, be sure to communicate with them about how to enter data to make it easier to analyze -for example, if using SAS, do not include spaces in variable names: metoprolol equivalent discharge  metoprolol_equivalent_discharge

98 Importance of Creating a Good Data Collection Tool to Minimize Possible Errors
From: Examples of errors that occurred: -interpreting a gene “SEPT2” as September second -misindentifying numbers as in scientific notation format, making the numbers larger Accessed September 24, 2016.

99

100 Acknowledgements Thank you to Sarah Billups, PharmD, BCPS

101 Now…work on your data collection tool
If time allows during the session… Split into groups of 2-3 Revise your data collection tool employing principles discussed today Review and provide feedback to your partner(s) about their tool

102 ASPIRE Workshop 5: Data Collection Tools
Lauren Heath, PharmD Research Fellow University of Colorado Skaggs School of Pharmacy and Pharmaceutical Sciences & Kaiser Permanente Colorado


Download ppt "ASPIRE Workshop 5: Data Collection Tools"

Similar presentations


Ads by Google