Presentation is loading. Please wait.

Presentation is loading. Please wait.

FINCA Client Assessment Tool (FCAT): Advanced Data Entry Presented: September 2010 Eurasia Marketing Manager Meeting.

Similar presentations


Presentation on theme: "FINCA Client Assessment Tool (FCAT): Advanced Data Entry Presented: September 2010 Eurasia Marketing Manager Meeting."— Presentation transcript:

1 FINCA Client Assessment Tool (FCAT): Advanced Data Entry Presented: September 2010 Eurasia Marketing Manager Meeting

2 Data Entry: Electronic Transfer and Manual Entry Raw data entry can be performed by direct transfer from palm devices OR manually (if palm devices were not used to collect the data) Extra steps to prevent data entry error are necessary if data is manually entered. (1) Use the Pre-Made Template (provided by HQ): Column headings must be identical. (2) Redundant Entry: Each data set should be entered twice, in two separate Excel workbooks. (3) Data Merger: Once data entry is complete, the two sets should be merged to ensure consistency. Extra Steps: 2

3 Data Merger and Cleaning Exercise 3 Note: the actual file names should include a unique ID indicating the individual who entered the data. We’ll work with these two file for the exercise. Save each of them someplace they’ll be easy to find. We’ll work with these two file for the exercise. Save each of them someplace they’ll be easy to find.

4 Manual data Merge Exercise Consolidating and Filtering Data (1)Open the two spreadsheets provided (2)In the first spreadsheet, select all the data omitting column headers and COPY (ctrl “c”) the data 4

5 Manual Data Merge Continued (3)PASTE the copied data in a blank cell on the second spreadsheet (below the existing data) (4)Filter the combined data using Excel’s auto filter (5)Sort the data by interview number (A to Z) a.This will allow you to see identical interviews side-by-side 5

6 Consistency Check for Manually Entered Data Checking data consistency (1)First scroll down the list to ensure that interview numbers have been entered correctly (the format of interview numbers will be covered later) (2)Once any necessary corrections have been made to interview numbers, using the drop down menu on the interview number column header, filter the data by interview number again (3)Starting with the first interview on the spreadsheet, scroll from left to right, comparing the selected interview and its duplicate 6 (4) Note any inconsistencies between interviews with the same interview number and correct them so that one of the two in the pair contains accurate information After all necessary corrections have been made, delete the duplicate interview Note: Only make changes to one of the two interviews in each pair. To prevent loss of data NEVER delete RAW data without first backing it up on another file.

7 Note: Manually merging data is NOT the same as using the “merge” or “consolidate” functions in Excel. “Merge” is designed for the editing of a single shared spreadsheet “Consolidate” allows you to combine two datasets into one, but won’t check for inconsistencies 7 On the subject of merging data… It may be possible to write a macro that would make the manual data merge process more efficient! Do you have any suggestions on how to perform this task more efficiently? It may be possible to write a macro that would make the manual data merge process more efficient! Do you have any suggestions on how to perform this task more efficiently? Now that the data’s merged, let’s move on to general cleaning…

8 Data Cleaning Challenges in FCAT Inconsistent values Outliers Missing values Calculated values Others Once you’re data is merged and checked for consistency you can start cleaning… Here are some things to look out for: 8

9 Inconsistent Values 1. Definition: When a second response is made invalid (either impossible or simply inaccurate) by an earlier given answer 2. Examples: Continue w/ FINCA? 1=Yes, 2=No Who made the decision to leave? Why did FINCA or Village Bank ask you to leave? Do you plan to return in the future? 1=Yes, 2=No 2Village BankClient defaulted1 2ClientN/A 1 Client defaultedN/A 3. Treatment: a. Filter b. Annotate (shaded cells show inconsistencies): 9

10 Outliers 1. Definition: Response outside the range of values 10

11 Outliers (continued) 2. Examples: 1) In general how is your health at this time? 1. Excellent 2. Good 3. Poor 4. Very Poor Answer: 7 2) How much does your household spend per week for food? Answer in Ecuador: $10,000 3. Treatment: a. Filter b. Annotate c. Correct value, if possible (e.g. mean of positive values) Special mention: Inliers. If a question calls for integers and the recorded answer is a decimal. e.g. recording a child’s age as.5 if he is yet to complete a year. Outlier: Response is out of answer range Outlier: Response amount is very unlikely 11

12 Missing Values 1. Definition: a. Stated information not recorded, not legitimate skips b. _____ 2. Examples: Continue w/ FINCA? 1=Yes, 2=No Who made the decision to leave? Why did FINCA or Village Bank ask you to leave? Do you plan to return in the future? 1=Yes, 2=No 2Village BankGroup dissolved1 2Client defaulted2 1N/A 3. Treatment: a. Filter b. Annotate c. Correct value, if possible (in shaded cells) Ex. If you can distinguish between missing value and legitimate skips, replace missing values with the mean over a defined sample (e.g. branch or region). 12

13 Calculated Values and Other Challenges Calculated Values 1. Definition:Data derived from sub-aggregated variables 2. Examples:DPCE, PPP converted from local currency unit 3. Treatment: Record units of measure Check formulas Others Text is text; numbers are numbers. Do not write in text responses for columns that accept only numbers. Please use the “Other” or “Notes” columns for this purpose. 13

14 Cleaning Data – Do’s Frequent and periodic: End of the day Much easier to clean 20 interviews than 80 or 320! Smaller samples are easier to manage: Avoids locality effects on false identification Avoids contamination of derived variables (e.g. DPCE) Keep two files: Raw data Cleaned data Always keep a back-up as well Record and annotate all data issues in a log or tracking document Techniques: Filtering Histograms Pivot tables In other words, do not let data problems snowball 14

15 Client ID Collection Please collect Client ID information from EACH client interviewed. It is not a violation of privacy, and you can assure the client that their personal information will not harm them in any way, that their responses will be to help make decisions to better loan products and services. 15

16 SurveyID For Entry into the Data Warehouse, we need to create a PRIMARY KEY for the Main Form to link to the cleaned Subform. The code appears like this when finished: DC20083101 (2 letter country code, the year collected, and an overall interview number from one fellow) Give each data collector a number (1, 2, or 3), and then add a column in BOTH the main form AND the Household Subform.* * A survey ID column is already included in the data entry template 16

17 SurveyID (cont’d) Collector 1 should take his/her overall individual interview number and add 1000 to it, collector 2 should add 2000, and collector 3 should add 3000. Armenia=AR Azerbaijan=AZ Tajikistan=TJ Therefore, the 14th interview performed by Georgia Collector#2 would be GA20102014. It would read that in the main form AND the HHSubform. Please maintain this convention throughout data collection. Georgia=GA Kyrgyzstan=KG Kosovo=KO Jordan = JO Russia=RU 17

18 Clean Data Data is “clean” if: All categorical codes match those in the survey design sheet *Ex.: Match drinking water sources with codes 1-15 All ordinal data are represented as whole numbers *Ex.: Do not have 3.4 years of education Outliers have been justified Missing data have been correctly annotated 18

19 19 Practice Preparing HH Sub Form Data Transferred from Palm When household sub form data is transferred from the Palms into Excel it displays in a manner that is not useful for analysis… See how data on each family member is stacked in the same columns We have to copy and paste each of these so that all the HH info gathered in one interview appear on the same line

20 HH Sub Form Data Practice Continued Step 1: Copy and paste the column headings across the top line 15 times! Step 2: Starting at the time stamp, cut and paste each row with the same interview number on to a single line (moving from left to right. 20

21 HH Sub Form Data Practice Continued After Deleting the blank spaces left from cutting / pasting, you’ll end up with something that looks like this: 21 Note how each row contains one distinct interview number and has data on all the members of that household Open this file if you’d like some practice

22 Questions? 22 FINCA Client Assessment Tool (FCAT): Advanced Data Entry


Download ppt "FINCA Client Assessment Tool (FCAT): Advanced Data Entry Presented: September 2010 Eurasia Marketing Manager Meeting."

Similar presentations


Ads by Google