Presentation is loading. Please wait.

Presentation is loading. Please wait.

Preparing Your Data for Analysis September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development Steve Poulin,

Similar presentations


Presentation on theme: "Preparing Your Data for Analysis September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development Steve Poulin,"— Presentation transcript:

1 Preparing Your Data for Analysis September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development Steve Poulin, University of Pennsylvania Larry Buron, Abt Associates Inc.

2 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 2 General Data Quality Issues Missing Data Missing Cases Missing Responses Inaccurate Data

3 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 3 Missing Data Effects Biases results if subjects with valid data are significantly different from those with missing data Concern about the effects of bias increases with number of missing values Missing personal identifying data (Social Security Number, name, birth date, and gender) makes it more difficult to unduplicate client records, thereby inflating counts of homeless persons Missing Program Exit Dates make it appear that clients have never exited a shelter, thereby overstating their length of time in shelter and inflating the count of homeless person in the time period Don’t Know and Refused responses have same effect as blanks, although they may useful for reducing the number of missing responses

4 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 4 Checking for Missing Data Missing Cases A one-day census for any shelter can be calculated by selecting cases with a Program Entry Date less than or equal to a particular date AND Program Exit Date greater than or equal to the date (or possibly the next date, depending on how exit date is collected) Program Exit Date that is null The occupancy rate for a date can be calculated by dividing the one-day census by the shelter’s capacity Lower than expected occupancy rates suggest that the shelter is not recording all persons served Higher than expected occupancy rates suggests a failure to enter exit dates

5 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 5 Checking for Missing Data Missing Responses The potential for bias and reasons for missing responses can be explored by comparing profiles of persons with missing responses to those with valid responses Compare the percent of missing across providers to determine if it is a site-specific training issue

6 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 6 Inaccurate Data Misrepresents the description of homeless clients Inaccurate personal identifying data compromises the unduplication process

7 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 7 Checking for Inaccurate Data Out-of-range category codes Program entry dates greater than program exit dates results in negative lengths of stay values Birth dates greater than program entry dates results in negative ages Look at the distribution of each variable by provider type to look for values that do not make sense (e.g. over-age clients at a youth shelter or a person over 100-years old at any shelter)

8 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 8 Special Data Standards Issues Social Security Number If only a partial SSN is recorded, the database should fill in the missing numbers with blanks so that the provided numbers are saved in the correct place of the Social Security Number This maximizes matching ability

9 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 9 Special Data Standards Issues Name, Date of Birth, and Gender Combination of these data elements can be used to create an alternative unique identifier if Social Security Number is missing Missing elements of Name (first name, middle name, last name) or Date of Birth (month, day, year) makes alternative identifier less unique Names must be recorded consistently to maximize matching nicknames should be avoided Format for birth dates (MM/DD/YYYY) must be consistent to create alternative identifiers that match

10 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 10 Special Data Standards Issues Race HMIS should allow more than one of the five race categories to be selected This will require the creation of multiple fields in the HMIS (e.g., race1, race2, etc.)

11 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 11 Special Data Standards Issues Residence Prior to Program Entry Residences can be recorded in greater detail than the 17 categories available in the Data Standards, but they must fit in one and only one of these categories Residences that do not fit in any of the 16 specified categories must be recorded as Other

12 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 12 Special Data Standards Issues Program Entry Date & Program Exit Date Missing Program Exit Dates is a common problem, especially if exits are frequent or ambiguous Frequent exits occur in shelters that do not reserve beds for more than one night; clients must be re-admitted every night. These shelters should think about automating the creation of an exit date or entering the exit date at the same time as the entry date. Ambiguous exits occur when beds are reserved for clients who leave shelters temporarily When in doubt if an exit has occurred, record the date of exit Shelter visits can be “reconstructed” by calculating the time between the last exit date and the next entry date

13 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 13 Improving Data Quality Feedback Training Sanctions

14 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 14 Feedback Data quality problems should be regularly communicated to service providers Reports of value to the service providers should be provided to encourage a vested interest in the quality of the data collected

15 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 15 Training Commonly occurring data quality issues should be identified by HMIS administrators Training sessions for service provider staff should be organized to address these issues

16 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 16 Sanctions As the most extreme measure, shelter providers could be sanctioned for submitting poor quality data Sanctions may range from withholding funds to denying the provision of HMIS reports Gentle persuasion always preferred

17 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 17 Data Manipulation This phase of data preparation usually takes longer than the analysis! Major steps: Recoding data values Computing new data values Merging datasets

18 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 18 Recoding Data Values Categories may be collapsed into fewer, more meaningful categories e.g., the 17 types of residences prior to program entry could be collapsed into the following categories: Street, Housed, and Institutional, Other, Unknown Discrete values can be collapsed into categories e.g., specific ages can be recoded into age categories, such as 0-17, 18-30, 31 to 50, etc. Recoding data can improve categorical statistical analysis techniques, such as chi-square analysis

19 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 19 Computing New Data Values Length of stay in shelter can be computed by subtracting Program Entry Date from Program Exit Date (and possibly adding “1” depending on how exit data are collected) Age at program entry is computed by subtracting Date of Birth from Program Entry Date Computing new data values creates more data for analysis

20 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 20 Merging Datasets Two types of merging data from different datasets may be necessary Adding cases Adding variables

21 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 21 Merging Datasets Adding cases may involve merging clients from different datasets into one dataset, creating a bigger dataset with more clients for analysis This process is facilitated by the use of the same field names in each of the merged datasets Adding variables merges variables from different datasets for the same clients Will require use of a key variable that connects data for the same clients, such as a client ID The new dataset will contain more information per client than the original datasets, thereby expanding the opportunities for analysis

22 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 22 What Is Extrapolation? Extrapolation is a method for estimating the total number of people receiving homeless residential services when some, but not all, of the residential service providers participate in HMIS.

23 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 23 Why Do We Need To Extrapolate? Because otherwise we will undercount the number of people who use homeless residential services if some providers do not participate in HMIS If all providers participate in HMIS, extrapolation is not needed

24 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 24 Methods of Extrapolation Simple proportionate extrapolation works when providers who do not participate in HMIS can be considered “missing at random.” Regression-based extrapolation method can work if providers are not missing at random as long as there is some overlap between types of providers that do and do not participate. If providers not participating are really different from any of providers participating (in terms of bed utilization or clientele characteristics), may not be able to accurately extrapolate.

25 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 25 Simple Proportionate Extrapolation Proportionate extrapolation estimates the number of homeless persons served by non-participating providers as a proportion of the number of homeless persons served by participating providers. The proportion is determined by comparing the size of non-participating providers to participating providers. For example, if the group of non-participating providers is the same size as the group of participating providers, then estimate that the non-participating providers serve the same number of homeless persons as participating providers

26 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 26 What Measure Of Provider Size Should Be Used? Need a measure of size that is correlated with the number of homeless persons served and is measurable for non- participating providers For shelter providers, bed capacity—the total number of beds a provider has to serve homeless clients—has the needed characteristics for the measure of size In general, the higher the bed capacity, the more clients a provider serves Bed capacity of a provider can be obtained whether or not the provider participates in HMIS

27 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 27 How to Calculate Estimate for Non-Participants For HMIS participants, calculate the average number of clients served per bed as: The estimated number of clients served by non-participants is:

28 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 28 Total Estimate of Homeless Persons Actual number of homeless persons using HMIS participating providers + Estimated number of homeless persons using non-participating providers

29 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 29 Example # of Providers # of Beds # of Homeless Persons Participating Providers 20 750 2,250 Non-Participating Providers 10 250 ? a.For participating provide rs, the number of clients per available bed is 2,250/750 = 3. b.Estimate of number of clients for non-participating providers is 3 * 250 = 750. c.Total estimate of homeless persons is 2,250 + 750 = 3000.

30 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 30 Conduct Extrapolation Separately for Different Types of Providers Extrapolation will be more accurate if it is done separately for different types of providers that are likely to have a different number of clients served per unit of size (because of different utilization or turnover rates) For example, for homeless residential service providers, you may want to separate as follows: Emergency shelter beds for individuals Emergency shelter beds for families Transitional housing beds for individuals Transitional housing beds for families

31 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 31 Total Number of People Using Homeless Services Once you have an estimate of homeless clients for different types of service providers, you may want to aggregate these estimates to arrive at an estimate of the total number of homeless clients served in your entire jurisdiction. To get a unique count of people using homeless services, you need to eliminate double counting of people who used more than one type of provider.

32 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 32 Double Counting Illustration

33 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 33 Regression-Based Extrapolation With this technique, you estimate the number and characteristics of clients served based on the characteristics of the providers For example, you regress the number of clients served on the size of the provider, the type of provider, and other provider characteristics (e.g., special populations served) for participating providers. Then you apply the model to non-participating providers to estimate the number and characteristics of people they served. The benefit of this method is that it bases the extrapolation on more than just the size of non-participating providers.

34 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 34 Non-Participating Providers are “Missing at Random” Compare the non-size characteristics of participating and non-participating providers. For counts of the number of homeless people served, compare whether participating and non-participating providers serve same number of people per unit of size. For the characteristics of homeless people, compare the characteristics of people served by participating and non- participating providers. If you don’t have client-level data, you can compare the service populations. Men? Women? Veterans? DV Victims? Special needs populations? Age? Race/ethnicity?

35 September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development 35 Concluding Thoughts on Extrapolation Achieving 100-percent participation in your HMIS will result in more accurate estimates and eliminate the need to extrapolate. Extrapolation techniques are more accurate the higher your participation rate. The rules of thumb are: 75% participation generally results in very good estimates; < 50% participation rate estimates are less reliable (unless the participants are truly a random sample)


Download ppt "Preparing Your Data for Analysis September 13-14, 2005 St. Louis, Missouri Sponsored by the U.S. Department of Housing and Urban Development Steve Poulin,"

Similar presentations


Ads by Google