Making Large Data Sets Work for You Advantages and Challenges Lesley H Curtis Soko Setoguchi Bradley G Hammill
Presenter disclosure information Lesley H Curtis Large Data Sets: An Overview FINANCIAL DISCLOSURE: None UNLABELED/UNAPPROVED USES DISCLOSURE: None
Agenda Large Data Sets: An Overview Prescription Drug Data: Advantages, Availability, and Access Linking Large Data Sets: Why, How, and What Not to Do Practical Examples
Which large data sets? Relevant for cardiovascular research Available to researchers Potential for linkage Claims data—federal and commercial Inpatient registries Longitudinal cohort studies
Claims data n Derived from payment of bills n Payor-centric n Examples l Medicare l Medicaid l Thomson-Reuters l United Health Care
Medicare claims data n Inpatient services (Part A) n Outpatient services (Part B) n Physician services (Carrier, Part B) n Durable medical equipment n Home health care n Skilled nursing facilities n Hospice
Medicare claims data elements n What data are available l Demographics l Service dates l Diagnoses l Procedures l Hospital / Physician n What data are not available l Physiological measures l Test results l Times of admission, procedures, etc. l Medications
Medicare claims data coverage n National scope n What patients will be represented? l Patients enrolled in traditional (fee-for- service) Medicare n What patients will not be represented? l Patients receiving care through the Veterans Health Administration l Patients enrolled in Medicare managed care plans
Medicare claims data quality n Main point l Reliability of specific claims data elements depends on importance for reimbursement n Good data on… l Major procedures l Hospitalizations l Mortality n Inconsistent data on… l Comorbidities and illness severity l Procedures with low reimbursement rates
Acquiring CMS claims data n All requests begin with ResDAC ( n Cost l $15K per year of inpatient+denominator data l $20K per year of 5% data across all files l $30K+ per year of data for custom requests n Detailed approval process l Prepare request packet for ResDAC review (4-6 weeks) l Review by CMS privacy board (4 weeks) l Request processed by contractor (6-8 weeks)
Preparing for CMS claims data n Make space l 16 GB for 100% denominator and inpatient files l 57 GB for 5% denominator, inpatient, outpatient, and carrier* files n Manage expectations l Time to process files l Transforming raw claims into usable information Coding algorithms Coding changes l Learning curve
The Learning Curve
Claims data n Derived from payment of bills n Payor-centric n Examples l Medicare l Medicaid l Thomson-Reuters l United Health Care
Commercial claims data elements n What data are typically available l Demographics l Service dates l Diagnoses l Procedures l Medications l Hospital / Physician n What data may not be available l Physiological measures l Test results
Commercial claims data coverage n National scope n What patients will be represented? l Individuals who are commercially insured n What patients will not be represented? l The uninsured l Medicare managed care?
Commercial claims data quality n Similar to Medicare claims data l Reliability of specific claims data elements depends on importance for reimbursement n Good data on… l Major procedures l Hospitalizations n Inconsistent data on… l Mortality l Comorbidities and illness severity l Procedures with low reimbursement rates
Preparing for commercial claims data n Cost l $25-70K depending on size, scope of data request n Size l 100 GB per year of data l Analysis sample sizes will differ from advertised sample sizes n Manage expectations!
Registry data n Observational cohorts of patients undergoing specific treatments or having specific conditions n Purpose may be to assess… l Quality of care l Provider performance l Treatment safety/effectiveness n Of interest today are hospital-based registries
OPTIMIZE-HF registry n Hospital-based quality improvement program and internet-based registry for heart failure. n : 50,000 patients; > 250 hospitals n Transitioned to GWTG-HF in 2005
Registry data coverage n Only patients treated at participating hospitals will be included + All patients at these hospitals included regardless of payor – Participating hospitals may not be representative of hospitals nationwide % of group in selected states State US Elderly Medicare FFS OPTIMIZE-HF California10.1%7.7%13.8% Florida7.4%7.0%8.7% Michigan3.4%4.0%9.5% New York 6.6%6.1%3.5% Pennsylvania5.2%4.4%6.7% Texas6.0%6.5%5.4%
Registry data quality n Good data on… l Many of the things not included in Medicare data: Labs, medications, treatment timing, process measures, contraindications (if collected) n Inconsistent data on… l Post-hospitalization follow-up care l Outcomes, particularly long-term
Accessing registry data n Networking and partnering l Many require that analyses be performed at selected analytical centers which may have long queues n Approval process via steering or executive committee
NHLBI longitudinal cohort studies n Atherosclerosis Risk in Communities Study (ARIC) n Cardiovascular Health Study (CHS) n Framingham Heart Study n Jackson Heart Study n Multi-Ethnic Study of Atherosclerosis n Women’s Health Initiative
Cardiovascular Health Study (CHS) n n Prospective, observational study of CV disease in the elderly (Washington Co. Maryland, Forsyth Co. NC, Sacramento Co. CA, and Pittsburgh, PA.) n n Baseline exams occurred from n n Minority cohort added at Year 5 n n Annual exams, with ‘major’ exams occurring at year 5 ( ), and year 9 ( ). Last exam was year 11 ( ). n n 5,201 participants at baseline; 687 additional minority participants 5,888
Cardiovascular Health Study data elements n What data are available l Demographics l Medical, personal history l Physiological measures, test results l QOL, depression l Cognitive function n What data are not available l Service dates l Procedures l Hospital/physician
Cardiovascular Health Study data quality n Main point l Data collected are of high quality n Good data regarding… l Cardiovascular risk factors l Cardiovascular endpoints l General health n Limited data on… l Non-cardiovascular risk factors l Non-cardiovascular endpoints
Accessing NHLBI cohort studies n Via the NHLBI data repository l HIPAA identifiers, geography removed n Via Coordinating Center for identifiable data n Size l 20MB per year of data
NHLBI-Medicare linked data sets n CMS linked with… l CHS ( , pending) l Framingham ( pending) l Jackson Heart Study ( pending) l Multi-Ethnic Study of Atherosclerosis ( pending) l Atherosclerosis Risk in Communities l Women’s Health Initiative
Conclusion n Large data sets abound n Do yourself a favor…manage expectations!
Contact Information Lesley Curtis