PhUSE 20141October 2014 Ziekte gebied/ Overall subject Name presenterMonth-Year Title presentation PhUSE 2014 Berber SnoeijerOct 2014 Simple and Efficient.

Slides:



Advertisements
Similar presentations
Haas MFE SAS Workshop Lecture 3:
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 12, Part A.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
Line Balancing Problem A B C 4.1mins D 1.7mins E 2.7 mins F 3.3 mins G 2.6 mins 2.2 mins 3.4 mins.
Basic Spreadsheet Functions Objective Functions are predefined formulas that perform calculations by using specific values, called arguments, in.
©GoldSim Technology Group LLC., 2012 Optimization in GoldSim Jason Lillywhite and Ryan Roper June 2012 Webinar.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
CSE 830: Design and Theory of Algorithms
Hash vs Join A case study evaluating the use of the data step hash object to replace a SQL join Geoff Ness Sep 2014.
Basic And Advanced SAS Programming
1 These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 5/e and are provided with permission by.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
A Guide to SQL, Seventh Edition. Objectives Retrieve data from a database using SQL commands Use compound conditions Use computed columns Use the SQL.
Microsoft Access 2010 Chapter 7 Using SQL.
Math – Getting Information from the Graph of a Function 1.
Copyright 2007, Paradigm Publishing Inc. BACKNEXTEND 3-1 LINKS TO OBJECTIVES Save a Filter as a Query Save a Filter as a Query Parameter Query Inner, Left,
Chapter 3: Combining Tables Horizontally using PROC SQL 1 ©Spring 2012 Imelda Go, John Grego, Jennifer Lasecki and the University of South Carolina.
SAS SQL SAS Seminar Series
9/8/ Relations and Functions Unit 3-3 Sec. 3.1.
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency.
Chapter 3 Single-Table Queries
SAS SQL Part 2 Alan Elliott. Dealing with Missing Values Title "Dealing with Missing Values in SQL"; PROC SQL; select INC_KEY,GENDER, RACE, INJTYPE, case.
PROC SQL: Tips and Translations for Data Step Users By: Gail Jorgensen Susan Marcella.
PROC SQL Phil Vecchione. SQL Structured Query Language Developed by IBM in the early 1970’s From the 70’s to the late 80’s there were different types.
SAS Efficiency Techniques and Methods By Kelley Weston Sr. Statistical Programmer Quintiles.
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
Implementing Natural Joins, R. Ramakrishnan and J. Gehrke with corrections by Christoph F. Eick 1 Implementing Natural Joins.
Database Systems Microsoft Access Practical #3 Queries Nos 215.
1 Efficient SAS Coding with Proc SQL When Proc SQL is Easier than Traditional SAS Approaches Mike Atkinson, May 4, 2005.
SQL queries ordering and grouping and joins
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
CS4432: Database Systems II Query Processing- Part 2.
By: David Gelbendorf, Hila Ben-Moshe Supervisor : Alon Zvirin
Model 5 Long Distance Phone Calls By Benjamin Cutting
Course title: Database-ii Chap No: 03 “Advanced SQL” Course instructor: ILTAF MEHDI.
An Introduction Katherine Nicholas & Liqiong Fan.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
USING ACCESS TO SEGMENT SURVEY DATA. OPEN ACCESS You May Need to Search for the Program You May Need to Search for the Program Access is a Database Access.
Matching. Objectives Discuss methods of matching Discuss advantages and disadvantages of matching Discuss applications of matching Confounding residual.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
BMTRY 789 Lecture 6: Proc Sort, Random Number Generators, and Do Loops Readings – Chapters 5 & 6 Lab Problem - Brain Teaser Homework Due – HW 2 Homework.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
CS Class 04 Topics  Selection statement – IF  Expressions  More practice writing simple C++ programs Announcements  Read pages for next.
Algorithmic Foundations COMP108 COMP108 Algorithmic Foundations Algorithm efficiency Prudence Wong.
Algorithmic Foundations COMP108 COMP108 Algorithmic Foundations Algorithm efficiency Prudence Wong
IFS180 Intro. to Data Management Chapter 10 - Unions.
Structured Query Language
Structured Query Language (Data Manipulation Language)
CS 440 Database Management Systems
Putting tables together
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Efficiency (Chapter 2).
Algorithm An algorithm is a finite set of steps required to solve a problem. An algorithm must have following properties: Input: An algorithm must have.
Creating the Example Data
Outer Joins Inner joins returned only matching rows. When you join tables, you might want to include nonmatching rows as well as matching rows.
Iteration: Beyond the Basic PERFORM
Sorting … and Insertion Sort.
Combining Data Sets in the DATA step.
Lecture 2- Query Processing (continued)
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
Query Functions.
Access: Queries III Participation Project
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
SQL set operators and modifiers.
Shelly Cashman: Microsoft Access 2016
Presentation transcript:

PhUSE 20141October 2014 Ziekte gebied/ Overall subject Name presenterMonth-Year Title presentation PhUSE 2014 Berber SnoeijerOct 2014 Simple and Efficient Matching Algorithms for Case-Control Matching Edith Heintjes

PhUSE 20142October 2014 Contents Observational studies Basic technique Different matching options Conclusions

PhUSE 20143October 2014 Observational studies (Retrospective) cohort Case-Control VS Case Control

PhUSE 20144October 2014 Case-control studies Limit possible confounding factors

PhUSE 20145October 2014 Case-control studies Exact and caliper matching

PhUSE 20146October 2014 Case-control studies

PhUSE 20147October 2014 Expected result

PhUSE 20148October 2014 Matching

PhUSE 20149October 2014 Efficient programming Limit number of data steps PROC sql; CREATE table Myagbs AS SELECT Distinct agb FROM data.fi_medicijnen_20145 quit; data fif3 ; input POSTCODEINWONERSPROVINCIEPLAATSFIF3NAAMFIF3; run ; proc SQL; create table xar3 as SELECT f.fif3, f.naamfif3, oapo_artcd, month(oapo_afldat) as month, year(oapo_afldat ) as year, ORDER BY fif3, oapo_artcd, year, month ; QUIT; data Inkoop_fif3 (RENAME=(var1=agb var2=fif3 )); format Var1-var2 repmon verpak 12. zindex $8.; input var1-var2 zindex periode verpak; run ; proc sql ; create table data.fi_medicijnen_fif3 as select a.agb, a.zindex, a.fif3, a.verpak as aantalstuks, a.djm format=ddmmyy10., from inkoop_fif3 a left join data.fi_knmp as b on a.zindex = left(b.knmp_artcd); quit; Proc SQL; CREATE TABLE XXXAS SELECT zindex, djm, fif3, knmp_prcd, knmp_atccd, knmp_inkhoev, SUM(aantalstuks) as aantalstuks FROM data.fi_medicijnen_fif3 GROUP BY zindex, djm, fif3, knmp_prcd, knmp_atccd, knmp_inkhoe;; QUIT; PROC SQL; CREATE TABLE Xar4 AS SELECT a.*, FROM xar3 as a FULL OUTER JOIN TotXarelto as b ON a.oapo_artcd=b.zindex ; QUIT;

PhUSE October 2014 Efficient programming Limit sorting

PhUSE October 2014 Efficient programming Decrease size of datasets

PhUSE October 2014 Efficient programming Limit number of iterations

PhUSE October 2014 Basic technique 1.Construct all possible pairs 2.Add a random number to each combination 3.Sort by control and random number PROC SQL; CREATE _Input AS SELECT a.*, b.*, ranuni(&Seed) as randomnum FROM Cases as a INNER JOIN Controls as bON … (all exact and caliper criteria) ORDER BY Pt_control, randomnum; QUIT;

PhUSE October 2014 Basic technique 4. Pick the first case for each control data _Result1; set _Input2; by Pt_control; if first.pt_control then output; run; 5. Sort by case proc sort data = _Result1; by Pt_case randomnum; run;

PhUSE October 2014 Basic technique 6. Pick the controls up to the maximum number of controls you desire data _result2; set _result1; retain Matchno; by Pt_case; if first.pt_case then Matchno=1; ELSE MatchNo=MatchNo+1; if Matchno<=&MaxMatch then output _result2; run;

PhUSE October 2014 Basic technique

PhUSE October 2014 By round

PhUSE October 2014 Closest match Calculate all absolute differences between the case and controls. Sort by absolute difference and then closest distance. PROC SQL; CREATE _Input AS SELECT a.*, b.*, ranuni(&Seed) as randomnum, Abs(CaseVal-RefVal) as AbsDif FROM Cases as a INNER JOIN Controls as bON … (all exact and caliper criteria) ORDER BY Pt_control, AbsDif, randomnum; QUIT;

PhUSE October 2014 Closest match – plaatje omdraaien 1: 1.5 2: 1.7 3: : : : : : : 2.0

PhUSE October 2014 Tests Match 1 control by round Distance Rank Priority Least number of matches priority Run Time Total number of matched cases Total number of matched Pairs Number of iterations No 1 min, 4 sec NoYesNo1 min, 0 sec No Yes1 min, 19 sec NoYes 1 min, 57 sec YesNo 4 min, 41 sec Yes No4 min, 37 sec YesNoYes5 min, 29 sec Yes 9 min, 37 sec cases, possible matches, maximum of 8 controls per case

PhUSE October 2014 Least number of matches method Proc SQL; Create table _input2 as select *, ranuni(&Seed) AS randomnum, Count(*) as Nmatches from _InputMe group by pt_case order by pt_control, Nmatches, randomnum ; Quit; data _Result1; set _Input2; by Pt_control; if first.pt_control then output; run;

PhUSE October 2014 Least number of matches method (2) Proc SQL; Create table _input2 as select *, ranuni(&Seed) AS randomnum, case when (Count(*) <= 10) Then count(*) when (Count(*) <= 100) Then ROUND(count(*),10.) when (count(*) <= 1000) then round(Count(*),100.) when (count(*) <= 10000) then round(count(*),1000.) else end as Nmatches from _InputMe group by pt_case order by pt_control, Nmatches, AbsDif, randomnum ; Quit; … … 1000

PhUSE October 2014 Example 2415 cases possible matches Match on –gender –age range (+/- 2.5 year) Max 10 matches per case No replacement All at once 7 rounds 47 seconds

PhUSE October 2014 Example 2415 cases possible matches Match on –gender –age range (+/- 2.5 year) Max 10 matches per case No replacement Round by round, 10% saturation 16 rounds 1 min 50 seconds

PhUSE October 2014 Example 2415 cases possible matches Match on –gender –age range (+/- 2.5 year) Max 10 matches per case No replacement Round by round, 60% saturation 19 rounds 1 min 58 seconds

PhUSE October 2014 Example 2415 cases possible matches Match on –gender –age range (+/- 2.5 year) Max 10 matches per case No replacement Round by round, full saturation 41 rounds 2 min 21 seconds

PhUSE October 2014 Conclusions Efficient and fast Useful with Big data Optimal Can handle any combination of exact and caliper variables Can handle any number of matches to controls Final distribution can be examined and best options can be chosen

PhUSE October 2014 Questions?