Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Overlap Based on Expected Value

Similar presentations


Presentation on theme: "Improving Overlap Based on Expected Value"— Presentation transcript:

1 Improving Overlap Based on Expected Value
By: Joo Li Lee

2 STEP 1: CLEAN TABLE SELECT [ID] ,Cast([Cancer] as float) Cancer ,[I305 1] I305_1 ,[I309 81] I309_81 ,[I311 ] I311 ,[IE849 7] IE849_7 ,[I150 9] I150_9 ,[I276 1] I276_1 ,[I276 8] I276_8 ,[I530 81] I530_81 ,[I263 9] I263_9 ,[I276 51] I276_51 ,[IV15 82] IV15_82 ,[I511 9] I511_9 ,[I401 9] I401_9 ,[I787 20] I787_20 ,[I564 00] I564_00 ,[I272 4] I272_4 ,[I280 9] I280_9 ,[I285 9] I285_9 ,[I496 ] I496 ,[I458 9] I458_9 ,[I486 ] I486 ,[IV58 61] IV58_61 ,[I197 7] I197_7 ,[I578 9] I578_9 ,[I584 9] I584_9 ,[IV66 7] IV66_7 ,[I244 9] I244_9 ,[I414 01] I414_01 ,[I599 0] I599_0 ,[I414 00] I414_00 ,[I585 9] I585_9 ,[I600 00] I600_00 ,[I428 0] I428_0 ,[I427 31] I427_31 ,[I403 90] I403_90 ,Cast([Dead] as float) Dead into StomachCancer_Table FROM [HAP823].[dbo].[StomachCancer] This code shows cleaning the table name StomachCancer

3 STEP 2: CREATING STRATA DROP TABLE #STRATA_TABLE SELECT ID ,Cancer ,Dead ,[I150_9]+[I197_7]+[I244_9]+[I263_9]+[I272_4]+[I276_1]+[I276_51]+[I276_8]+[I280_9] +[I285_9]+[I305_1]+[I309_81]+[I311]+[I401_9]+[I403_90]+[I414_00]+[I414_01]+[I427_31] +[I428_0]+[I458_9]+[I486]+[I496]+[I511_9]+[I530_81]+[I564_00]+[I578_9]+[I584_9] +[I585_9]+[I599_0]+[I600_00]+[I787_20]+[IE849_7]+[IV15_82]+[IV58_61]+[IV66_7]Strata INTO #STRATA_TABLE FROM StomachCancer_Table ORDER BY STRATA; Creating strata by concatenating the independent variables

4 STEP 4: CREATING CONTROLS
STEP 3: CREATING CASES STEP 4: CREATING CONTROLS DROP TABLE #CASES Select Count (distinct ID) as nCases ,SUM(IIF(Dead = 1, 1., 0.)) as a ,SUM(IIF(Dead = 0, 1., 0.)) as b ,STRATA INTO #Cases FROM #STRATA_TABLE WHERE CANCER = 1 GROUP BY STRATA DROP TABLE #CONTROLS Select Count (distinct ID) as nControls ,SUM(IIF(Dead = 1, 1., 0.)) as c ,SUM(IIF(Dead = 0, 1., 0.)) as d ,STRATA INTO #Controls FROM #STRATA_TABLE WHERE CANCER = 0 GROUP BY STRATA

5 STEP 5: INNER JOIN – FIND CASES WITH MATCHES
DROP TABLE #MATCH SELECT Round(Cast([nCases] as float)/Cast([nControls] as float),2) As [Weight] ,nControls ,c -- no cancer & dead ,d -- no cancer & not dead ,#Cases.* INTO #MATCH FROM #Cases JOIN #Controls On #Cases.Strata = #Controls.STRATA ORDER BY [WEIGHT]

6 STEP 5.1: CALCULATING OVERLAP
AS FLOAT = (SELECT SUM(nCases) FROM #Cases) SELECT as overlap FROM #MATCH -- Overlap : Using the partial matching method, we are going to increase this overlap.

7 STEP 6: LEFT JOIN – FIND CASES WITH NO MATCH
DROP TABLE #MATCH_LEFT /*Left Join*/ SELECT ROW_NUMBER() OVER (ORDER BY #Cases.STRATA DESC) as ID ,'0' STRATA_SET ,Round(Cast([nCases] as float)/Cast([nControls] as float),2) As [Weight] ,nControls ,c -- no cancer & dead ,d -- no cancer & not dead ,#Cases.* INTO #MATCH_LEFT FROM #Cases LEFT JOIN #Controls On#Cases.Strata = #Controls.STRATA SELECT * FROM #MATCH_LEFT

8 STEP 6.1: CREATING TWO TABLES (MATCHED AND NOT MATCHED)
DROP TABLE #MATCH_NULL SELECT * INTO #MATCH_NULL Match NULL FROM #MATCH_LEFT WHERE [WEIGHT] IS NULL --(96 rows affected) DROP TABLE #MATCH_NOT_NULL INTO #MATCH_NOT_NULL Match NOT NULL FROM #MATCH_LEFT WHERE [WEIGHT] IS NOT NULL --(84 rows affected) Insert cases that did NOT match into table #MATCH_NULL Insert cases that did match into table #MATCH_NOT_NULL

9 STEP 6.2: CREATING STRATA TABLE FOR CONTROLS
DROP TABLE #CONTROLS_STRATA SELECT nControls, c, d ,STRATA ,RIGHT(STRATA, LEN(STRATA) -1) STRATA_1 ,RIGHT(STRATA, LEN(STRATA) -2) STRATA_2 ,RIGHT(STRATA, LEN(STRATA) -3) STRATA_3 ,RIGHT(STRATA, LEN(STRATA) -4) STRATA_4 ,RIGHT(STRATA, LEN(STRATA) -5) STRATA_5 ,RIGHT(STRATA, LEN(STRATA) -6) STRATA_6 ,RIGHT(STRATA, LEN(STRATA) -7) STRATA_7 ,RIGHT(STRATA, LEN(STRATA) -8) STRATA_8 ,RIGHT(STRATA, LEN(STRATA) -9) STRATA_9 ,RIGHT(STRATA, LEN(STRATA) -10) STRATA_10 INTO #CONTROLS_STRATA FROM #Controls This part of the code is creating new columns by dropping one diagnosis every time the number is decreasing

10 STEP 6.3: CREATING STRATA TABLE FOR CASES
DROP TABLE #Cases_STRATA SELECT ID , nCases, a, b ,RIGHT(STRATA, LEN(STRATA) -1) STRATA_1 ,RIGHT(STRATA, LEN(STRATA) -2) STRATA_2 ,RIGHT(STRATA, LEN(STRATA) -3) STRATA_3 ,RIGHT(STRATA, LEN(STRATA) -4) STRATA_4 ,RIGHT(STRATA, LEN(STRATA) -5) STRATA_5 ,RIGHT(STRATA, LEN(STRATA) -6) STRATA_6 ,RIGHT(STRATA, LEN(STRATA) -7) STRATA_7 ,RIGHT(STRATA, LEN(STRATA) -8) STRATA_8 ,RIGHT(STRATA, LEN(STRATA) -9) STRATA_9 ,RIGHT(STRATA, LEN(STRATA) -10) STRATA_10 INTO #CASES_STRATA FROM #MATCH_NULL This part of the code is creating new columns by dropping one diagnosis every time the number is decreasing

11 STEP 7: DROPPING ONE ATTRIBUTE AT A TIME – FINDING LONGEST MATCH FOR CASES THAT HAD NO MATCH
DROP TABLE #MATCH_1 SELECT DISTINCT ID, '1' STRATA_SET ,round(CAST([nCases] as float)/Cast([nControls] as float),2) [Weight] ,nControls ,t2.c ,t2.d ,nCases ,a,b ,t1.STRATA_1 STRATA INTO #MATCH_1 FROM #CASES_STRATA t1 LEFT JOIN #CONTROLS_STRATA t2 on t1.STRATA_1 = T2.STRATA_1 DROP TABLE #MATCHED_NULL SELECT * INTO #MATCHED_NULL FROM #MATCH_1 WHERE [WEIGHT] IS NOT NULL

12 STEP 7.1: DROPPING ANOTHER ATTRIBUTE – FINDING LONGEST MATCH FOR CASES THAT HAD NO MATCH DURING FIRST ATTRIBUTE DROP DROP TABLE #MATCH_2 SELECT DISTINCT ID ,'2' STRATA_SET ,round(CAST(SUM([nCases]) as float)/Cast(SUM([nControls]) as float),2) [Weight] ,SUM(nControls) nControls ,SUM(t2.c) C,SUM(t2.d) D ,SUM(nCases) nCases,SUM(a) A,SUM(b) B,t1.STRATA_2 STRATA INTO #MATCH_2 FROM #CASES_STRATA t1 LEFT JOIN #CONTROLS_STRATA t2 on t1.STRATA_2 = T2.STRATA_2 WHERE ID NOT IN (SELECT ID FROM #MATCHED_NULL) GROUP BY ID,t1.STRATA_2 INSERT INTO #MATCHED_NULL SELECT * FROM #MATCH_2 WHERE [WEIGHT] IS NOT NULL

13 CONTINUE DROPPING ATTRIBUTES

14 STEP 8: CREATING A FINAL TABLE TO ANALYZE ON
DROP TABLE #FINAL SELECT * INTO #FINAL FROM #MATCH_NOT_NULL INSERT INTO #FINAL SELECT * FROM #MATCHED_NULL SELECT * FROM #FINAL The #MATCH_NOT_NULL table contains the cases that matched from the beginning. The #MATCH_NULL table contains cases where matches were found by dropping attributes one by one. All the matched cases are in the table #FINAL. Table #FINAL will be used to analyze the improved overlap

15 STEP 8.1: CALCULATING OVERLAP
AS FLOAT = (SELECT SUM(nCases) FROM #Cases) SELECT AS overlap FROM #FINAL Type Overlap Original Dropped up to 7 Attributes


Download ppt "Improving Overlap Based on Expected Value"

Similar presentations


Ads by Google