Download presentation
Presentation is loading. Please wait.
1
Prepare data for Microdeletion
Jianfang Chen
2
1. Original Data Set.
3
(1) snp_homozygosity_data. (2) snp_location_data. (3) Parameter file.
2. Objective Data Sets. (1) snp_homozygosity_data. (2) snp_location_data. (3) Parameter file.
4
snp_homozygosity_data -- The first row is the title line
snp_homozygosity_data -- The first row is the title line. The first column is affection_status (0 for controls, 1 for cases). The remaining columns are homozygosity data at each site ( 0 for missing, 1 for homozygotes, 2 for heterozygotes). Example of "snp_homozygosity_data" (with two controls, two cases, 6 SNPs): indicator v1 v2 v3 v4 v5 v6
5
snp_location_data -- The first row is the title line
snp_location_data -- The first row is the title line. The first column is SNP index number. Second col is SNP location. The locations are sorted in increasing order. Example of "snp_location_data" (with 6 SNPs): order position
6
Parameter file -- It needs the following inputs (one input per line):
snp_homozygosity_data_name, snp_location_data_name, output_file_name, num_cont, num_case, num_site, maximum_window_size, num_rep1
7
3. Algorithm sort orginal data by FamilyID, Position and Marker_name.
remove one marker with duplicate position. for each family within a marker (3 individuals) leave child as case
8
combine father and mother into one line as control, based on the following algorithm:
suppose father (a,b) mother (c,d) and child (e,f) if e=a and f=c then control will be (b,d) else if e=a and f=d then control will be (b,c) else if e=b and f=c then control will be (a,d) else if e=b and f=d then control will be (a,c) else if e=c and f=a then control will be (d,b) else if e=c and f=b then control will be (d,a) else if e=d and f=a then control will be (c,b) else if e=d and f=b then control will be (c,a)
9
else if a=1 and b=1 and c=1 and d=1 and e=2 and f=2 then control will be (1,1) else if a=2 and b=2 and c=2 and d=2 and e=1 and f=1 then control will be (2,2) else if a=1 and b=1 and c=1 and d=1 and e=2 and f=2 then control will be (1,1) else if a=2 and b=2 and c=2 and d=2 and e=1 and f=1 then control will be (2,2) else if a=1 and b=1 and c=2 and d=2 and e=1 and f=1 then control will be (1,2) else if a=2 and b=2 and c=1 and d=1 and e=1 and f=1 then control will be (1,2) else if a=1 and b=1 and c=2 and d=2 and e=2 and f=2 then control will be (1,2)
10
else if a=2 and b=2 and c=1 and d=1 and e=2 and f=2 then control will be (1,2)
else if a=2 and b=2 and c=2 and d=2 and e=1 and f=2 then control will be (2,2) else if a=1 and b=1 and c=1 and d=1 and e=1 and f=2 then control will be (1,1) else control will be (0,0)
11
recode any combination of a,b,c,d pair(x,y) as
if x*y=0 then output 0 else if x*y=2 then output 1 else output 2 dump out Middle Step Output as I put in the website. for each family "0" + line up of all parents recode_number got from step4. "1" + line up of all children recode_number got from step4.
12
data_all.txt data_clean.txt
4. Data sets. data_all.txt data_clean.txt
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.