Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.

Similar presentations


Presentation on theme: "1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics."— Presentation transcript:

1 1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics Institute Summer 2003 Funded by the National Science Foundation and the National Institutes of Health

2 2 Learn ASP and VBScript Learn the biology Programming Project I : writing code for mining of online genetic data Programming Project II : writing a program to graph linkage disequilibrium data Overview of Summer Program

3 3 Intro to ASP & VBScript ASP : Microsoft Active Server Pages * server generated web pages * similar to CGI but easier * works well with databases VBScript : Microsoft Visual Basic Scripting * scripting language to enhance HTML web pages * default language of ASP

4 4 Hello World! Sample ASP file (one line only!)

5 5 Genetic Mapping of ASPs ASPs : affected sibling pairs Identification of genes associated with cancer in patients and siblings who both have cancer (breast, prostate, lung or colon) Determine allele sharing statistics of susceptibility genes Look at gene-gene interactions => Provide information on a person’s genetic risk of developing cancer

6 6 DNA Marker Genotyping Genetic marker : polymorphic gene or section of DNA that has identifiable physical location on a chromosome used to trace inheritance Ex. Microsatellite and SNP markers

7 7 Programming Project I: Tag Selection For Markers Need unique way to identify markers (like social security numbers for people) Chromosome locations are relative and change frequently (UCSC) Use ASP to automate data mining to ease the generation of these unique 50 base-pair tags for each marker in database Tags will be used to locate markers in genome

8 8 UCSC Genome Browser

9 9 Marker Tag Selection Submit sequence surrounding simple repeat Submit accession number for microsatellite Submit accession number for snp

10 10 Output Link to UCSC browser chromosomeSequence start position Sequence end position Inputted sequence with repeats highlighted in blue

11 11 Choosing a 50bp tag Copy and paste here Send sequence to UCSC

12 12 UCSC Blat Results Blat is similar to BLAST : searches for alignment in genome

13 13 List of markers and their tags

14 14 Convert to FASTA format FASTA format: >name sequence program converts marker tag file into fasta format automatically

15 15 Check tag selection Program sends fasta file to UCSC Blat

16 16 Linkage Disequilibrium A condition where two polymorphisms are found together on the same chromosome at a greater frequency than that predicted from the product of their individual frequencies.

17 17 5’3’ 5’3’ 5’3’ 5’3’ G/A G : 0.88 A : 0.12 T/C T : 0.75 C : 0.25 5’3’ Two snps and their base frequencies G G T C AT A C (0.88)(0.75) = 0.66 (0.88)(0.25) = 0.22 (0.12)(0.75) = 0.09 (0.12)(0.25) = 0.03 Expected frequencies

18 18 Expected Frequencies Observed Frequencies G & T0.660.54 G & C0.220.20 A & T0.090.24 A & C0.030.02 IF observed frequencies of 2 variants together > expected frequencies => LINKAGE DISEQUILIBRIUM A and T together are in linkage disequilibrium

19 19 A Quantitative Measure of LD One of the most common measures of linkage disequilibrium is It is a squared correlation coefficient => the correlation of alleles at two sites. Special case: (“perfect LD”) ~ Exactly two out of the four possible haplotypes are observed. ~ Markers NOT separated by recombination

20 20 Programming Project II Program that helps visualize linkage disequilibrium by graphing scores such as Each pair of markers has such a score => pairwise comparisons 1 Marker 3Marker 1 1 Marker 2 Marker 1 Marker 2 Marker 3 0.7 0.20.7 0.2 Symmetric!

21 21 Sample data for graphing Read data by row: Pairwise comparison of marker 1 and marker 7 results in two different kinds of measurements

22 22 GOLD – Graphical Overview of Linkage Disequilibrium Existing program from the Univ. of Michigan to graph linkage disequilibrium http://www.sph.umich.edu/csg/abecasis/GOLD/ Graphs based on a chromosomal position scale Works very well for long range pattern analysis, but hard to distinguish each specific measurement.

23 23 Comparison of Program Output Output from GOLD Difficult to see individual points on graph Same input file Output from LD Color (my program) Easier to distinguish individual points

24 24 LD Color Program Program written in ASP to graphically depict linkage disequilibrium in human genetic data Color coded for specific numerical ranges of different measures of each pair-wise comparison of markers Complete program: 4 files ; >1,000 lines of code

25 25 Program Features Data input : file uploading or text pasting Allows for variable file formats for input User defined colors and ranges Switch between different measures of LD View actual data on graph or just the colors Change size of graph Option to select specific rows of data

26 26 Upload your file Paste data

27 27 Specify marker columns

28 28 Choose label for numerical data inputted

29 29 Choose measure of linkage disequilibrium Specify which column the data is located

30 30 Same as before => used to specify data for other side of diagonal

31 31 Choose to display data on graph

32 32 Choose different sizes for the graph

33 33 Select only the markers you want graphed by choosing rows Default : all are graphed

34 34 Specify the ranges for the colors you want graphed.

35 35 Manual

36 36 Color Legend

37 37 Sample: Symmetric

38 38 Sample: Big Size!

39 39 Sample: Data On, Asymmetric

40 40 Sample: Row Select

41 41 Future Directions LD Color Mouseover tag to each cell on graph to show marker id (Javascript) Ability to accept more kinds of file formats Better form validation and error checking More functionality and linking to outside sources

42 42 Acknowledgements Dr. Garry Larson, Ph.D Dave Ko City of Hope Senior Programmer Analyst Louis Geller City of Hope Senior Research Associate Dr. Ted Krontiris, M.D.,Ph.D Principal Investigator The rest of the Krontiris Lab Southern California Bioinformatics Institute: Dr. Jamil Momand, Dr. Nancy Warter-Perez, Dr. Sandra Sharp & Dr. Wendie Johnston, Jackie Leung & rest of SoCalBSI staff Fellow interns NSF & NIH


Download ppt "1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics."

Similar presentations


Ads by Google