The Co-Evolution of Genetics and Statistics Bio-Stat seminar 2 February 2011
From the First Gregor Mendel is recognized as the founder of genetics Was the first to use “math” to define a biologic process.
Role of Biostat Fisher suggested in 1936 that Mendel’s data was a little too good. Fisher is thought of as a geneticist.
The Chemistry of DNA
Structure of DNA Double helix=2nm 10 Base pairs/turn=10nm 140 BP/nucleosome
How Has “Math” Driven Genetics? Genotype GG=0.25 Gg=0.50 gg=0.25 Phenotype Gx=0.75 gg=0.25
Some Questions in Genetics There are 4 bases in DNA There are 20 amino acids How do you order 4 to code for 20?
Simple Math 4=4 4X4=16 4X4X4=64 So 3 bases required at a minimum
More Questions If 3 required, spacing? Boxcar= ATGCAGT Sequential=ATGCAGT Spaced=ATGaCAGaT Solution First a homo-polymer (TTTTTTT) This produce a peptide of phenylalanine Then a co-polymer TTCCTTCCTTCC The pattern of AA would allow dissection
Example TTCCTTCCTTCCTTCCTTCC Boxcar Sequential TTC=Phe TTC=Phe TCC=Ser CTT=Leu CCT=Pro CCT=Pro CTT=Leu TCC=Ser
How Did We Get Here? Genetics is the study of variation “Easy” genetics involved variation by genes of major effect. Sickle cell, cystic fibrosis are examples of single gene diseases
Finding Single Genes Collect families that show the trait Analyze their DNA find sections that are common with trait Assess the probability that these are shared randomly LOd ratio
How is DNA Measured? Before the age of the genome, centimorgans Humans have 22 paired chromosomes These segregate at cell division independently Along a chromosome the probability that a trait is near something is measured in centimorgans
DNA is in Base Pairs Now The chromosomes are numbered largest to smallest (1-22) Positions are now located by Chr # and position along that Chr. (Chr2: ) There are a little more than 3x10 9 BP
Mutations vs. SNP Currently the trend is to talk of “variation” not mutation. SNP=Single Nucleotide Polymorphism Most SNP are dimeric (A/G) and have a frequency (0.895/0.105) SNP’s mark positions not “mutations”!
Other Terms InDel VNTR marker Coding Non- Coding synonymous promoter epigenetic imprinting mitochondria l Intron/exon
Data Sets Arrays SNP Expressio n Genotype SNP - Looking for regions of DNA associated with a trait Phenotype Expression - What genes are “produced” How the biochemistry is changed
Help From the “Math”Gifted! These are complex datasets Analysis can be “simple”, it shouldn’t be! Getting in early is critical
Next Big Challenge Network analysis Andrew Mugler, Boris Grinshpun, Riley Franks, and Chris H. WigginsStatistical method for revealing form-function relations in biological networksPNAS (2) ; published ahead of print December 23, 2010, doi: /pnas.