Download presentation
Presentation is loading. Please wait.
Published byPatience Bradford Modified over 9 years ago
1
Of Sea Urchins, Birds and Men Algorithmic Functions of Computational Biology – Course 1 Professor Istrail
2
Darwin’s Finches 2 and Coco
3
The Father of All Dot Plots Algorithmic Functions of Computational Biology – Course 1 Professor Istrail The Human Genome
4
The Synteny Problem Between distant species can reveal function Conservation reveals selective pressure Between near species Conservation reveals evolutionary history Between similar or the same species Recent events in subpopulations Phenotypic differences Algorithmic Functions of Computational Biology - Course 1 Professor Istrail
5
Matching, Chaining, Extension Extension Phase Chaining Phase Algorithmic Functions of Computational Biology – Course 1 Professor Istrail Matching Phase
6
Dot Plots 101 a,b,c,d stand for letters A,B,C,D for words Where letters match, put a dot Where words match, put a line (words can be rc-ed) Algorithmic Functions of Computational Biology – Course 1 Professor Istrail
7
Dot Plots 101 When words line up Reversed Misplaced Something gained (relative to horizontal) Something lost (relative to horizontal) Algorithmic Functions of Computational Biology – Course 1 Professor Istrail
8
Some large reversals in GP Algorithmic Functions of Computational Biology – Course 1 Professor Istrail
9
NCBI has more of the centromere than anyone else (or is that N’s?) Algorithmic Functions of Computational Biology – Course 1 Professor Istrail
10
Many reversals in GP, a piece of the end is re-ordered to the middle, celera assemblies boringly good. Algorithmic Functions of Computational Biology – Course 1 Professor Istrail
11
Again everyone misses the first 10MB (or are those N’s) of NCBI31 Algorithmic Functions of Computational Biology – Course 1 Professor Istrail
12
Rube Goldberg’s Innovation GENOMIC REGULATORY SYSTEMS Mixed character of the problem : continuous mathematics discrete mathematics
13
Open window (A) and fly kite (B). String (C) lifts small door (D) allowing moths (E) to escape and eat red flannel shirt (F). As weight of shirt becomes less, shoe (G) steps on switch (H)which heats electric iron (I) and burns hole in pants (J). Smoke (K) enters hole in tree (L), smoking out opossum (M) which jumps into basket (N),pulling rope (O) and lifting cage (P), allowing woodpecker (Q) to chew wood from pencil (R), exposing lead. Emergency knife (S) is always handy in case opossum or the woodpecker gets sick and can't work. Rube Goldberg ’ s Pencil Sharpener invention
14
A Tale of Two Networks Sea Urchin Drosophila Algorithmic Functions of Computational Biology – Course 1 Professor Istrail
15
A Proposal for Nobel Prize “Programs built into the DNA of every animal.” Eric H. Davidson Genomic Regulatory Systems One gene, 30 years of study, 300 docs and postdocs
16
The Dogma Algorithmic Functions of Computational Biology - Course 1 Professor Istrail
17
Genomic Regulatory Regions Algorithmic Functions of Computational Biology – Course 1 Professor Istrail
18
TF Binding Site Complexity Algorithmic Functions of Computational Biology – Course 1 Professor Istrail
19
Genome Complexity 1 Billion DNA bases 20,000 Genes
20
cis-Regulatory Modules Complexity 200,000 cis-Modules Algorithmic Functions of Computational Biology - Course 1 Professor Istrail
21
The DNA program that regulates the expression of endo16 in sea urchin THE FIRST GENE
22
THE FIRST NETWORK
23
The View from the Genome Algorithmic Functions of Computational Biology – Course 1 Professor Istrail
24
The View from the Nucleus Algorithmic Functions of Computational Biology – Course 1 Professor Istrail
25
Building Protein-DNA Assemblies Inter-cismodule linkage Insulation Communication cismodule DNA Cooperativity Linear-amp Gates Potentiality Algorithmic Functions of Computational Biology - Course 1 Professor Istrail
26
The Building Blocks Protein Free Energy DNA Protein-DNA Binding (free energy) Free energy is the “GLUE” Algorithmic Functions of Computational Biology - Course 1 Professor Istrail
27
Information Processing Algorithmic Functions of Computational Biology - Course 1 Professor Istrail
28
0 1 1 0 0 1 0 0 Boolean Circuit Synchronous input and output Completely defined gates 0 Algorithmic Functions of Computational Biology - Course 1 Professor Istrail
29
0 1 1 0 0 1 0 0 1.4 0.5 Synchronous input and output Asynchronous input and output Completely defined gates Incompletely defined gates Boolean Circuit Boolinear Circuit 00 1.1
30
OR AND NOT 1 1 0 1 OR 1 IF (x1 = 1 AND x2= 1) THEN ….. GTAGGATTAAG …... CATCCTAATTC ……. GTATCTAGAAG …….
31
Web page : http://www.its.caltech. edu/~chyuh/cathy- mirsky-info.html Caltech, Davidson Lab October 2004
32
Introduction SNPs, HAPLOTYPES
33
A SNP is a position in a genome at which two or more different bases occur in the population, each with a frequency >1%. GATTTAGATCGCGATAGAG GATTTAGATCTCGATAGAG The most abundant type of polymorphism The two alleles at the site are G and T Single Nucleotide Polymorphism (SNP)
34
tttctccatttgtcgtgacacctttgttgacaccttcatttctgcattctcaattctatttcactggtctatggcagagaacacaaaatatggccagtggc ctaaatccagcctactaccttttttttttttttgtaacattttactaacatagccattcccatgtgtttccatgtgtctgggctgcttttgcactctaatggcag agttaagaaattgtagcagagaccacaatgcctcaaatatttactctacagccctttataaaaacagtgtgccaactcctgatttatgaacttatc attatgtcaataccatactgtctttattactgtagttttataagtcatgacatcagataatgtaaatcctccaactttgtttttaatcaaaagtgttttggcc atcctagatatactttgtattgccacataaatttgaagatcagcctgtcagtgtctacaaaatagcatgctaggattttgatagggattgtgtagaat ctatagattaattagaggagaatgactatcttgacaatactgctgcccctctgtattcgtgggggattggttccacaacaacacccaccccccac tcggcaacccctgaaacccccacatcccccagcttttttcccctgctaccaaaatccatggatgctcaagtccatataaaatgccatactatttgc atataacctctgcaatcctcccctatagtttagatcatctctagattacttataatactaataaaatctaaatgctatgtaaatagttgctatactgtgtt gagggttttttgttttgttttgttttatttgtttgtttgtttgtattttaagagatggtgtcttgctttgttgcccaggctggagtgcagtggtgagatcatagctt actgcagcctcaaactcctggactcaaacagtcctcccacctcagcctcccaaagtgctgggatacaggtgtgacccactgtgcccagttatt attttttatttgtattattttactgttgtattatttttaattattttttctgaatattttccatctatagttggttgaatcatggatgtggaacaggcaaatatggag ggctaactgtattgcatcttccagttcatgagtatgcagtctctctgtttatttaaagttttagtttttctcaaccatgtttacttttcagtatacaagactttg acgttttttgttaaatgtatttgtaagtattttattatttgtgatgttatttaaaaagaaattgttgactgggcacagtggctcacgcctgtaatcccagca ctttgggaggctgaggcgggcagatcacgaggtcaggagatcaagaccatcctggctaacatggtaaaaccccgtctctactaaaaataga aaaaaattagccaggcgtggtggcgagtgcctgtagtcccagctactcgggaggctgaggcaggagaatggtgtgaacctgggaggcgg agcttgcagtgagctgagatcgtgccactgcattccagcctgcgtgacagagcgagactctgtcaaaaaaataaataaaatttaaaaaaag aagaagaaattattttcttaatttcattttcaggttttttatttatttctactatatggatacatgattgatttttgtatattgatcatgtatcctgcaaactagct aacatagtttattatttctctttttttgtggattttaaaggattttctacatagataaataaacacacataaacagttttacttctttcttttcaacctagactg gatgcattttttgtttttgtttgtttgtttgctttttaacttgctgcagtgactagagaatgtattgaagaatatattgttgaacaaaagcagtgagagtgg acatccctgctttccccctgattttagggggaatgttttcagtctttcactatttaatatgattttagctataggtttatcctagatccctgttatcatgttga ggaaattcccttctatttctagtttgttgagattttttaattcatgtgattgcgctatctggctttgctctca tctc gaga gaga gaga gaga gaga gcgc gcgc gcgc tctc gaga gaga gaga gaga gaga tctc tctc tctc tctc gaga gaga gaga tctc gcgc tctc tctc tctc Human Genome contains ~ 3 G basepairs arranged in 46 chromosomes. Two individuals are 99.9% the same. I.e. differ in ~ 3 M basepairs. SNPs occur once every ~600 bp Average gene in the human genome spans ~27Kb ~50 SNPs per gene
35
G C T C G A C A A C A G G T T C G T C A A C A G Two individuals C A G Haplotypes T T G SNP Haplotype
36
Mutations Infinite Sites Assumption: Each site mutates at most once
37
Haplotype Pattern 0 0 1 1 0 1 0 0 1 0 0 1 C A G T T T G A C A T G C T G T At each SNP site label the two alleles as 0 and 1. The choice which allele is 0 and which one is 1 is arbitrary.
38
G T T C G A C T A T T A G T T C G A C A A C A T A C G T A T C T A T T A Recombination
39
G T T C G A C T A T T A G T T C G A C A A C A T A C G T A T C T A T T A The two alleles are linked, I.e., they are “ traveling together ” ? Recombination disrupts the linkage Recombination
40
Variations in Chromosomes Within a Population Common Ancestor Emergence of Variations Over Time timepresent Disease Mutation Linkage Disequilibrium (LD)
41
Time = present 2,000 gens. ago Disease-Causing Mutation 1,000 gens. ago Extent of Linkage Disequilibrium
42
A Data Compression Problem Select SNPs to use in an association study Would like to associate single nucleotide polymorphisms (SNPs) with disease. Very large number of candidate SNPs Chromosome wide studies, whole genome-scans For cost effectiveness, select only a subset. Closely spaced SNPs are highly correlated It is less likely that there has been a recombination between two SNPs if they are close to each other.
43
Disease Associations
44
Association studies Disease Responder Control Non-responder Allele 0Allele 1 Marker A is associated with Phenotype Marker A: Allele 0 = Allele 1 =
45
Evaluate whether nucleotide polymorphisms associate with phenotype TA GA A CG GA A CG TA A TA TC G TG TA G TG GA G Association studies
46
TA GA A CG GA A CG TA A TA TC G TG TA G TG GA G
47
11 00 0 00 00 0 00 10 0 11 11 1 10 10 1 10 00 1
48
Data Compression ACGATCGATCATGAT GGTGATTGCATCGAT ACGATCGGGCTTCCG ACGATCGGCATCCCG GGTGATTATCATGAT A------A---TG-- G------G---CG-- A------G---TC-- A------G---CC-- G------A---TG-- Haplotype Blocks based on LD (Method of Gabriel et al.2002) Selecting Tagging SNPs in blocks
49
Real Haplotype Data Two different runs of the Gabriel el al Block Detection method + Zhang et al SNP selection algorithm Our block-free algorithm A region of Chr. 22 45 Caucasian samples
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.