Download presentation
Presentation is loading. Please wait.
Published byLinette Small Modified over 9 years ago
1
CSE280Vineet Bafna CSE280a: Projects Vineet Bafna
2
CSE280Vineet Bafna Project Logisitics Research project (70%) Work individually, or in groups of 2 Two presentations: – Introductory presentation: Feb 1st week (20 minutes) (20% grade) Describe the goals of the project Describe your (computational) formulation Summarize/critique reading assignment Present an algorithm Constructive criticism of other projects – One on one meeting with instructor (end February) (10% grade) Discuss preliminary results – Final presentation (last 2-3 classes): (30% grade) Submit a final report Final presentation
3
CSE280Vineet Bafna Project 1: disease gene mapping Recall, Linkage Disequilibrium In the absence of recombination, – Correlation between columns – The joint probability Pr[A=a,B=b] is different from P(a)P(b) With extensive recombination – Pr(a,b)=P(a)P(b)
4
CSE280Vineet Bafna Measures of LD Consider two bi-allelic sites with alleles marked with 0 and 1 Define – P 00 = Pr[Allele 0 in locus 1, and 0 in locus 2] – P 0* = Pr[Allele 0 in locus 1] Linkage equilibrium if P 00 = P 0* P *0 D = abs(P 00 - P 0* P *0 ) = abs(P 01 - P 0* P *1 ) = …
5
CSE280Vineet Bafna LD can be used to map disease genes LD decays with distance from the disease allele. By plotting LD, one can short list the region containing the disease gene. 011001011001 DNNDDNDNNDDN LD
6
CSE280Vineet Bafna Multiple loci In complex diseases, multiple loci interact to confer disease susceptibility 001001001001 DNNDDNDNNDDN LD 011000011000
7
CSE280Vineet Bafna Testing for multiple loci Assume SNP matrix with n individuals, m loci. Testing for all sets of 5 SNPs implies a huge number of computations? Can you come out with computational strategies that can speed it up?
8
CSE280Vineet Bafna Speeding up multiple locus computations A filtering strategy? Input: a SNP matrix with one or more pairs that interactively associate Output: a set of SNP pairs that includes the interacting pair(s). Method should be fast, and should NOT consider all pairs.
9
CSE280Vineet Bafna 110011110011 Speeding up the computations Correlated SNPs should also have low hamming distance. Random SNPs should have high hamming distance. Strategy: select k individuals at random. – Hash each individual restricted to k individuals – Correlated SNPs should fall in the same bin with high probability 001001001001 101011101011 k=2
10
CSE280Vineet Bafna Project 2: mtDNA phylogeny In the absence of recombination, the history of mitochondrial DNA can be expressed by a tree. The goal of this project is to build a robust phylogeny using a heuristic modification of the perfect phylogeny.
11
CSE280Vineet Bafna The Genographic project The genographic project aims to trace geographic origins of the human race using mitochondrial DNA. https://www3.nationalgeographic.com/genographic/atlas.html
12
CSE280Vineet Bafna Without recurrent mutations Unique tree can explain the evolutionary history r E B C D A 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 1 2 4 3 5
13
CSE280Vineet Bafna With recurrent mutations Adding another individual F destroys perfect phylogeny Why? It is not so easy to place F Can you suggest a strategy? r E B C D A 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 F 0 1 0 0 0 1 2 4 3 5 1 F 2
14
CSE280Vineet Bafna Tests of Selection In class, we have discussed alleles that can be selectively neutral, or under active selection – Active selection may be positive or negative How do we identify regions under positive, or negative selection? Balancing selection: sometimes it is helpful for a population to
15
CSE280Vineet Bafna Adaptive Selection Selection leads to loss of heterozygosity (will be explained in detail in the next lecture). Can you come up with a test for selection?
16
CSE280Vineet Bafna Balancing selection Sometimes both alleles are useful in a population, and it helps to have both around A simple example is when diversity is important (the two variants help maintain diversity) Bipolar disorder genes could be under balancing selection – High creativity which might confer some selective/reproductive advantage. – Depression offers a disadvantage If so, the tests for this disorder might be tricky. How can we identify regions under balancing selection?
17
CSE280Vineet Bafna Testing for Balancing Selection Adaptive selection leads to loss of heterozygosity (will be explained in detail in the next lecture). Balancing selection leads to two dominant haplotypes Can you come up with a test for balancing selection?
18
CSE280Vineet Bafna Project: Primer design for cancer genomics
19
CSE280Vineet Bafna The Science behind Gleevec Fusions – observed in leukemia, lymphoma, and sarcomas “Philadelphia Translocation” – Drugs target this fusion protein
20
CSE280Vineet Bafna Fluoroscent in situ hybridization Cancer genomes show extensive structural variation
21
CSE280Vineet Bafna Assaying for tumor variants Most tumors start off with a single cell, which then proliferate. Drugs like Gleevec are used well after cancer has taken hold. Can we detect the cancer early by detecting the genomic abnormality? – If a very few cells in the person are cancerous, can we still detect it? Can we track a patient through his treatment?
22
CSE280Vineet Bafna Cancer genomics In cancers, large genetic changes can occur, including deletions, inversions, and rearrangements of genomes In the early stages, only a few cells will show this deletion
23
CSE280Vineet Bafna Polymerase Chain Reaction PCR is a technique for amplifying and detecting a specific portion of the genome Amplification takes place if the primers are ‘appropriate’ distance apart (<2kb)
24
CSE280Vineet Bafna Assaying for Rare Variants PCR can be used to assay for a given genomic abnormality, even in a heterogenous population of tumor and normal cells Extract Genomic DNA PCR Distance too large for amplification Tumor cell Detection
25
CSE280Vineet Bafna Variant Variants What if the variant is the minority in the cell population? What if deletion boundaries are uncertain? Deletion Patient A Patient B Patient C
26
CSE280Vineet Bafna Observed variation in deletion size Sizes of homozygous deletions in cell lines from different human cancers. (scale is in megabases).
27
CSE280Vineet Bafna Primer Approximation Multiplex PCR (PAMP)* Multiple primers are optimally spaced, flanking a breakpoint of interest – Upstream of breakpoint, forward primers – Downstream of breakpoint, reverse primers The primers are run in a multiplex PCR reaction – Any pair can form a viable product Deletion Patient BPatient C
28
CSE280Vineet Bafna Experimental Design (500Kb region) 10 sets of 25 primers: upstream and downstream – 250 upstream – 250 downstream Primer-pairs closest to breakpoint amplified Assay by oligo array Goal: Computational selection of an ‘optimal’ primer set
29
CSE280Vineet Bafna Goal Input, a collection of primers Identify a subset of primers that do not cross-hybridize, are unique, yet cover the region completely Use combinatorial optimization, simulated annealing, integer linear programming…..
30
CSE280Vineet Bafna Spectral Networks Algorithms for De Novo Interpretation of Tandem Mass Spectra Nuno Bandeira, Ph.D. Department of Computer Science and Engineering, University of California, San Diego ProtIG seminar series September 21, 2007
31
CSE280Vineet Bafna Proteins and their modifications Proteins are fundamental players in the regulation of biological processes. DNAProteins regulate encodes for Knowing proteins involves knowing many things. This dissertation focuses on: - Identification - Sequencing - Post-translational modifications ( )
32
CSE280Vineet Bafna Protein sequences and modifications From a computational perspective, a protein can be represented as a string over a weighted alphabet: …AFSRLEMILGF… AFSRL SRLEMILGF EMILG Subsequences are called peptides (obtained via enzymatic digestion) Amino acidMass A71 F147 S87 R156 L113 E129 M131 I113 G57 Protein sequence: SRLEM ILGF Modifications change amino acid masses: SRLEMILGF Mass(SRLEMILGF)=1047 Mass(M)=131 Mass(SRLEM ILGF)=1063 Mass(M )=147 Mass( )=16
33
CSE280Vineet Bafna Nobel prize in chemistry, 2002
34
CSE280Vineet Bafna What is mass spectrometry? http://nobelprize.org/chemistry/laureates/2002/chemadv02.pdf Amino acidMass A71 F147 S87 …
35
CSE280Vineet Bafna Modified peptide LARG*E Tandem Mass Spectrometry (MS/MS) …THISISAVERYLARGESAMPLEPRTEINSEQENCE… Protein Sequence: Peptide LARGE MS/MS spectrum Modification: any event that changes the mass at a specific site. : b y: : b : y PM
36
CSE280Vineet Bafna Example of a real MS/MS spectrum Symmetric b 10 y 12
37
CSE280Vineet Bafna Tandem Mass Spectrometry (MS/MS) Enzymatic digestion Tandem Mass Spectrometry Proteins Peptides … Large set of MS/MS spectra … Database of known peptides MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT, HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE, ALKIIMNVRT, SEQUENCE, HEWAILF, GHNLWAMNAC, GVFGSVLRA, EKLNKAATYIN.. s s s f e e e e e e e q q q u uun n n e c c c s e e equ n c Peptide SEQUENCE Database searchDe novo sequencing
38
CSE280Vineet Bafna Mixture spectra Sometimes, the instrument generates a single spectrum from two or more peptides: Mixture spectrum Peptide A: NLAFFQLR Peptide B: ALDDILNLK ?
39
CSE280Vineet Bafna How to identify mixture spectra?
40
CSE280Vineet Bafna Proposed approach When identifying a mixture spectrum of peptides A,B, assume you have non-mixture spectra for the same peptides. Compare the non-mixture spectra of known peptides to putative mixture spectra to determine peptide identifications
41
CSE280Vineet Bafna Project description Implement an algorithm to identify mixture spectra from pairs of peptides by combining previously identified spectra from isolated peptides. Test the above implementation by simulating mixture spectra using an existing database of spectra from isolated peptides. Propose a scoring procedure to separate correct from false identifications. Nuno Bandeira bandeira@ucsd.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.