Download presentation
Presentation is loading. Please wait.
1
Computational Biology
2
An Exciting Time Huge amounts of DNA information has been generated by the human genome project Huge amounts of gene expression data for entire genomes is becoming available Computer hardware and software technology is rapidly advancing Confluence of rapidly advancing fields of molecular biology and computational sciences opens new opportunities There is a need to develop new algorithms for analyzing the data and transforming it into knowledge and understanding
3
DNA and Protein Databases GenBank - run by US National Library of MedicineGenBank EMBL - European Molecular Biology Laboratory DDBJ - DNA Database of Japan PIR and SwissProt - protein databases dbEST - expressed sequence tags dbSNPs - single nucleotide polymorphisms
5
Other Kinds of Databases Entréz - DNA sequences, protein sequences and associated literatureEntréz TFD - transcription factor database (TFs, motifs, and binding sites on DNA) Prosite - protein motifs Flybase - Drosophila genes, products, protocols, literatureFlybase OMIM - online Mendelian Inheritance in ManOMIM Structure databases - PDB contains experimental data from crystallographic and NMR structure determinations.Structure databases Cancer genome anatomy project - information related to cancerCancer genome anatomy project Human Genome resourcesHuman Genome Other genome databasesOther genome
6
Algorithms interpreting sequence analysis results intelligently requires an understanding of the algorithm used the basis of nearly all computer programs a number of algorithms have been developed to compare sequences and genomes
7
Algorithm a finite sequence of well-defined actions whose purpose is to accomplish a given task
8
Physical Characteristics positive charge negative charge net charge isoelectric point - no net charge Using physical properties of amino acids it is possible to calculate a number of basic properties of the protein as a whole.
9
Sliding Window Analyses calculate a value for a range (a window) of residues, either nucleotides or amino acids plot result move window along sequence calculate new value plot result repeat until end of sequence produces a view of the property being measured as a function of position along the sequence can demonstrate trends that are not detectable by other methods
10
Sliding Window Base Composition 1 window = 20 offset = 2 Shows a plot of the base composition as a function of position in the DNA sequence.
11
Sliding Window Base Composition 2 window = 500 offset = 5
12
Hydropathy Analysis Plots the distribution of hydrophobic and hydrophilic amino acids as a function of position along the protein sequence. Note the seven transmembrane regions for the rhodopsin protein. 7 6 5 4 3 2 1
13
Predicting Protein Structure
14
CF Structure Prediction (graph) The Chou-Fasman method predicts protein structure by looking for nucleation sites that can start the formation of either alpha helices or beta sheets - then extending those regions.
15
GOR Structure Prediction (Squiggles) The Garnier-Osguthorpe-Robson method also predicts regions in proteins of particular structure by using a modified sliding window approach. This “squiggles” plot graphically represents the structures.
16
Helical Wheel Helical wheel analysis looks down the center of an alpha helix and displays the distribution of hydrophobic and hydrophilic amino acid side chains.
17
Kinds of Sequence Comparisons Searching a database for sequences that are similar to query sequence Comparing two genes for similarities Examining families of genes from different organisms to understand evolutionary relationships
18
Comparing Sequences
19
Dot Matrix Comparisons compares two sequences graphically uses sliding window approach provides immediate visual feedback of similarities very sensitive to parameters used
20
dog vs. rat chymotrypsin cDNA Window = 10 Mismatches = 0
21
dog vs. rat chymotrypsin cDNA Window = 10 Mismatches = 2
22
dog vs. rat chymotrypsin cDNA Window = 20 Mismatches = 5
23
Dot Matrix - Color Conveys Information
24
Dot Matrix - cDNA vs Genomic DNA Comparing genomic and cDNA sequences can identify the location of introns.
25
Ribosomal DNA Dot Matrix Window = 10 ≤1 mismatch 2 mismatches Comparing a sequence to itself can reveal internal information about sequence organization
26
Dot Matrix (Protein - Identity Table) Looking for sequence identities at the protein level might not indicate regions of similarity
27
PAM250 Table
28
Dot Matrix (Protein - PAM250 Table) Scoring tables use information about the similarity of amino acid physical characteristics
29
Summary of Dot Matrix Analysis Strengths –visual representation is easy to evaluate and discern relationships not apparent through other means –no need to deal with gaps –ideal first choice for initial sequence comparison Weaknesses –does not produce a score –too slow for database searching –does not actually align sequences
30
Searching a Database Define what is being searched for –similarity –identity –consensus Choose a scoring system Establish an efficient searching plan
31
Defining Similarity their vs. there here vs. there I will assemble the bicycle I will not assemble the bicycle I will disassemble the bicycle I will assemble the bike green yellow blue house Lys vs. Arg or Lys vs. Phe
32
Consider These Sequences ACGGTCGAAT(a) ACGGACGAAT(b) ACGGTTCGAAT(c) CGCGACGGTCGAATAT(d)
33
Alignment of a vs a ACGGTCGAAT(a) |||||||||| ACGGTCGAAT(a) what is the score?
34
Alignment of a vs b ACGGTCGAAT(a) |||| ||||| ACGGACGAAT(b) subtract for mismatch? or just not add?
35
Alignment of a vs c ACGGTCGAAT(a) ||||| | ACGGTTCGAAT(c)
36
Gapped Alignment of a vs c ACGGT-CGAAT(a) ||||| ACGGTTCGAAT(c) gap penalty? subtract? relative to mismatch penalty?
37
Gapped Alignment of a vs e ----ACGGTCGAAT-- (a) |||||||||| CGCGACGGTCGAATAT (d) end gaps? same as gap of 6 internally?
38
Before Doing an Alignment Define gap insertion penalty Define mismatch penalty ( = 0 ? ) Define scoring table for proteins All these must be considered before doing a database search
39
Hybridization of Nucleic Acids If strands are disassociated they can form double- stranded structures again if bases match up Single-stranded DNA can basepair with single- stranded DNA or RNA Can use this “hybridization” procedure to detect the presence of specific sequences in a mixture
40
DNA MicroArrays DNA MicroArrays use hybridization technology to examine gene expression Attach different DNAs onto a slide as a grid of small spots, one for each gene - up to 50,000 per slide Hybridize a mixture of fluorescently labeled cDNAs extracted from cells after different treatments - experimental is green and control is red Examine expression patterns Stanford MicroArray Database (SMD)SMD
41
DNA MicroArray Hybridization Data decreased expression increased expression mRNAs at higher levels in treated cells are green, those at lower levels in treated cells are red, those that are unchanged are yellow. each spot corresponds to a different gene
42
Some Uses of DNA MicroArrays Rapid scanning of DNA or RNA populations Used for identifying bacterial infections in hospitals Used in research on molecular biology of gene expression MicroArray databases - gene expression informationMicroArray databases
43
Identifying Genetic Information Methods exist to identify individuals in a population using various enzyme treatments of DNA and computer analysis of the results DNA fingerprinting allows for identification of individuals –Animal breeding –Forensic testing –Identification of carriers of mutations Databases of individuals Databases for countries
44
Pharmacogenomics DNA fingerprint of patient compare to known profiles what is most effective treatment for this particular allele designing drugs for individuals
45
What to do with genetic information databases some thorny ethical issues
46
Huntington’s Disease autosomal dominant onset usually after age 40 disease is devastating and fatal test is 100% accurate Would you want to take this test if you knew there was a possibility of you having Huntington’s? Should physicians be obligated to offer the test? Would they have to share results with insurance co.?
47
BRCA1 Testing can detect perhaps a 5-10x increased risk of developing breast cancer if positive, can have more regular breast exams and therefore earlier detection -- increased survival chances of getting breast cancer are still only in the 10% range even with BRCA1 Would you want to take this test? Would you want to know the results of your sister’s test? Who should pay for this test? Can insurance companies demand results? What if you pay for it privately?
48
Other “Increased Risk” Genes some believe that there are genes for increased risk of heart attacks or strokes Should testing be mandatory for jobs involving public safety (e.g. airline pilots, bus drivers, etc.)? How will the test results affect your chances of promotion? Can there be genome discrimination?
49
Some Ethical Issues to Consider genetic testing - mandatory or voluntary? obligations of health care providers Insurance - who pays? Who has rights to info? job security - privacy of genetic information
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.