Last lecture summary.

Slides:



Advertisements
Similar presentations
Last lecture summary.
Advertisements

Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
Last lecture summary.
Some new sequencing technologies. Molecular Inversion Probes.
Visualizing Protein Structures. Genetic information, stored in DNA, is conveyed as proteins.
Day 2. Genetic information, stored in DNA, is conveyed as proteins.
Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University.
Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Sequence similarity.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
The Age Of Genomics Rachel and Olga. THE AGE OF GENOMICS Outline HHow Genetics Became Genomics TThe Human Genome Project Begins TTechnology drives.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector.
Přednáška odpadá. Last lecture summary recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Last lecture summary. Sequencing strategies Hierarchical genome shotgun HGS – Human Genome Project “map first, sequence second” clone-by-clone … cloning.
Lesson 10 Bioinformatics
P2 Discussion 1. Revise on Central Dogma 2
What is the Human Genome Project? Identify all the approximately 35,000 genes in human DNA Determine the sequences of the 3,000,000,000 bases ( = 200 phone.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Last lecture summary. New generation sequencing (NGS) The completion of human genome was just a start of modern DNA sequencing era – “high-throughput.
CO 10.
17.3 The Process of Speciation 17.4 Molecular Evolution
Enzymes (Proteins) Standards 1b, 1h, 4e, 4f, From the largest entity in the Universe to the smallest entity that makes up all the matter in the Universe.
Last lecture summary. recombinant DNA technology DNA polymerase (copy DNA), restriction endonucleases (cut DNA), ligases (join DNA) DNA cloning – vector.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
A Sequenciação em Análises Clínicas Polymerase Chain Reaction.
The Biology and Genetic Base of Cancer. 2 (Mutation)
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Used for detection of genetic diseases, forensics, paternity, evolutionary links Based on the characteristics of mammalian DNA Eukaryotic genome 1000x.
Chapter 21 Eukaryotic Genome Sequences
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Last lecture summary. New generation sequencing (NGS) The completion of human genome was just a start of modern DNA sequencing era – “high-throughput.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Initial sequencing and analysis of the human genome Averya Johnson Nick Patrick Aaron Lerner Joel Burrill Computer Science 4G October 18, 2005.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Last lecture summary. Flavors of sequence alignment pair-wise alignment × multiple sequence alignment.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
GENE SEQUENCING. INTRODUCTION CELL The cells contain the nucleus. The chromosomes are present within the nucleus.
Molecular and Genomic Evolution Getting at the Gene Pool.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Example of regression by RBF-ANN Prediction of charge on peptides after electron-spray ionization in mass spectrometry What are the best attributes to.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Lesson Overview 17.4 Molecular Evolution.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
LECTURE PRESENTATIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry, Michael L. Cain, Steven A. Wasserman, Peter V. Minorsky, Robert.
Last lecture summary. Sequence alignment What is sequence alignment Three flavors of sequence alignment Point mutations, indels.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Virginia Commonwealth University
Human Genome Project.
Lesson Overview 17.4 Molecular Evolution.
Very important to know the difference between the trees!
Genomes and Their Evolution
Last lecture summary.
DNA Sequencing The DNA from the genome is chopped into bits- whole chromosomes are too large to deal with, so the DNA is broken into manageably-sized overlapping.
Today… Review a few items from last class
Genomes and Their Evolution
Gene Density and Noncoding DNA
Telophase I and Cytokinesis
Lesson Overview 17.4 Molecular Evolution.
Introduction to Sequencing
Human Genome Project Seminal achievement. Scientific milestone.
Presentation transcript:

Last lecture summary

Sequencing strategies Hierarchical genome shotgun HGS – Human Genome Project “map first, sequence second” clone-by-clone … cloning is performed twice (BAC, plasmid)

Sequencing strategies Whole genome shotgun WGS – Celera shotgun, no mapping Coverage - the average number of reads representing a given nucleotide in the reconstructed sequence. HGS: 8, WGS: 20

Human genome 3 billions bps, ~20 000 – 25 000 genes Only 1.1 – 1.4 % of the genome sequence codes for proteins. State of completion: best estimate – 92.3% is complete problematic unfinished regions: centromeres, telomeres (both contain highly repetitive sequences), some unclosed gaps It is likely that the centromeres and telomeres will remain unsequenced until new technology is developed Genome is stored in databases Primary database – Genebank (http://www.ncbi.nlm.nih.gov/sites/entrez?db=nucleotide) Additional data and annotation, tools for visualizing and searching UCSCS (http://genome.ucsc.edu) Ensembl (http://www.ensembl.org)

New stuff

Personal human genomes Personal genomes had not been sequenced in the Human Genome Project to protect the identity of volunteers who provided DNA samples. Following personal genomes were available by July 2011: Japanese male (2010, PMID: 20972442) Korean male (2009, PMID: 19470904) Chinese male (2008, PMID: 18987735) Nigerian male (2008, PMID: 18987734) J. D. Watson (2008, PMID: 18421352) J. C. Venter (2007, PMID: 17803354) HGP sequence is haploid, however, the sequence maps of Venter and Watson are diploid.

Next generation sequencing (NGS) The completion of human genome was just a start of modern DNA sequencing era – “high-throughput next generation sequencing” (NGS). New approaches, reduce time and cost. Holly Grail of sequencing – complete human genome below $ 1000.

1st and 2nd generation of sequencers 1st generation – ABI Prism 3700 (Sanger, fluorescence, 96 capillaries), used in HGP and in Celera Sanger method overcomes NGS by the read length (600 bps) 2nd generation - birth of HT-NGS in 2005. 454 Life Sciences developed GS 20 sequencer. Combines PCR with pyrosequencing. Pyrosequencing – sequencing-by-synthesis Relies on detection of pyrophosphate release on nucleotide incorporation rather than chain termination with ddNTs. The release of pyrophosphate is detected by flash of light (chemiluminiscence). Average read length: 400 bp Roche GS-FLX 454 (successor of GS 20) used for J. Watson’s genome sequencing. Show video Show Pyrosequencing.flv PYROSEQUENCING - "Sequencing by synthesis" involves taking a single strand of the DNA to be sequenced and then synthesizing its complementary strand enzymatically. The pyrosequencing method is based on detecting the activity of DNA polymerase (a DNA synthesizing enzyme) with another chemiluminescent enzyme. Essentially, the method allows sequencing of a single strand of DNA by synthesizing the complementary strand along it, one base pair at a time, and detecting which base was actually added at each step. The template DNA is immobile, and solutions of A, C, G, and T nucleotides are sequentially added and removed from the reaction. Light is produced only when the nucleotide solution complements the first unpaired base of the template. The sequence of solutions which produce chemiluminescent signals allows the determination of the sequence of the template. - Based on excellent and up-to-date review Pareek CS, Smoczynski R, Tretyn A. Sequencing technologies and genome sequencing. J Appl Genet. 2011 Jun 23. PubMed PMID: 21698376

3rd generation 2nd generation still uses PCR amplification which may introduce base sequence errors or favor certain sequences over others. To overcome this, emerging 3rd generation of seqeuencers performs the single molecule sequencing (i.e. sequence is determined directly from one DNA molecule, no amplification or cloning). Compared to 2nd generation these instruments offer higher throughput, longer reads (~1000 bps), higher accuracy, small amount of starting material, lower cost

Moore’s law Na tabuli kridou kreslit a vysvetlovat Mooruv zakon a logaritmickou osu na grafu 1) Mooruv zakon – zdvojnasobeni vypocetni sily kazde dva roky. Ted je vypocetni sila n, za dva roky je 2n, za dalsi dva roky je 2*2n=4n, za dlais dva roky je 2*4n=8n, pak 16n atd. S linearni y osou je to parabola. 2) Kdyz chci, aby byly na y ose vzdalenosti mezi 2n, 4n, 8n, 16, … ekvidistantni, co musim udelat? Vzit log, protoze mam log(2n), log(4n)=log(2^2n)=2xlog(2n), log(8n)=3xlog(2n),… v logaritmicke ose pak bude Mooruv zakon primka.

transition to 2nd generation source: http://www.genome.gov/27541954 transition to 2nd generation 5,000$ 4,905 $ 0.054$

Illumina HiSeq X Ten 14. 1. 2014 Illumina anounced the new HiSeq X Ten Sequencing System. Illumina claims they are enabling the $1,000 genome. Uses Illumina SBS technology (sequencing-by-synthesis). It sells for at least $10 million.

Human Longevity 4. 3. 2014 – Human Longevity was founded by Craig Venter Its main aim: to slow down the process of ageing The largest human DNA sequencing operation in the world, capable of processing 40,000 human genomes a year. DNA data will be combined with other data on the health and body composition of the people whose DNA is sequenced, in the hope of gleaning insights into the molecular causes of aging and age-related illnesses like cancer and heart disease. Equipment: 2x Illumina Hiseq X Ten

Which genomes were sequenced? http://www.ncbi.nlm.nih.gov/sites/genome GOLD – Genomes online database (http://www.genomesonline.org/) information regarding complete and ongoing genome projects

Important genomics projects The analysis of personal genomes has demonstrated, how difficult is to draw medically or biologically relevant conclusions from individual sequences. More genomes need to be sequenced to learn how genotype correlates with phenotype. 1000 Genomes project (http://www.1000genomes.org/) started in 2009. Sequence the genomes of at least a 1000 people from around the world to create the detailed and medically useful picture of human genetic variation. 2nd generation of sequencers is used in 1000 Genomes. 10 000 Genomes will start soon.

Important genomics projects ENCODE project (ENCyclopedia Of DNA Elements, http://www.genome.gov/ENCODE/) by NHGRI identify all functional elements in the human genome sequence Defined regions of the human genome corresponding to 30Mb (1%) have been selected. These regions serve as the foundation on which to test and evaluate the effectiveness and efficiency of a diverse set of methods and technologies for finding various functional elements in human DNA.

Sequence Alignment

What is sequence alignment ? CTTTTCAAGGCTTA GGCTTATTATTGC Fragment overlaps CTTTTCAAGGCTTA GGCTATTATTGC navozeni squence alignmentu na prikladech, kdy se tento objevuje v predchozim vykladu nahore je presny overlap dole je priblizny overlap, jsou ukazana dve zarovnani, zarovnani s vlozenou mezerou je vice optimalni CTTTTCAAGGCTTA GGCT-ATTATTGC

What is sequence alignment ? CCCCATGGTGGCGGCAGGTGACAG CATGGGGGAGGATGGGGACAGTCCGG TTACCCCATGGTGGCGGCTTGGGAAACTT TGGCGGCTCGGGACAGTCGCGCATAAT CCATGGTGGTGGCTGGGGATAGTA TGAGGCAGTCGCGCATAATTCCG CCCCATGGTGGCGGCAGGTGACAG CATGGGGGAGGATGGGGACAGTCCGG TTACCCCATGGTGGCGGCTTGGGAAACTT TGGCGGCTCGGGACAGTCGCGCATAAT CCATGGTGGTGGCTGGGGATAGTA TGAGGCAGTCGCGCATAATTCCG toto je pouze demonstracni zarovnani clustering vede ke konsensualni sekvenci TTACCCCATGGTGGCGGCTGGGGACAGTCGCGCATAATTCCG consensus

Sequence alphabet Adenine A Thymine T Cytosine G Guanine C Name side chain charge at physiological pH 7.4 Name 3 letters 1 letter Positively charged side chains Arginine Arg R Histidine His H Lysine Lys K Negatively charged side chains Aspartic Acid Asp D Glutamic Acid Glu E Polar uncharged side chains Serine Ser S Threonine Thr T Asparagine Asn N Glutamine Gln Q Special Cysteine Cys C Selenocysteine Sec U Glycine Gly G Proline\ Pro P Hydrophobic side chains Alanine Ala A Leucine Leu L Isoleucine Ile I Methionine Met M Phenylalanine Phe F Tryptophan Trp W Tyrosine Tyr Y Valine Val V Adenine A Thymine T Cytosine G Guanine C

Sequence alignment Procedure of comparing sequences Point mutations – easy More difficult example However, gaps can be inserted to get something like this ACGTCTGATACGCCGTATAGTCTATCT ACGTCTGATTCGCCCTATCGTCTATCT gapless alignment ACGTCTGATACGCCGTATAGTCTATCT CTGATTCGCATCGTCTATCT Gaps correspond to inserion in one sequnce, or deletion in another. (indel) Comparing two genes it is generally impossible to tell if an indel is an insertion in one gene, or a deletion in another, unless ancestry is known: ACGTCTGATACGCCGTATCGTCTATCT ACGTCTGAT---CCGTATCGTCTATCT ACGTCTGATACGCCGTATAGTCTATCT ----CTGATTCGC---ATCGTCTATCT gapped alignment insertion × deletion indel

Why align sequences – continuation The draft human genome is available Automated gene finding is possible Gene: AGTACGTATCGTATAGCGTAA What does it do? One approach: Is there a similar gene in another species? Align sequences with known genes Find the gene with the “best” match

Flavors of sequence alignment pair-wise alignment × multiple sequence alignment - párové/násobné zarovnání -

Flavors of sequence alignment global alignment × local alignment global align entire sequence stretches of sequence with the highest density of matches are aligned, generating islands of matches or subalignments in the aligned sequences - Sequences that are quite similar and approximately the same length are suitable candidates for global alignment. - Local alignments are more suitable for aligning sequences that are similar along some of their lengths but dissimilar in others, sequences that differ in length, or sequences that share a conserved region or domain. local

Evolution common ancestors wikipedia.org

Evolution of sequences The sequences are the products of molecular evolution. When sequences share a common ancestor, they tend to exhibit similarity in their sequences, structures and biological functions. DNA1 DNA2 Protein1 Protein2 - similar sequences produce similar proteins – this is probably the most powerful idea of bioinformatics because it enables us to make predictions. Often little is known about the function of new sequence from a genome sequencing program, but if similar sequences can be found in a database for which functional or structural information is available, then this can be used as the basis of a prediction of function or structure for the new sequence. Sequence similarity Similar 3D structure Similar function Similar sequences produce similar proteins However, this statement is not a rule. See Gerlt JA, Babbitt PC. Can sequence determine function? Genome Biol. 2000;1(5) PMID: 11178260

Homology During the time period, the molecular sequences undergo random changes, some of which are selected during the process of evolution. Selected sequences accumulate mutations, they diverge over time. Two sequences are homologous when they are descended from a common ancestor sequence. Traces of evolution may still remain in certain portions of the sequences to allow identification of the common ancestry. Residues performing key roles are preserved by natural selection, less crucial residues mutate more frequently.

Orhology, paralogy I Orthologs – homologous proteins from different species that possess the same function (e.g. corresponding kinases in signal transduction pathway in humans and mice) Paralogs – homologous proteins that have different function in the same species (e.g. two kinases in different signal transduction pathways of humans) However, these terms are controversially discussed: Jensen RA. Orthologs and paralogs - we need to get it right. Genome Biol. 2001;2(8), PMID: 11532207 and references therein two flavors of homology kinase: phospohorylation, transfers phosphate groups from high-energy donor molecules, such as ATP, to specific substrates

Orthology, paralogy II Orthologs – genes separated by the event of speciation Sequences are direct descendants of a common ancestor. Most likely have similar domain structure, 3D structure and biological function. Paralogs – genes separated by the event of genetic duplication Gene duplication: An extra copy of a gene. Gene duplication is a key mechanism in evolution. Once a gene is duplicated, the identical genes can undergo changes and diverge to create two different genes. http://www.globalchange.umich.edu/globalchange1/current/lectures/speciation/speciation.html

Gene duplication Unequal cross-over Entire chromosome is replicated twice This error will result in one of the daughter cells having an extra copy of the chromosome. If this cell fuses with another cell during reproduction, it may or may not result in a viable zygote. Retrotransposition Sequences of DNA are copied to RNA and then back to DNA instead of being translated into proteins resulting in extra copies of DNA being present within cell. Gene duplication is believed to play a major role in evolution; Duplications typically arise from an event termed unequal crossing-over (recombination) that occurs between misaligned homologous chromosomes during meiosis (germ cell formation). The chance of this event happening is a function of the degree of sharing of repetitive elements between two chromosomes. The recombination products of such an event are a duplication at the site of the exchange and a reciprocal deletion. Another way that gene duplication can occur is if the entire chromosome is replicated twice. This error will result in one of the daughter cells having an extra copy of the chromosome and all the extra genetic material. If this cell fuses with another cell during reproduction, it may or may not result in a viable zygote. The last way that gene duplication can occur is through retrotransposition. During retrotransposition, sequences of DNA are copied to RNA and then back to DNA instead of being translated into proteins. This results in extra copies of that DNA being present within the cell, which can rejoin with the chromosomes that are already present. Any genes found along these sequences of DNA will have been duplicated in the process.

Unequal cross-over Homologous chromosomes are misaligned during meiosis. The probability of misalignment is a function of the degree of sharing the repetitive elements. The underlying DNA sequence homology of the similar maternal and paternal chromosome pairs guides this search and eventual alignment along the entire length of each chromosome. The alignment is further mediated and cemented by a three-dimensional zipperlike structure surrounding each set of paired homologous chromosomes, the synaptonemal complex. Read more: Meiosis - Biology Encyclopedia - cells, plant, body, human, process, different, chromosomes, DNA, organs http://www.biologyreference.com/Ma-Mo/Meiosis.html#ixzz1cXdSgeg7

Comparing sequences through alignment – patterns of conservation and variation can be identified. The degree of sequence conservation in the alignment reveals evolutionary relatedness of different sequences The variation between sequences reflects the changes that have occurred during evolution in the form of substitutions and/or indels. Identifying the evolutionary relationships between sequences helps to characterize the function of unknown sequences. Protein sequence comparison can identify homologous sequences from common ancestor 1 billions year ago (BYA). DNA sequences typically only 600 MYA.