CSCI 1810 Computational Molecular Biology 2018

Slides:



Advertisements
Similar presentations
Site-specific recombination
Advertisements

Genomics – The Language of DNA Honors Genetics 2006.
Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
DNA Sequencing Lecture 9, Tuesday April 29, 2003.
GENE DUPLICATIONS A.Non-homologous recombination B.Transposition C.Non-disjunction in meiosis.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
Class 02: Whole genome sequencing. The seminal papers ``Is Whole Genome Sequencing Feasible?'' ``Whole-Genome DNA.
DNA Sequencing. CS273a Lecture 3, Spring 07, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
DNA Sequencing. Next few topics DNA Sequencing  Sequencing strategies Hierarchical Online (Walking) Whole Genome Shotgun  Sequencing Assembly Gene Recognition.
The Human Genome Race. Collins vs. Venter Collins Venter.
CS262 Discussion Section 3. Topics for today DNA replication DNA sequencing: Biological tools Transposons “Out of Africa” hypothesis of human origins.
CS273a Lecture 4, Autumn 08, Batzoglou Hierarchical Sequencing.
DNA Sequencing.
DNA Sequencing. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
DNA Sequencing. CS273a Lecture 3, Autumn 08, Batzoglou DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT.
Viruses, Jumping Genes and Other Unusual Genes Chapter 12.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Analysis Determine locus & sequence of all the organism’s genes More than 100 genomes have been analysed including humans in the Human Genome Project.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
Screening a Library Plate out library on nutrient agar in petri dishes. Up to 50,000 plaques or colonies per plate.
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Genome sequencing Haixu Tang School of Informatics.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
Cutting and Pasting DNA The cutters are called restriction enzymes, they cut DNA at specific nucleotide sequences.
SIZE SELECT SHEAR Shotgun DNA Sequencing (Technology) DNA target sample LIGATE & CLONE Vector End Reads (Mates) SEQUENCE Primer.
BACTERIA AND VIRUSES. DNA core Protein coat (capsid) Characteristics: Parasitic Replicate only inside phenomenal rate.
Genomes & their evolution Ch 21.4,5. About 1.2% of the human genome is protein coding exons. In 9/2012, in papers in Nature, the ENCODE group has produced.
Chapter 21 Eukaryotic Genome Sequences
BACTERIAL TRANSPOSONS
Human Genome.
Genetic Engineering Genetic engineering is also referred to as recombinant DNA technology – new combinations of genetic material are produced by artificially.
Bacteriophage Families with a detailed description of Models Phages Myoviridae – Mu Viro102: Bacteriophages & Phage Therapy 3 Credit hours NUST Centre.
Recombinant DNA Technology. DNA replication refers to the scientific process in which a specific sequence of DNA is replicated in vitro, to produce multiple.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
A Molecular Toolkit AP Biology Fall The Scissors: Restriction Enzymes  Bacteria possess restriction enzymes whose usual function is to cut apart.
Chapter 5 Sequence Assembly: Assembling the Human Genome.
454 Genome Sequence Assembly and Analysis HC70AL S Brandon Le & Min Chen.
Genome Analysis. This involves finding out the: order of the bases in the DNA location of genes parts of the DNA that controls the activity of the genes.
Homologous Recombination
Title: Studying whole genomes Homework: learning package 14 for Thursday 21 June 2016.
 DNA- genetic material of eukaryotes.  Are highly variable in size and complexity.  About 3.3 billion bp in humans.  Complexity- due to non coding.
Objective: I can explain how genes jumping between chromosomes can lead to evolution. Chapter 21; Sections ; Pgs Genomes: Connecting.
Objectives: Outline the steps involved in sequencing the genome of an organism. Outline how gene sequencing allows for genome wide comparisons between.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
DNA Sequencing Project
Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite.
Genome sequence assembly
Genomes and Their Evolution
Genomes and their evolution
Pre-genomic era: finding your own clones
Section 3: Gene Technologies in Detail
Genomes and Their Evolution
SGN23 The Organization of the Human Genome
Stuff to Do.
Genomes and Their Evolution
Evolution of eukaryote genomes
Fig Figure 21.1 What genomic information makes a human or chimpanzee?
A Sequenciação em Análises Clínicas
BSC1010: Intro to Biology I K. Maltz Chapter 21.

Chapter 6 Clusters and Repeats.
Introduction to Sequencing
Sequence the 3 billion base pairs of human
Evolution of Genomes Chapter 21.
Unit Genomic sequencing
Human Genome Project Seminal achievement. Scientific milestone.
Presentation transcript:

CSCI 1810 Computational Molecular Biology 2018 Genome Assembly – short intro

Assembly Progression (Macro View)

Review-Assembly Step 1: Compare sequences all against all and find all fragment intersections of at least 40 bases with up to 6% error. (For the human genome this took 10,000 CPU hours) Step 2: Cluster into groups of overlapping fragments that agree on a common sequence, and do not overlap fragments that dispute this sequence. Such clusters are called contigs.

Review-Assembly Step 3: Identify contigs the originated from repeats by using the “depth” of the fragments. Step 4: Determine the consensus sequence of contig.

Repeats Classes of Repeats Uses of repeats Transposon derived repeats (45% of genome) Pseudugenes (inactive copies of genes) Short Kmer repeats ( (A)n (CA)n ) Segmental duplication Blocks of tandemly repeated segments Uses of repeats Passively repeats help study evolution Actively repeats case genome rearrangements

Repeats in the Human Genome Hitch-hikers: molecules that use our genetic machinery for their replication - viruses and repeats: DNA transposons 3% of our genome Use our DNA replication machinery, encode transposase. Many small unrelated families (common ancestor). RNA transposons (retroposons) 41% of our genome, Alu 400bpX106 copies Use our transcription machinery, encode reverse transcriptase.

History of Sequencing BAC to BAC sequencing: Used by HGP in the early stages when sequencing was slow and time consuming. BAC end shotgun sequencing: Used by HGP in later stages. Whole genome shotgun sequencing: Used by Celera. The success of whole genome shotgun sequencing is a victory for computer science.

BAC to BAC sequencing Several copies of the genome are randomly cut into pieces of about 150,000 bp. Each of these fragments is inserted into a BAC creating a BAC library of entire genome. Fingerprint each fragment using restriction enzymes. Use fingerprint to create a physical map determining order and orientation of fragments (tedious process which many CS people earned their living on. Distribute BACS between laboratories, perform shotgun sequencing on each BAC

BAC end shotgun sequencing Several copies of a chromosome are randomly cut into pieces of about 150,000 bp. Sequence 500 bp of both ends from each BAC. Randomly chose a single BAC and perform shotgun sequence. “walk” along the chromosome using the sequenced ends to chose next BAC. Problem: is not parallel

Whole genome shotgun sequencing Several copies of the whole are randomly cut into pieces of about 2000bp and 10000bp Sequence 500 bp of both ends from each fragment. Each such pair of sequences ends are called mates. Perform assembly over all sequences to create contigs. Use the mates to put contigs together.

Whole genome shotgun sequencing We know each mate pair is either 2000 or 10000 bps apart and we know their orientation. The process of ordering and placing the contigs is called scaffolding. More than one mate pair supports each pair of contigs The long 10000bp sequences allow us to jump over problematic repetative regions.

Handling repeats Assembler classifies repeat sequences by size and reliability. Rocks are the most reliable and must be supported by at least 2 mates one for each neighboring contig Stones are linked by only one mate Finally pebbles fill in the holes