BioInformatics (2). Physical Mapping - I Low resolution  Megabase-scale High resolution  Kilobase-scale or better Methods for low resolution mapping.

Slides:



Advertisements
Similar presentations
The Human Genome Project
Advertisements

Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
Recombinant DNA technology
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
9 Genomics and Beyond Brief Chapter Outline
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
CISC667, F05, Lec4, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Whole genome sequencing Mapping & Assembly.
Mining SNPs from EST Databases Picoult-Newberg et al. (1999)
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
16 and 20 February, 2004 Chapter 9 Genomics Mapping and characterizing whole genomes.
3 September, 2004 Chapter 20 Methods: Nucleic Acids.
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
© 2005 Prentice Hall Inc. / A Pearson Education Company / Upper Saddle River, New Jersey Chapter 4 Genome Sequencing Strategies and procedures for.
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Genome Analysis Determine locus & sequence of all the organism’s genes More than 100 genomes have been analysed including humans in the Human Genome Project.
Manipulating the Genome: DNA Cloning and Analysis 20.1 – 20.3 Lesson 4.8.
Reading the Blueprint of Life
DNA Technology- Cloning, Libraries, and PCR 17 November, 2003 Text Chapter 20.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
HAPLOID GENOME SIZES (DNA PER HAPLOID CELL) Size rangeExample speciesEx. Size BACTERIA1-10 Mb E. coli: Mb FUNGI10-40 Mb S. cerevisiae 13 Mb INSECTS.
Mouse Genome Sequencing
Large-scale genome projects
PHYSICAL MAPPING AND POSITIONAL CLONING. Linkage mapping – Flanking markers identified – 1cM, for example Probably ~ 1 MB or more in humans Need very.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
CUGI Pilot Sequencing/Assembly Projects Christopher Saski.
CO 10.
Tomato Chromosome 4: A Mapping & Sequencing Update 28 th September 2005 Christine Nicholson Mapping Core Group Welcome Trust Sanger Institute, UK.
05/04/2005 Informatics Meeting C. elegans – “Back To The Future”. Paul Davis (aka Huey)
How do you identify and clone a gene of interest? Shotgun approach? Is there a better way?
Molecular Basis for Relationship between Genotype and Phenotype DNA RNA protein genotype function organism phenotype DNA sequence amino acid sequence transcription.
Probes can be designed in an evolutionary hierarchy.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Steps in a genome sequencing project Funding and sequencing strategy source of funding identified / community drive development of sequencing strategy.
Sequencing a genome. Approximate Molecular Dynamics: New Algorithms with Applications in Protein Folding Author: Qun (Marc) Ma Predicting the 3D native.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
A Sequenciação em Análises Clínicas Polymerase Chain Reaction.
Status report on gap closure of the human chromosome 5 BAC map Authentication of C5 BAC maps Map and sequence status Gap status and steps used to close.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
MAPPING GENOMES – genetic, physical & cytological maps Genetic distance (in cM) 1 centimorgan = 1 map unit, corresponding to recombination frequency of.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Chromosome 2 Doil Choi, Sunghwan Jo KOREA. Cytological architecture of chromosome kb/µm DAPI (4’-6-diamidino-2-phenylindole) stained pachytene chromosome.
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
PHYSICAL MAPPING AND POSITIONAL CLONING. Linkage mapping – Flanking markers identified – 1cM, for example Probably ~ 1 MB or more in humans Need very.
Wageningen, April 24-25, 2008 II Tomato Finishing Workshop Chromosome 12 Update ENEA, Rome University of Naples ‘Federico II’ CRIBI and Univ. of Padua.
Physical and transcript mapping Physical mapping Transcript identification.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Human Genome.
GENE SEQUENCING. INTRODUCTION CELL The cells contain the nucleus. The chromosomes are present within the nucleus.
A guided tour of Ensembl This quick tour will give you an outline view of what Ensembl is all about. You will learn: –Why we need Ensembl –What is in the.
Mojavensis: Issues of Polymorphisms Chris Shaffer GEP 2009 Washington University.
Genome Analysis Assaad text book slides only Lectures by F. Assaad can be downlaoded from muenchen.de/~farhah/index.htm.
Gene Technologies and Human ApplicationsSection 3 Section 3: Gene Technologies in Detail Preview Bellringer Key Ideas Basic Tools for Genetic Manipulation.
Genome Analysis. This involves finding out the: order of the bases in the DNA location of genes parts of the DNA that controls the activity of the genes.
Radiation hybrid map of the zebrafish genome
Virginia Commonwealth University
Physical and transcript mapping
Human Genome Project.
Seminar on :- Constructing Contigs Sequencing
اجابة السؤال الاول.
Pre-genomic era: finding your own clones
Chapter 4 Recombinant DNA Technology
Cloning Overview DNA can be cloned into bacterial plasmids for research or commercial applications. The recombinant plasmids can be used as a source of.
Peter John M.Phil, PhD Atta-ur-Rahman School of Applied Biosciences (ASAB) National University of Sciences & Technology (NUST)
Genomes and Their Evolution
Genomics Genetic Analysis on a Genome-wide (global) scale
Introduction to Sequencing
Sequence the 3 billion base pairs of human
Presentation transcript:

BioInformatics (2)

Physical Mapping - I Low resolution  Megabase-scale High resolution  Kilobase-scale or better Methods for low resolution mapping  Somatic cell hybrids (human and mouse or hamster) -Fast chromosomal localisation of genes -Subchromosomal mapping possible  Fluorescence in situ hybridisation (FISH)  Chromosome painting  Fractionation of chromosomes by flow cytometry

Physical Mapping - II Methods for high resolution mapping  Long-range restriction mapping  Pulsed-field gel electrophoresis (PFGE)  Assembly of clone contigs  The double digest problem -Ordering fragments from a 2 restriction enzyme digest  Sequence Tagged Sites (STSs) -Sequence fragments in the genome described uniquely by a pair of PCR primers -Usually bases -Very useful as ‘landmarks’ on the physical map -Can be mapped to individual clones by FISH  Assembly of STS-content physical maps

Physical Mapping - III Map units (human genome)  1 cM = ~ 1 Mb  1 cR = ~ 30 kb -1 centiRay = 1% chance of a radiation-induced break between 2 markers  Major information resources -Stanford Human Genome Center (RH maps) – -Whitehead/MIT Genome Center (STS content maps) – -Centre d’Etude du Polymorphisme Humaine - CEPH (YAC maps) –

Physical Mapping - IV Conclusions  The value of physical mapping -Confirmation of chromosomal location of clones and genes -Correction of genetic map errors -Correlation to genetic map reveals ‘hot’and ‘cold’ regions of recombinational activity on chromosomes -Provides useful information for duplicated regions -High resolution mapping provides the framework necessary for high quality sequencing of large genomic regions

System for Assembling Markers (SAM)

DNA Sequencing Ordered clone library  Sequencing of overlapping clones of known order as determined by restriction analysis  Advantage -Easy ordering of resulting sequence reads  Disadvantage -Detailed mapping is time-consuming Shotgun sequencing  Partial digestion of DNA with a 4-cuter enzyme  Sequencing of randomly overlapping clones  Computer-aided assembly of reads  Advantage -Speed -Disadvantage -High data redundancy due to random sequencing -Not suitable for large genomes (>300 Mb)

Assembly of Sequence Contigs The problem:  Semi-automated assembly of a contiguous DNA sequence from overlapping gel readings Steps  Base identification  Trimming of ends  Vector clipping  Assembly of fragments Major software packages  Sequencher TM from GeneCodes Inc., Ann Arbor, Michigan  Platforms: PowerMac, Windows NT  Up to 70 kb contigs  The Staden package by Staden et al., MRC, Cambridge  PHRED/PHRAP by Green et al., University of Washington, Seattle  Platforms: Unix  Megabase range contigs  Mutation detection capabilities

Quality Control of Sequence Data Source: US DOE Joint Genome Institute Goals  Complete sequence continuity across a target region (both within and between clones) -No more than one gap in 200 kb -Size of all gaps no larger than 1% of the size of the total region  ‘Allowable gaps’ include -regions unclonable/unstable in conventional cloning vectors -repetitive regions -regions with significant secondary structure or abnormally high GC content -Gap size measured by PCR or restriction digest analysis  Accuracy of finished sequence: 1 error in 10,000 bases -At least 95% double-strand coverage  Assembly Verification -a minimum of three independent restriction digests -reassembly with an independent algorithm -re-sequencing of random clones

Submission and Annotation of Sequence Data Source: US DOE Joint Genome Institute Size of the starting clone is minimum size of submission to public databases  95% of the sequence represented on both strands  all ambiguities resolved or annotated  missing data from the end of a clone allowed if sequence overlap is detected with the adjacent clone in the tiling path Level of annotation  all sequences annotated in a largely automated fashion  identification of putative or known genes, repetitive elements, EST matches and any other useful “miscellaneous features”  computationally-derived predictions must be indicated as such Immediate release of finished annotated sequence  Global assembly of meta-contigs from previously submitted data will be performed periodically

International Strategy Meeting on Human Genome Sequencing Bermuda, 25th-28th February 1996 Sponsored by the Wellcome Trust Summary of agreed principles  Primary genomic sequence should be in the public domain  Primary genomic sequence should be rapidly released  Assemblies of greater than 1 Kb should be automatically released on a daily basis  Finished annotated sequence should be immediately submitted to the public databases Coordination  Large-scale sequencing centres should inform HUGO of their intention to sequence particular regions of the human genome

Annotating the Human Genome Sequence Identification of coding regions  Exon/intron prediction High throughput comparison of genomic sequence to protein information  Full-length protein sequences  Databases of protein domains How automated is automated annotation in reality?  Advantages -High speed -Good for tRNA genes, repetitive regions -Good for high-scoring matches in databases, but  Disadvantages -Error propagation can be detrimental -Domain ‘recycling’ in evolution causes misinterpretation, e.g. in the case of transcription factors similar to peptidases Very computer-intensive task!