Presentation is loading. Please wait.

Presentation is loading. Please wait.

BioInformatics (2). Physical Mapping - I Low resolution  Megabase-scale High resolution  Kilobase-scale or better Methods for low resolution mapping.

Similar presentations


Presentation on theme: "BioInformatics (2). Physical Mapping - I Low resolution  Megabase-scale High resolution  Kilobase-scale or better Methods for low resolution mapping."— Presentation transcript:

1 BioInformatics (2)

2 Physical Mapping - I Low resolution  Megabase-scale High resolution  Kilobase-scale or better Methods for low resolution mapping  Somatic cell hybrids (human and mouse or hamster) -Fast chromosomal localisation of genes -Subchromosomal mapping possible  Fluorescence in situ hybridisation (FISH)  Chromosome painting  Fractionation of chromosomes by flow cytometry

3 Physical Mapping - II Methods for high resolution mapping  Long-range restriction mapping  Pulsed-field gel electrophoresis (PFGE)  Assembly of clone contigs  The double digest problem -Ordering fragments from a 2 restriction enzyme digest  Sequence Tagged Sites (STSs) -Sequence fragments in the genome described uniquely by a pair of PCR primers -Usually 200-300 bases -Very useful as ‘landmarks’ on the physical map -Can be mapped to individual clones by FISH  Assembly of STS-content physical maps

4 Physical Mapping - III Map units (human genome)  1 cM = ~ 1 Mb  1 cR = ~ 30 kb -1 centiRay = 1% chance of a radiation-induced break between 2 markers  Major information resources -Stanford Human Genome Center (RH maps) –http://www-shgc.stanford.edu -Whitehead/MIT Genome Center (STS content maps) –http://www-genome.wi.mit.edu/ -Centre d’Etude du Polymorphisme Humaine - CEPH (YAC maps) –http://www.cephb.fr/bio/ceph-genethon-map.html

5 Physical Mapping - IV Conclusions  The value of physical mapping -Confirmation of chromosomal location of clones and genes -Correction of genetic map errors -Correlation to genetic map reveals ‘hot’and ‘cold’ regions of recombinational activity on chromosomes -Provides useful information for duplicated regions -High resolution mapping provides the framework necessary for high quality sequencing of large genomic regions

6 System for Assembling Markers (SAM)

7

8 DNA Sequencing Ordered clone library  Sequencing of overlapping clones of known order as determined by restriction analysis  Advantage -Easy ordering of resulting sequence reads  Disadvantage -Detailed mapping is time-consuming Shotgun sequencing  Partial digestion of DNA with a 4-cuter enzyme  Sequencing of randomly overlapping clones  Computer-aided assembly of reads  Advantage -Speed -Disadvantage -High data redundancy due to random sequencing -Not suitable for large genomes (>300 Mb)

9 Assembly of Sequence Contigs The problem:  Semi-automated assembly of a contiguous DNA sequence from overlapping gel readings Steps  Base identification  Trimming of ends  Vector clipping  Assembly of fragments Major software packages  Sequencher TM from GeneCodes Inc., Ann Arbor, Michigan  Platforms: PowerMac, Windows NT  Up to 70 kb contigs  The Staden package by Staden et al., MRC, Cambridge  PHRED/PHRAP by Green et al., University of Washington, Seattle  Platforms: Unix  Megabase range contigs  Mutation detection capabilities

10 Quality Control of Sequence Data Source: US DOE Joint Genome Institute Goals  Complete sequence continuity across a target region (both within and between clones) -No more than one gap in 200 kb -Size of all gaps no larger than 1% of the size of the total region  ‘Allowable gaps’ include -regions unclonable/unstable in conventional cloning vectors -repetitive regions -regions with significant secondary structure or abnormally high GC content -Gap size measured by PCR or restriction digest analysis  Accuracy of finished sequence: 1 error in 10,000 bases -At least 95% double-strand coverage  Assembly Verification -a minimum of three independent restriction digests -reassembly with an independent algorithm -re-sequencing of random clones

11 Submission and Annotation of Sequence Data Source: US DOE Joint Genome Institute Size of the starting clone is minimum size of submission to public databases  95% of the sequence represented on both strands  all ambiguities resolved or annotated  missing data from the end of a clone allowed if sequence overlap is detected with the adjacent clone in the tiling path Level of annotation  all sequences annotated in a largely automated fashion  identification of putative or known genes, repetitive elements, EST matches and any other useful “miscellaneous features”  computationally-derived predictions must be indicated as such Immediate release of finished annotated sequence  Global assembly of meta-contigs from previously submitted data will be performed periodically

12 International Strategy Meeting on Human Genome Sequencing Bermuda, 25th-28th February 1996 Sponsored by the Wellcome Trust Summary of agreed principles  Primary genomic sequence should be in the public domain  Primary genomic sequence should be rapidly released  Assemblies of greater than 1 Kb should be automatically released on a daily basis  Finished annotated sequence should be immediately submitted to the public databases Coordination  Large-scale sequencing centres should inform HUGO of their intention to sequence particular regions of the human genome

13 Annotating the Human Genome Sequence Identification of coding regions  Exon/intron prediction High throughput comparison of genomic sequence to protein information  Full-length protein sequences  Databases of protein domains How automated is automated annotation in reality?  Advantages -High speed -Good for tRNA genes, repetitive regions -Good for high-scoring matches in databases, but  Disadvantages -Error propagation can be detrimental -Domain ‘recycling’ in evolution causes misinterpretation, e.g. in the case of transcription factors similar to peptidases Very computer-intensive task!


Download ppt "BioInformatics (2). Physical Mapping - I Low resolution  Megabase-scale High resolution  Kilobase-scale or better Methods for low resolution mapping."

Similar presentations


Ads by Google