Download presentation
Presentation is loading. Please wait.
Published byRosalyn Gibson Modified over 9 years ago
1
BioInformatics (2)
2
Physical Mapping - I Low resolution Megabase-scale High resolution Kilobase-scale or better Methods for low resolution mapping Somatic cell hybrids (human and mouse or hamster) -Fast chromosomal localisation of genes -Subchromosomal mapping possible Fluorescence in situ hybridisation (FISH) Chromosome painting Fractionation of chromosomes by flow cytometry
3
Physical Mapping - II Methods for high resolution mapping Long-range restriction mapping Pulsed-field gel electrophoresis (PFGE) Assembly of clone contigs The double digest problem -Ordering fragments from a 2 restriction enzyme digest Sequence Tagged Sites (STSs) -Sequence fragments in the genome described uniquely by a pair of PCR primers -Usually 200-300 bases -Very useful as ‘landmarks’ on the physical map -Can be mapped to individual clones by FISH Assembly of STS-content physical maps
4
Physical Mapping - III Map units (human genome) 1 cM = ~ 1 Mb 1 cR = ~ 30 kb -1 centiRay = 1% chance of a radiation-induced break between 2 markers Major information resources -Stanford Human Genome Center (RH maps) –http://www-shgc.stanford.edu -Whitehead/MIT Genome Center (STS content maps) –http://www-genome.wi.mit.edu/ -Centre d’Etude du Polymorphisme Humaine - CEPH (YAC maps) –http://www.cephb.fr/bio/ceph-genethon-map.html
5
Physical Mapping - IV Conclusions The value of physical mapping -Confirmation of chromosomal location of clones and genes -Correction of genetic map errors -Correlation to genetic map reveals ‘hot’and ‘cold’ regions of recombinational activity on chromosomes -Provides useful information for duplicated regions -High resolution mapping provides the framework necessary for high quality sequencing of large genomic regions
6
System for Assembling Markers (SAM)
8
DNA Sequencing Ordered clone library Sequencing of overlapping clones of known order as determined by restriction analysis Advantage -Easy ordering of resulting sequence reads Disadvantage -Detailed mapping is time-consuming Shotgun sequencing Partial digestion of DNA with a 4-cuter enzyme Sequencing of randomly overlapping clones Computer-aided assembly of reads Advantage -Speed -Disadvantage -High data redundancy due to random sequencing -Not suitable for large genomes (>300 Mb)
9
Assembly of Sequence Contigs The problem: Semi-automated assembly of a contiguous DNA sequence from overlapping gel readings Steps Base identification Trimming of ends Vector clipping Assembly of fragments Major software packages Sequencher TM from GeneCodes Inc., Ann Arbor, Michigan Platforms: PowerMac, Windows NT Up to 70 kb contigs The Staden package by Staden et al., MRC, Cambridge PHRED/PHRAP by Green et al., University of Washington, Seattle Platforms: Unix Megabase range contigs Mutation detection capabilities
10
Quality Control of Sequence Data Source: US DOE Joint Genome Institute Goals Complete sequence continuity across a target region (both within and between clones) -No more than one gap in 200 kb -Size of all gaps no larger than 1% of the size of the total region ‘Allowable gaps’ include -regions unclonable/unstable in conventional cloning vectors -repetitive regions -regions with significant secondary structure or abnormally high GC content -Gap size measured by PCR or restriction digest analysis Accuracy of finished sequence: 1 error in 10,000 bases -At least 95% double-strand coverage Assembly Verification -a minimum of three independent restriction digests -reassembly with an independent algorithm -re-sequencing of random clones
11
Submission and Annotation of Sequence Data Source: US DOE Joint Genome Institute Size of the starting clone is minimum size of submission to public databases 95% of the sequence represented on both strands all ambiguities resolved or annotated missing data from the end of a clone allowed if sequence overlap is detected with the adjacent clone in the tiling path Level of annotation all sequences annotated in a largely automated fashion identification of putative or known genes, repetitive elements, EST matches and any other useful “miscellaneous features” computationally-derived predictions must be indicated as such Immediate release of finished annotated sequence Global assembly of meta-contigs from previously submitted data will be performed periodically
12
International Strategy Meeting on Human Genome Sequencing Bermuda, 25th-28th February 1996 Sponsored by the Wellcome Trust Summary of agreed principles Primary genomic sequence should be in the public domain Primary genomic sequence should be rapidly released Assemblies of greater than 1 Kb should be automatically released on a daily basis Finished annotated sequence should be immediately submitted to the public databases Coordination Large-scale sequencing centres should inform HUGO of their intention to sequence particular regions of the human genome
13
Annotating the Human Genome Sequence Identification of coding regions Exon/intron prediction High throughput comparison of genomic sequence to protein information Full-length protein sequences Databases of protein domains How automated is automated annotation in reality? Advantages -High speed -Good for tRNA genes, repetitive regions -Good for high-scoring matches in databases, but Disadvantages -Error propagation can be detrimental -Domain ‘recycling’ in evolution causes misinterpretation, e.g. in the case of transcription factors similar to peptidases Very computer-intensive task!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.