Applications of genome sequencing projects 4) Bioarchaeology, anthropology, human evolution, human migration 5) DNA forensics 6) Agriculture, livestock breeding, and bioprocessing 1) Molecular Medicine 2) Energy sources and environmental applications 3) Risk assessment
Molecular medicine improved diagnosis of disease earlier detection of genetic predisposition to disease Rational drug design Gene therapy and control systems for drugs pharmacogenomics "custom drugs"
Definitions DNA polymorphism: A DNA sequence that occurs in two or more variant forms Alleles: any variations in genes at a particular location (locus) Haplotype: combination of alleles at multiple, tightly-linked loci that are transmitted together over many generations Anonymous locus : position on genome with no known function DNA marker: polymorphic locus useful for mapping studies RFLP Variation in the length of a restriction fragment due to nucleotide changes at a restriction site, detected by a particular probe / PCR. SNP: presence of two different nucleotides at the same loci in genomic DNA from different individuals DNA fingerprinting: Detection of genotype at a number of unlinked highly polymorphic loci using one probe Genetic testing: Testing for a pathogenic mutation in a certain gene in an individual that indicate a person’s risk of developing or transmitting a disease
The spectrum of human diseases Cystic fibrosis thalassemiaHuntington’s cancer <5%
‘Mendelian’ diseases (<5%) Autosomal dominant inheritance: e.g huntington’s disease Autosomal codominant inheritance e.g Hb-S sickle cell disease Autosomal recessive inheritance: e.g cystic fibrosis, thalassemias X-linked inheritance: e.g Duchenne muscular dystrophy (DMD)
How to identify disease genes Identify pathology Find families in which the disease is segregating Find ‘candidate gene’ Screen for mutations in segregating families
How to map candidate genes 2 broad strategies have been used A. Position independent approach (based on knowledge of gene function) 1) biochemical approach 2) animal model approach B. Position dependent approach (based on mapped position)
Position independent approach 1) Biochemical: when the causative protein has been identified E.g. Factor VIII haemophilia Blood-clotting cascade in which vessel damage causes a cascade of inactive factors to be converted to active factors
Blood tests determine if active form of each factor in the cascade is present Fig c
Techniques used to purify Factor VIII and clone the gene Fig d Fig d Hartwell
2) Animal model approach compares animal mutant models for a phenotypically similar human disease. E.g. Identification of the SOX10 gene in human Waardenburg syndrome4 (WS4) Dom (dominant megacolon) mutant mice shared phenotypic traits similar to human patient with WS4 (Hirschsprung disease, hearing loss, pigment abnormalities) WS4 patients screened for SOX10 mutations confirmed the role of this gene in WS4. Dom mouse Hirschsprung Waardenburg
B) Positional dependent approach Positional cloning identifies a disease gene based on only approximate chromosomal location. It is used when nature of gene product / candidate genes is unknown. Candidate genes can be identified by a combination of their map position and expression, function or homology
B) Positional Cloning Steps Step 1 – Collect a large number of affected families as possible Step 2 - Identify a candidate region based on genetic mapping (~ 10Mb or more) Step 3 - Establish a transcript map, cataloguing all the genes in the region Step 4- Identify potential candidate genes Step 5 – confirm a candidate gene and screen for mutations in affected families
Step 2 - Identifying a candidate region Genetic map of <1Mb Genetic markers: RFLPs, SSLPs, SNPs Linkage association: Lod scores ( log of the odds): ratio of the odds that 2 loci are linked or not linked need a lod of 3 to prove linkage and a lod of -2 against linkage Chromosmal abnormalities Halpotype association HapMap published in Oct Nature
DNA markers/polymorphisms RFLPs (restriction fragment length polymorphisms) - Size changes in fragments due to the loss or gain of a restriction site SSLPs (simple sequence length polymorphisms) or microsatellite repeats. Copies of bi, tri or tetra nucleotide repeats of differing lengths e.g. 25 copies of a CA repeat can be detected using PCR analysis. SNPs (single nucleotide polymorphisms)- presence of two different nucleotides at the same loci in genomic DNA from different individuals
RFLPs Fig – genetics/ Hartwell -Amplify fragment -Expose to restriction enzyme -Gel electrophoresis e.g., sickle-cell genotyping with a PCR based protocol
SSLPs Similar principles used in detection of RFLPs However, no change in restriction sites Changes in length of repeats
SNPs (single nucleotide polymorphisms) SNP detection using allele-specific oligonucleotides (ASOs) Very short probes (<21 bp) specific which hybridize to one allele or other ASOs can determine genotype at any SNP locus Fig presence of two different nucleotides at the same loci in genomic DNA from different individuals
Fig a-c
Hybridized and labeled with ASO for allele 1 Hybridized and labeled with ASO for allele 2 Fig d, e
Step 2 – identifying candidate regions Chromosomal abnormalities: Rare patients who show chromosomal abnormalities linked to an unexplained phenotype. E.g DMD Boy’BB’ with a single large Xp21 deletion who had - Duschenne’s muscular dystrophy (DMD gene) - Chronic granulomatoses disease (CYBB gene) - retinitis pigmentosa (RPGR gene) - McLeod phenotype (XK gene)
Step 3 – transcript map which defines all genes within the candidate region Search browsers e.g. Ensembl Computational analysis –Usually about 17 genes per 1000 kb fragment –Identify coding regions, conserved sequences between species, exon-like sequences by looking for codon usage, ORFs, and splice sites etc Experimental checks – double check sequences, clones, alignments etc Direct searches – cDNA library screen
Step 4 – identifying candidate genes Expression: Gene expression patterns can pinpoint candidate genes Northern blot analysis reveals only one of candidate genes is expressed in lungs and pancreas RNA expression by Northern blot or RT-PCR or microarrays Look for misexpression (no expression, underexpression, overexpression) CFTR gene
Step 4 – identifying candidate genes Function: Look for obvious function or most likely function based on sequence analysis e.g. retinitis pigmentosa Candidate gene RHO part of phototransduction pathway Linkage analysis mapped disease gene on 3q (close to RHO) Patient-specific mutations identified in a year
Step 4 – identifying candidate genes Homology: look for homolog (paralog or ortholog) Both mapped to 5q Beals syndrome fibrillin gene FBN2 Marfan syndrome fibrillin gene FBN1
Step 4 – identifying candidate genes Animal models: look for homologous genes in animal models especially mouse e.g. Waardenburg syndrome type 1 Linkage analysis localised WS1 to 2q Splotch mouse mutant showed similar phenotype Could sp and WS1 be orthologous genes? Pax-3 mapped to sp locus Homologous to HuP2 Splotch mouseWS type1
Step 5 – confirm a candidate gene Mutation screening Sequence differences - Missense mutations identified by sequencing coding region of candidate gene from normal and abnormal individuals Transgenic model - Knockout / knockin the mutant gene into a model organism Modification of phenotype
Transgenic analysis can prove candidate gene is disease locus Fig
Reading HMG3 by T Strachan & AP Read : Chapter 14 AND/OR Genetics by Hartwell (2e) chapter 11 Optional Reading on Molecular medicine Nature (May2004) Vol 429 Insight series human genomics and medicine pp439 (editorial) predicting disease using medicine by John Bell pp