Human Genetics Weibin Shi Michele Sale
Contact Information Shi: Sale:
Recommended textbooks Medical Genetics -Jorde, Carey, Bamshad & White Mosby, ISBN 13: Mosby, ISBN 13: Human Molecular Genetics - Strachan T, Read A Garland Science,ISBN-10:
Overview of course content 1: Organization of the human genome 2: Genetic variation 3. Patterns of inheritance 4: Population genetics 5: linkage disequilibrium 6: Genetic epidemiology 7: Applied research in human genetics
Organization of the human genome
February 2001 Human genome sequence published - February 2001
Genes are found in the nucleus and mitochondria
Nuclear genome packaged with proteins to form chromatin
Human chromosomes 23 pairs 46 chromosomes 22 pairs – autosomes 1 pair sex chromosomes 46,XY Normal male
46,XX Normal female Human chromosomes
A little more basic terminology
Human genome = nuclear genome + mitochondrial genome
NUCLEAR GENOME 24 distinct chromosomes (22 autosomal + X + Y) 3,200 Mbp 25,000 genes Mitochondrial genome 16,569 bp 37 genes
Small (16.5 kb) circular DNA rRNA, tRNA and protein encoding genes (37) 1 gene/0.45 kb Very few repeats No introns 93% coding Genes are transcribed as multimeric transcripts Maternal inheritance Human Mitochondrial Genome
24 of 37genes are RNA coding 22 tRNA 22 tRNA 2 ribosomal RNA (23S, 16S) 2 ribosomal RNA (23S, 16S) 13 of 37 genes are protein coding some subunits of respiratory complexes and oxidative phosphorylation enzymes What are the mitochondrial genes?
mt encodednuclear mt encodednuclear NADH dehydrogenase 7 subunits35 subunits Cytochrome b-c1 comp 1 subunit10 subunits Cytochrome C oxidase 3 subunits10 subunits ATP synthase complex 2 subunits14 subunits Limited autonomy of mitochondrial genome
Two independent ATG located in Frame-shift to each other, second stop codon is derived from TA + A (from poly-A) Two overlapping genes encoded by same strand of mt DNA (unique example)
Mitochondrial codon table
3,200 Mb 23 (XX) or 24 (XY) linear chromosomes 25,000 genes 1 gene/120kb Introns in the most of the genes 1.5 % of DNA is coding Genes are transcribed individually Repetitive DNA sequences (45%) Inherited from both parents Human Nuclear Genome
In human nuclear genome gene-rich regions are separated by gene deserts Chr. 19 has the highest gene density Chr. 13 & Y show the lowest gene density
Human genome base content 41% CG in average 38% CG for Chr. 4 and Chr % for Chr. 19 Regions with wide swings in CG content (e.g. from 33.1% to 59.3%) Gene density correlates with higher CG content
CpG dinucleotide depletion Expected frequency is 4.2% Observed frequency is five times lower
Location of CpG islands in the gene CpG islands in the regulatory areas of human genes
Human nuclear genome Gene density varies widely Averagely 9 exons per gene 363 exons in titin gene Certain genes are intronsless Largest intron is 800 kb (WWOX gene) Smallest introns – 10 bp Average 5’ UTR kb Average 3’ UTR 0.77 kb Largest protein: titin: 38,138 aa
Gene density varies substantially between chromosomal regions
Genes vary in size and exon content
INTRONLESS GENES Interferon genes Histone genes Many ribonuclease genes Heat shock protein genes Many G-protein coupled receptors Various neurotransmitters receptors and hormone receptors
Genes within genes
Classical gene families: members exhibit a high degree of sequence similarity alpha-albuminserum albuminvitamin D-binding protein four placenta-specific genes, primates only CS = chorionic somatomammotropin
Gene families: gene products bearing short conservative amino acid motifs DEAD box proteins are involved in mRNA splicing and translation initiation; DEAD box (Asp-Glu-Ala-Asp) WD proteins take part in a variety of regulatory functions, GH (Gly-His) should be at aa distance from WD (Trp-Aps)
Gene superfamily: Proteins that are functionally related in a general sense, but show only weak homology
Functionally similar genes are occasionally clustered, but usually dispersed throughout the genome
Non-coding RNA genes Code for functional RNA ncRNA represent 98% of all transcripts in a mammalian cell ncRNA can be: Structural Structural Catalytic Catalytic Regulatory Regulatory
How many genes in the nuclear genome? ~3000 RNA genes in the nuclear genome ~10% of human gene count have not been taken into account in gene counts
Non-coding RNA tRNA – transfer RNA: involved in translation rRNA – ribosomal RNA: structural component of ribosome, where translation takes place snoRNA – small nucleolar RNA: functional/catalytic in rRNA maturation Antisense RNA: gene regulation/silencing
microRNA A new class of non-coding RNA gene Products are 19~25 nt RNAs Precursors are nt. Block translation or result in degradation of target mRNA
Tandem repeats and interspersed repeats
Satellite DNA is repetitive DNA that could be separated by centrifugation Equilibrium density gradient centrifugation Sheared DNA in Cesium Chloride gradient
Satellite DNA Alpha –satellite (Centromere DNA) MicrosatelliteMinisatellite
Microsatellite di-, tri-, and tetra-nucleotide repeats ~10% of the nuclear genome TGCTCATCATCATCAGC TGCTCATCA------GC TGCCACACACACACACACAGC TGCCACACACACA------GC TGCTCAGTCAGTCAGTCAGGC TGCTCAGTCAG GC
Minisatellites 1 tgattggtct ctctgccacc gggagatttc cttatttgga ggtgatggag gatttcagga 61 attttttagg aattttttta atggattacg ggattttagg gttctaggat tttaggatta 121 tggtatttta ggatttactt gattttggga ttttaggatt gagggatttt agggtttcag 181 gatttcggga tttcaggatt ttaagttttc ttgattttat gattttaaga ttttaggatt 241 tacttgattt tgggatttta ggattacggg attttagggt ttcaggattt cgggatttca 301 ggattttaag ttttcttgat tttatgattt taagatttta ggatttactt gattttggga 361 ttttaggatt acgggatttt agggtgctca ctatttatag aactttcatg gtttaacata 421 ctgaatataa atgctctgct gctctcgctg atgtcattgt tctcataata cgttcctttg Repeat: AGGAATTTTT 6-64 bp repeating pattern
α-Satellite repeat 171 bp sequence repeat
Interspersed repetitive DNA SINE (Short interspersed nuclear elements): Alu, ~0.3 kb, ~10,7% of human DNA (1,200, 000 copies) Alu, ~0.3 kb, ~10,7% of human DNA (1,200, 000 copies) MIR, ~0.13 kb, 3% of human DNA (500,000 copies) MIR, ~0.13 kb, 3% of human DNA (500,000 copies) LINE (Long interspersed nuclear elements): ~0.8 kb, ~21% of human DNA (~1,00,000 copies) ~0.8 kb, ~21% of human DNA (~1,00,000 copies)
Chromosomal location of repeats
Non-functional copy of a gene Non-processed pseudogene Non-processed pseudogene Nonfunctional copies of the genomic DNA sequence of a geneNonfunctional copies of the genomic DNA sequence of a gene Contain exons, intron, and flanking sequencesContain exons, intron, and flanking sequences Processed pseudogene Processed pseudogene Nonfunctional copies of the exonic sequences of a geneNonfunctional copies of the exonic sequences of a gene Reverse-transcribed from an RNA transcriptReverse-transcribed from an RNA transcript No 5’ promoterNo 5’ promoter No intronsNo introns Often includes polyA tailOften includes polyA tail Both include events that make the gene non-functional Both include events that make the gene non-functional FrameshiftFrameshift Stop codonsStop codons Could be as high as 20-30% of all Genomic sequence predictions could be pseudogene We assume pseudogenes have no function, but we really don’t know! Pseudogenes
Human Genome Organization HUMAN GENOME Genes and gene- related sequences Extragenic DNA Nuclear genome 3,200 Mb 25,000 genes Mitochondrial genome 16.5 kb 37 genes Coding DNA Noncoding DNA Unique or low copy number Moderate to highly repetitive Pseudogenes Gene fragments Introns, untranslated sequences, etc. Tandemly repeated Interspersed repeats Unique or moderately repetitive Two rRNA genes 22 tRNA genes 13 polypeptide- encoding genes