Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.

Similar presentations


Presentation on theme: "Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences."— Presentation transcript:

1 Genomics An introduction

2 Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences cDNA

3 Aims of genomics II Describing every gene: function/expression data/relationships/phenotype 3-d structure and features (introns/exons, domains, repeats) similarities to other genes Characterize sequence diversity in population

4 Genomics can be: Structural –where it is? Functional –what it does? –DNA microarrays: Comparative –finding important fragments

5 Mapping genomes Past –Genetic maps Distance between simple markers expressed in units of recombination –Cytological maps Stained chromosomes, observable under microscope Present –Physical maps Distance between nucleotides expressed in bases –Comparative map Corresponding genes detection; Regulatory sequence detection;

6 Genome sizes OrganismDNA length Genes Mycoplasma genitalium 0.5 Mb470 Deinococcus radiodurans 3 Mb in 4- 10 copies! 3 200 Escherichia coli4.5 Mb4 400 Saccharomyces cerevisiae 12 Mb6 200 Caenorhabditis elegans 97 Mb22 000 Drosophila melanogaster 120 Mb18 000 Homo sapiens3200 Mb32 000

7 Genetic differences among humans Goals –Genetic diseases –Identifying criminals Methods –Genetic markers (fingerprints) and DNA sequence. Repeats: Microsatellites (repeats of 1-12 nucleotides) Minisatellites (> 12) –Other types of variation Genome rearrangements Single nucleotide mutations

8 Microsatellites and disease Huntington’s disease –Huntingtin gene of unknown (!) function –Repeats #: 6-35: normal; 36-120: disease Friedrich ataxia disease –GAA repeat in non-coding (intron) region –Repeats #: 7-34: normal; 35 up: disease –Repeat expansion reduces expression of frataxin gene

9 SNP - Single Nucleotide Polymorphism Definition –SNP and phenotype Occurrence in genome –Rarity of most SNPs (agrees with neutral molecular evolutionary theory) –SNPs in human population: High variance in genome! Detection of SNPs: Hybridization Inter-genic regionsCoding regions Every 1400bpEvery 1430bp

10 Sickle cell anemia SNP on Beta Globin gene, which is recessive: 2 faulty copies: red blood cells change shape under stress - anemia 1 faulty copy: red blood cells change shape under heavy stress – but gives resistance to malaria parasite Sickle looks like this:

11 SNPs and haplotypes Passengers and their evolutionary vehicles

12 SNP - Phase inference In the data from sequencing the genome the origin of SNP is scrambled GTGT... CT AC GT... GAGA... CTGACGGT...... CTTACAGT...... CTGACAGT...... CTTACGGT... chromosome Possibility 1Possibility 2 Which SNPs are on the same chromosome (are in phase)?

13 SNP – phase inference Determining the parent of origin for each SNP GG In this case: GAGA... CT AC GT... CGCG CTCT A GTGT GAGA TA Phase inference – the reason why many SNPs sequencing is done for child and two parents.

14 Linkage Disequilibrium, intro How hard is it to break a chromosome An allele/trait/SNP A and a are on the same position in genome (locus), thus on a single chromosome an individual can have either of them – but not both –f A - frequency of occurrences of trait A in population –f a = 1- f A –f B, f b = 1 - f B are frequency occurrences of B and b Probabilities of occurences of both traits on the same chromosome: f AB f Ab f aB f ab LD and genomic recombination AB Ab aB ab

15 Linkage Disequilibrium, calculation When these alleles are not correlated we expect them to occur together by chance alone: f AB = f A f B f Ab = f A f b f aB = f a f B f ab = f a f b But if A and B are occurring together more often (disequilibrium state), we can write f AB = f A f B + D f Ab = f A f b - D f aB = f a f B - D f ab = f a f b + D where D is called the measure of disequlibrium Of course from definitions above we have D = f AB - f A f B

16 How can we use it? Phase inference tells us how SNPs are organized on chromosome Linkage disequilibrium measures the correlation between SNPs

17 Back to SNPs Daly et al (2001), Figure 1

18 Haplotypes - vehicles for SNPs Daly et al (2001) were able to infer offspring haplotypes largely from parents. They say that “it became evident that the region could be largely decomposed into discrete haplotype blocks, each with a striking lack of diversity“ The haplotype blocks: –Up to 100kb –5 or more SNPs For example, this block shows just two distinct haplotypes accounting for 95% of the observed chromosomes

19 Haplotypes on the genome fragment a) Observed haplotypes with dotted lines wherever probability of switching to another line is > 2% b) Percent of explanation by haplotypes c) Contribution of specific haplotypes

20 Another genetic test Does haplotypes exist? -Each row represents an SNP -Blue dot = major -yellow = minor -Each column represents a single chromosome -The 147 SNPs are divided into 18 blocks defined by black lines. -The expanded box on the right is an SNP block of 26 SNPs over 19kb of genomic DNA. The 4 most common of 7 different haplotypes include 80% of the chromosomes, and can be distinguished with 2 SNPs

21 How much SNPs we can ignore? …and still predict haplotypes with high accuracy?

22 Literature Gibson, Muse „A Primer of Genome Science” N Patil et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21 Science 294 2001:1719-1723. M J Daly et al. High-resolution haplotype structure in the human genome Nat. Genet. 29 2001: 229-232.


Download ppt "Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences."

Similar presentations


Ads by Google