Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.

Slides:



Advertisements
Similar presentations
applications of genome sequencing projects
Advertisements

Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail.
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Single Nucleotide Polymorphism Copy Number Variations and SNP Array Xiaole Shirley Liu and Jun Liu.
Polymorphisms: Clinical Implications By Amr S. Moustafa, M.D.; Ph.D. Assistant Prof. & Consultant, Medical Biochemistry Dept. College of Medicine, KSU.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Signatures of Selection
This presentation was originally prepared by C. William Birky, Jr. Department of Ecology and Evolutionary Biology The University of Arizona It may be used.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
16 and 20 February, 2004 Chapter 9 Genomics Mapping and characterizing whole genomes.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
Restriction Fragment Length Polymorphisms (RFLPs) By Amr S. Moustafa, M.D.; Ph.D. Assistant Prof. & Consultant, Medical Biochemistry Dept. College of.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Simple Nucleotide.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Reading the Blueprint of Life
HAPLOID GENOME SIZES (DNA PER HAPLOID CELL) Size rangeExample speciesEx. Size BACTERIA1-10 Mb E. coli: Mb FUNGI10-40 Mb S. cerevisiae 13 Mb INSECTS.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Genomics BIT 220 Chapter 21.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Human SNP haplotypes Statistics 246, Spring 2002 Week 15, Lecture 1.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Non-Mendelian Genetics
CS177 Lecture 10 SNPs and Human Genetic Variation
MAPPING GENOMES – genetic, physical & cytological maps Genetic distance (in cM) 1 centimorgan = 1 map unit, corresponding to recombination frequency of.
SNPs and the Human Genome Prof. Sorin Istrail. A SNP is a position in a genome at which two or more different bases occur in the population, each with.
Gene Hunting: Linkage and Association
Announcements: Proposal resubmission deadline 4/23 (Thursday).
CATALYST Recall and Review: – What are chromosomes? – What are genes? – What are alleles? How do these terms relate to DNA? How do these terms relate to.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
10cM - Linkage Mapping Set v2 ABI Median intermarker distance: 4.7 Mb Mean intermarker distance: 5.6 Mb Mean genetic gap distance: 8.9 cM Average Heterozygosity.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
What is a SNP?. Lecture topics What is a SNP? What use are they? SNP discovery SNP genotyping Introduction to Linkage Disequilibrium.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Genomics and Forensics
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
1 DNA Polymorphisms: DNA markers a useful tool in biotechnology Any section of DNA that varies among individuals in a population, “many forms”. Examples.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Molecular markers Non-PCR based 1courtesy of Carol Ritland.
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
1 Balanced Translocation detected by FISH. 2 Red- Chrom. 5 probe Green- Chrom. 8 probe.
Basic Biology for Bioinformatics: genes as information The central dogma of molecular genetics DNA to RNA to protein to phenotype Protein functions, synthesis.
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Simple-Sequence Length Polymorphisms SSLPs Short tandemly repeated DNA sequences that are present in variable copy numbers at a given locus. Scattered.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Notes: Human Genome (Right side page)
Linkage and Mapping Bonus #2 due now. The relationship between genes and traits is often complex Complexities include: Complex relationships between alleles.
The Haplotype Blocks Problems Wu Ling-Yun
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Restriction Fragment Length Polymorphism. Definition The variation in the length of DNA fragments produced by a restriction endonuclease that cuts at.
GENOME ORGANIZATION AS REVEALED BY GENOME MAPPING WHY MAP GENOMES? HOW TO MAP GENOMES?
Simple-Sequence Length Polymorphisms
Of Sea Urchins, Birds and Men
DNA Marker Lecture 10 BY Ms. Shumaila Azam
Recombination (Crossing Over)
Relationship between Genotype and Phenotype
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Sequential Steps in Genome Mapping
Ho Kim School of Public Health Seoul National University
CATALYST Recall and Review: How do these terms relate to DNA?
Presentation transcript:

Genomics An introduction

Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences cDNA

Aims of genomics II Describing every gene: function/expression data/relationships/phenotype 3-d structure and features (introns/exons, domains, repeats) similarities to other genes Characterize sequence diversity in population

Genomics can be: Structural –where it is? Functional –what it does? –DNA microarrays: Comparative –finding important fragments

Mapping genomes Past –Genetic maps Distance between simple markers expressed in units of recombination –Cytological maps Stained chromosomes, observable under microscope Present –Physical maps Distance between nucleotides expressed in bases –Comparative map Corresponding genes detection; Regulatory sequence detection;

Genome sizes OrganismDNA length Genes Mycoplasma genitalium 0.5 Mb470 Deinococcus radiodurans 3 Mb in copies! Escherichia coli4.5 Mb4 400 Saccharomyces cerevisiae 12 Mb6 200 Caenorhabditis elegans 97 Mb Drosophila melanogaster 120 Mb Homo sapiens3200 Mb32 000

Genetic differences among humans Goals –Genetic diseases –Identifying criminals Methods –Genetic markers (fingerprints) and DNA sequence. Repeats: Microsatellites (repeats of 1-12 nucleotides) Minisatellites (> 12) –Other types of variation Genome rearrangements Single nucleotide mutations

Microsatellites and disease Huntington’s disease –Huntingtin gene of unknown (!) function –Repeats #: 6-35: normal; : disease Friedrich ataxia disease –GAA repeat in non-coding (intron) region –Repeats #: 7-34: normal; 35 up: disease –Repeat expansion reduces expression of frataxin gene

SNP - Single Nucleotide Polymorphism Definition –SNP and phenotype Occurrence in genome –Rarity of most SNPs (agrees with neutral molecular evolutionary theory) –SNPs in human population: High variance in genome! Detection of SNPs: Hybridization Inter-genic regionsCoding regions Every 1400bpEvery 1430bp

Sickle cell anemia SNP on Beta Globin gene, which is recessive: 2 faulty copies: red blood cells change shape under stress - anemia 1 faulty copy: red blood cells change shape under heavy stress – but gives resistance to malaria parasite Sickle looks like this:

SNPs and haplotypes Passengers and their evolutionary vehicles

SNP - Phase inference In the data from sequencing the genome the origin of SNP is scrambled GTGT... CT AC GT... GAGA... CTGACGGT CTTACAGT CTGACAGT CTTACGGT... chromosome Possibility 1Possibility 2 Which SNPs are on the same chromosome (are in phase)?

SNP – phase inference Determining the parent of origin for each SNP GG In this case: GAGA... CT AC GT... CGCG CTCT A GTGT GAGA TA Phase inference – the reason why many SNPs sequencing is done for child and two parents.

Linkage Disequilibrium, intro How hard is it to break a chromosome An allele/trait/SNP A and a are on the same position in genome (locus), thus on a single chromosome an individual can have either of them – but not both –f A - frequency of occurrences of trait A in population –f a = 1- f A –f B, f b = 1 - f B are frequency occurrences of B and b Probabilities of occurences of both traits on the same chromosome: f AB f Ab f aB f ab LD and genomic recombination AB Ab aB ab

Linkage Disequilibrium, calculation When these alleles are not correlated we expect them to occur together by chance alone: f AB = f A f B f Ab = f A f b f aB = f a f B f ab = f a f b But if A and B are occurring together more often (disequilibrium state), we can write f AB = f A f B + D f Ab = f A f b - D f aB = f a f B - D f ab = f a f b + D where D is called the measure of disequlibrium Of course from definitions above we have D = f AB - f A f B

How can we use it? Phase inference tells us how SNPs are organized on chromosome Linkage disequilibrium measures the correlation between SNPs

Back to SNPs Daly et al (2001), Figure 1

Haplotypes - vehicles for SNPs Daly et al (2001) were able to infer offspring haplotypes largely from parents. They say that “it became evident that the region could be largely decomposed into discrete haplotype blocks, each with a striking lack of diversity“ The haplotype blocks: –Up to 100kb –5 or more SNPs For example, this block shows just two distinct haplotypes accounting for 95% of the observed chromosomes

Haplotypes on the genome fragment a) Observed haplotypes with dotted lines wherever probability of switching to another line is > 2% b) Percent of explanation by haplotypes c) Contribution of specific haplotypes

Another genetic test Does haplotypes exist? -Each row represents an SNP -Blue dot = major -yellow = minor -Each column represents a single chromosome -The 147 SNPs are divided into 18 blocks defined by black lines. -The expanded box on the right is an SNP block of 26 SNPs over 19kb of genomic DNA. The 4 most common of 7 different haplotypes include 80% of the chromosomes, and can be distinguished with 2 SNPs

How much SNPs we can ignore? …and still predict haplotypes with high accuracy?

Literature Gibson, Muse „A Primer of Genome Science” N Patil et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21 Science : M J Daly et al. High-resolution haplotype structure in the human genome Nat. Genet :