Problem Set I review BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD
Tissue Specificity & Top Tissues Life is a complex orchestration of genes to be expressed at the right time, place, and level. Basic cellular functions require the expression of certain genes in all cells and tissues (that is, in a ubiquitous manner) while specialized functions require restricted expression of other genes in a single or small number of cells and tissues (that is, tissue specific). Life is a complex orchestration of genes to be expressed at the right time, place, and level. Basic cellular functions require the expression of certain genes in all cells and tissues (that is, in a ubiquitous manner) while specialized functions require restricted expression of other genes in a single or small number of cells and tissues (that is, tissue specific).
Tissue Specificity vs Tissues with Most Frequent Expression Not always the same Not always the same Tissue specificity: tissues expressing the gene above the median value. OMIM – just lists a few where gene found microarray-based expression data microarray-based expression data See e.g. expressed sequence tag (EST)-based expression data expressed sequence tag (EST)-based expression data See Stanford Source, Unigene RT-PCR data RT-PCR data Literature, commercial software, no good databases
MTHFR: Lymphoma;; Cardiac. Muscle (next probe)
MTHFR: Pancreas, liver
Stanford Source: MTHFR: lymph
MTHFR: heart, lung GeneCards:
Stanford Source Calculation EST-example Clones for a gene were isolated from skeletal muscle (8 unique clones) and cardiac muscle (2 unique clones). Clones for a gene were isolated from skeletal muscle (8 unique clones) and cardiac muscle (2 unique clones). Number of all clones isolated from skeletal muscle: 16000, so frequency is 8/16000= Number of all clones isolated from skeletal muscle: 16000, so frequency is 8/16000= Number of cardiac muscle clones is 10000, so frequency is Number of cardiac muscle clones is 10000, so frequency is = = Normalized gene expression is calculated by dividing by Skeletal muscle = / = 71% Skeletal muscle = / = 71% Cardiac muscle = / = 29% Cardiac muscle = / = 29%
Tissue-Specificity Calculation 2 unique clones for gene X were isolated from cardiac muscle. Out of clones isolated from cardiac muscle, there are 9999 genes represented by only one clone and one gene represented by 2 clones. This gene is tissue-specific 2 unique clones for gene X were isolated from cardiac muscle. Out of clones isolated from cardiac muscle, there are 9999 genes represented by only one clone and one gene represented by 2 clones. This gene is tissue-specific
dbSNP queries SLC19A1[gene] AND human[orgn] SLC19A1[gene] AND human[orgn] SLC19A1[gene] AND human[orgn] AND snp[snp_class] SLC19A1[gene] AND human[orgn] AND snp[snp_class] SLC19A1[gene] AND human[orgn] AND "coding nonsynonymous"[FUNC] SLC19A1[gene] AND human[orgn] AND "coding nonsynonymous"[FUNC] SLC19A1[gene] AND human[orgn] AND "coding synonymous"[FUNC] SLC19A1[gene] AND human[orgn] AND "coding synonymous"[FUNC]
dbSNP queries ADRB1[gene] AND human[orgn] = 48 ADRB1[gene] AND human[orgn] = 48 ADRB1[gene] AND human[orgn] AND "snp"[SNP_CLASS] =40 ADRB1[gene] AND human[orgn] AND "snp"[SNP_CLASS] =40 ADRB1[gene] AND human[orgn] AND "in- del"[snp_class] = 5 ADRB1[gene] AND human[orgn] AND "in- del"[snp_class] = 5 ADRB1[gene] AND human[orgn] AND heterozygous[snp_class] = 0 ADRB1[gene] AND human[orgn] AND heterozygous[snp_class] = 0
ADRB1[gene] AND human[orgn] AND mixed[snp_class] = 0 ADRB1[gene] AND human[orgn] AND mixed[snp_class] = 0 ADRB1[gene] AND human[orgn] AND microsatellite[snp_class] = 3 ADRB1[gene] AND human[orgn] AND microsatellite[snp_class] = 3 ADRB1[gene] AND human[orgn] AND "multinucleotide polymorphism"[snp_class] = 0 ADRB1[gene] AND human[orgn] AND "multinucleotide polymorphism"[snp_class] = 0 ADRB1[gene] AND human[orgn] AND "named locus"[snp_class] = 0 ADRB1[gene] AND human[orgn] AND "named locus"[snp_class] = 0 ADRB1[gene] AND human[orgn] AND "no variation"[snp_class] = 0 ADRB1[gene] AND human[orgn] AND "no variation"[snp_class] = 0
ADRB1 SNP summary 48 SNPs 48 SNPs 40 true SNPs 40 true SNPs 5 insertion-deletions (in-dels) 5 insertion-deletions (in-dels) 3 microsatellites 3 microsatellites no other types no other types
Type of variation SNP[snp_class], True single nucleotide polymorphism SNP[snp_class], True single nucleotide polymorphism in-del, Insertion deletion polymorphism; ('-‘/’+’) in-del, Insertion deletion polymorphism; ('-‘/’+’) Heterozygous, Variation has unknown sequence composition but is observed to be heterozygous Heterozygous, Variation has unknown sequence composition but is observed to be heterozygous Microsatellite/simple sequence repeat Microsatellite/simple sequence repeat Named: Allele sequences defined by name tag instead of raw sequence, e.g., (Alu)/ Named: Allele sequences defined by name tag instead of raw sequence, e.g., (Alu)/ no-variation, invariant region in surveyed sequence no-variation, invariant region in surveyed sequence Multiple nucleotide polymorphism (all alleles same length, where length >1) Multiple nucleotide polymorphism (all alleles same length, where length >1)
Definitions Homozygote - has two identical alleles at a particular locus (for a given gene) Homozygote - has two identical alleles at a particular locus (for a given gene) Heterozygote - has two different alleles at a particular locus Heterozygote - has two different alleles at a particular locus Hemizygote – only one of a pair of genes for a specific trait. Example: male is hemizygote for the X-chromosome Hemizygote – only one of a pair of genes for a specific trait. Example: male is hemizygote for the X-chromosome
Definitions Heterozygous genotype = Occurs when the two alleles at a particular gene locus are different. A heterozygous genotype may include one normal allele and one mutation, or two different mutations. The latter is called a compound heterozygote. Heterozygous genotype = Occurs when the two alleles at a particular gene locus are different. A heterozygous genotype may include one normal allele and one mutation, or two different mutations. The latter is called a compound heterozygote.
Heterozygous SNP vs AVG. Heterozygozyty
More on dbSNP An ss number is the unique ID number assigned to each submitted SNP. Once aligned and processed, submissions are clustered and a “reference SNP cluster”, or a “refSNP” is created and given a unique rs ID number, An ss number is the unique ID number assigned to each submitted SNP. Once aligned and processed, submissions are clustered and a “reference SNP cluster”, or a “refSNP” is created and given a unique rs ID number,
Drugs Some proteins are drug targets. Some proteins are drug targets. Example: glimepiride (antidiabetic: targets KCNJ11 (blocker) (also, antagonist, agonist) Example: glimepiride (antidiabetic: targets KCNJ11 (blocker) (also, antagonist, agonist) Some drugs regulate activity of drugs indirectly. Some drugs regulate activity of drugs indirectly. Diazoxide activates KCNJ11 Diazoxide activates KCNJ11 Glucocorticoid decreases expression of Kcnj11 mRNA Glucocorticoid decreases expression of Kcnj11 mRNA Regulates binding of KCNJ11 Regulates binding of KCNJ11 Some drugs are even more indirectly associated with SNPs in proteins causing sensitivities Some drugs are even more indirectly associated with SNPs in proteins causing sensitivities
Haplotypes Haplotypes are groups of linked SNPs which are somewhat inherited in a linked fashion Haplotypes are groups of linked SNPs which are somewhat inherited in a linked fashion Haplotype blocks refer to sites of closely located SNPs which are inherited in blocks Haplotype blocks refer to sites of closely located SNPs which are inherited in blocks A set of closely linked genes that tends to be inherited together as a unit. Haplotype may refer to only one locus or to an entire genome A set of closely linked genes that tends to be inherited together as a unit. Haplotype may refer to only one locus or to an entire genome - the HapMap project - the HapMap project
Haplotype block names Sometimes different for different populations/families. Sometimes different for different populations/families. Still “in progress” Still “in progress” Sometimes linked via dbSNP (haplotype- tagged), available in other variation sites Sometimes linked via dbSNP (haplotype- tagged), available in other variation sites Haplotype analysis of ABCB1 revealed 2 major haplotypes, ABCB1*1 and ABCB1*13. ABCB1*13 contains T1236, T2677T, T3435, and 3 intronic variants. Haplotype analysis of ABCB1 revealed 2 major haplotypes, ABCB1*1 and ABCB1*13. ABCB1*13 contains T1236, T2677T, T3435, and 3 intronic variants.