Evolution Aristotle: classification of animals theories on change

Slides:



Advertisements
Similar presentations
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Advertisements

Speaker: HU Xue-Jia Supervisor: WU Yun-Dong Date: 19/12/2013.
Outline to SNP bioinformatics lecture
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Whole Genome Polymorphism Analysis of Regulatory Elements in Breast Cancer AAGTCGGTGATGATTGGGACTGCTCT[C/T]AACACAAGCGAGATGAAGAAACTGA Jacob Biesinger Dr.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
PolyPhen and SIFT: Tools for predicting functional effects of SNPs Epi 244 Spring 2009 Sam S. Oh.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Computational Molecular Biology Biochem 218 – BioMedical Informatics Simple Nucleotide.
Geuvadis RNAseq analysis at UNIGE Analysis plans
01/03/2013UK NEQAS UV Participants Meeting 2013 in a quality perspective.
MES Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
Aims and objectives of the workshop David Moore. Aims Classification of variants is subjective and NEQAS results suggest this is not a major problem To.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Protein and RNA Families
Eukaryotic Genomes  The Organization and Control of Eukaryotic Genomes.
MCDB 4650 Developmental Control of Gene Expression.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Bioinformatics and Computational Biology
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
How can we find genes? Search for them Look them up.
Single nucleotide polymorphisms and Large scale variation
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
? ? Individual 1Individual 2 1. Questions This is a pedigree for a disease involving a mutation within an imprinted gene. The disease manifests only when.
Regulation of Gene Expression
Genomic Analysis: GWAS
Lesson Four Structure of a Gene.
Gil McVean Department of Statistics
Mistakes in the code.
Lesson Four Structure of a Gene.
Complex disease and long-range regulation: Interpreting the GWAS using a Dual Colour Transgenesis Strategy in Zebrafish.
Which of the following would be the corresponding amino acid sequence that would be translated as a protein product of the following segment of DNA? A.
Functional Mapping and Annotation of GWAS: FUMA
Basics of Comparative Genomics
Bioinformatic Tools for Epigenetic Research
School of Pharmacy, University of Nizwa
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Gene Hunting: Design and statistics
Genome Biology & Applied Bioinformatics Mehmet Tevfik DORAK, MD PhD
Prediction of Regulatory Elements for Non-Model Organisms Rachita Sharma, Patricia.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Post-GWAS and Mechanistic Analyses
Genome Center of Wisconsin, UW-Madison
Effect of polymorphisms on transcriptional regulation in mice
What are the Patterns Of Nucleotide Substitution Within Coding and
Genomes and Their Evolution
There are four levels of structure in proteins
Concept 18.2: Eukaryotic gene expression can be regulated at any stage
Ensembl Genome Repository.
From Prescription to Transcription: Genome Sequence as Drug Target
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
“TaqMan genotyping Assay’’
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Mutations in a Novel Gene with Transmembrane Domains Underlie Usher Syndrome Type 3  Tarja Joensuu, Riikka Hämäläinen, Bo Yuan, Cheryl Johnson, Saara.
Pharmacogenomic variability and anaesthesia
One SNP at a Time: Moving beyond GWAS in Psoriasis
Basics of Comparative Genomics
BF528 - Whole Genome Sequencing and Genomic Variation
Mutations.
SNPs and CNPs By: David Wendel.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

Evolution Aristotle: classification of animals theories on change (change is the actuality of the potential) Darwin: descent with modification natural selection There is no evolution without change

Evolving nomenclature change in DNA code = genetic variation change with respect to what? any consequence? Mutations Single Nucleotide Polymorphism SNPs Deletion/insertion polymorphism DIPs Short Nucleotide Polymorphism SNPs Short Nucleotide Variants SNVs Short Genetic Variants

Definitions pol·y·mor·phism (pl-môrfzm) n. 1. Biology The occurrence of different forms, stages, or types in individual organisms or in organisms of the same species, independent of sexual variations. 2. Chemistry Crystallization of a compound in at least two distinct forms. Also called pleomorphism. var·i·ant (vâr-nt, vr-) adj. 1. Having or exhibiting variation; differing. 2. Tending or liable to vary; variable. 3. Deviating from a standard, usually by only a slight difference. n. Something that differs in form only slightly from something else, as a different spelling or pronunciation of the same word.

Human Genome Project ENCODE project HapMap project SNP consortium Individual human genomes James Watson, Craig Venter, 3 asian gentlemen

Evolving SNV analysis needs Single SNP Millions of SNPs How to structure the analysis is based on the same theories… It’s a question of scale and heuristics Finding SNPs in single gene sequence Finding SNPs in GWAS studies, other exome sequencing etc…

Calling SNPs in NGS Polymorphisms with respect to a reference genome Challenging because of alignment errors, variable depth of coverage Accuracy is essential – diagnostics, risk assessment False positives and false negatives both a problem Given 1% sequencing error, how many high quality reads do we need to call a variant Quality scores differ per experiment The tools we use should have prior knowledge of known SNPs and their relevance to our question, ie causing disease or not

Prioritization of SNPs You have millions How do you know which are important for your research? First let’s look at what SNPs can do…

So you have a SNP imagine Is it associated with disease? If so, why? Is it to do with protein function or transcriptional regulation or both, or none, or what? If none of the above, then why is it associated with disease? how do you begin to imagine its function? imagine

SNP function prediction (summary) (in coding sequence) Protein Function Ligand binding affinity Co-factor binding affinity targeting to different cellular compartment (in coding or non-coding sequence) Gene Processing Transcriptional regulation Translational regulation Splicing

Assessment of SNP Function 3/14/2018 Position of SNP dbSNP or new SNP: first identify location In a coding sequnce: non-synonymous Protein Data Bank , PolyPhen UniProt, PsiPred (secondary structure prediction tool) ProSite, InterPro Done individually, or incorporated into software to scale up for high throughput Check SNP position at dbSNP If it is in a coding sequence of a gene Is it a synonymous SNP? If yes, then it is probably affecting something other than protein function, treat it as SNP in UTR (below). If it is a non-synonymous SNP, check amino acid substitution check conservation of domain in UniProt and Pfam see if there is a 3D structure (at the Protein Data Bank) If yes, conduct analysis in PolyPhen If no, conduct secondary structure analysis on the domain as defined by Uniprot or Interpro If it is in a UTR or just upstream check if it is on a known regulatory element transcription factor binding site (TRANSFAC) miRNA (miRNA registry) alternative transcriptional start sites (DBTSS) 10

Example: AGT & Hypoxaluria

SNP mutation causes disease CCA > CTA => Proline > Leucine (P11L) C L: Leu N C P: Pro

Two more in AGT Gly82Glu blocks binding to cofactor E: Glu O Gly82Glu blocks binding to cofactor G: Gly H Gly41Arg disrupts intermonomer interactions C R: Arg N G: Gly H

Assessment of SNP Function - I 3/14/2018 Position of SNP In CDS: non-synonymous Protein Data Bank , PolyPhen UniProt, PsiPred ProSite, InterPro Upstream of CDS or in CDS and synonymous SignalP, ProSite, rate of processing? TRANSFAC DBTSS NXSensor Check SNP position at dbSNP If it is in a coding sequence of a gene Is it a synonymous SNP? If yes, then it is probably affecting something other than protein function, treat it as SNP in UTR (below). If it is a non-synonymous SNP, check amino acid substitution check conservation of domain in UniProt and Pfam see if there is a 3D structure (at the Protein Data Bank) If yes, conduct analysis in PolyPhen If no, conduct secondary structure analysis on the domain as defined by Uniprot or Interpro If it is in a UTR or just upstream check if it is on a known regulatory element transcription factor binding site (TRANSFAC) miRNA (miRNA registry) alternative transcriptional start sites (DBTSS) Is it in a regulatory element? 14

Translation initiation site Initiation codon ATG 5’UTR Translation initiation site Initiation codon ATG promoter Exon 1 Exon 2 5’ TSS Transcriptional Start Site 3’ promoter Exon 1 Exon 2 Transcription factor binding sites TFBSs

SNP in a regulatory element TFBS ACAGTCGTAAGGCTGATTGGCTGGATAGCAGTACG ACAGTCGTAAGGCTAATTGGCTGGATAGCAGTACG Single nucleotide polymorphism May disrupt TF binding and therefore functionality

Example: CYP2E1 SNP ATG TSS Track from DBTSS

Nucleosomes

Assessment of SNP Function - II 3/14/2018 In non-coding sequence First, assess conservation TRANSFAC miRNA registry Repeatmasker Alternative splicing HapMap Is it in a regulatory element? Check SNP position at dbSNP If it is in a coding sequence of a gene Is it a synonymous SNP? If yes, then it is probably affecting something other than protein function, treat it as SNP in UTR (below). If it is a non-synonymous SNP, check amino acid substitution check conservation of domain in UniProt and Pfam see if there is a 3D structure (at the Protein Data Bank) If yes, conduct analysis in PolyPhen If no, conduct secondary structure analysis on the domain as defined by Uniprot or Interpro If it is in a UTR or just upstream check if it is on a known regulatory element transcription factor binding site (TRANSFAC) miRNA (miRNA registry) alternative transcriptional start sites (DBTSS) 22

Prioritization of SNPs You have millions How do you know which are important for your research? How do (can you?) you implement this into a pipeline so you can do thousands at once? How can you come up with strategies to prioritise?

Statistical genetics If a SNV is present in all members of the family, affected and not, then it is to do with something innocuous. Some methods are based on how common these variants are in families. ie shared ancestral variants and genetic linkage co-segregation Need pedigree haplotype information Mostly used in GWAS studies BEAGLE, GERMLINE, PLINK IBD, MERLIN

Several Tools Out There For example: SeattleSeq dbNSFP built into other NGS analysis software New ideas continue to emerge…

The Plot Thickens…

If you Google directly to dbSNP 10Nov2015

The NCBI homepage: if you go to dbSNP from here

You get this: but no worries, both access the same underlying database.

Combining gene expr. & variations eQTL: expression quantitative trait locus Correlation between gene expr. and freq. of variation Simple linear regression (matrixeQTL) Significance is assessed by p-value