Supervisor: Yihong Jennifer Tan Eric Gähwiler Karim Hamidi

Slides:



Advertisements
Similar presentations
Basic Genetic Concepts & Terms
Advertisements

SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Chromosome structure and chemical modifications can affect gene expression
Naveen K. Bansal and Prachi Pradeep Dept. of Math., Stat., and Comp. Sci. Marquette University Milwaukee, WI (USA)
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
living organisms According to Presence of cell The non- cellular organism The cellular organisms According to Type the Eukaryotes the prokaryotes human.
Transcription The first step of gene expression – synthesis of RNA molecule.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Investigating the Importance of non-coding transcripts.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Chris Chander, Luke Adea BioSci D145 Feb. 12, 2015
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Identification of obesity-associated intergenic long noncoding RNAs
Kaitlyn Cook Carleton College Northfield Undergraduate Mathematics Symposium October 7, 2014 A METHOD FOR COMBINING FAMILY-BASED RARE VARIANT TESTS OF.
Genetic Analysis in Human Disease. Learning Objectives Describe the differences between a linkage analysis and an association analysis Identify potentially.
EOC Mutation review 2013 TEK 6E Concepts ProceduresApplication.
Geuvadis RNAseq analysis at UNIGE Analysis plans
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Single Nucleotide Polymorphisms Mrs. Stewart Medical Interventions Central Magnet School.
AP Biology From Gene to Protein How Genes Work AP Biology What do genes code for? proteinscellsbodies How does DNA code for cells & bodies?  how are.
From Gene to Protein Chapter 17.
Allele. Alternate form of a gene gene variant autosome.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Chapter 10 Transcription RNA processing Translation Jones and Bartlett Publishers © 2005.
1 Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine Chenghai Xue, Fei Li, Tao He,
What is Genetic Research?. Genetic Research Deals with Inherited Traits DNA Isolation Use bioinformatics to Research differences in DNA Genetic researchers.
5.3 – Advances in Genetics Trashketball!. Selecting organisms with desired traits to be parents of the next generation is… A. Inbreeding A. Inbreeding.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Online Mendelian Inheritance in Man (OMIM): What it is & What it can do for you Knowledge Management & Eskind Biomedical Library January 27, 2012 helen.
Chapter 21 Eukaryotic Genome Sequences
From DNA to Protein Chapter 5. Ricin and your Ribosomes.
Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희.
The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
Genes Traffic lights quiz Hold up the coloured card that matches the correct answer you see on the screen.
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Geuvadis Analysis Meeting 16/02/2012 Micha Sammeth CNAG – Barcelona.
Chapter 2 From Genes to Genomes. 2.1 Introduction We can think about mapping genes and genomes at several levels of resolution: A genetic (or linkage)
In The Name of GOD Genetic Polymorphism M.Dianatpour MLD,PHD.
Deletions Project Tom Carpel CS CM124 6/11/2008.
Chapter 12 Assessment How could manipulating DNA be beneficial?
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Genetics Intro. Phenotype Observable, Physical traits (ear shape, petal color) these are expressed biologically. –Offspring usually have a phenotype similar.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Let’s see what you know! 23,000, microscope, nucleus, chromosomes, divide, DNA, proteins, code, genes __________.
Living Things Inherit Traits In Patterns Chapter 4.1 Pages
CAMPBELL BIOLOGY IN FOCUS © 2014 Pearson Education, Inc. Urry Cain Wasserman Minorsky Jackson Reece Lecture Presentations by Kathleen Fitzpatrick and Nicole.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
Brendan Burke and Kyle Steffen. Important New Tool in Genomic Medicine GWAS is used to estimate disease risk and test SNPs( the most common type of genetic.
MOLECULAR MARKERS.
6.3 Mendel and Heredity.
The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans Science Volume 348(6235): May 8, 2015 Published by AAAS.
DNA & protein synthesis Chapter 9
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Volume 18, Issue 9, Pages (February 2017)
Identification and Validation of Genetic Variants that Influence Transcription Factor and Cell Signaling Protein Levels  Ronald J. Hause, Amy L. Stark,
Notes – Genetics 1.
CaQTL analysis identifies genetic variants affecting human islet cis-RE use. caQTL analysis identifies genetic variants affecting human islet cis-RE use.
Presentation transcript:

Identification of Auto-Immune disease associated Intergenic Long noncoding RNAs Supervisor: Yihong Jennifer Tan Eric Gähwiler Karim Hamidi Virginie Ricci

Plan Introduction - LincRNAs Project Interests Datasets Identification Conservation and functions Project Interests Datasets Reminder of our last presentation New project goals Tools and Methods Data Manipulations Correlation Test Multiple Correction Test Results Conclusions Prospective Questions

LincRNA Identification Long Intergenic Non-coding RNAs > 200 base pairs Not coding for proteins No apparent open reading frame Similarities with mRNAs: Cap, polyA tails, splice junction Transcribed by Pol II Differences from mRNAs: More lowly expressed More tissues-specific Many are found in the nucleus, although some are found in the cytoplasm

lincRNA conservation and functions Some lincRNAs are conserved in species Examples of lincRNA functions: Does it mean that the expression is conserved in particular tissues????

Project interests Human genome completely sequenced in 2003 Use genome sequencing data to understand human biology Identify links between lincRNAs and various human phenotypes lincRNAs and disease traits

Dataset – LincRNAs & Genotype LCL (lymphoblastoid cells line) of 373 European individuals from the Geuvadis dataset Expression levels of lincRNAs (Gencode) RNA sequencing measured in RPKM Genotypes of the individuals SNP sequencing e.x. C/C, C/T, T/T

Reminder Establish a correlation between the expression of lincRNAs and genetic variants recently linked to obesity and BMI – cis-eQTL analysis Wrong tissues used to study BMI traits Ajouter le plot ou il n’y a pas de corrélation

News Goals New goals Determine whether long intergenic noncoding RNAs play a functional role in Auto-Immune traits and diseases Establish a correlation between the lincRNA expression level and genetic variant associated to immune traits - cis-eQTL analysis

Dataset - SNPs Auto-Immune traits associated SNPs NIH: In genetic epidemiology, a genome-wide association study (GWA study, or GWAS), also known as whole genome association study (WGA study, or WGAS) or common-variant association study (CVAS), is an examination of many common genetic variants in different individuals to see if any variant is associated with a trait. GWAS typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major diseases. These studies normally compare the DNA of two groups of participants: people with the disease (cases) and similar people without (controls). 

Dataset Crohn's disease Hypothyroidism Multiple sclerosis Psoriatic arthritis Rheumatoid arthritis Systemic lupus erythematosus and Systemic sclerosis Type 1 diabetes Only SNPs associated to the traits with a p.value < 5x10-8 Explain each disease and put some disgusting pictures stemic sclerosis (SSc) is a systemic connective tissue disease. Characteristics of systemic sclerosis include essential vasomotor disturbances; fibrosis; subsequent atrophy of the skin (see the image below), subcutaneous tissue, muscles, and internal organs (eg, alimentary tract, lungs, heart, kidney, CNS); and immunologic disturbances accompany these findings. Multiple sclerosis (MS), also known as disseminated sclerosis or encephalomyelitis disseminata, is an inflammatory disease in which the insulating covers of nerve cells in the brain and spinal cord are damaged. This damage disrupts the ability of parts of the nervous system to communicate, resulting in a wide range of signs and symptoms,[1][2] including physical, mental,[2] and sometimes psychiatric problems.[3] MS takes several forms, with new symptoms either occurring in isolated attacks (relapsing forms) or building up over time (progressive forms).[4] Between attacks, symptoms may disappear completely; however, permanent neurological problems often occur, especially as the disease advances.[4] Rheumatoid arthritis (RA) is a chronic, systemic inflammatory disorder that primarily affects joints.[1] It may result in deformed andpainful joints, which can lead to loss of function. The disease may also have signs and symptoms in organs other than joints. 579 SNPs associated to immune traits

Methodology Data collecting and manipulations Estimate correlation test between lincRNAs expression levels and genotypes of Auto-Immune diseases-SNPs – cis-eQTL Randomized multiple correlation test

+ Methodology (7256) Multiple test correction LincRNAs location SNPs location (579) lincRNA close to the SNPs (2409 pairs) Genotypes of the SNPs (402) + lincRNAs expression level (467) Pearsons’ Correlation Test Multiple test correction

Multiple Correlation Tests Multiple Test : Many genotype ~ many expressions levels 373 / gene Corresponding to do a correlation test for each expression levels and genotypes Multiple Test problem : For each individual correlation test  α error = 0.05 False Discovery Rate or FDR Alpha error = the probability to reject H0 if H0 is true… If we had 1000 H0s taht we tests the error alpha is multiplied by 1000 Because there is a sum of all the error alpha so we are no more at alpha = 0.05 for the «global» H0 (?)

Multiple Test correction 1) For each lincRNA :SNP pair: Randomize 373 lincRNA expression 1000 times Evaluate 1000 correlation tests with permuted data Store the maximum permuted correlation value 2) Obtain 95% quantile of the permuted correlation value (5%FDR) 3) Compare observed correlations with 5%FDR, and accept observed correlation values as significant only if it passes 5%FDR test. False discovery rate (FDR) is designed to control the proportion of false positives among the set of rejected hypotheses ® We don’t have to speak about the FDR because we don’t FDR. ?????? 1)We made 1000 correlation test with permuted data to find out if the value of the observed corraltion are significant 2)We tried to obtain the quantile 95% of the permuted values so that we can take in the last part… 3)Use the quantile 95% of the paermuted values as threshold, to only keep the significant value greater than 95% in the normal distribution. ????????

Results Gene name: ENSG00000224950 Chromosome 1 SNP name: rs2300747 Correlation coefficient: 0.210 Associated disease : Multiple sclerosis Corrected p.value: 0.079

Results Gene name: ENSG00000224950 Chromosome 1 SNP name: rs1335532 Correlation coefficient: 0.210 Associated disease : Multiple sclerosis Corrected p.value: 0.079

Visualization lincRNA (ENSG00000224950) rs1335532 rs2300747 http://www.carefecthomecareservices.com/blog/multiple-sclerosis-definition-causes-types-symptoms/

Results Gene name: ENSG00000258701 Chromosome 14 SNP name: rs2841277 Correlation coefficient: -0.220 Associated disease : Rheumatoid arthritis Corrected p.value: 0.055 Negative correlation

Visualization Visualisation tool lincRNA (ENSG00000258701) Rheumatoid arthritis rs2841277 Is it always necessary? http://fr.wikipedia.org/wiki/Polyarthrite_rhumato%C3%AFde#/media/File:Rheumatoid_Arthritis.JPG

Conclusions No correlation at FDR < 5% Found 2 LincRNAs whose expression levels is correlated with SNPs associated with Multiple sclerosis & Rheumatoid arthritis FDR < 10% With the FDR at 10% it means that we don’t have a clear correlation but indicates us that there is maybe something further analyse (to look after)

Prospects Using other datasets, see if can reproduce the same results Possibly in same or different tissues (i.e. neuronal tissues, skin etc.) Further analyze the characteristics and functions of the lincRNAs Whether there is an implication of the lincRNA in respective diseases Multiple Sclerosis Rheumatoid arthritis Roles of lincRNAs

Feedback Difficulties Learnings Keep a global vision of the project Data manipulations Find an error in many code line Learnings LincRNAs R – programmation Methodologyies in a study

Thank you for your attention Questions? Thank you for your attention