Presentation is loading. Please wait.

Presentation is loading. Please wait.

Geuvadis WP4: RNA sequencing Progress, Aims and Data Tuuli Lappalainen University of Geneva Geuvadis Analysis Group Meeting, April 16, 2012, Geneva.

Similar presentations


Presentation on theme: "Geuvadis WP4: RNA sequencing Progress, Aims and Data Tuuli Lappalainen University of Geneva Geuvadis Analysis Group Meeting, April 16, 2012, Geneva."— Presentation transcript:

1 Geuvadis WP4: RNA sequencing Progress, Aims and Data Tuuli Lappalainen University of Geneva Geuvadis Analysis Group Meeting, April 16, 2012, Geneva

2 Genomics, meet transcriptomics RNA sequencing of ~500 individuals from the 1000 Genomes FIN GBR TSI CEU YRI Geuvadis in 1000G Phase1 TSI9392 GBR9686 FIN9589 CEU9279 YRI8977 TOTAL465423 Integrated haplotypes of SNPs, indels, structural variants of total ~ 13M variants + mRNAseq + miRNAseq

3 I have all these variants from my sequencing study but I don’t know what’s functional. Here’s a pretty good catalogue of regulatory variants. We can also start to predict functional consequences of novel variants based on their properties. We might want to do RNAseq on a big scale. What do we get out of it? How should we do it? At least we did lots of cool science. This is how we created the data and analyzed it. Why are we doing this? I want to use 1000g data in my research, but is there any functional data available? Yes – this the largest genome+transcriptome reference dataset thus far. You can use it in your own research (after our paper is out).

4 UU 48 72 Samples 1.Transformed lymphoblastoid cell lines from Coriell & UNIGE 2.Cell culture at ECACC: Cell pellets for RNA isolation + cell banks for all the partners 3.RNA extracted at UNIGE 4.Sequencing in 7 partner labs Randomization of the sample processing ICMB MPIMG HMGU UNIGE CRG/CNAG/USC LUMC 48 72 96 116+168

5 Sequencing mRNAseq: 2 x 75bp, minimum of 20M mapping reads per sample total ~15 billion mapping reads miRNAseq: 1 x 36bp, minimum of 3M total reads per sample total ~1 billion mapping reads All sequencing in HiSeq with the latest TruSeq kits standardization of the methods as much as possible

6 Progress and timeline 2010: Pilot of 5 samples, 7 labs 2011 2012 study design sample selection cell line shipments and growing RNA extraction pilot RNA extraction sequencing mappingQ C Thomas / Tuuli 10/1212 paper submission

7 Documentation : wiki http://www.geuvadis.org/group/geuvadis/wikishttp://www.geuvadis.org/group/geuvadis/wikis. Tech support from Gabrielle gabrielle.bertier@crg.eu Contents of the WP4 Wiki Analysis Analysis results, methods, etc Data storage Locations and descriptions of data files found in EBI (ENA/Arrayexpress or FTP site) The Wiki is only for sharing small result files, not actual data Partners and contact info WP4 participants Protocols Protocols, from cells to fastq files Samples Information of the samples included in the project, including sample lists for sequencing Teleconference minutes Presentations Presentation slides and abstracts documentation of any analysis that is used by the consortium is obligatory

8 Data storage : ftp ftp:ftp-private.ebi.ac.uk/upload/geuvadis/wp4_rnaseq/main_project/ Tech support from Natalja (natalja@ebi.ac.uk)natalja@ebi.ac.uk

9 Status of the data: mRNA Fastqs All filtered, uploaded to ftp, sample information sheets sorted out, checksums OK 464 samples in total (1 failed sequencing QC) Mapping bwa (Tuuli/Ismael) All done and uploaded to the ftp site GEM (Micha/Thasso/Paolo) GEM files are done. Bam conversion coming Quantifications Exon quantifications bwa: all done and uploaded to the ftp site GEM deconvoluted from flux: ready to upload? read counts: once the bams are done Transcript quantifications from flux: ready to upload? QC and normalization No sample swaps. 5 samples that show signs of cross-contamination. Expression outliers – soon QTL analysis needs normalization to remove technical variation

10 mRNA quality statistics

11

12

13 mRNA quality statistics: replicates EstimateStd.Errorz valuep (Intercept)4.257490.01379308.706<10^-16 HG003550.276180.0127121.729<10^-16 NA06986-0.650740.01041-62.518<10^-16 NA19095-0.228080.01125-20.265<10^-16 NA205270.222390.0125317.752<10^-16 lab1_2-0.193260.01556-12.417<10^-16 lab2-0.220910.01547-14.279<10^-16 lab3-1.171570.01329-88.144<10^-16 lab4-0.343130.01509-22.745<10^-16 lab50.021660.016351.3250.185 lab6-0.094540.01591-5.94210^-9 lab70.261470.017415.027<10^-16 reference: HG00117, lab1_batch1

14 mRNA quality statistics: all full-coverage samples

15 Status of the data: miRNA Fastqs All except 48 from Kiel uploaded to ftp, sample information sheets sorted out, checksums OK Processing of the data ongoing (Marc F) trimming, mapping, QC

16 Status of the data: genotypes 422 individuals from 1000g Phase 1 are OK genotypes in the final format uploaded to the ftp site imputation of the Phase 2 individuals issues either with the input haplotypes from 1000g or filtering of the reference panel… annotation of the variants most of the information from 1000g Functional Interpretation Group + additional info by Tuuli and Manny will be included in the vcf files, format customized from VAT and documented in the wiki VA=1:AlleleNumber C1orf159:GeneName ENSG00000131591.12:GeneID -:Strand nonsynonymous:Type 2/8:FractionOfTranscriptsAffected C1orf159-201:TranscriptName ENST00000294576.5:TranscriptID 23468_23597:ExonStartPosGenomic_ExonEndPosGenomic: 3/7:ExonNumber/TotalExonNumberInTranscript: 1035_944_315_R->Q_1035TranscriptLength_ PositionOfVariantInTranscript_ PositionOfAminoAcidInPeptide_ AminoAcidChange_ AltAlleleTranscriptLength

17 (Some of the) questions that we should address 1. How to do transcriptomics in a big scale? technical covariates, batch effects, replicates low-level data processing 2. SNP calling from RNAseq data 3. How does the transcriptome vary and interact? quantitative/qualitative mRNA variation population variation of miRNAs interactions (mRNA-miRNA), coexpression networks 4. Catalogue of genetic variants in 1000g that affect transcriptome variation common eQTLs, sQTLs, variation QTLs, loss of function variants… 5. What are the mechanisms underlying regulatory variants? Functional annotation of regulatory variants Mapping of causal regulatory variants 6. Interpretation: population and evolutionary genetic analysis, disease aspects….

18 The consortium UNIGE (Geneva) Manolis Dermitzakis Stylianos Antonarakis Tuuli Lappalainen Thomas Giger Emilie Falconnet Luciana Romano Alexandra Planchon Ismael Padioleau Alisa Yurovsky CRG/CNAG/USC (Barcelona) Xavier Estivill Ivo Gut Roderic Guigo Angel Carracedo Alvarez Gabrielle Bertier Micha Sammeth Thasso Griber Paolo Ribeca Pedro Ferreira Jean Monlong Esther Lizano Marc Friedländer Marta Gut Sergi Bertran Agullo ICMB (Kiel) Stefan Schreiber Philip Rosenstiel Matthias Barann MPIMG (Berlin) Hans Lehrach Ralf Sudbrak Marc Sultan Vyacheslav Amstislavskiy LUMC (Leiden) Gert-Jan van Ommen Peter ‘t Hoen Irina Pulyakhina UU (Uppsala) Ann-Christine Syvänen Olof Karlberg Jonas Almlöf Mathias Brännvall HMGU (Munich) Thomas Meitinger Tim Strom Thomas Wieland Thomas Schwarzmayr EBI Alvis Brazma Natalja Kurbatova Oxford University Manuel Rivas Massachusetts General Hospital Daniel McArthur ECACC Bryan Bolton Karen Ball Edward Burnett Jim Cooper Who is missing??


Download ppt "Geuvadis WP4: RNA sequencing Progress, Aims and Data Tuuli Lappalainen University of Geneva Geuvadis Analysis Group Meeting, April 16, 2012, Geneva."

Similar presentations


Ads by Google