BIF-30806 Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis.

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
RNA-seq Analysis in Galaxy
NCBI resources III: GEO and expression data analysis Yanbin Yin Fall
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
WormBase Workshop: 2015 International C. elegans Meeting Tools & Resources InterMine / WormMine – Chris Grove JBrowse – Scott Cain The WormBase Ontology.
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Bacterial Genome Assembly | Victor Jongeneel Radhika S. Khetani
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
An Introduction to RNA-Seq Transcriptome Profiling with iPlant
Expression Analysis of RNA-seq Data
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.
BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis Species: C. Elegans Project: Advanced.
Networks and Interactions Boo Virk v1.0.
Copyright OpenHelix. No use or reproduction without express written consent1.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
An Introduction to RNA-Seq Transcriptome Profiling with iPlant.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Introduction to RNA-Seq
Data Analysis Project Advanced Bioinformatics BIF
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
Galaxy – Set up your account. Galaxy – Two ways to get your data.
RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.
Computing Co-Expression Relationships Wen-Dar Lin.
Introduction to RNAseq
Comparative transcriptomic analysis of fungi Group Nicotiana Daan van Vliet, Dou Hu, Joost de Jong, Krista Kokki.
The iPlant Collaborative
An Introduction to RNA-Seq Transcriptome Profiling with iPlant (
CBioPortal Web resource for exploring, visualizing, and analyzing multidimentional cancer genomics data.
Comparative transcriptomics of fungi Group Nicotiana Daan van Vliet, Dou Hu, Joost de Jong, Krista Kokki.
Accessing and visualizing genomics data
Network construction and exploration using CORNET and Cytoscape - Excercises SPICY WORKSHOP Wageningen, March 8 th 2012 Stefanie De Bodt.
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Group Medicago Basic Project: Gene expression in yeast Advanced Bioinformatics.
Overview of Genomics Workflows
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Case study: Saccharomyces cerevisiae grown under two different conditions RNAseq data plataform: Illumina Goal: Generate a platform where the user will.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Transcriptomics History and practice.
Introductory RNA-seq Transcriptome Profiling
NGS File formats Raw data from various vendors => various formats
GCC Workshop 9 RNA-Seq with Galaxy
Networks and Interactions
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
Advanced Bioinformatics
Using ArrayExpress.
Figure S2 A B Log2 Fold Change (+/- cAMP) Transcriptome (9hr)
How to store and visualize RNA-seq data
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
Introductory RNA-Seq Transcriptome Profiling
Kallisto: near-optimal RNA seq quantification tool
The PATRIC RNASeq Service
ID Mapping tools: Converting Accessions between Databases
Martijn Masoed Nick Rico
Transcriptomics History and practice.
Additional file 2: RNA-Seq data analysis pipeline
Transcriptomics – towards RNASeq – part III
RNA-Seq Data Analysis UND Genomics Core.
Project progress Brachypodium Rodenburg Wang Muminov Karrenbelt.
Presentation transcript:

BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis

Species: Caenorhabditis elegans Nematode worm Genome of ~100M bp (completed 2002) ~20,000 genes

Project choice: Advanced Project Investigation of differences in gene expression over multiple conditions

Project Overview Dataset Preparation Transcriptome Construction Pipeline Differentially Expressed Genes Gene Function Biological Explanation Co-expressed Genes Modules Functional Description & Explanation Module Conservation b/w species Gene Expression (Basic Project) Relationship to Transcript Properties Visualisation of Interaction Network

Datasets to use We will use four different conditions, corresponding to four different life-stages of the organism (L2, L3, L4 & YA) For each life-stage, there are 2-3 datasets (runs) of transcript reads, available on the NCBI SRA online database. Reference Genome also required

Dataset preparation.sra files are first converted to.fastq files via fastq-dump.fastq run-files are merged together to create a single.fastq file per stadia, via command-line script (cat) Reference genome selected from Ensembl database, after a Ref. genome from Wormbase failed to work

Merged transcriptome file CuffLinks program CuffLinks program Pipeline Overview Transcript reads.fastq file Transcript reads.fastq file TopHat program TopHat program Reference genome.gtf file Reads splice-aligned to genome Reconstructed transcriptome Transcriptome quantified (4 files) CuffDiff program CuffDiff program Differential gene expression CuffMerge program CuffMerge program

Project Task Delegation (M) Determine most differentially expressed genes, and (M) Visualisation of these genes~Qianqian (M) Link these genes to the NCBI database to determine gene function ~David (M) Biological explanation of differential gene expression across the different conditions Differentially Expressed Genes (S) Find modules of co-expressed genes using WGCNA~Thijs (C) Visualisation of these genes in Cytoscape (S) Functional description and explanation of the identified modules (S) Conservation of modules in a closely related species Co- expressed Genes Modules (S) Determine most highly expressed genes, for all 4 conditions, and (C) Any correlation between gene expression and transcript properties ~Matthew (W) Visualisation of these genes in an interaction network Gene Expression (Basic Project)

Problem Management Problem Overloaded Server Online database/software unavailable Online queries too large (overloading APIs) Bad time management Solution Run overnight Wait; good time management Download database and run queries locally Good time management

Data Validation Run the pipeline on another closely- related organism for comparable results? Do the biological explanations of the gene expression make sense in light of the conditional contexts?