Working with gene lists: Finding data using GEO & BioMart June 5, 2014.

Slides:



Advertisements
Similar presentations
1 / 30 Data Mining with BioMart
Advertisements

Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
On line (DNA and amino acid) Sequence Information Lecture 7.
Genomic Innovations- Orthology Paralogy. Genomic innovation.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Microarray GEO – Microarray sets database
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
NCBI resources III: GEO and expression data analysis Yanbin Yin Fall
Midterm project Course: Statistics in Bioinformatics Date: 指導教授 : 陳光琦 學生 : 吳昱賢.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
Multiple sequence alignment
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Viewing & Getting GO COST Functional Modeling Workshop April, Helsinki.
Gene Expression Omnibus (GEO)
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
RNAseq analyses -- methods
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Finish up array applications Move on to proteomics Protein microarrays.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Module 4: Understanding KO designs Mark Thomas Wellcome Trust Sanger Institute.
NGS Bioinformatics Workshop 1.4 Tutorial - Comparative Sequence Analysis and Visualization March 29th, 2012 IRMACS Facilitator: Richard Bruskiewich.
Data Mining in Ensembl with BioMart Nov,
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
Copyright OpenHelix. No use or reproduction without express written consent1.
Gene Expression Omnibus (GEO)
数据库使用 杨建华 2010/9/28. Outline of the Topics UCSC and Ensembl Genome Browser (Blat vs Blast vs Blastz vs Multiz) 挖掘数据用 Table Browser 或 BioMart 用户友好化你的数据.
Data Mining in Ensembl with BioMart Giulietta Spudich.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
EBI is an Outstation of the European Molecular Biology Laboratory. PRIDE centric exercise: BioMart interface PRIDE team, Proteomics Services Group PANDA.
Cool BaRC Web Tools Prat Thiru. BaRC Web Tools We have.
EBI is an Outstation of the European Molecular Biology Laboratory. Gautier Koscielny VectorBase Meeting 08 Feburary 2012, EBI VectorBase Text Search Engine.
Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Getting GO: how to get GO for functional modeling Iowa State Workshop 11 June 2009.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Bioinformatics Shared Resource Introduction to Gene Expression Omnibus (GEO) bsrweb.sanfordburnham.org
Improving gene expression similarity measurement using pathway-based analytic dimension Changwon Keum BMDRC.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
GEO (Gene Expression Omnibus) Deepak Sambhara Georgia Institute of Technology 21 June, 2006.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Getting GO annotation for your dataset
Simon v1.0 Motif Searching Simon v1.0.
Data Mining with BioMart
Large Scale Annotation of Genomic Datasets with Genephony
Gene Expression Omnibus (GEO)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Ensembl Genome Repository.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Simon V Motif Searching Simon V
Welcome to the GrameneMart Tutorial
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Problems from last section
Welcome - webinar instructions
BioGRID: Biological General Repository for Interaction Datasets
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Working with gene lists: Finding data using GEO & BioMart June 5, 2014

Analyzing a gene list  With hundreds of genes but a limited budget and lab personnel, you need to prioritize the gene list to candidate genes for follow-up  Pick ones that are “interesting”  Known to be involved in other related processes but not (yet) in your process of interest  Has protein features which suggest a function in your process, but it has not been characterized  No known function or domain, but it shows up in other, related high-throughput experiments suggesting a key role in your process of interest

Our approach Analyzing gene lists by: 1. Finding overlap with other high-throughput experiments 2. Finding additional information using BioMart 1. Mouse/human homologs 2. Protein domain content 3. GO classification

GEO (gene expression omnibus)  GEO Datasets  Curated gene expression datasets i.e. there is backlog of experiments that haven’t made it into the database  Can search for experiments and conduct differential gene expression queries on some datasets  Can download datasets & do offline analyses  GEO Profiles  Profiles of expression data for genes

Why search GEO?  What other experiments have been done that are similar to yours?  GEO datasets  How do my genes of interest behave in other large scale experiments  GEO profiles

GEO Profile search Search on a gene name (C04F5.7):

GEO Dataset search “C. elegans”: 4434

GEO Dataset searches QueryTotal datasets C. elegans datasets C. elegans C. elegans AND response C. elegans AND host response55 C. elegans AND immune2420 C. elegans AND antimicrobial10994

Once dataset identified  Download data  SOFT format: tab-delimited data  Issues:  Not necessarily processed such that they have the ratios of experiment/control  If starting with raw data, may not be able to replicate exactly what authors did or lack expertise/software to generate a list of DE genes  Look for supplementary data from publication  Usually they provide a list of all DE genes

Choice of dataset for comparison In class demo

Biomart – EBI Ensembl  Use series of menus Data source – organism (genes, variation, ect) Filters -- reduce the number of results Attributes – what data to return  Can set up very precise and multilayered queries  Can query across multiple organisms  Simple query:  Given a list of gene IDs, you can obtain attributes or sequences for the entire list  Tools  ID converter – very useful, easy to use

Two sites for BioMart access

Database journal issue on BioMart

Filtering in BioMart

Attributes in BioMart

Biomart  Filters  C. elegans genes with a human homolog  Specify only genes with >= # isoforms  protein coding genes with a transmembrane domain  Attributes  Entrez Gene IDs, WormBase IDs, Affy IDs  Sequence data transcript, protein, UTRs, flanking regions, ect.

BioMart  In class demo

Today’s exercise  Compare current dataset from PLoS Pathogens paper to data from a different dataset  Identify & retrieve additional information about C. elegans genes using BioMart