Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 / 30 Data Mining with BioMart www.ensembl.org/biomart/martview www.biomart.org/biomart/martview.

Similar presentations


Presentation on theme: "1 / 30 Data Mining with BioMart www.ensembl.org/biomart/martview www.biomart.org/biomart/martview."— Presentation transcript:

1 1 / 30 Data Mining with BioMart www.ensembl.org/biomart/martview www.biomart.org/biomart/martview

2 2 / 30 What is BioMart? A data export tool A quick table generator A web interface to mine Ensembl data

3 3 / 30 BioMart- Data mining BioMart is a search engine that can find multiple terms and put them into a table format. Such as: mouse gene (IDs), chromosome and base pair position No programming required!

4 4 / 30 General or Specific Data-Tables All the genes for one species Or… only genes on one specific region of a chromosome Or… make BioMart select genes (I.e. all transcripts that match a microarry probe set, GO term, or InterPro domain).

5 5 / 30 Results Tables or sequences

6 6 / 30 The First Step: Choose the Dataset Dataset: Current Ensembl, Human genes

7 7 / 30 The Second Step: Filters Filters: Define a gene set

8 8 / 30 Attributes attach information Attributes: Determine output columns

9 9 / 30 Query For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s)

10 10 / 30 Query: For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s) In the query: Filters: what we know Attributes: what we want to know.

11 11 / 30 Query: For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s) In the query: Filters: what we know Attributes: what we want to know.

12 12 / 30 Query: For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s) In the query: Filters: what we know Attributes: what we want to know

13 13 / 30 A Brief Example Use the current Ensembl (archives are also available) Select Homo sapiens genes

14 14 / 30 Select the Genes with Filters Expand the GENE panel to enter in the gene ID(s). Expand the ‘GENE’ panel. Click Filters

15 15 / 30 Filters (and Count) Click “Count” to see if genes passed through your filters. Change this to HGNC curated name. Enter “CFTR” in the box.

16 16 / 30 Attributes (Output Options) Click on ‘Attributes’ ‘Attributes’ allows you to output information.

17 17 / 30 Attributes (Output Options) Select ‘EntrezGene ID’

18 18 / 30 Attributes (Output Options) Select the Affy Platform ‘HG U133-PLUS-2’ in the ‘Microarray’ section

19 19 / 30 The Results Table - Preview For the full result table: click “Go” or View “ALL” rows.

20 20 / 30 Full Result Table Ensembl Gene ID for CFTR Ensembl Transcript IDs EntrezGene ID Affy HG probeset

21 21 / 30 Other Export Options (Attributes)  Sequences: UTRs, flanking sequences, cDNA and peptides, etc  Gene IDs from Ensembl and external sources (MGI, Entrez, etc)  Microarray data  Protein Functions/descriptions (Interpro, GO)  Orthologous gene sets  SNP/ Variation Data

22 22 / 30 BioMart around the world… BioMart started at Ensembl… To where has it travelled?

23 23 / 30 Central Portal www.biomart.org

24 24 / 30WormBase

25 25 / 30HapMap Population frequencies Inter- population comparisons Gene annotation

26 26 / 30 DictyBase

27 27 / 30 GRAMENE www.gramene.org

28 28 / 30 The Potato Center

29 29 / 30 How to Get There http://www.biomart.org/biomart/martview http://www.ensembl.org/biomart/martview Or click on ‘BioMart’ from Ensembl

30 30 / 30 Worked Example Follow the worked example on pg 26 Then, do the exercises on pg 34 (answers on pg 37) This module should do the following: Show you how to export multiple data types from Ensembl for gene IDs or chromosomal regions.

31 31 / 30 Ensembl Core Databases Relational Database Normalised Each data point stored only once Therefore: Quick updates Minimal storage requirements But: Many tables Many joins for complicated queries Slow for data mining applications

32 32 / 30 Normalised Schema gene_idgene.symbol 9970SMAD1 1712SMAD2 8240SMAD3 1967SMAD4 …… gene_idtranscript 9970ENST00000302085 1712ENST00000262160 1712ENST00000356825 8240ENST00000327367 1967ENST00000342988 …… gene_idstable_id 9970ENSG00000170365 1712ENSG00000175387 8240ENSG00000166949 1967ENSG00000141646 ……

33 33 / 30 BioMart Database Data warehouse De-normalised Query-optimised Therefore: Fast and flexible Ideal for data mining But: Tables with apparent “redundancy” Needs rebuilding from scratch for every release from normalised core databases

34 34 / 30 De-Normalised Schema gene_idtranscript_idgene.symbol ENSG00000170365ENST00000302085SMAD1 ENSG00000175387ENST00000262160SMAD2 ENSG00000175387ENST00000356825SMAD2 ENSG00000166949ENST00000327367SMAD3 ENSG00000141646ENST00000342988SMAD4 ………

35 35 / 30 SPECIES FOCUS REGION SNP PROTEIN HOMOLOGY GENE EXPRESSIONREFSEQ INTERPRO GO SWISSPROT EMBL AFFYMETRIX FASTA FILE EXCEL TEXT GTF HTML DATASETFILTERATTRIBUTES Information Flow REGION SNP PROTEIN HOMOLOGY GENE EXPRESSION


Download ppt "1 / 30 Data Mining with BioMart www.ensembl.org/biomart/martview www.biomart.org/biomart/martview."

Similar presentations


Ads by Google