1 / 30 Data Mining with BioMart www.ensembl.org/biomart/martview www.biomart.org/biomart/martview.

Slides:



Advertisements
Similar presentations
1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
Advertisements

Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Introduction to XHTML Programming the World Wide Web Fourth edition.
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Objectives: Generate and describe sequences. Vocabulary:
UNITED NATIONS Shipment Details Report – January 2006.
1 Hyades Command Routing Message flow and data translation.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Rhesy S.ppt proRheo GmbH
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
Excel Functions. Part 1. Introduction 2 An Excel function is a formula or a procedure that is performed in the Visual Basic environment, outside the.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
PP Test Review Sections 6-1 to 6-6
EU market situation for eggs and poultry Management Committee 20 October 2011.
Bright Futures Guidelines Priorities and Screening Tables
© Paradigm Publishing, Inc Access 2010 Level 1 Unit 1Creating Tables and Queries Chapter 2Creating Relationships between Tables.
Microsoft Access.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
2 |SharePoint Saturday New York City
Green Eggs and Ham.
VOORBLAD.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
Benchmark Series Microsoft Excel 2013 Level 2
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
LO: Count up to 100 objects by grouping them and counting in 5s 10s and 2s. Mrs Criddle: Westfield Middle School.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Januar MDMDFSSMDMDFSSS
Analyzing Genes and Genomes
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
ANSC644 Bioinformatics-Database Mining 1 ANSC644 Bioinformatics §Carl J. Schmidt §051 Townsend Hall §
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
Energy Generation in Mitochondria and Chlorplasts
RefWorks: The Basics October 12, What is RefWorks? A personal bibliographic software manager –Manages citations –Creates bibliogaphies Accessible.
© Paradigm Publishing, Inc Access 2010 Level 2 Unit 2Advanced Reports, Access Tools, and Customizing Access Chapter 8Integrating Access Data.
South Dakota Library Network MetaLib User Interface South Dakota Library Network 1200 University, Unit 9672 Spearfish, SD © South Dakota.

Genomic Innovations- Orthology Paralogy. Genomic innovation.
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Data Mining in Ensembl with BioMart Nov,
Data Mining in Ensembl with BioMart Giulietta Spudich.
Data Mining with BioMart
Presentation transcript:

1 / 30 Data Mining with BioMart

2 / 30 What is BioMart? A data export tool A quick table generator A web interface to mine Ensembl data

3 / 30 BioMart- Data mining BioMart is a search engine that can find multiple terms and put them into a table format. Such as: mouse gene (IDs), chromosome and base pair position No programming required!

4 / 30 General or Specific Data-Tables All the genes for one species Or… only genes on one specific region of a chromosome Or… make BioMart select genes (I.e. all transcripts that match a microarry probe set, GO term, or InterPro domain).

5 / 30 Results Tables or sequences

6 / 30 The First Step: Choose the Dataset Dataset: Current Ensembl, Human genes

7 / 30 The Second Step: Filters Filters: Define a gene set

8 / 30 Attributes attach information Attributes: Determine output columns

9 / 30 Query For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s)

10 / 30 Query: For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s) In the query: Filters: what we know Attributes: what we want to know.

11 / 30 Query: For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s) In the query: Filters: what we know Attributes: what we want to know.

12 / 30 Query: For the human CFTR gene, export the Entrez Gene ID(s) and matching Affy HG U133-PLUS-2 probeset(s) In the query: Filters: what we know Attributes: what we want to know

13 / 30 A Brief Example Use the current Ensembl (archives are also available) Select Homo sapiens genes

14 / 30 Select the Genes with Filters Expand the GENE panel to enter in the gene ID(s). Expand the ‘GENE’ panel. Click Filters

15 / 30 Filters (and Count) Click “Count” to see if genes passed through your filters. Change this to HGNC curated name. Enter “CFTR” in the box.

16 / 30 Attributes (Output Options) Click on ‘Attributes’ ‘Attributes’ allows you to output information.

17 / 30 Attributes (Output Options) Select ‘EntrezGene ID’

18 / 30 Attributes (Output Options) Select the Affy Platform ‘HG U133-PLUS-2’ in the ‘Microarray’ section

19 / 30 The Results Table - Preview For the full result table: click “Go” or View “ALL” rows.

20 / 30 Full Result Table Ensembl Gene ID for CFTR Ensembl Transcript IDs EntrezGene ID Affy HG probeset

21 / 30 Other Export Options (Attributes)  Sequences: UTRs, flanking sequences, cDNA and peptides, etc  Gene IDs from Ensembl and external sources (MGI, Entrez, etc)  Microarray data  Protein Functions/descriptions (Interpro, GO)  Orthologous gene sets  SNP/ Variation Data

22 / 30 BioMart around the world… BioMart started at Ensembl… To where has it travelled?

23 / 30 Central Portal

24 / 30WormBase

25 / 30HapMap Population frequencies Inter- population comparisons Gene annotation

26 / 30 DictyBase

27 / 30 GRAMENE

28 / 30 The Potato Center

29 / 30 How to Get There Or click on ‘BioMart’ from Ensembl

30 / 30 Worked Example Follow the worked example on pg 26 Then, do the exercises on pg 34 (answers on pg 37) This module should do the following: Show you how to export multiple data types from Ensembl for gene IDs or chromosomal regions.

31 / 30 Ensembl Core Databases Relational Database Normalised Each data point stored only once Therefore: Quick updates Minimal storage requirements But: Many tables Many joins for complicated queries Slow for data mining applications

32 / 30 Normalised Schema gene_idgene.symbol 9970SMAD1 1712SMAD2 8240SMAD3 1967SMAD4 …… gene_idtranscript 9970ENST ENST ENST ENST ENST …… gene_idstable_id 9970ENSG ENSG ENSG ENSG ……

33 / 30 BioMart Database Data warehouse De-normalised Query-optimised Therefore: Fast and flexible Ideal for data mining But: Tables with apparent “redundancy” Needs rebuilding from scratch for every release from normalised core databases

34 / 30 De-Normalised Schema gene_idtranscript_idgene.symbol ENSG ENST SMAD1 ENSG ENST SMAD2 ENSG ENST SMAD2 ENSG ENST SMAD3 ENSG ENST SMAD4 ………

35 / 30 SPECIES FOCUS REGION SNP PROTEIN HOMOLOGY GENE EXPRESSIONREFSEQ INTERPRO GO SWISSPROT EMBL AFFYMETRIX FASTA FILE EXCEL TEXT GTF HTML DATASETFILTERATTRIBUTES Information Flow REGION SNP PROTEIN HOMOLOGY GENE EXPRESSION