Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.

Similar presentations


Presentation on theme: "Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University."— Presentation transcript:

1 Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University

2 MIAME (Minimum Information About a Microarray Experiment)  MIAME describes the Minimum Information About a Microarray Experiment that is needed to enable the interpretation of the results of the experiment unambiguously and potentially to reproduce the experiment. [Brazma et al, Nature Genetics]Brazma et al, Nature Genetics

3 MIAME  raw data (CEL or GPR files)  final processed (normalized) data  essential sample annotation including experimental factors and their values  experimental design including sample data relationships  sufficient annotation of the array  essential laboratory and data processing protocols

4 Databases using MIAME  ArrayExpress at EBI  GEO at NCBI  CIBEX at DDBJ

5 ArrayExpress http://www.ebi.ac.uk/microarray-as/aer/  Stores transcriptomics and related data  Data warehouse stores gene indexed expression profiles  In accordance with MGED recommendations: MIAME

6

7 ArrayExpress statistics  Experiment repository: 2,914 experiments (each with at least 6 microarrays) and growing  Expression profiles: including 267 experiments, 121,891 genes  Data warehouse updated everyday

8 Searching ArrayExpress  Keywords: breast cancer, cell cycle, … etc.  Accession numbers: E-XXXX-d, e.g. E-AFFY-1281, E-TIGR-372, … etc.  Secondary accession numbers: GEO accession, e.g. GSE5389.  Species names mainly in Latin names (e.g. Homo sapiens), common names may be used as well (e.g. human).

9

10 ArrayExpress interface

11

12 ArrayExpress Search/Browse Result Keyword: lung cancer

13 ArrayExpress Search/Browse Result Detailed view

14

15

16

17

18

19 Expression Profile results  Thumbnail view  BigPlot view  Gene ranking (most differentially expressed experiments are top ranked)  Similarity search: search genes with similar expression levels

20

21

22 Take a break …

23 Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/  Gene expression/molecular abundance repository  MIAME compliant  Supports browsing, query and retrieval

24

25 GEO record types  Platform  Sample  Series  DataSet  Profile

26 GEO Platform  Platform record defines the list of elements that may be detected and quantified in that experiment (e.g., cDNAs, oligonucleotide probesets)  Each Platform record is assigned a unique and stable GEO accession number (GPLxxx)  A Platform may reference many Samples that have been submitted by multiple submitters

27 GEO Sample  Sample record describes the conditions under which an individual Sample was handled, the manipulations it underwent, and the abundance measurement of each element derived from it  Each Sample record is assigned a unique and stable GEO accession number (GSMxxx)  A Sample entity must reference only one Platform and may be included in multiple Series

28

29 GEO Series  A Series record links together a group of related Samples and provides a focal point and description of the whole study  Series records may also contain tables describing extracted data, summary conclusions, or analyses  Each Series record is assigned a unique and stable GEO accession number (GSExxx)

30

31 GEO DataSet  Assembled in NCBI  Samples are all equivalently measured and normalized  Can be viewed and analyzed with NCBI ’ s advanced data display and analysis tool

32

33 GEO Profile  Profile consists of the expression measurements for an individual gene across all Samples in a DataSet  Profiles can be searched using Entrez GEO Profiles  Similar to Expression Profile in ArrayExpress

34

35

36 SOFT (Simple Omnibus Format in Text)  Text based  Line based  Easily parsed with text processing languages, including Perl, Python, Ruby, PHP, … etc.

37

38 Take a break …

39 Network Biology Visualization and Analysis

40 Cytoscape  Open source network visualization and analysis software  ‘ Core ’ features include network layout and query, also integrate visualizations with state data  Can be extended by plugins

41 Cytoscape developers  University of California at San Diego (Trey Ideker)  Institute for Systems Biology (Leroy Hood)  Memorial Sloan-Kettering Cancer Center (Chris Sander)  Institut Pasteur (Benno Schwikowski)  Agilent Technologies (Annette Adler)  University of California at San Francisco (Bruce Conklin)

42 Cytoscape  A java application  Require Java 5 or 6 (JDK5/6 or JRE5/6)

43

44 Simple Interaction Format (SIF)  Each line denotes one interaction InteractorA xx Interactor B  ‘ xx ’ are interaction types: pp: protein-protein interaction pd: protein-DNA interaction (transcription factor/regulation) pr (protein-reaction), rc (reaction- compound), cr (compound-reaction), gl (genetic-lethal), pm (protein-metabolite), mp (metabolite-protein)

45 Other interaction formats supported  GML  XGMML  SBML  BioPAX  PSI-MI  Tab-delimited text table and excel

46 Cytoscape Demonstration

47 Applications of Gene Expression  Gene selection (differentially expressed genes)  State annotation in networks (expression level)  Gene regulatory network identification


Download ppt "Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University."

Similar presentations


Ads by Google