Transcriptomics Patrick Kemmeren European Bioinformatics Institute Genomics Lab, UMC Utrecht
mRNAcDNA hybridise to microarray What are microarrays ? Transcriptomics?
hybridisation labelled nucleic acid array RNA extract Sample Array design hybridisation labelled nucleic acid array RNA extract Sample hybridisation labelled nucleic acid array RNA extract Sample hybridisation labelled nucleic acid array RNA extract Sample hybridisation labelled nucleic acid Microarray RNA extract Sample Experiment Gene expression data matrix normalization integration Protocol genes
Samples Genes Gene expression levels Sample annotation Gene annotation Gene expression matrix Microarray data and annotation
Traditions of data sharing in Life Sciences Data used in publications should be made available so that the experiments can be reproduced and the conclusions can be verified the others can build on other’s results In genome sequencing this has evolved into submissions to public sequence databases DDBJ/EMBL/Genbank – most journals require such submissions
Array scans Spots Quantitations Genes Samples A B C D Sharing microarray data – which data?
MGED standards - MIAME
ArraySample Sample source Sample treatments Extraction protocol Labeling protocol Array design information Location of each element Description of each element Hybridization protocol Quantification matrix Analysis protocol Software specifications Image Scanning protocol Software specifications Hybridisation MIAME 6 parts of a microarray experiment MGED – MIAME
Microarray experiment SamplesExtracts Labelled Extracts Colours related to labels Hybridizations Shapes related to array designs Experiment name Rustici et al., S. pombe cell-cycle mutant data (2004)
MIAMExpress Database Submissions Database Retrieval of raw & processed data for analysis Gene, sample, and experiment centric queries, Submission support Curation MAGE- ML XML VisualisationData download Data upload User Functionality Database Architecture MAGE- ML External Application ArrayExpress Repository AE Data Warehouse
Submission and annotation tool Potential local data annotation tool Based on MIAME concepts Accepts protocol, array and experiment submissions User accounts allow re-use of protocols and arrays Works with your own or commercial arrays MIAMExpress
MIAMExpress schema
MIAMExpress Database Submissions Database Retrieval of raw & processed data for analysis Gene, sample, and experiment centric queries, Submission support Curation MAGE- ML XML VisualisationData download Data upload User Functionality Database Architecture MAGE- ML External Application ArrayExpress Repository AE Data Warehouse
ArrayExpress A public repository for microarray data at the EBI
Data in ArrayExpress
Submissions by pipelines Online (MIAMExpress) Submissions
ArrayExpress data - by organism Total ~ 7000 hybridisations
MIAMExpress Database Submissions Database Retrieval of raw & processed data for analysis Gene, sample, and experiment centric queries, Submission support Curation MAGE- ML XML VisualisationData download Data upload User Functionality Database Architecture MAGE- ML External Application ArrayExpress Repository AE Data Warehouse
Gene-centric Query Prototype New!
Gene-centric Query Prototype New! - Driven by a BioMart backend
Gene-centric Query Prototype New!
MIAMExpress Database Submissions Database Retrieval of raw & processed data for analysis Gene, sample, and experiment centric queries, Submission support Curation MAGE- ML XML VisualisationData download Data upload User Functionality Database Architecture MAGE- ML External Application ArrayExpress Repository AE Data Warehouse
Expression Profiler An online microarray data analysis platform
What can you do with the data?
...view as a heatmap... Expression Profiler Data Viewer Component
What can you do with the data?...cluster the data... Expression Profiler Hierarchical Clustering Component
What can you do with the data?...look at GeneOntology enrichment of a selected cluster... Expression Profiler GO Annotation Component
What can you do with the data?... check out how clusterings compare... Expression Profiler Clustering Comparison Component
What can you do with the data? Expression Profiler Threeway Similarity Analysis... integrate several data types together...
–Data Selection –Data Transformation –Missing Value Imputation –Hierarchical Clustering & K- groups Clustering –Clustering Comparison –Signature Algorithm –Sequence Homology –SPEXS: Promoter Discovery –Visual Pattern Matching –Ordination (COA, PCA) –Between Group Analysis –Three-way Similarity Analysis –GO Annotation Uses: ArrayExpress suite of tools Standalone tool Locally installed (UJI, UMC Utrecht) Teaching tool Pipelines, workflows, high-throughput analysis Available Components
Original EP Development: Jaak Vilo (Tartu) Patrick Kemmeren (Utrecht) Misha Kapushesky EP:NG Framework Development: Patrick Kemmeren (Utrecht) Misha Kapushesky Caroline Johnston (UCL) Visualization Components: Misha Kapushesky Steffen Durinck (Leuven) Phil Hyoun Lee Clustering Comparison: Aurora Torrente Christine Körner (Leipzig) PCA/COA/BGA: Aedín Culhane (Cork) Signature Algorithm: Jan Ihmels (Tel-Aviv) Gene Ordering: Karlis Freivalds (Riga) Normalisation: Caroline Johnston (UCL) Web Services: Antonio Estruch (UJI) Acknowledgements EBI Microarray Informatics Team Alvis Brazma, Head of Microarray Informatics Group Ahmet Oezcimen, Scientist (Oracle DBA) Anastasia Samsonova, PhD student Anjan Sharma, Scientist (Software Developer) Anna Farne, Scientist (Curation) Aurora Torrente, PhD Student Bhuwan Tiwari, Trainee Catherine Leroy, Summer Student Ele Holloway, Scientist (Curation) Gabriella Rustici, Scientist (Postdoc) Gaurab Mukherjee, Scientist (Curation) Gonzalo Garcia Lara, Scientist (Web Designer/Programmer) Helen Parkinson, Scientist (Curation Coordinator) Jaak Vilo, Consultant Lev Soinov, Scientist (Postdoc Wellcome Trust) Misha Kapushesky, Scientist (Scientific Application Programmer) Mohammadreza Shojatalab, Scientist (Database Programmer) Niran Abeygunawardena, Scientist (Web Designer/Programmer) Patrick Kemmeren, Consultant Per Lilja, Scientist (Database Programmer) Philippe Rocca-Serra, Scientist (Nutrigenomics Proj. Coordinator) Pierre Marguerite, Summer Student Richard Coulson, Scientist (Biosapiens Project) Sergio Contrino, Scientist (Database Programmer) Steffen Durinck, Student Susanna-Assunta Sansone, Scientist (Toxicogenomics Proj. Coordinator) Tim Rayner, Scientist (Curation) Ugis Sarkans, Scientist (Database Development Coordinator)