ArrayExpress A public database for microarray based gene expression data European Bioinformatics Institute EMBL-EBI Alvis Brazma, Helen Parkinson, Ugis Sarkans, Mohammadreza Shojatalab, Jaak Vilo + team MGED IV, Boston, February 2002
ArrayExpress Standards:MIAME-compliant Data model: MAGE-OM Data input: MAGE-ML, web Data output: HTML, MAGE-ML, TAB-delimited, link to Expression Profiler Data curation:Team of curators Data sets:Yeast, human Tuesday, February 12 th, 2002 Opened to public
General overview ArrayExpress MIAMExpress Expression Profiler MAGE-ML Internet www MAGE-ML
ArrayExpress component architecture Main database SQL derived from MAGE-OM Data warehouse gene-centred queries Application server Java servlets MAGE-OM Images file server ArrayExpress MAGE-ML Submission/ curation Internet www
ArrayExpress - features MIAME-compliant, MAGE-ML, MAGE-OM Can deal with: raw quantitation data processed data data transformations Independent of: experimental platforms image analysis methods data normalization methods
ArrayExpress: details Database schema derived from MAGE-OM Standard SQL, we use Oracle Data loader for MAGE-ML - generated Web interface (first release ) Queries by experiment, array, sample Browsing Object model-based query mechanism, automatic mapping to SQL
Simplified ArrayExpress model
MIAMExpress Data annotation and submission tool MIAME based web interface Experiment, Array, Protocol submissions Uses CV/ontology wherever possible Creates MAGE-ML files for loading into ArrayExpress Based on MySQL, Perl, CGI, Apache
Login Pending/New Experiment Sample1Sample2Sample3 Sample n Sample protocol Hybridisations Hyb protocol Array 1 Array 2 Array 3 Array n Scanning protocol Data 1 Data 2 Data 3 Data n Image analysis protocol Combined Experiment Data Transformation protocol Submit Final free text comment Create account Extracts 1…n E1E1 E2E2 EnEn E1E1 E2E2 EnEn E1E1 E2E2 EnEn E1E1 E2E2 EnEn Extraction protocol MIAMExpress submission procedure
MIAMExpress design and future Species and domain specific pages and ontologies, ontology development Life-span of data submissions is long Curation control, submissions tracking Interaction with ArrayExpress Full MAGE-OM, data updating Usability, flexibility, scalability, platform independence User needs, free in-house installation
ArrayExpress curation effort User support and help documentation Submission support for MIAMExpress Support on ontologies and CVs Minimize free text, removal of synonyms MIAME encouragement Help on MAGE-ML Goal: to provide high-quality, well- annotated data to allow automated data analysis
E-MEXP-234 Experiment 234 via MIAMExpress E-SANG-25 Experiment 25 from Sanger Institute A-AFFY-1034 Array description 1034 from Affymetrix P-LABL-5 Protocol 5 for labeling Accession numbers
Data in ArrayExpress Human data (ironchip) from EMBL Yeast data from EMBL S. pombe data Sanger Institute TIGR array descriptions Affymetrix chip designs Direct pipeline from Sanger (Rob Andrews) HGMP mouse EMBL mosquito (Add your name here!) Now Work underway
Data browsing and queries
Experiment info
Sample info
General overview ArrayExpress MIAMExpress Expression Profiler MAGE-ML Internet www MAGE-ML
Expression Profiler: EPCLUST DATASELECT FOLDER ANALYZE A CLUSTER URLMAP GeneOntology Pathways Databases SPEXS Other tools
>YAL036C chromo=1 coord=( (C)) start=-600 end=+2 seq=( ) TGTTCTTTCTTCTTCTGCTTCTCCTTTTCCTTTTTTTCCTTCTCCTTTTCCTTCTTGGACTTTAGTATAGGCTTACCATCCTTCTTCTCTTCAATAACCTTCTTTTCTTG CTTCTTCTTCGATTGCTTCAAAGTAGACATGAAGTCGCCTTCAATGGCCTCAGCACCTTCAGCACTTGCACTTGCTTCTCTGGAAGTGTCATCTGCACCTGCGCTGCTTT CTGGATTTGGAGTTGGCGTGGCACTGATTTCTTCGTTCTGGGCGGCGTCTTCTTCGAATTCCTCATCCCAGTAGTTCTGTTGGTTCTTTTTACTCTTTTTCGCCATCTTT CACTTATCTGATGTTCCTGATTGCCCTTCTTATCCCCTCAAAGTTCACCTTTGCCACTTATTCTAGTGCAAGATCTCTTGCTTTCAATGGGCTTAAAGCTTGAAAAATTT TTTCACATCACAAGCGACGAGGGCCCGTTTTTTTCATCGATGAGCTATAAGAGTTTTCCACTTTTAAGATGGGATATTACGGTGTGATGAGGGCGCAATGATAGGAAGTG TTTGAAGCTAGATGCAGTAGGTGCAAGCGTAGAGTTGTTGATTGAGCAAA_ATG_ >YAL025C chromo=1 coord=( (C)) start=-600 end=+2 seq=( ) CTTAGAAGATAAAGTAGTGAATTACAATAAATTCGATACGAACGTTCAAATAGTCAAGAATTTCATTCAAAGGGTTCAATGGTCCAAGTTTTACACTTTCAAAGTTAACC ACGAATTGCTGAGTAAGTGTGTTTATATTAGCACATTAACACAAGAAGAGATTAATGAACTATCCACATGAGGTATTGTGCCACTTTCCTCCAGTTCCCAAATTCCTCTT GTAAAAAACTTTGCATATAAAATATACAGATGGAGCATATATAGATGGAGCATACATACATGTTTTTTTTTTTTTAAAAACATGGACTCGAACAGAATAAAAGAATTTAT AATGATAGATAATGCATACTTCAATAAGAGAGAATACTTGTTTTTAAATGAGAATTGCTTTCATTAGCTCATTATGTTCAGATTATCAAAATGCAGTAGGGTAATAAACC TTTTTTTTTTTTTTTTTTTTTTTTGAAAAATTTTCCGATGAGCTTTTGAAAAAAAATGAAAAAGTGATTGGTATAGAGGCAGATATTGCATTGCTTAGTTCTTTCTTTTG ACAGTGTTCTCTTCAGTACATAACTACAACGGTTAGAATACAACGAGGAT_ATG_... >YBR084W chromo=2 coord=( ) start=-600 end=+2 seq=( ) CCATGTATCCAAGACCTGCTGAAGATGCTTACAATGCCAATTATATTCAAGGTCTGCCCCAGTACCAAACATCTTATTTTTCGCAGCTGTTATTATCATCACCCCAGCAT TACGAACATTCTCCACATCAAAGGAACTTTACGCCATCCAACCAATCGCATGGGAACTTTTATTAAATGTCTACATACATACATACATCTCGTACATAAATACGCATACG TATCTTCGTAGTAAGAACCGTCACAGATATGATTGAGCACGGTACAATTATGTATTAGTCAAACATTACCAGTTCTCGAACAAAACCAAAGCTACTCCTGCAACACTCTT CTATCGCACATGTATGGTTCTTATTGTTTCCCGAGTTCTTTTTTACTGACGCGCCAGAACGAGTAAGAAAGTTCTCTAGCGCCATGCTGAAATTTTTTTCACTTCAACGG ACAGCGATTTTTTTTCTTTTTCCTCCGAAATAATGTTGCAGCGGTTCTCGATGCCTCAAGAATTGCAGAAGTAAACCAGCCAATACACATCAAAAAACAACTTTCATTAC TGTGATTCTCTCAGTCTGTTCATTTGTCAGATATTTAAGGCTAAAAGGAA_ATG_ 101 Sequences relative to ORF start GATGAG.T 1:52/70 2:453/508 R: BP: e-33 G.GATGAG.T 1:39/49 2:193/222 R: BP: e-33 AAAATTTT 1:63/77 2:833/911 R: BP: e-32 TGAAAA.TTT 1:45/53 2:333/350 R: BP: e-31 TG.AAA.TTT 1:53/61 2:538/570 R: BP: e-31 TG.AAA.TTTT 1:40/43 2:254/260 R: BP: e-30 TGAAA..TTT 1:54/65 2:608/645 R: BP:1.0887e GATGAG.T TGAAA..TTT YGR128C + 100
Upstream sequence (600bp) GATGAG.T TGAAA..TTT GATGAG.T W/30 TGAAA..TTT 1 mismatch
EPCLUST Expression data GENOMES sequence, function, annotation SPEXS discover patterns URLMAP provide links Components of Expression Profiler Expression data External data, tools pathways, function, etc. PATMATCH visualise patterns EP:GO GeneOntology EP:PPI Prot-Prot ia. SEQLOGO
Ackowledgments: the team (3) Alvis Brazma Alan Robinson Jaak Vilo 1999 November MGED 1 in Hinxton, EBI
Ackowledgments: the team (5) Alvis Brazma, Alan Robinson Database Ugis Sarkans Expression Profiler Jaak Vilo Research, students Thomas Schlitt 2000 August
Ackowledgments: the team (9) Alvis Brazma DatabaseCuration MIAMExpress Ugis SarkansHelen ParkinsonMohammadreza Shojatalab Expression Profiler Jaak Vilo Research, students Thomas Schlitt Katja Kivinen Johan Rung Patrick Kemmeren 2001 June
Ackowledgments: the team (19) Alvis Brazma DatabaseCuration MIAMExpress Ugis Sarkans Gonzalo Garcia Helen ParkinsonMohammadreza Shojatalab Expression Profiler Jaak Vilo Research, students Thomas Schlitt Katja Kivinen Johan Rung Patrick Kemmeren Misha Kapushesky Lev Soinov Koichi Tazaki Anastasia Samsonova Susanna Sansone Philippe Rocca-Serra Ele Holloway Niran Abeyguna- wardena Ahmet Oezcimen 2002 February