Presentation is loading. Please wait.

Presentation is loading. Please wait.

ArrayExpress A public database for microarray based gene expression data European Bioinformatics Institute EMBL-EBI Alvis.

Similar presentations


Presentation on theme: "ArrayExpress A public database for microarray based gene expression data European Bioinformatics Institute EMBL-EBI Alvis."— Presentation transcript:

1 ArrayExpress A public database for microarray based gene expression data http://www.ebi.ac.uk/microarray/ European Bioinformatics Institute EMBL-EBI Alvis Brazma, Helen Parkinson, Ugis Sarkans, Mohammadreza Shojatalab, Jaak Vilo + team MGED IV, Boston, February 2002

2 ArrayExpress Standards:MIAME-compliant Data model: MAGE-OM Data input: MAGE-ML, web Data output: HTML, MAGE-ML, TAB-delimited, link to Expression Profiler Data curation:Team of curators Data sets:Yeast, human Tuesday, February 12 th, 2002 Opened to public

3 General overview ArrayExpress MIAMExpress Expression Profiler MAGE-ML Internet www MAGE-ML

4 ArrayExpress component architecture Main database SQL derived from MAGE-OM Data warehouse gene-centred queries Application server Java servlets MAGE-OM Images file server ArrayExpress MAGE-ML Submission/ curation Internet www

5 ArrayExpress - features MIAME-compliant, MAGE-ML, MAGE-OM Can deal with: raw quantitation data processed data data transformations Independent of: experimental platforms image analysis methods data normalization methods

6 ArrayExpress: details Database schema derived from MAGE-OM Standard SQL, we use Oracle Data loader for MAGE-ML - generated Web interface (first release 12.2.2002) Queries by experiment, array, sample Browsing Object model-based query mechanism, automatic mapping to SQL

7 Simplified ArrayExpress model

8 MIAMExpress Data annotation and submission tool MIAME based web interface Experiment, Array, Protocol submissions Uses CV/ontology wherever possible Creates MAGE-ML files for loading into ArrayExpress Based on MySQL, Perl, CGI, Apache

9

10 Login Pending/New Experiment Sample1Sample2Sample3 Sample n Sample protocol Hybridisations Hyb protocol Array 1 Array 2 Array 3 Array n Scanning protocol Data 1 Data 2 Data 3 Data n Image analysis protocol Combined Experiment Data Transformation protocol Submit Final free text comment Create account Extracts 1…n E1E1 E2E2 EnEn E1E1 E2E2 EnEn E1E1 E2E2 EnEn E1E1 E2E2 EnEn Extraction protocol MIAMExpress submission procedure

11 MIAMExpress design and future Species and domain specific pages and ontologies, ontology development Life-span of data submissions is long Curation control, submissions tracking Interaction with ArrayExpress Full MAGE-OM, data updating Usability, flexibility, scalability, platform independence User needs, free in-house installation

12 ArrayExpress curation effort User support and help documentation Submission support for MIAMExpress Support on ontologies and CVs Minimize free text, removal of synonyms MIAME encouragement Help on MAGE-ML Goal: to provide high-quality, well- annotated data to allow automated data analysis

13 E-MEXP-234 Experiment 234 via MIAMExpress E-SANG-25 Experiment 25 from Sanger Institute A-AFFY-1034 Array description 1034 from Affymetrix P-LABL-5 Protocol 5 for labeling Accession numbers

14 Data in ArrayExpress Human data (ironchip) from EMBL Yeast data from EMBL S. pombe data Sanger Institute TIGR array descriptions Affymetrix chip designs Direct pipeline from Sanger (Rob Andrews) HGMP mouse EMBL mosquito (Add your name here!) Now Work underway

15 Data browsing and queries

16

17 Experiment info

18 Sample info

19 General overview ArrayExpress MIAMExpress Expression Profiler MAGE-ML Internet www MAGE-ML

20 Expression Profiler: EPCLUST DATASELECT FOLDER ANALYZE A CLUSTER URLMAP GeneOntology Pathways Databases SPEXS Other tools

21 >YAL036C chromo=1 coord=(76154-75048(C)) start=-600 end=+2 seq=(76152-76754) TGTTCTTTCTTCTTCTGCTTCTCCTTTTCCTTTTTTTCCTTCTCCTTTTCCTTCTTGGACTTTAGTATAGGCTTACCATCCTTCTTCTCTTCAATAACCTTCTTTTCTTG CTTCTTCTTCGATTGCTTCAAAGTAGACATGAAGTCGCCTTCAATGGCCTCAGCACCTTCAGCACTTGCACTTGCTTCTCTGGAAGTGTCATCTGCACCTGCGCTGCTTT CTGGATTTGGAGTTGGCGTGGCACTGATTTCTTCGTTCTGGGCGGCGTCTTCTTCGAATTCCTCATCCCAGTAGTTCTGTTGGTTCTTTTTACTCTTTTTCGCCATCTTT CACTTATCTGATGTTCCTGATTGCCCTTCTTATCCCCTCAAAGTTCACCTTTGCCACTTATTCTAGTGCAAGATCTCTTGCTTTCAATGGGCTTAAAGCTTGAAAAATTT TTTCACATCACAAGCGACGAGGGCCCGTTTTTTTCATCGATGAGCTATAAGAGTTTTCCACTTTTAAGATGGGATATTACGGTGTGATGAGGGCGCAATGATAGGAAGTG TTTGAAGCTAGATGCAGTAGGTGCAAGCGTAGAGTTGTTGATTGAGCAAA_ATG_ >YAL025C chromo=1 coord=(101147-100230(C)) start=-600 end=+2 seq=(101145-101747) CTTAGAAGATAAAGTAGTGAATTACAATAAATTCGATACGAACGTTCAAATAGTCAAGAATTTCATTCAAAGGGTTCAATGGTCCAAGTTTTACACTTTCAAAGTTAACC ACGAATTGCTGAGTAAGTGTGTTTATATTAGCACATTAACACAAGAAGAGATTAATGAACTATCCACATGAGGTATTGTGCCACTTTCCTCCAGTTCCCAAATTCCTCTT GTAAAAAACTTTGCATATAAAATATACAGATGGAGCATATATAGATGGAGCATACATACATGTTTTTTTTTTTTTAAAAACATGGACTCGAACAGAATAAAAGAATTTAT AATGATAGATAATGCATACTTCAATAAGAGAGAATACTTGTTTTTAAATGAGAATTGCTTTCATTAGCTCATTATGTTCAGATTATCAAAATGCAGTAGGGTAATAAACC TTTTTTTTTTTTTTTTTTTTTTTTGAAAAATTTTCCGATGAGCTTTTGAAAAAAAATGAAAAAGTGATTGGTATAGAGGCAGATATTGCATTGCTTAGTTCTTTCTTTTG ACAGTGTTCTCTTCAGTACATAACTACAACGGTTAGAATACAACGAGGAT_ATG_... >YBR084W chromo=2 coord=(411012-413936) start=-600 end=+2 seq=(410412-411014) CCATGTATCCAAGACCTGCTGAAGATGCTTACAATGCCAATTATATTCAAGGTCTGCCCCAGTACCAAACATCTTATTTTTCGCAGCTGTTATTATCATCACCCCAGCAT TACGAACATTCTCCACATCAAAGGAACTTTACGCCATCCAACCAATCGCATGGGAACTTTTATTAAATGTCTACATACATACATACATCTCGTACATAAATACGCATACG TATCTTCGTAGTAAGAACCGTCACAGATATGATTGAGCACGGTACAATTATGTATTAGTCAAACATTACCAGTTCTCGAACAAAACCAAAGCTACTCCTGCAACACTCTT CTATCGCACATGTATGGTTCTTATTGTTTCCCGAGTTCTTTTTTACTGACGCGCCAGAACGAGTAAGAAAGTTCTCTAGCGCCATGCTGAAATTTTTTTCACTTCAACGG ACAGCGATTTTTTTTCTTTTTCCTCCGAAATAATGTTGCAGCGGTTCTCGATGCCTCAAGAATTGCAGAAGTAAACCAGCCAATACACATCAAAAAACAACTTTCATTAC TGTGATTCTCTCAGTCTGTTCATTTGTCAGATATTTAAGGCTAAAAGGAA_ATG_ 101 Sequences relative to ORF start GATGAG.T 1:52/70 2:453/508 R:7.52345 BP:1.02391e-33 G.GATGAG.T 1:39/49 2:193/222 R:13.244 BP:2.49026e-33 AAAATTTT 1:63/77 2:833/911 R:4.95687 BP:5.02807e-32 TGAAAA.TTT 1:45/53 2:333/350 R:8.85687 BP:1.69905e-31 TG.AAA.TTT 1:53/61 2:538/570 R:6.45662 BP:3.24836e-31 TG.AAA.TTTT 1:40/43 2:254/260 R:10.3214 BP:3.84624e-30 TGAAA..TTT 1:54/65 2:608/645 R:5.82106 BP:1.0887e-29... GATGAG.T TGAAA..TTT YGR128C + 100

22 Upstream sequence (600bp) GATGAG.T TGAAA..TTT GATGAG.T W/30 TGAAA..TTT 1 mismatch

23

24 EPCLUST Expression data GENOMES sequence, function, annotation SPEXS discover patterns URLMAP provide links Components of Expression Profiler http://ep.ebi.ac.uk/ Expression data External data, tools pathways, function, etc. PATMATCH visualise patterns EP:GO GeneOntology EP:PPI Prot-Prot ia. SEQLOGO

25 Ackowledgments: the team (3) Alvis Brazma Alan Robinson Jaak Vilo 1999 November MGED 1 in Hinxton, EBI

26 Ackowledgments: the team (5) Alvis Brazma, Alan Robinson Database Ugis Sarkans Expression Profiler Jaak Vilo Research, students Thomas Schlitt 2000 August

27 Ackowledgments: the team (9) Alvis Brazma DatabaseCuration MIAMExpress Ugis SarkansHelen ParkinsonMohammadreza Shojatalab Expression Profiler Jaak Vilo Research, students Thomas Schlitt Katja Kivinen Johan Rung Patrick Kemmeren 2001 June

28 Ackowledgments: the team (19) Alvis Brazma DatabaseCuration MIAMExpress Ugis Sarkans Gonzalo Garcia Helen ParkinsonMohammadreza Shojatalab Expression Profiler Jaak Vilo Research, students Thomas Schlitt Katja Kivinen Johan Rung Patrick Kemmeren Misha Kapushesky Lev Soinov Koichi Tazaki Anastasia Samsonova Susanna Sansone Philippe Rocca-Serra Ele Holloway Niran Abeyguna- wardena Ahmet Oezcimen 2002 February


Download ppt "ArrayExpress A public database for microarray based gene expression data European Bioinformatics Institute EMBL-EBI Alvis."

Similar presentations


Ads by Google