Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham
Introduction to my Grid a computer science pilot project working in the field of bioinformatics a consortium of the European Bioinformatics Institute, IT Innovations, 5 universities and some industrial partners ends June 2005 and other projects will develope infrastructure further
Presentation aims Introduce my Grid Introduce bioinformatics Introduce portal work in my Grid Show some screenshots of portlets
Introduction to bioinformatics how to store, process and publish large volumes of biological data large databases, access and analysis services composite processes involve multiple databases and services Automation through workflows
Data in bioinformatics Commonly genetic sequences –DNA: GCGCATAGCGATGA –Protein: MAHPLGPHGVANA Meta information –Species, chromosome –Interesting features –Equipment used –First published paper referring to sequence
Data storage 3 international databases aim to store all DNA sequences (EMBL, GenBank, DDBJ) Protein sequences in SwissProt Journals require submission before publication Smaller databases hold specialist information
Using bioinformatics data Database access services –Fetch sequence for given ID –Fetch similar sequences Sequence analysis –Look for interesting regions of sequence Sequence prediction –Predict proteins generated by DNA sequence
Service interface types Web-page Command-line tool set Programming language library client SOAP web-service with WSDL interface
Using services Often need to combine services with different interface types Cut-and-paste from web-page to file and run command-line tool Repetitive and time-consuming Can be automated using scripts
Workflows
my Grid workflow technology Freefluo workflow enactor Taverna – graphical workbench allowing users to –Author workflows –Enact and browse results my Grid Information Repository
Authoring a workflow
Enacting a workflow
Browsing results
Including services in workflows Service invocation done by processor Generic processor for SOAP/WSDL web- services Custom processor can wrap custom client SOAPlab exposes command-line tools as web-service
Portal in my Grid Taverna/Freefluo is production workflow system, so interface can’t be hacked around with Some interface limitiations –Difficult to start new workflow running using results of enactment –Complex interface, so takes time to master
Text services work If enactment of a workflow produces a SwissProt protein sequence record, can extract from this PubMed ID of first paper referring to this protein Add extra workflow stages which look up related papers Might like to re-run these stages as a separate workflow on any new papers found
Input form
Monitoring progress
Results
MIR portal work Taverna/Freefluo/MIR interface caters for expert user Large numbers of users who won’t write workflows but might enact them Provide a simpler workflow enactment interface Portal useful – all biologists have browser on their desk
Collections of workflows
View workflow
View workflow results
View individual output param
Further details Twiki.mygrid.org.uk Stefan Rennick Egglestone Ian Roberts Presentation and notes will be at