Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013
BHF CoRE Bioinformatics
Data Integration and our Data Mining Tool Our strategy is to help biologists make the most of their ‘-omics’ data We analyse array and sequence data using current methods Biologists mine their results in a custom built, secure web based platform We help integrate other relevant data from biologist’s lab and the literature
Data Mining Tool Wish list – Web accessible – Secure – Complex queries across datasets – Technology agnostic – Query cross species – Annotation, statistics and graphs – Links to external databases – Include downstream tools
Data Mining Tool Login via EASE or htaccess (+ vpn) Built in PHP with mySQL back end Generic database structure for statistics – counts, intensity, fold change, p-value Separate annotation tables Includes experiment details and QC info Query builder type interface Output as tables with links Gene set enrichment, heat-map and literature
Data Integration Across technologies – array, sequencing – gene expression, methylation, proteomics, genetics Across species – Human, mouse, rat, fly, fish At the gene level – Probe level for within array – Entrez gene within species – orthologous groups across species
Development New platform: Drupal – Some nice features – New look and feel Web services – interactions, diseases, TF binding sites, miRNA… More use of literature data – Top 10 co-cited on gene detail page – Better visualisation – Better text mining Correlation data (expression profiles) – searchable with other stats Cross experiment gene sets
Thanks Jon Manning John Mullins Our collaborators British Heart Foundation
| “Providing bioinformatics services to biology teams throughout the research process”