Download presentation
Presentation is loading. Please wait.
1
Optimizing Biological Data Integration
Bioinformatics depends not just on the numbers, but also on correct molecular identification (MI) and mapping between high-throughput platforms. But the databases and algorithms providing MI disagree wildly. TCGA (The Cancer Genome Atlas) provides 1000’s of samples and over a dozen platforms for data integration. Integrating both samples and semantics gives us a way to measure the accuracy of MI for filtering and mapping. In this way we can evaluate and compare data prep strategies: ID mapping among genes, transcripts, and proteins. Algorithms for predicting microRNA targets, for aligning NGS data with reference genomes, for calling copy number variations, etc. Integrating multiple-platform data correctly will open up a new level of comprehensive systems biology modeling. We have built some bioconductor R packages to support this work, and published our first application. We look to greatly expand the scope, to aid bioinformaticians and curators. We pre-process raw data into processed data ripe for answering medical and biological questions. We pre-process raw data into processed data ripe for answering medical and biological questions. MEANINGFUL Translational Bioinformatics & Systems Biology Good choices Bad choices Not so meaningful… pre-processing choices (annotation, ID mapping, filtering, algorithms,…) data for analysis & modeling raw data MEANINGFUL Translational Bioinformatics & Systems Biology Good choices Bad choices Not so meaningful… pre-processing choices (annotation, ID mapping, filtering, algorithms,…) data for analysis & modeling raw data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.