Download presentation
Presentation is loading. Please wait.
Published byJuliana Bryant Modified over 9 years ago
2
Genome Informatics 2005 ~ 220 participants 1 keynote speaker: David Haussler 47 talks 121 posters
3
Rodger Voelker:Two classes of splice junctions Search for 5-7 base motifs in exonic and intronic flanking sequences of known splice junctions Computational analysis of collocations between different motifs Many collocations between exonic and intronic sequences Known ESEs display collocations with intronic sequences (including ISEs) Nearly all introns (89%) can be classified into 2 classes
4
Chip Lawrence: futility of optima in inferences The strong focus in bioinformatics on optimal solutions is fundamentally flawed, because the asymptotic underpinnings of these solutions, such as consistency, do not apply The curse of dimensionality can render optimal solutions very unlikely and misleading Example: minimum free energy predictions of RNA structures Reason: incomplete energy function used, only sec structure considered, no tertiary
5
Minimum free energy predictions of RNA structures Assumption: –molecule folds into lowest energy state –unique solution to folding problem (optimum) Many programs (e.g. Zuker's Mfold) use the Boltzmann probability function –Most include calculations of suboptimal structures –but not all structures are computed –PPV of MFE: 48 %
6
Alternative prediction of RNA structures Sample the ensemble of sec structures in proportion to their Boltzmann weights Cluster the structures Use centroid structure in predictions –Improved PPV compared to MFE Srna module of Sfold (http://sfold.wadsworth.org/ )http://sfold.wadsworth.org/
7
A.tumefaciens 5S rRNA energy landscape
8
Alternative prediction of RNA structures Improved PPV compared to MFE: –Ensemble centroid + 30 % –Largest cluster centroid +18 % –Best centroid + 47 %
9
Data mining Geneseer – searchable name-translation database (http://geneseer.cshl.org/ )http://geneseer.cshl.org/ Access to genomic information through gene names Mapping sequences to gene names Identification of homologs across several species for a given gene Used in RNAi Codex (http://codex.cshl.edu )http://codex.cshl.edu
10
Data mining Ulysses – annotate human genes based on gene interactions in model organisms (http://www.cisreg.ca:8080/ulysses/ )http://www.cisreg.ca:8080/ulysses/ Interologs: conserved protein-protein interactions Regulogs: conserved protein-DNA interactions Almost no overlap between data in interaction databases BIND DIP: 984 refs; BIND 5 DB's: 3 refs
11
Data mining Integrated Genome Browser (IGB) – visualize: – Genomic annotations from multiple data resources – Experimental data from Affymetrix arrays (http://www.affymetrix.com/support/developer/ tools/download_igb.affx )http://www.affymetrix.com/support/developer/ tools/download_igb.affx
12
Gene expression and pathways Skypainter tool in Reactome database: –allows overlay of gene expression data on pathway graphs –allows generation of a "movie" of a time series (http://www.reactome.org/ )http://www.reactome.org/
13
Gene expression ArrayBlast: Compares gene expression signatures generated on different platforms Uses public microarray data sets (GEO) Used to create conserved cancer-related expression signature (http://seq.mc.vanderbilt.edu/arrayBlast/ )http://seq.mc.vanderbilt.edu/arrayBlast/
14
Gene expression C. elegans Gene Expression Consortium: SAGE data from specific stages, tissues and cell types Database of gene expression data/pictures/movies of transgenic worms with promoter::GFP fusions for 2000 genes with human orthologs (http://elegans.bcgsc.ca/home/ge_consortium.html )http://elegans.bcgsc.ca/home/ge_consortium.html
15
Michael Caudy: Whole genome analysis of combinatorial and architectural transcription codes Search for TFBS in known neural pathway genes Determine architecture: number, type, order, orientation and spacing of TFBS Compare architecture of activated and repressed genes Determine activity of promoters with TFBS mutations Architecture is critical for differential response to Notch signalling
16
Regulatory sequence identification Evoprinter: highlights multi-species conserved sequences within orthologous DNAs in the context of a single species of interest (http://evoprinter.ninds.nih.gov/ )http://evoprinter.ninds.nih.gov/
17
Regulatory sequence identification NestedMICA: –method for discovering many over-represented short motifs in large sets of strings in a single run –candidate transcription factor binding sites (http://www.sanger.ac.uk/Software/analysis/n mica/ )http://www.sanger.ac.uk/Software/analysis/n mica/
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.