Download presentation
Presentation is loading. Please wait.
Published byPaula Kelley Modified over 9 years ago
1
Do not reproduce without permission 1 Gerstein.info/talks (c) 2004 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gerstein Lab Aims in ModENCODE Scoring Arrays Using Tilescope (Normalization + HMM Segmentation) ==> DART classification of un-annotated transcription Pseudogene Annotation Using PseudoPipe ==> Pseudogene.org
2
2 zdz © mmvii 2 Tilescope 101 ▪ It is available at tilescope.gersteinlab.org ▪ It was designed for high-density tiling microarray data analysis. ▪ It is useful ▫ Most existing data processing software was designed for traditional microarrays. ▫ It is flexible—several microarray data processing methods are available. ▫ It is easy to use It has a graphic user interface. The data analysis process is streamlined. It is online software. No need to install. ▫ It is free! Zhang et al. GenomeBiology (2007)
3
3 zdz © mmvii 3 Tilescope: system implementation ▪ Written in Java ▪ Composed of 3 parts: applet, servlet, and pipeline program Internet Applet Servlet Pipeline Server Users Zhang et al. GenomeBiology (2007)
4
4 zdz © mmvii 4 Tilescope: user interface Zhang et al. GenomeBiology (2007)
5
5 zdz © mmvii 5 Tilescope: data processing ▪ Array data can be normalized by mean, median, quantile, and loess. ▪ Tile scoring generates the signal map and the P- value map. ▪ Feature identification produces ‘hits’. Zhang et al. GenomeBiology (2007)
6
6 zdz © mmvii 6 Du et al. (2006) Bioinformatics
7
7 zdz © mmvii 7 Segmenting with an HMM and Selecting the regions for validation ▪ Different selection schemes ▫ For a certain model (e.g. HMM), would one selection scheme generally outperform the others? Du et al. (2006) Bioinformatics
8
8 zdz © mmvii 8 DART Classification of Un-annotated Transcription Rozowsky et al. Genome Research (in press)
9
9 zdz © mmvii 9 DART: Database & Tools -Interfaces with UCSC -Tools use Ensembl API Rozowsky et al. Genome Research (in press) DART.gersteinlab.org
10
10 zdz © mmvii 10 Routine D Full Length Protein Queries (simulate processed genes) Human Gene Annotation ENCODE Sequences with Repeats & Exons Masked Lists of Hits similar to Queries Unique Hits Eliminate Redundant Hits Resolve Paternity & Extend Clusters by Referring to the Query Proteins TFASTY DNA Sequences of Exons + 50 bp Overhang on Either Side exon Candidates In-frame Translation Eliminate Redundant Hits; Select Hits 50% Coverage of Exons Assemble Pseudo-exons by Referring to the Intron-Exon Structure of Query Genes GeneWise Queries of Exon Peptides (simulate duplicated genes) Rapid Coarse Indexing (by TBLASTN) Analyze Gaps, Sequence Identity, Coverage of the alignment; Check Disablements, Poly(A) tails etc. Processed genes gene Fragments Duplicated genes Putative genes aligned to parent proteins Genes Candidates with clear parents Hit Clusters Merge & Cluster Routine P Dyn. Prog. Alignment Pseudo Pipe Zheng et al., GenomeBiology (2006)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.