Presentation is loading. Please wait.

Presentation is loading. Please wait.

Do not reproduce without permission 1 Gerstein.info/talks (c) 2004 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gerstein Lab Aims in ModENCODE.

Similar presentations


Presentation on theme: "Do not reproduce without permission 1 Gerstein.info/talks (c) 2004 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gerstein Lab Aims in ModENCODE."— Presentation transcript:

1 Do not reproduce without permission 1 Gerstein.info/talks (c) 2004 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gerstein Lab Aims in ModENCODE Scoring Arrays  Using Tilescope (Normalization + HMM Segmentation) ==> DART classification of un-annotated transcription Pseudogene Annotation  Using PseudoPipe ==> Pseudogene.org

2 2 zdz © mmvii 2 Tilescope 101 ▪ It is available at tilescope.gersteinlab.org ▪ It was designed for high-density tiling microarray data analysis. ▪ It is useful ▫ Most existing data processing software was designed for traditional microarrays. ▫ It is flexible—several microarray data processing methods are available. ▫ It is easy to use It has a graphic user interface. The data analysis process is streamlined. It is online software. No need to install. ▫ It is free! Zhang et al. GenomeBiology (2007)

3 3 zdz © mmvii 3 Tilescope: system implementation ▪ Written in Java ▪ Composed of 3 parts: applet, servlet, and pipeline program Internet Applet Servlet Pipeline Server Users Zhang et al. GenomeBiology (2007)

4 4 zdz © mmvii 4 Tilescope: user interface Zhang et al. GenomeBiology (2007)

5 5 zdz © mmvii 5 Tilescope: data processing ▪ Array data can be normalized by mean, median, quantile, and loess. ▪ Tile scoring generates the signal map and the P- value map. ▪ Feature identification produces ‘hits’. Zhang et al. GenomeBiology (2007)

6 6 zdz © mmvii 6 Du et al. (2006) Bioinformatics

7 7 zdz © mmvii 7 Segmenting with an HMM and Selecting the regions for validation ▪ Different selection schemes ▫ For a certain model (e.g. HMM), would one selection scheme generally outperform the others? Du et al. (2006) Bioinformatics

8 8 zdz © mmvii 8 DART Classification of Un-annotated Transcription Rozowsky et al. Genome Research (in press)

9 9 zdz © mmvii 9 DART: Database & Tools -Interfaces with UCSC -Tools use Ensembl API Rozowsky et al. Genome Research (in press) DART.gersteinlab.org

10 10 zdz © mmvii 10 Routine D Full Length Protein Queries (simulate processed  genes) Human Gene Annotation ENCODE Sequences with Repeats & Exons Masked Lists of Hits similar to Queries Unique Hits Eliminate Redundant Hits Resolve Paternity & Extend Clusters by Referring to the Query Proteins TFASTY DNA Sequences of Exons + 50 bp Overhang on Either Side  exon Candidates In-frame Translation Eliminate Redundant Hits; Select Hits 50% Coverage of Exons Assemble Pseudo-exons by Referring to the Intron-Exon Structure of Query Genes GeneWise Queries of Exon Peptides (simulate duplicated  genes) Rapid Coarse Indexing (by TBLASTN) Analyze Gaps, Sequence Identity, Coverage of the alignment; Check Disablements, Poly(A) tails etc. Processed  genes  gene Fragments Duplicated  genes Putative  genes aligned to parent proteins  Genes Candidates with clear parents Hit Clusters Merge & Cluster Routine P Dyn. Prog. Alignment Pseudo Pipe Zheng et al., GenomeBiology (2006)


Download ppt "Do not reproduce without permission 1 Gerstein.info/talks (c) 2004 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gerstein Lab Aims in ModENCODE."

Similar presentations


Ads by Google