Functional and structural genomics using PEDANT

Functional and structural genomics using PEDANT
陽明生技所生物資訊學程林千涵

Introduction With increasing biological sequence data, it need a system with ability of storing and retreving tens of gigabytes of data, a mature database management system, and a good visualization tools From case-oriented sequence analysis work to automated large-scale genome annotation

Introduction-PEDANT Difference of existing genome analysis programs
protein oriented vs. DNA oriented analysis interactive work vs. commandline operation bioinformatics method applied user interface conveniency feature, project management and data editors fidelity of result produced Benchmark may vary in terms of chosen of balance between sensitivity and selectivity of the analyses PEDANT (Protein Extraction, Description, and ANalysis Tool) was available in mid-1997(use FASTA as similarity search) a workhorse for general bioinformatics research a common framework for a number of genome analysis projects a complete database of automated genomes a tool for routine analysis of large amounts of genomic contigs and ESTs

System Architecture Overview
database module: storing, modifying and accessing data processing module: bioinformatics computations user interface: web based communication

System Architecture-Cont.
Data access primary table: store raw data (ex DNA, protein sequences and program results ex BLAST output ) secondary table: parsed program results simplified schema Operation in command line mode applying bioinformatics methods to sequences parsing data tables querying the resulting databases Web interface No static HTML pages required DNA and Protein viewers make direct access to the SQL tables Implementation and system requirements Perl 5, and C++ for graphical viewer Performance parallel capabilities

Schema

Bioinformatics Method
Overview of the PEDANT processing pipeline identification of coding regions and various analysis genetics elements homology search detection of protein motifs, prediction of secondary structure and other protein features and sensitive fold recognition automatically attributed to pre-defined functional categories Prediction of genes and other genetic elements Table 1 choose one of 15 genetic codes Functional and structural categories similarity search : PSI-BLAST(Position-Specific Iterated BLAST) special datasets: MIPS, COG, PROSITE, PFAM and BLOCKS significant matches of PIR: annotations, keywords, enzyme classification and superfamily information with significant relationship of PDB, secondary structure information: STRIDE(upper case), PREDATOR(lower case) low complexity region, membrance regions, coiled coils and signal peptides comparison of SCOP with IMPALA functional structural

Table 1

Bioinformatics Method-Cont.
Yeast biological role categories first system of biological role of categories : E.Coli MIPS: advanced hierarchical functional catalogue (Yeast) Multidimensionality-protein:gene is M:M automated assignment to MIPS is first approximation, will be refined by manual annotation Distribution of ORFs Visualization a integrated, hypertext-linked protein report with calculated parameters and sequences as reference for further manual annotation Protein report page

Distribution of ORFs

Protein report page

Bioinformatics Method-Cont.2
Automatic versus manual annotation Problem of error propagation erroneous annotation by human error and spurious similarity hits with filtering algorithms and domain structure ? quality improvement of manual review of human experts ! Manual annotation Catalogue independent Flexibility: first place in higher category and later step move to the finer categories 528 categories: 20 main categories and 6 levels confidence levels: “reject”, “low”, “medium”, “high” and default is “auto” Data release management new release data can be intelligently merged with existing data pool transfer manual annotation between subsequent data release “manual” field: “yes” or ”no” and default is “no” initially example: a PFAM domain identified in new release ORF is “manual: no” and “conf: auto”

Manual annotation transfer
Two genes fuse to one contig Two contigs fuse to one Gene boundary change Appears new gene

The PEDANT Genome Database
Annotation of publicly available completely sequenced and unfinished genomes Genome annotated by MIPS Completely sequenced and published genomic sequences Unfinished and/or unpublished genomics sequences gene prediction by ORPHEUS, allow large overlaps between ORFs PEDANT as a structural genomics resource-0.3M proteins class-based approach, cost-saving (i)non-redundant protein sequence databases (ii)PSI-BLAST search with SCOP against (I) abd saving resulting profiles (iii)construct a SCOP profile library using IMPALA (iv)IMPALA search with each genomic sequence against SCOP library same procedure for nr PDB sequence database performance of IMPALA Cross-genome comparison treat each genome as an individual contig : creat cross-genome datasets without any modification 44 genomes

Performance of IMPALA

Applications Arabidopsis thaliana chromosome IV
3744 predicted protein coding genes roughly 30% are known proteins or strongly similar to known proteins multi-cellular organisms has higher all-alpha and smaller mixed alpha/beta structural domains ratio to unicellular species Assembled human transcripts human UniGene subjected PEDANT analysis, compare over contigs this MySQL DB is close to 8GB acceptable query time show the suitability of PEDANT for large-scale EST sequencing projects Analysis of the GroEL substrates GroEL: a common E.Coli chaperonin structural motif common in 52 substrates relying on GroEL for folding in vivo : two or more alpha/beta domains involving buried beta-sheets with large hydrophobic surfaces--easy aggregation

Classification of predicted genes
Classification by the degree of homology to functionally characterized proteins based on BLAST scores

Summary and Outlook PEDANT is a useful tool for genome annotation and bioinformatics research It can automated and manual assignment of gene product to functional and structural categories extensive hyperlinked protein report and advanced viewers Outlook better decision rules need to be employed manually annotate predicted genetics eelments(ex. LTRs) supporting Oracle RDBMS automatic gene prediction pipeline for higher eukaryotes interactive capabilities

Functional and structural genomics using PEDANT

Similar presentations

Presentation on theme: "Functional and structural genomics using PEDANT"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Functional and structural genomics using PEDANT

Similar presentations

Presentation on theme: "Functional and structural genomics using PEDANT"— Presentation transcript:

Similar presentations

About project

Feedback