Project No. 7 Structural Genomics of the RGS Protein Family: Development of a Public Web-based Informatics Database Resource Dahai Gai Samuel Kalet Hongbo.

Slides:



Advertisements
Similar presentations
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Advertisements

Cell signaling: responding to the outside world Cells interact with their environment by interpreting extracellular signals via proteins that span their.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Profiles for Sequences
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Structural bioinformatics
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
16 March Identification of RNAi-Related Genes in Archaea David M. Ng BME 230.
Bio 465 Summary. Overview Conserved DNA Conserved DNA Drug Targets, TreeSAAP Drug Targets, TreeSAAP Next Generation Sequencing Next Generation Sequencing.
Multiple sequence alignments and motif discovery Tutorial 5.
Multiple sequence alignment
Sequence Alignment III CIS 667 February 10, 2004.
HMMER tutorial 羅偉軒 Account IP: Account: binfo2005 Password: 2005binfo.
The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
“Multiple indexes and multiple alignments” Presenting:Siddharth Jonathan Scribing:Susan Tang DFLW:Neda Nategh Upcoming: 10/24:“Evolution of Multidomain.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
BioMapper Bioinformatics Workflow Tool Cognitive Walkthrough 1 st November 2010.
Lab7 QRNA, HMMER, PFAM. Sean Eddy’s Lab
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
BioPerf: A Benchmark Suite to Evaluate High- Performance Computer Architecture on Bioinformatics Applications David A. Bader, Yue Li Tao Li Vipin Sachdeva.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
12.3 DNA, RNA, and Protein Objective: 6(C) Explain the purpose and process of transcription and translation using models of DNA and RNA.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Protein and RNA Families
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Bioinformatics – NSF Summer School 2003 Z. Luthey-Schulten, UIUC.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Lab7 Twinscan, HMMER, PFAM. TWINSCAN TwinScan TwinScan finds genes in a "target" genomic sequence by simultaneously maximizing the probability of the.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
RBP1 Splicing Regulation in Drosophila Melanogaster Fall 2005 Jacob Joseph, Ahmet Bakan, Amina Abdulla This presentation available at
Finding new nirK genes in metagenomic data
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Copyright OpenHelix. No use or reproduction without express written consent1.
(H)MMs in gene prediction and similarity searches.
1 Multiple Sequence Alignment(MSA). 2 Multiple Alignment Number of sequences >2 Global alignment Seek an alignment that maximizes score.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
HISPIG – A Discriminative Model Refinement Approach with Iterations for Detecting Regulatory Regions Takuma Tsukahara
Bioinformatics Research Overview Li Liao Develop new algorithms and (statistical) learning methods > Capable of incorporating domain knowledge > Effective,
A knowledge-based approach to integrated genome annotation Michael Brent Washington University.
Graduate Research with Bioinformatics Research Mentors Nancy Warter-Perez, ECE Robert Vellanoweth Chem and Biochem Fellow Sean Caonguyen 8/20/08.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
BME435 BIOINFORMATICS.
bacteria and eukaryotes
Sequence based searches:
Genome Annotation Continued
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Genome organization and Bioinformatics
1 Department of Engineering, 2 Department of Mathematics,
Dr Tan Tin Wee Director Bioinformatics Centre
Sequence Based Analysis Tutorial
Presentation transcript:

Project No. 7 Structural Genomics of the RGS Protein Family: Development of a Public Web-based Informatics Database Resource Dahai Gai Samuel Kalet Hongbo Yang May

Signal Transduction Nucleus DNA replication RNA synthesis New mRNA New Protein X

RGS in G protein signaling regulation G  -GTP: active form G  -GDP-  : inactive form

Mechanism of RGS Regulation RGS: increases turnover rate of intrinsic GTPase activity of G  subunits resulting in increased “off” rate, therefore decreased signal

RGS proteins & RGS domain

RGS domain 9  helices

RGS-Web (Interface) Genomic Data Interaction Data Structural Data Informatics Engine Expression Data Visualization Simulation Engine RGSdb RGS Web

24 human genome Predicted genes Potential RGS proteins RGS HMM model Known RGS protein sequences Genescan Prediction of to-be-discovered RGS proteins Subfamily HMM models Subfamily allocation HMM model RGS subfamilies protein sequences Function prediction and biological test ClustalW

Human genomic data

Gene finding: troubleshooting Trouble: Genescan cannot handle sequences larger than 5M Solving: Split long sequence to multiple short sequences Trouble: If split happen in a gene, genescan will miss it Solving: Make 10kb overlap between split sequences Trouble: Long unknown sequences (Ns) slow down genescan Solving: Replace any long Ns (>0.5kb) with 0.5kb Ns

Gene finding: results Tool:Genescan Speed:0.1~0.2s/kb CPU usage:75% 7.5G Machine:Hydra.capsl Mem. Usage: Total running time:3 days

Covered by Gai’s talk 24 human genome Predicted genes Potential RGS proteins RGS HMM model Known RGS protein sequences Genescan Prediction of to-be-discovered RGS proteins Subfamily HMM models Subfamily allocation HMM model RGS subfamilies protein sequences Function prediction and biological test ClustalW

Multiple sequence alignment n ClustalW n Clustal W is a general purpose multiple alignment program for DNA or proteins. n Multiple alignments are carried out in 3 stages u all sequences are compared to each other (pairwise alignments) u a dendrogram (like a phylogenetic tree) is constructed, describing the approximate groupings of the sequences by similarity u the final multiple alignment is carried out, using the dendrogram as a guide.

HMM training n HMMer n Using RGS domain sequences have found to train the HMM. n Two set of source data: u Set A: Only those RGS which begin the protein sequence u Set B: all RGS domain sequence in proteins u Elements in set A have high similarity while those in set B have low similarity

HMMer usage n Hmmbuild u Input: aligned sequences u output: the hidden Markov model n hmmcalibrate u work on the HMM to improve the E-value sensitivity n hmmsearch u Input: the built HMM, the target protein sequence u output: the domains found, position of the domain, score and E-value

HMM search result

Q: is there correlation between length and # of RGS Q: density(affinity) of the RGS, metric?

Summary of HMMer result

HMMer result summary in detail

Reasons for the miss n Genome sequence not complete or has error in it. n Genescan prediction is not 100% accurate n... possible reasons, need further investigation.

Covered by Gai’s talk 24 human genome Predicted genes Potential RGS proteins RGS HMM model Known RGS protein sequences Genescan Prediction of to-be-discovered RGS proteins Subfamily HMM models Subfamily allocation HMM model RGS subfamilies protein sequences Function prediction and biological test ClustalW

Subfamily identification n Build HMM for each subfamily(A - F) n Use each HMM to search the to-be- discovered RGS with high score n Result u chr1-4901, subfamily A u chr4-5038, subfamily A

Summary n Integrated tools u gene scan u ClustalW u HMMer n Our framework works in finding genes, performing multiple sequence alignment, building HMM and search to-be-discovered RGS domain in protein sequences.

References n De Vries, L., Zheng, B., Fischer, T., Elenko, E., Farquhar, M. G. (2000). The regulator of G protein signaling family. Annu. Rev. Pharmacol. Toxicol. 40: n De Vries, L., and Farquhar, M. G. (1999). RGS proteins: more than just GAPs for heterotrimeric G proteins. Trends. Cell. Biol. 9(4): n Zheng, B., De Vries, L., and Farquhar, M. G. (1999). Divergence of RGS proteins: evidence for the existence of six mammalian RGS subfamilies. Trends. Biochem. Sci. 24(11):411-4 n Berman, D. M., and Gilman, A. G. (1998) Mammalian RGS Proteins: Barbarians at the Gate. J. Biol. Chem : n Dohlman, H. G., and Thorner, J. (1997) RGS Proteins and Signaling by Heterotrimeric G Proteins. J. Biol. Chem :