Friday July 18 th Update “WesList” proteins Wes and Mark? identified a number of potential targets based upon the presence of patents – the enzyme name,

Slides:



Advertisements
Similar presentations
Evolution of genomes.
Advertisements

Proteomics Examination Yvonne (Bonnie) Eyler Technology Center 1600 Art Unit 1646 (703)
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Molecular Genetics DNA RNA Protein Phenotype Genome Gene
Proteomics and “Orphan” Receptors Yvonne (Bonnie) Eyler Technology Center 1600 Art Unit 1646 (703)
1 Single Nucleotide Polymorphisms (SNP) Gary Jones SPE, Technology Center 1600 (703)
Basics of Comparative Genomics Dr G. P. S. Raghava.
MainLabMeeting_PingZheng_ Ran the fgenesh on the large contigs from the matina_1_6_RNA dataset and performed BLAST the Putative genes against.
Profiles for Sequences
Emmanuel Oluseyi Ogunniyi Student no: OGU3752S BSc Pharmacology.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
Protein Modules An Introduction to Bioinformatics.
Similar Sequence Similar Function Charles Yan Spring 2006.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Recap Don’t forget to – pick a paper and – me See the schedule to see what’s taken –
Protein Structures.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Development of Bioinformatics and its application on Biotechnology
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Library screening Heterologous and homologous gene probes Differential screening Expression library screening.
BME 110L / BIOL 181L Computational Biology Tools October 29: Quickly that demo: how to align a protein family (10/27)
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
EXPLORING DEAD GENES Adrienne Manuel I400. What are they? Dead Genes are also called Pseudogenes Pseudogenes are non functioning copies of genes in DNA.
Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Muhammad Awais PhD Biochemistry 08-ARID-1103 Understanding Basic Local Alignment Search Tool.
1 Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine Chenghai Xue, Fei Li, Tao He,
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Complaints The Policy Company Limited ©. Policy Complaints are encouraged and welcomed as a way of ensuring that any dissatisfaction with the quality.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
1 Improve Protein Disorder Prediction Using Homology Instructor: Dr. Slobodan Vucetic Student: Kang Peng.
Cool BaRC Web Tools Prat Thiru. BaRC Web Tools We have.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Protein Homologue Clustering and Molecular Modeling L. Wang.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
SAGExplore web server tutorial. The SAGExplore server has three different modules …
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Copyright OpenHelix. No use or reproduction without express written consent1.
Bioinformatics Dipl. Ing. (FH) Patrick Grossmann
Finding genes in the genome
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Genes in ActionSection 1 Section 1: Mutation and Genetic Change Preview Bellringer Key Ideas Mutation: The Basis of Genetic Change Several Kinds of Mutations.
Plasmodium falciparum (3D7) - published in Draft coverage. No sequence updates for a year. No new annotation since? Leishmania major Friedlin - version.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
The Bovine Genome Sequence: potential resources and practical uses. Nicola Hastings, Andy Law and John L. Williams * * Department of Genetics and Genomics,
Bio/Chem-informatics
Basics of Comparative Genomics
Proteins!!! More than just meat.
BLAST.
Identify D. melanogaster ortholog
Protein Structures.
What do you with a whole genome sequence?
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Systems Update
Protein domains Jasmin sutkovic
Basics of Comparative Genomics
Basic Local Alignment Search Tool
Figure 1a. Insertion of sequence into Claudi capsid gene
Presentation transcript:

Friday July 18 th Update “WesList” proteins Wes and Mark? identified a number of potential targets based upon the presence of patents – the enzyme name, PlasmoDB ID and number of patents found were provided. It was decided that a round of target selection would be carried out to identify P. falciparum targets which might have medical relevance. I have cross-referenced this to the available P. falciparum proteins in the SGPP Target Selection database: 314 proteins were identified Selection was carried out based upon: 0 predicted TM regions Homology to PDB proteins – 55% identity threshold Homology to human redundant dataset – 55% identity threshold These proteins were also blasted against the T. brucei and L. major proteins currently present in the SGPP Target Selection database in order to identify a homologous protein dataset for target selection in these species.

Following exclusion of proteins based upon “normal” selection procedures as well as the previously discussed homology exclusions, 135 P. falciparum proteins were identified as possible targets and sent to Chris. 73 of these were identified as previously having been targeted – most likely from the previously carried out enzyme selection. A variety of data was sent to Chris, as well as the more usual sequence, length etc. Matches vs redundant human proteins match (>55% identity excluded) Matches vs PDB match (>55% identity excluded) Matches vs Structural Genomics Targets match Enzyme Name; Nbr patents; Priority This should allow him to select 96 targets from the list of 135. I did not carry out selection based upon the size thresholds in order to allow Chris to obtain 96 proteins, if necessary including a few above the 850 amino acid threshold. The average length in this set is 764; 51 proteins are above the normal size threshold of 850; 15 are longer than 1,000 amino acids. Chris will send me back a list of the proteins selected and will also send this list directly to Frank for inclusion in the WebPages and in the DB being maintained there.

It was also decided that we should attempt to obtain homologous proteins for this set for both L. major and T. brucei – I have primarily been concentrating upon T. brucei target selection. 93 T. brucei “WesList” homologues were identified for T. brucei – unfortunately the majority of those identifiable based upon T. brucei sequences already present in the SGPP target selection database were incomplete and therefore unusable. New T. brucei sequences have been downloaded from a variety of sources: 9,782 proteins are currently being parsed (this list is redundant), it appears that continuing sequencing and reannotation has provided an increased number of proteins with start and stop codons – I am waiting to complete parsing and analysis of this set before I send the “WesList” T. brucei homologue list to Chris. This will also provide T. brucei targets that are not homologous to the “WesList” set.

Few suitable homologues for the “WesList” set were identified for L. major. The L. major genome project was frozen as of the beginning of this month, and I am now (together with Chris peacock at the Sanger Institute) working on the reannotation of previously incomplete L. major chromosome sequences, together with annotation of a number of L. major chromosomes which were not included in previous L. major SGPP selections, primarily due to concerns of the quality of annotation, arising from identification of large numbers of incomplete (no start or stop codon) protein amino acid sequences (5 chromosomes worth). At SBRI we have been working on redeveloping our annotation database – these changes have been reflected in the SGPP Target Selection database and allow us to populate both databases with information on the newly identified L. major proteins. I will then identify “WesList” homologues. This will obviously allow target selection for L. major proteins in general. A new set of L. major targets can be sent to Rochester by next week.

I have obtained the progress information for Targets but we have not begun any analysis aimed at identifying correlations between sequence features and “clonability”, “expressability’ etc.