Introduction to Gene Mining Part B: How similar are plant and human versions of a gene? After completing part B, you will demonstrate How to use NCBI BLASTp.

Slides:



Advertisements
Similar presentations
DNA BLAST Lab.
Advertisements

© Wiley Publishing All Rights Reserved. How Most People Use Bioinformatics.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Scaffold Download free viewer:
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
LESSON 2 FLORAL DEVELOPMENT. Warm Up 1.When do plants normally flower? 2.What are some factors that you think plants use to decide that it is time to.
Thanks for volunteering for our study. Your chart says you have problems eating, facial weakness and overall poor muscle tone. Looks like your mother had.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Metagenomic Analysis Using MEGAN4
An Introduction to Bioinformatics
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
How to set up a reverse genetics experiment with an Arabidopsis thaliana mutant Mining Phenotypes 1.
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein Sequence Alignment and Database Searching.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Copyright OpenHelix. No use or reproduction without express written consent1.
Networks and Interactions Boo Virk v1.0.
Introduction to Gene Mining Part A: BLASTn-off! After Part A you will demonstrate your ability to: Use the bioinformatics NCBI Gene and BLASTn tools to.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Copyright OpenHelix. No use or reproduction without express written consent1.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Copyright OpenHelix. No use or reproduction without express written consent1.
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
1 P6a Extra Discussion Slides Part 1. 2 Section A.
Organizing information in the post-genomic era The rise of bioinformatics.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
BioInformatics Database of Primer Results In order to help predict the way proteins will act in an organism, biologists cross-examine sequences of amino.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
I NTRODUCTION TO DATABASES - P RACTICAL. Q UERY S EQUENCE >my weird new protein MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRT.
This tutorial will describe how to navigate the section of Gramene that provides descriptions of alleles associated with morphological, developmental,
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Bioinformatics and Computational Biology
By Chris Paine Genes Essential idea: Every living organism inherits a blueprint for life from its parents. Genes and.
Copyright OpenHelix. No use or reproduction without express written consent1.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
DNA makes RNA  Transcription RNA makes Proteins  Translation Information flows from genes  proteins – But not the other way! (usually)
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
Welcome to the combined BLAST and Genome Browser Tutorial.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Bioinformatics What is a genome? How are databases used? What is a phylogentic tree?
Bacterial infection by lytic virus
Using BLAST to Identify Species from Proteins
Bacterial infection by lytic virus
Using BLAST to Identify Species from Proteins
Genome Center of Wisconsin, UW-Madison
BLAST.
Welcome to the Quantitative Trait Loci (QTL) Tutorial
Annotation Presentation
Basic Local Alignment Search Tool
3.1 Genes Essential idea: Every living organism inherits a blueprint for life from its parents. Genes and hence genetic information is inherited from.
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Using BLAST to Identify Species from Proteins
Welcome - webinar instructions
Presentation transcript:

Introduction to Gene Mining Part B: How similar are plant and human versions of a gene? After completing part B, you will demonstrate How to use NCBI BLASTp and data to determine whether Arabidopsis thaliana and human muscle protein genes and gene products are homologous. 1

The Arabidopsis Information Portal is funded by a grant from the National Science Foundation (#DBI ) and co-funded by a grant from the Biotechnology and Biological Sciences Research Council (BB/L027151/1). These lessons were developed during the summer of 2015 as education outreach for the portal in conjunction with the J. Craig Venter Institute, Rockville, MD, 20850, USA. Contact information General information: Jason Miller, Grant Co-Principal Investigator, JCVI This lesson was prepared by Andrea Cobb, Ph.D. with the help of Margot Goldberg 2

In Part A, our sample question was: Can we study your muscle disease using a plant model? 3

We used the NCBI portal to find names of human muscle genes. 4

We also found the function of human actin-alpha 1 gene ( ACTA1) and asked “ Might plants need that same function?” 5

. We used NCBI BLASTn to search in Arabidopsis thaliana for genes which align to human ACTA1 6

We learned that “alignment” is achieved by using an algorithm that maximizes local matches between two sequences. 7

We learned how to use the BLASTn report scores with Query cover, Ident and the E-values to choose a statistically meaningful alignment. 8

In a group of 3-4 students, examine your gene discovery scorecard and then: Infer characteristics of genes which were in both A. thaliana and humans. Identify characteristics of genes present in humans but not found in plants. 9

What information so far indicates whether or not plants have animal muscle genes? What additional information might you need to be certain whether or not plants have animal muscle genes? 10

Part B: Evaluating homology- How similar are plant and human versions of a gene? 11

Recipes handed down often change 12

Which parts of the recipes were conserved (were almost the same) in all generations’ recipes? Which parts were not conserved? 13

Reasons why a recipe might be changed Discuss in groups and report your ideas. 14

How might you track the passage of a recipe from one generation to the next if you can’t ask the cooks? ? 15

How is a gene like a recipe? Discuss in groups and report your ideas. 16

What features of a gene might make it a version of another gene? Record your answers. v=gCxrkl2igGYhttps:// v=gCxrkl2igGY is a song you might remember. 17

What is homology? What criteria do scientists use to classify particular genes and their protein products as homologs? Explore 18

Homology- a general term describing 2 or more genes which share an ancestral gene How might recipes be “homologous”? 19

To use a plant model for my patient’s disease, I need to find a plant homolog to his ACTA1 gene. We found that the Arabidopsis thaliana ACT7 gene is a version, but is it similar enough to be a homolog? 20

Should we search for homologs using a gene sequence or a protein sequence? 21

The structure of a eukaryotic gene is complex! The amino acid sequence of the protein is more likely to be conserved than the gene sequence Translation (protein synthesis) /lectures/lecture24/lecture24.htm l 22

A BLASTp using the gene product’s amino acid sequence is likely to find protein homologs A BLASTn might find more differences than similarities 23

We will use a protein BLAST tool, BLASTp, to find homologous proteins. We need to first find the protein sequence coded by the human ACTA1 gene on the NCBI protein page. 24

From the ACTA1 protein information page, select FASTA, then copy and paste the amino acid sequence into a Word Document. >gi| |emb|CAG | ACTA1 [Homo sapiens] MCDEDETTALVCDNGSGLVKAGFAGDD APRAVFPSIVGRPRHQGVMVGMGQKD SYVGDEAQSKRGILTLK YPIEHGIITNWDDMEKIWHHTFYNELRV APEEHPTLLTEAPLNPKANREKMTQIMF ETFNVPAMYVAIQA VLSLYASGRTTGIVLDSGDGVTHNVPIYE GYALPHAIMRLDLAGRDLTDYLMKILTER GYSFVTTAEREI VRDIKEKLCYVALDFENEMATAASSSSLEK SYELPDGQVITIGNERFRCPETLFQPSFIG MESAGIHETT YNSIMKCDIDIRKDLYANNVMSGGTTMY PGIADRMQKEITALAPSTMKIKIIAPPERK YSVWIGGSILAS LSTFQQMWITKQEYDEAGPSIVHRKCF Each amino acid is represented by a particular letter 25

Navigate to the BLASTp link on NCBI. 26

Paste the protein sequence for ACTA1 here. Enter Arabidopsis thaliana for the search database. Select blastp and then click on the BLAST button. 27

The BLASTp report is similar to the BLASTn report. Query sequence 28

“Descriptions” shows 4 actins with the same query coverage, E-value and Ident! There appear to be 4 possible homologous proteins but which is most similar to the human ACTA1 protein? 29

There are a number of actin proteins with high Query coverage, very low E-values and high identity. Check them all (for some whose numbers are represented more than once, check the first listing). Then select “Multiple Alignment” to directly compare those sequences. 30

Conserved amino acids are shown in red. Which differences can you find quickly? Can you spot a deletion? Where is an amino acid replaced by a chemically similar type? Where is an amino acid replaced by a chemically different type? 31

Protein sequence homology is analyzed by constructing a Distance tree of results. Check the desired “hits”, then select “Distance tree”. 32

Query—human ACTA1 protein Nodes represent a shared ancestral gene These proteins are all homologs. 33

34

Of the proteins in Arabidopsis thaliana, ACT7 has the highest identity (88%) and lowest E- value (0.0) when compared to human ACTA1. A gene tree program predicts the presence of ancestral genes between ACT7 and ACTA1. Is that sufficient to confirm protein homology for experimental modeling? 35

A more restricted alignment between human ACTA1 and the closest 3 Arabidopsis proteins can check that ACT7 is the protein closest to the ancestral gene. Check Align two or more sequences, then copy and past protein sequences for ACT7, ACT8 and ACT2 into Subject Sequence box. 36

Multiple alignment results for human ACTA1 protein and the 3 closest Arabidopsis proteins. 37

What do the distance tree results indicate? 38

Do you have enough data to use Arabidopsis ACT7 gene as a model for the human ACTA1 gene? Discuss and report your ideas. 39

What criteria from published work indicated that these plant processes and human diseases involved homologous genes or proteins ? 40

Homologous proteins will have: Very low E-values for sequence alignment(<.00001) >25% conserved sequences for >100 aa* Protein-protein interactions of one homolog which are similar to protein-protein interactions of the other homolog Similar co-expression of genes for each homolog Similar Function Gene Ontology (GO terms) Conserved sequences and protein domains * rs2012/form_blast_intro.pdf 41

Let’s find homology information and data about the Arabidopsis ACT7 gene in Use the pull-down menu to access the ThaleMine tool. 42

Enter information about your gene of interest, in this case, ACT7 43

Results show 1 gene, 2 articles and 1 mRNA in the database. We are only interested in studying the gene for now, so we will select the category – Gene or just select the identifier for the gene from the list at right 44

This is the Gene information sheet for the Arabidopsis thaliana ACT7 gene. How did the function listed under Curator Summary compare to your previous prediction? 45

The blue bar under Curator Summary has tabs that take you quickly to that section down the page. Click on the Homology tab. Links to information about human ACT7 homologs. 46

Homologous proteins will have: Very low E-values for sequence alignment (<.00001) >25% conserved sequences for > 100 aa* Protein-protein interactions of one homolog which are similar to protein-protein interactions of the other homolog Similar co-expression of genes for each homolog Similar Function Gene Ontology (GO terms) Conserved protein domains * rs2012/form_blast_intro.pdf 47

Compare the first (human ACTA1) and second (Arabidopsis ACT7) sequences in each alignment and it is evident that many more than 25% of any 100 amino acids in any of the regions align. 48

Homologous proteins will have: Very low E-values for sequence alignment (<.00001) >25% conserved sequences for > 100 aa* Protein-protein interactions of one homolog which are similar to protein-protein interactions of the other homolog Similar co-expression of genes for each homolog Similar Function Gene Ontology (GO terms) Conserved protein domains * rs2012/form_blast_intro.pdf 49

Actin interacts with many proteins zZk 50

ACT7 and ACTA1 proteins each interact with a variety of other proteins. Because the same protein may have a plant name and a different animal name, further investigation is needed to know from this data whether ACTA1 and ACT7 are interacting with identical proteins. Arabidopsis ACT7 interacts with these proteins Human ACTA1 interacts with these proteins 51

Homologous proteins will have: Very low E-values for sequence alignment (<.00001) >25% conserved sequences for > 100 aa* Protein-protein interactions of one homolog which are similar to protein-protein interactions of the other homolog ?? Similar co-expression of genes for each homolog Similar Function Gene Ontology (GO terms) Conserved protein domains * rs2012/form_blast_intro.pdf 52

Co-expression (transcription of 2 or more genes at the same time in the same cell) is required for gene products (proteins) to work together. 0/fpls HTML/image_m/fpls g001.jpg In the image above, two differently colored fluorescent proteins are co-expressed in Arabidopsis. 53

What genes are co-expressed (same time, same location) for ACT7 or ACTA1? Arabidopsis ACT7 is co-expressed with these genes Human ACTA1 co-expression is shown with purple lines. 54 Scientists would need to confirm that the different plant and animal names were actually the same protein.

Homologous proteins will have: Very low E-values for sequence alignment (<.00001) >25% conserved sequences for > 100 aa* Protein-protein interactions of one homolog which are somewhat similar to protein-protein interactions of the other homolog ?? Some similar co-expression of genes for each homolog ?? Some similar Function Gene Ontology (GO terms) Conserved protein domains * rs2012/form_blast_intro.pdf 55

Gene Ontology provides information about biological process, molecular function and cellular location –are any ACT7 GO terms similar to human ACTA1 GO terms? Arabidopsis ACT7 Human ACTA1 56

Homologous proteins will have: Very low E-values for sequence alignment (<.00001) >25% conserved sequences for > 100 aa* Protein-protein interactions of one homolog which are somewhat similar to protein-protein interactions of the other homolog ?? Some similar co-expression of genes for each homolog ?? Some similar Function Gene Ontology (GO terms) Conserved protein domains * rs2012/form_blast_intro.pdf 57

58

Homologous proteins will have: Very low E-values for sequence alignment (<.00001) >25% conserved sequences for > 100 aa* Protein-protein interactions of one homolog which are somewhat similar to protein-protein interactions of the other homolog ?? Some similar co-expression of genes for each homolog ?? Some similar Function Gene Ontology (GO terms) Conserved protein domains * rs2012/form_blast_intro.pdf 59

Members of the Arabidopsis actin family of genes are homologous with each other. Does that mean that the Arabidopsis actins are homologous with human ACTA1? 60

Arabidopsis actin gene ACT7 plays an essential role in germination and root growth The Plant Journal Volume 33, Issue 2, pages , 16 JAN 2003 DOI: /j X x Volume 33, Issue 2, Wild-type, no ACT7 mutation Mutant ACT7+ Wild-type, no ACT7 mutation Mutant ACT7+ We have an ACT7 mutant with an observable phenotype difference compared to the normal wild type. 61

Have we found a suitable plant research model for nemaline myopathy? What additional information would you want? Scientific literature searches for Arabidopsis information are easy to access in  apps  50 years of Arabidopsis research! 62