This tutorial is designed to be used in a “follow along” fashion

Slides:



Advertisements
Similar presentations
Support.ebsco.com EBSCOhost Digital Archives Viewer Tutorial.
Advertisements

Microsoft Office XP Microsoft Excel
Microsoft Word 2010 Lesson 1: Introduction to Word.
Managing Grades with Excel Viewing Help To view Help 1.Open Excel on your computer. 2.In the top right hand corner of the Excel Screen type in the.
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
XP New Perspectives on Microsoft Office Access 2003, Second Edition- Tutorial 2 1 Microsoft Office Access 2003 Tutorial 2 – Creating And Maintaining A.
Creating And Maintaining A Database. 2 Learn the guidelines for designing databases When designing a database, first try to think of all the fields of.
CTS130 Spreadsheet Lesson 13 Working with Lists. Copying Data between Workbooks  Use the [Copy ]and [Paste] Buttons  Use the CTRL+[C] and CTRL + [V]
1 After completing this lesson, you will be able to: Format numeric data. Adjust the size of rows and columns. Align cell content. Create and apply conditional.
Working with the Conifer_dbMagic database: A short tutorial on mining conifer assembly data. This tutorial is designed to be used in a “follow along” fashion.
Josh Probert – Yankee A Prototype based on Sierra’s SRS.
Chapter 10: Working with Large Data Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
Pairwise Alignment, Part I Constructing the Values and Directions Tables from 2 related DNA (or Protein) Sequences.
Microsoft Excel By: Dr. K.V. Vishwanath Professor, Dept. of C.S.E,
 Starting Excel 2003  Using Help  Workbook Management  Cursor Management  Manipulating Data  Using Formulae and Functions  Formatting Spreadsheet.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
XP New Perspectives on Integrating Microsoft Office XP Tutorial 2 1 Integrating Microsoft Office XP Tutorial 2 – Integrating Word, Excel, and Access.
XP New Perspectives on Microsoft Access 2002 Tutorial 21 Microsoft Access Tutorial 2 – Creating And Maintaining A Database.
XP 1 Microsoft Access 2003 Introduction To Microsoft Access 2003.
IC 3 BASICS, Internet and Computing Core Certification Key Applications Lesson 11 Organizing the Worksheet.
1. Chapter 10 Managing and Printing Documents 3 Working with Files and Printing You can open multiple documents in Word. When multiple documents are.
ArcGIS: ArcMap Tables. Agenda Opening tables The interface Working with columns Working with records Making selections Advanced table tools ▫Add fields.
Basic Editing Lesson 2.
Basic Editing Lesson 2.
SRS Introductory Course 5/12/ Temporary and permanent sessions - Simple querying - Browsing indices - Standard and extended query forms - User defined.
Fall 2003Sylnovie Merchant, Ph.D. ACCESS Tutorial Note: The purpose of this tutorial is to provide an introduction to some of the functions of ACCESS in.
Lesson 6 Formatting Cells and Ranges. Objectives:  Insert and delete cells  Manually format cell contents  Copy cell formatting with the Format Painter.
FOCUS II Demonstration Simply click the mouse to advance through the presentation. Or; Tap the right arrow key on the keyboard to advance through the slides.
The Excel model for information processing The Excel model is a grid of cells in which items of information are stored and processed. Any information that.
INTRODUCTION TO DATABASES USING MICROSOFT ACCESS Basic Database Terms Create A Database Creating Table Fields Populating a Table Modifying Data Create.
XP 1 Workshop Overview Goal Participants will leave the workshop with some basic Excel skills and the ability to locate and use online resources to continue.
Key Applications Module Lesson 22 — Managing and Reporting Database Information Computer Literacy BASICS.
Extracting Information from an Excel List The purpose of creating a database, or list in Excel, is to be able to manipulate the data elements in ways that.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
 The clipboard is a temporary storage area  The cut or copy commands place information on the CLIPBOARD  There are two types of clipboards: – System.
XP New Perspectives on Microsoft Office FrontPage 2003 Tutorial 5 1 Microsoft Office FrontPage 2003 Tutorial 5 – Creating Tables and Frames.
Lecturer: Dalia Mirghani
Shelly Cashman: Microsoft Word 2016
Chapter 2 Using Spreadsheets.
Converting CSV Files to Excel
Microsoft Excel.
Forms and Reports 09.
The Smarter Balanced Assessment Consortium
CONTENT MANAGEMENT SYSTEM CSIR-NISCAIR, New Delhi
Excel 2010 Screen Symbols.
The Smarter Balanced Assessment Consortium
Microsoft Excel 101.
Tutorial for using Case It for bioinformatics analyses
After completing this lesson, you will be able to:
Central Document Library Quick Reference User Guide View User Guide
Exploring Microsoft Office Access 2007
Microsoft Excel All editions of Microsoft office.
The Smarter Balanced Assessment Consortium
MODULE 7 Microsoft Access 2010
Basic Editing Lesson 2.
The Smarter Balanced Assessment Consortium
Tonya Easterwood HRMS Trainer
Chapter 5 Microsoft Excel Window
Lesson 10: Epic Appointment Scheduling Viewing Provider’s Schedule
HP ALM Defects Module To protect the confidential and proprietary information included in this material, it may not be disclosed or provided to any third.
Navya Thum January 30, 2013 Day 5: MICROSOFT EXCEL Navya Thum January 30, 2013.
Microsoft Office Access 2003
EBSCOhost Digital Archives Viewer
Objectives At the end of this session, students will be able to:
Guidelines for Microsoft® Office 2013
The Smarter Balanced Assessment Consortium
Spreadsheets and Data Management
The Smarter Balanced Assessment Consortium
Presentation transcript:

Working with the Conifer_dbMagic database: A short tutorial on mining conifer assembly data. This tutorial is designed to be used in a “follow along” fashion. You will need to have the Conifer_dbMagic database launched to replicate steps shown in this tutorial. If you do not already have the conifer_dbMagic.jnlp (Java web start) file on your desktop, use the following URL to download and launch the file now: http://ancangio.uga.edu/ng-genediscovery/conifer_dbMagic.jnlp

Upon launching the program, the Assemblies menu will appear. The drop down menu is used to select the species and the assembly that you wish to query.

Each species ID is followed by an extension that identifies the assembler used for de novo transcriptome assembly. For example the _MIRA, _NGEN, and _NBLR extensions indicate that either miraEST, NGen, or Newbler was used to assemble the transcript data, respectively. The three P. taeda libraries are listed slightly differently, i.e. as PtMIRA, PtNBLR1, and PtNGen2 Working with the Conifer_dbMagic database- a short tutorial on mining conifer assembly data. If you do not already have it, cut and paste the following URL to download the conifer_dbMagic. jnlp (Jave web start) file to your desktop: http://ancangio.uga.edu/ng-genediscovery/conifer_dbMagic.jnlp Click the Java Web Start icon and the Assemblies menu will appear (Slide 1). Use the drop down menu to select the species and the assembly that you wish to query. Each species ID is followed by an extension that identifies the assembler used for de novo transcriptome assembly. For example _MIRA, _NGEN, and _NBLR indicate that miraEST, NGen, and Newbler were used to assemble the data, respectively. The three P. taeda libraries are listed slightly differently, i.e. as PtMira, PtNBLR1, and PtNGen2

We have selected the C.atl_MIRA assembly for our example. Now click on the “Submit” button to open the Assembly Display Screen

This is the main Assembly Display panel This is the main Assembly Display panel. Two tabs are on the upper right: Search UniScript and Blast Annotation. Note that C.atlantica is listed in the box on the right under “Genotypes.” Do not click on it. Click on the Submit button in the center of the window. IMPORTANT: Do not click anywhere within the “Select Contigs containing all these genotypes box.” This is a feature that is not utilized in conifer_dbMagic, since there is no genotype information parsed into the database.

We see that there are 30,658 matches found (total clusters) in this assembly. Each of four columns that have been populated can now be sorted (either increasing or decreasing) simply by clicking on the its column header: Num- database numeric identifier UniScript- cluster name UniScript Length- total consensus length in bases Total Seq- total number of sequence reads associated with each cluster (Note that there are “clusters” of one, i.e. singletons.).   *If you want to open a new Assemblies menu to select a different species or a different assembler, you can simply click on “New Display” located at the top left of this window. *In this and in all other windows, the term “UniScript” in column 2 is a legacy term meaning unique transcript, but is simply the contig name (or isotig name in the case of Newbler assemblies) associated with each cluster in the database. To see and sort the list of all clusters in this assembly, simply click the “Submit” button in the upper left center of the window. We see that there are 30,658 matches (total clusters) found in this assembly. (Slide 3). Do not highlight “C. atlantica” above the “Submit” and “Reset” buttons. Each of four columns that have been populated can now be sorted (either increasing or decreasing) simply by clicking on the column header: Num- database numeric identifier UniScript- cluster name UniScript Length- total consensus length in bases Total Seq- total number of sequence reads associated with each cluster (Note that there are “clusters” of one, i.e. singletons.).   *In this and in all other windows, the term “UniScript” in column 2 is a legacy term meaning unique transcript, but is simply the contig name (or isotig name in the case of Newbler assemblies) associated with each cluster in the database. *Note that to open a new Assemblies menu, and select another species or different assembler, one can simply click on “New Display” located at the top left of this Assembly Display window.

To search the assembly by UniScript or Sequence Name, or to filter the assembly by either the UniScript length, or by the number of sequence reads in a cluster, use the “UniScript filters” box seen at the upper left. Two drop down menus are available: First, select UniScript Length “between x,y” and then type in a range of 2000 to 3000 bases. Next, select Number of Sequences >= and type in the number 10. Now click the Submit button

The result is 768 clusters that have consensi between 2000 and 3000 bases, and have at least 10 sequence reads per cluster. Note that all of the column values have also changed to reflect the new query results. Now click twice on the Total Seq column header to sort from highest to lowest values.

After sorting, we will click on the first row to highlight cluster C After sorting, we will click on the first row to highlight cluster C.atlantica_rep_c103, which has the largest number of total reads (303). Next, click on “View Alignment” at the bottom of the window to see the cluster alignment. *Multiple clusters can be selected here and multiple alignment windows can be opened for viewing or comparing several clusters at once.

A new UniScript Alignment window now appears with the consensus sequence shown at the top, and a pileup view of all aligned sequences listed below. Individual sequence read names are seen on the left. The red blocks indicate inconsistencies among the sequenced reads and the consensus sequence (some of these may be interpreted as possible indel/SNP containing reads). The slider bars located on the bottom and right side of the window are used to scroll through the alignment.

Now, return to the Assembly Display window by clicking on it, and then click on “Blast Annotation” at the bottom of the window.

Click “Submit” to see the Blastx returns for the selected contig. The view switches to the Blast Annotation tab (one can also go here directly as will be shown later). The UniScript Name for the cluster we identified in the “Search UniScript” tab has been auto-filled with a database generated ID. Next, click to highlight a target blast database (NCBI NR) in the Select Target Database(s) panel. Click “Submit” to see the Blastx returns for the selected contig. *Note that multiple target blast databases can be highlighted at one time, if so desired.

Here we see the blastx results panel, and we have returned 10 records for the C.atlantica_repC_103 cluster. Just as in the Search UniScript tables, one can sort the blast data table columns by clicking on any column header. Column widths can also be modified by clicking on the dividing line and dragging to the desired width. In any list obtained from the database, e.g. in the Search UniScript or the Blast Annotation tabs, one can highlight contiguous or multiple, separated rows of interest using standard Windows Shift or Ctrl key/mouse click combinations. Use CtrlC to copy a highlighted table or individual rows of table data for pasting into text or Excel files.  

Next, we will click on the “Expect” column and sort the blast data by their expect values. Note that whenever a row is highlighted, the amino acid alignment between the query sequence and the target sequence appears at the bottom of the window, which itself can be scrolled through using the slider bar. Now, click the “Reset” button to clear this query result.

Next, we type in the word “actin” in the Annotation box, and select < from the drop down menu next to Expect Val and type 1e-75 in Expect Val box. Click to highlight the TAIR_9 database. Click Submit

We see that 442 records are returned whose TAIR blast description records contain the term “actin,” and that also have expect values < 1e-75. *Note in the highlighted row that any record, e.g. Num=9, containing the term “actin” in the description is returned, i.e. the word “interacting,” whether it is actually an “actin” gene or not. Also note that up to five different blast records may be returned for any given cluster.

Now, we will sort the blast data by clicking on the “Match Length” column, sorting from highest to lowest values. Next, scroll down and highlight the first entry for ACT1 in the Seq Description column (you will need to increase this column width to see it)- record Num= 46, cluster C. atlantica_rep_c1017) Now click “Search UniScript” at the bottom of the window.

We are returned to the Search UniScript tab and the “UniScript Name(s)” box has been auto-filled with a database generated ID. Click Submit and the information for the ACT1 cluster is returned.

Click to highlight the UniScript row. Now, we can either click to view the alignment of the cluster, as we saw previously, or we can click on “Make Fasta” After clicking Make Fasta, a dialog box appears for selection of either just the consensus sequence, or the consensus sequence plus all individual sequence reads associated with it. The fasta file can then be downloaded to a local directory of choice. As this short demonstration has shown, the Search UniScript and the Blast Annotation features in the Assembly Display panel are interrelated. One can start in either tab, depending on whether you want to begin searching based on assembly metric or blast data, and end up with data the full complement of data for any cluster in any assembly.  

This concludes the conifer_dbMagic tutorial Here are some helpful commands for working in or copying information from java database tables: Ctrl A = all rows selected. Click/Shift/Click = a defined group of rows with the range selected using the mouse. Click/Ctrl/Click = multiple, ungrouped rows selected using the mouse. Ctrl C = copy rows that have been highlighted. )