Pairwise Alignment, Part I Constructing the Values and Directions Tables from 2 related DNA (or Protein) Sequences.

Slides:



Advertisements
Similar presentations
Spreadsheet Software lesson 14. This lesson includes the following sections: Spreadsheet Programs and Their Uses The Spreadsheet's Interface Entering.
Advertisements

Microsoft Office XP Microsoft Excel
Introduction to Powerschool and Excel Jared Schatz Staff Accountant (509)
Loading Excel Double click the Excel icon on the desktop (if you have this) OR Click on Start All Programs Microsoft Office Microsoft Office Excel 2003.
3-dimensional formula A formula that refers to cells in other worksheets.
Collin College Excel Exam Review. True In Excel worksheets, rows are designated using numbers while columns are designated using letters.
Objectives 1.Identify the functions of a spreadsheet 2.Identify how spreadsheets can be used. 3.Explain the difference in columns and rows. 4.Locate specific.
BY: T. KHAWLAH AL-MUTLAQ Excel Web App. Introduction to Spreadsheets 2 A spreadsheet is an electronic file used to organize related data and perform calculations.
Practical Computing by Lynn Hogan. Practical Computing Chapter 8 Creating Spreadsheets (Using Microsoft Excel 2007)
1 CS110: Lecture 2 Spreadsheets Prepared by Fred Annexstein University of Cincinnati CC Some rights reserved Today’s Topics Basics of Excel Spreadsheets.
Introduction to Excel 2007 Bar Graphs & Histograms Psych 209 February 1st, 2011.
Entering Data in Excel. Entering numbers, text, a date, or a time n 1Click the cell where you want to enter data. n 2Type the data and press ENTER or.
Excel Web App By: Ms. Fatima Shannag.
FIRST COURSE Excel Lecture. XP 2 Introducing Excel Microsoft Office Excel 2007 (or Excel) is a computer program used to enter, analyze, and present quantitative.
Lecture 7 Desktop Publishing IV – Spreadsheet Software Introduction to Information Technology With thanks to Dr. A. Zhang, Dr. Haipeng Guo, and Dr. David.
Excel 2007 Part (2) Dr. Susan Al Naqshbandi
Pairwise Alignment, Part II Reconstructing the Path and Calculating the LCS from the Values and Directions Tables.
Working with the Conifer_dbMagic database: A short tutorial on mining conifer assembly data. This tutorial is designed to be used in a “follow along” fashion.
CTS130 Spreadsheet Lesson 7 Using Logical and Financial Functions.
Microsoft Excel. Excel specializes in creating and designing spreadsheets, or worksheets Worksheet – area to insert data Workbook – a set of worksheets.
Lesson 1 – Microsoft Excel The goal of this lesson is for students to successfully explore and describe the Excel window and to create a new worksheet.
CTS130 Spreadsheet Lesson 5 Working with Simple Formulas.
11 Chapter 1: Introduction to Excel What Can I Do with a Spreadsheet Exploring Microsoft Office Excel 2007 Lecture Notes Chapter 01 (CSIT 104)
Key Applications Module Lesson 16 — Excel Essentials Computer Literacy BASICS.
CTS130 Spreadsheet Lesson 3 Using Editing and Formatting Tools.
European Computer Driving Licence Syllabus version 5.0 Module 4 – Spreadsheets Chapter 22 – Functions Pass ECDL5 for Office 2007 Module 4 Spreadsheets.
 Starting Excel 2003  Using Help  Workbook Management  Cursor Management  Manipulating Data  Using Formulae and Functions  Formatting Spreadsheet.
Spreadsheet A spreadsheet is the computer equivalent of a paper ledger sheet. It consists of a grid made from columns and rows. It is an environment that.
Creating your own form from scratch.. To create a custom form, you can modify an existing form or design and create a form from scratch. In either case,
CTS130 Spreadsheet Lesson 4 Working with Cells, Columns, Rows, and Sheets.
IC 3 BASICS, Internet and Computing Core Certification Key Applications Lesson 10 Creating and Formatting an Excel Worksheet.
McGraw-Hill Career Education© 2008 by the McGraw-Hill Companies, Inc. All Rights Reserved. 3-1 Office Excel 2007 Lab 3 Managing and Analyzing a Workbook.
CHAPTER 13 Creating a Workbook Part 1. Learning Objectives Understand spreadsheets and Excel Enter data in cells Edit cell content Work with columns and.
Excel Spreadsheet basics. Excel Sheets and Books  Spreadsheet: tool to analyze, chart and manage data for personal, business and financial use Worksheet:
1 Performing Spreadsheet What-If Analysis Applications of Spreadsheets.
Lesson No:10 Intro. to Excel 2000, Managing & Formatting Worksheet CHBT-01 Basic Micro process & Computer Operation.
Copyright © 2008 Pearson Prentice Hall. All rights reserved. 1 1 Copyright © 2008 Prentice-Hall. All rights reserved. What Can I Do with a Spreadsheet.
Exploring Office 2003 Vol 1 2/e - Grauer and Barber 1 Committed to Shaping the Next Generation of IT Experts. Chapter 1 – Introduction to Excel: What is.
Excel Web App By: T. Khawlah Al-Mutlaq. Introduction to Spreadsheets A spreadsheet is an electronic file used to organize related data and perform calculations.
Microsoft Excel. Excel specializes in creating and designing spreadsheets, or worksheets Worksheet – area to insert data Workbook – a set of worksheets.
IC 3 BASICS, Internet and Computing Core Certification Key Applications Lesson 11 Organizing the Worksheet.
Introduction to Excel Editing Your Workbook.
CA I Excel Lesson 1. √Cells√Components√Other Terms active cell formula bar column cell name box label cell address Go To command row find range Excel.
Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall. 1 by Mary Anne Poatsy, Keith Mulbery, Lynn Hogan, Amy Rutledge, Cyndi Krebs, Eric.
Shannon K. Basher, MLS Houston Academy of Medicine – Texas Medical Center Library.
© 2008 The McGraw-Hill Companies, Inc. All rights reserved. WORD 2007 M I C R O S O F T ® THE PROFESSIONAL APPROACH S E R I E S Lesson 15 Advanced Tables.
Formatting WorksheetsFormatting Worksheets Lesson 7.
Spreadsheet Data Tables Data Table 1 allows you to change one input variable’s value at a time and record the impact that the input assumption has on several.
Chapter 6 Creating, Sorting, and Querying a Table
Worksheets Copyright 2006 South-Western/Thomson Learning.
Excel Web App By: Ms. Fatima Shannag.
Excel Screen Slide 1 Column Row Cell Formula bar Column heading Row heading Worksheet tab.
Chapter 3: Referencing and Names Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
Microsoft Excel. Excel specializes in creating and designing spreadsheets, or worksheets Worksheet – area to insert data Workbook – a set of worksheets.
Excel 2007 Part (3) Dr. Susan Al Naqshbandi
Spreadsheets: Part I Creating a Worksheet in MS Excel
Lesson 6 Formatting Cells and Ranges. Objectives:  Insert and delete cells  Manually format cell contents  Copy cell formatting with the Format Painter.
Overview Excel is a spreadsheet, a grid made from columns and rows. It is a software program that can make number manipulation easy and somewhat painless.
Spreadsheets What is Excel?. Objectives 1. Identify the parts of the Excel Screen 2. Identify the functions of a spreadsheet 3. Identify how spreadsheets.
EXCEL LESSON 1. WHAT IS EXCEL? Excel is a spreadsheet program. A spreadsheet is a grid of rows and columns in which you enter text, numbers, and formulas.
Chapter 3: Referencing and Names Spreadsheet-Based Decision Support Systems Prof. Name Position (123) University Name.
Microsoft Excel Microsoft Excel 2013 is a spreadsheet application in the Microsoft Office Suite. A spreadsheet is an accounting program for the.
COMPUTER LITERACY NOTES MS-EXCEL. SPREADSHEETS A spreadsheet is a computer equivalent of a paper ledger sheet. Excel allows you to create spreadsheets.
Exploring Office Grauer and Barber 1 Committed to Shaping the Next Generation of IT Experts. Chapter 1 – Introduction to Excel: What is a Spreadsheet?
Chapter 6 Modifying Cell Styles
By: T. Khawlah Al-Mutlaq
Formatting a Worksheet
MS-Excel Part 1.
Developing Editing and Formatting Skills
EXCEL Study Guide #2.
Presentation transcript:

Pairwise Alignment, Part I Constructing the Values and Directions Tables from 2 related DNA (or Protein) Sequences

Given a pair of DNA gene sequences – each from a different species – one way to determine how related the species are is to calculate the degree of homology that the sequences share. We assume that the more similar the sequences, the more closely related the species are. We are going to use a technique based on the LCS – Longest Common Subsequence – algorithm. The first part of the algorithm constructs a TABLE of DATA, similar to a multiplication table. However, instead of numbers in the headers (along the top and left sides), we place the letters of each DNA sequence. ATCTGATG TGCATAC Pairwise Alignment of DNA (or Amino Acid) Sequences The problem is complicated because, in the course of evolution, bases may have been inserted into or deleted from the sequence. The algorithm therefore must also take into account how best to align the two sequences for comparison. Before we write a Java program (in the subsequent lesson), we will model this algorithm in an Excel spreadsheet.

Type the labels Seq 1 and Seq 2 in A1 and A2. Center- or Right-Align A1:A2. The actual sequences themselves go in B1 and B2. Left-Align B1:B2. It’s best to use the Paste Special  Values command for the sequences so as not to overwrite the formatting.

The number series 1-10 go in the horizontal range D8:M8 and the vertical range A11:A20. The hard-coded number 1 goes in the first cell of each range (D8 and A11). The formula ’Previous-Cell-Reference’ + 1 can be copied to the remaining cells, as shown below. Use Ctrl+~ to toggle in and out of Formula Mode.

Next we fill in the HEADER row and column with the individual letters of Sequences 1 and 2, respectively. We do this using formulas…

Using the horizontal and vertical 1-10 series, we can EXTRACT individual letters from each of the sequences with the MID() function. This technique allows us to paste different sequences into cells B1 or B2 without having to manually enter each character into the header cells of the table. Note that the formulas con- veniently place blanks in the cells beyond the last letter of each sequence. The 1 st argument is the cell address of the sequence. We use a $ to absolute cell reference either the column (Seq 1) or the row (Seq 2) so that each formula can be copied across or down, respectively. The 2 nd argument is the position of the character we want to extract. The 3 rd argument is the number of characters we want to extract, in this case 1.

Below is the pseudo-code for generating the table data for the LCS algorithm. The algorithm uses a 2-D array, whose cells are referenced by row and column (r,c) as depicted in the grid above. The variable i refers to the rows, The variable j refers to the columns. In the pseudo-code, c[i, j] is the value that is placed in the cell ( r=i, c=j )

We first initialize the cells in the row and column next to the headers to zeroes.

Rather than hard-code each zero, we hard-code only the zero in C10. We use formulas in the other cells so that zeroes appear only in columns and rows where there are header letters. Note again that these formulas conveniently place blanks (rather than zeroes) in the cells beyond the last letter of each sequence. If we paste sequences of different lengths into B1 or B2, we will not have to reprogram these cells. Copy the formula in D10 across  and the formula in C11 down .

We now create a single formula in D11 that we will copy to every other data cell in the table. The formula will use the if … else if … else conditional statements on lines 9-16.

Lines 9-10: If the two bases in the headers are the SAME, then the value of the cell that their row/column combination intersects is 1 plus the value in the cell along the diagonal in the previous row and column, i.e. the value to the cell’s top left. The 2 bases are the same for cell(2,1), i.e. T = T. VALUE in cell (2,1)  VALUE in cell(1,0) + 1 The value in this cell is calculated by adding 1 to the value in the diagonally placed cell (1,0) in the previous row & column. +1

Lines 12-13,15: If the two bases in the headers are DIFFERENT, then the value of the cell that their row/column combination intersects is the larger of the values in the previous row or the previous column, i.e. the values above and to the left. The 2 bases are different for cell(2,2), i.e. G <> T. VALUE in cell (2,2)  MAX(cell(1,2), cell(2,1)) above left The value in this cell is calculated by taking the larger of the 2 values in the cells above and to the left.

When we combine these two conditions, we end up with the formula for D11 below. Copy this formula to all the cells in the table.

A problem emerges after copying the D11 formula to the rest of the table’s data cells. We really want the cells in columns 7-10 and rows 8-10 to be blank. We can accomplish this with an extra IF condition prepended to our formula:

The 1 st Data Table is now complete and should look like the image below.

Constructing the Directions Table The 2 nd Data Table will consist of Direction data indicating one of the 3 neighbor cells (left, above, diagonal) relative to the current cell. For example, if 2 bases are the same, the arrow will point back to the diagonal cell. If the 2 bases are different, we will use either the  or  arrows to indicate the cell above or to the left, depending upon which one was the largest. This information will allow us to backtrack and reconstruct the entire path traveled in calculating a cell’s value. To start, delete the other empty worksheets in your document and RENAME your remaining worksheet Values.

Make an exact copy of the worksheet by right-clicking on the worksheet label (Values). Select Move or Copy from the pop-up menu. In the Move or Copy Dialog: (1) Select (move to end) (2) Check Create a copy Rename the duplicate worksheet: Directions

From Excel’s main menu, select the View menu (1). Select New Window (2), then Arrange All (3). In the Arrange Windows dialog, select Vertical (4). You should now see 2 windows side-by-side. Select the Values worksheet in one and the Directions worksheet in the other.

Every cell in the Directions table will mirror its counterpart in the Values table, using a formula containing just the cell reference / address of the Values table cell. Create a new formula in cell A1 of the Directions table by entering the = sign. In the Directions table, use the Paste Special command to copy only the FORMULA part of cell A1 to the ranges A1:B2, A11:C20, D8:M10, and C10. This way, each cell will keep its unique formatting. Then click on cell A1 in the Values table. The formula =Values!A1 will appear in cell A1 of the Directions table.

Arrow Keys: From the Windows Start Menu, select Accessories  System Tools  Character Map tool to access the Left Arrow (“←” Unicode 2190) and Up Arrow (“↑“ Unicode 2191) characters. Because there is no easy way to depict a diagonal arrow, we’ll use the Backslash character (“\” Unicode 005C / decimal 92) in its place. [Or just copy them from here]. We will now modify the formula in cell D11 in the Directions table For the part of the formula that checks whether the 2 bases are the same, simply substitute “\” into the true part of the IF statement. We will replace the MAX() function in the false part of the IF statement, because the MAX() function does not tell us which cell (left or above) was used. We therefore use another IF statement to check whether the value in the cell above (D10) is >= to the value in the cell to the left (C11). If so, we place “↑“ in the cell. If not, we place “←” in the cell. You can now copy the formula in D11 to the 99 other data cells in the Directions table, the range D11:M20. NOTE: We need to check the D10 and C11 cells in the Values table, since the corresponding cells in the Directions table will be filled with arrows, not numbers.

The completed Directions table should appear as in the image below.

Below is a View of the 2 Completed tables for the Original Test Sequences.

Below is a View of the 2 Completed tables for the Original Test Sequences, where Seq 1 and Seq 2 have exchanged places. Use Paste Special → Values to preserve cell formatting. Make the change in the Values table; the Directions table will automatically reflect the changes.