Histogram of percent identity for significant and insignificant hits

Slides:



Advertisements
Similar presentations
Computer lab exercises #8. Comments on projects worth sharing: 1.Use BLINK whenever possible. It can save a lot of waiting and greatly accelerates explorations.
Advertisements

Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
MCB 5472 Psi BLAST, Perl: Arrays, Loops J. Peter Gogarten Office: BPB 404 phone: ,
Advanced Perl for Bioinformatics Lecture 5. Regular expressions - review You can put the pattern you want to match between //, bind the pattern to the.
MCB 5472 Psi BLAST, Perl: Arrays, Loops, Hashes J. Peter Gogarten Office: BPB 404 phone: ,
Flow of Control. 2 Control Structures Control structure: An instruction that determines the order in which other instructions in a program are executed.
Chapter 12: How Long Can This Go On?
4 1 Array and Hash Variables CGI/Perl Programming By Diane Zak.
Program Errors and Debugging Week 10, Thursday Lab.
5 1 Data Files CGI/Perl Programming By Diane Zak.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
INVITATION TO Computer Science 1 11 Chapter 2 The Algorithmic Foundations of Computer Science.
Representing Data: Constants and Variables
Arrays Chapter 7.
CS203 Lecture 14. Hashing An object may contain an arbitrary amount of data, and searching a data structure that contains many large objects is expensive.
Unit 2 Technology Systems
CSC 211 Java I for loops and arrays.
3.3 Fundamentals of data representation
Microsoft Visual Basic 2008: Reloaded Third Edition
Data Types Variables are used in programs to store items of data e.g a name, a high score, an exam mark. The data stored in a variable is entered from.
Gene plot Frankia ACN vs Frankia CCI3
Arrays: Checkboxes and Textareas
Topics Introduction to Repetition Structures
Arrays An array in PHP is an ordered map
Warm-up Program Use the same method as your first fortune cookie project and write a program that reads in a string from the user and, at random, will.
CS1371 Introduction to Computing for Engineers
Chapter Topics 2.1 Designing a Program 2.2 Output, Input, and Variables 2.3 Variable Assignment and Calculations 2.4 Variable Declarations and Data Types.
Psuedo Code.
Topics Introduction to Repetition Structures
The structure of computer programs
Starter Write a program that asks the user if it is raining today.
While Loops BIS1523 – Lecture 12.
Plot blast.out and blast.out.top
Aeromonas_hydrophila_ML09_119_uid versus Aeromonas_salmonicida A449
While loops The while loop executes the statement over and over as long as the boolean expression is true. The expression is evaluated first, so the statement.
IPC144 Introduction to Programming Using C Week 2 – Lesson 1
Arrays, For loop While loop Do while loop
Exception Handling Chapter 9.
Lecture 3 Functions Simple functions
MODULE 7 Microsoft Access 2010
 .
Chapter 8 JavaScript: Control Statements, Part 2
Testing Hypotheses about Proportions
CIS16 Application Development and Programming using Visual Basic.net
Arrays 6-Dec-18.
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Algorithm Discovery and Design
Chapter 6 Control Statements: Part 2
Building Java Programs
Overloading functions
CIS16 Application Development and Programming using Visual Basic.net
CSCI N207 Data Analysis Using Spreadsheet
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Hash Tables Chapter 12 discusses several ways of storing information in an array, and later searching for the information. Hash tables are a common.
Flow of Control.
CSCI N207 Data Analysis Using Spreadsheet
Programming Logic and Design Fifth Edition, Comprehensive
Plot blast.out and blast.out.top
Building Java Programs
Arrays 2-May-19.
CSC1401 Manipulating Pictures 2
Descriptive Statistics
Chapter 9 Using Decisions to
LCC 6310 Computation as an Expressive Medium
Introduction to Computer Programming IT-104
Module 4 Loops and Repetition 9/19/2019 CSE 1321 Module 4.
Chapter 1: Creating a Program.
Plot blast.out and blast.out.top
Chapter 4: Loops and Iteration
Presentation transcript:

Why is % identity not a good measure for the the quality of a blast hit?

Histogram of percent identity for significant and insignificant hits

Number of identical residues for significant and insignificant hits

GCBias in a window (G-C/(G+C))

Window=100 , printed every 100

Window=1000 , printed every 100

Window=10000 , printed every 100

Cumulative GCSkew SUM(C-G) measured along the genome from the ORI

Part of script to calculate cumulative GC bias

Cumulative Strand Bias SG0

Tetramer bias for Thermus thermophilus SG0 Oligonucleotide bias (how often does an oligonucleotide occur on one strand minus occurrence on the other strand) Tetramer bias for Thermus thermophilus SG0

Within genome recombination

Aeromonas_hydrophila_ML09_119_uid205540 versus Aeromonas_salmonicida A449 About 26 recombination events equidistant to the ORI

Aeromonas_veronii B565 versus Aeromonas_hydrophila_ML09_119_uid205540

Two Reasons for Recombination Patterns in Microbial genomes: A) Recombination events occur at the time of replication Figure 2. Rearrangement of gene order by translocation of genes across the replication axis. A hypothetical ancestral gene order is indicated (left). After passage of the replication forks (triangles), genes C and D have exchanged positions with W and X by translocation across the replication axis (vertical dashed line) in the descendant genome. For simplicity, the diagram shows a reciprocal translocation that might occur in a single round of replication through two reciprocal recombination events. The diagram does not specify a mechanism for the translocation of genes, which may also occur in several steps as a series of recombination events in separate rounds of replication through intermediate genome organizations. The two replication forks are proposed to be across the replication axis and physically close together, promoting translocation of sequences at the forks. Numbers indicate the percentage of distance from the origin.

Two Reasons for Recombination Patterns in Microbial genomes: B) Recombination events that do not occur in a symmetric fashion result in misplaced Genome Architecture IMparting Sequences (AIMS) dnaA boxes one way signals for the replication fork, ter sites , tus binding

Selection for Chromosome Architecture in Bacteria

Go through array-scalar_example.pl

Past Assignments

From Wednesday: For the following array declaration @myArray = ('A', 'B', 'C', 'D', 'E'); what is the value of the following expressions: $#myArray
length(@myArray)
$myArray[1]
$n=@myArray
reverse (@myArray) 4 1 B 5 EDCBA Run and discuss class03_answers.pl (see scripts folder) or here and here

%GC counter, part A: read in seqs

%GC counter, part B: move seqs to array

%GC counter, part B: calculate %GC

G-C/(G+C) in a window

Control structures: Sum 1..50 while ( ) { } for ( , , ) { }

Control structures: Sum 1..50 foreach ( ) { }; Infinite loop with last: while () { if( ) {last}; };

Control structures: Sum 1..50 while (defined ( )) { }; for ( , , ) { } Counting elements of an array Could have started at 0

Hashes are tables that relate keys and values. (in the array the number of the field could be considered the key: @a=(1..51) => $a[0]=1, $a[50]=51 ) In a %ash the entry for the key is the address where the value is stored. E.g., you could have a hash where the students age is stored as value and the student ID is the key. But you also could use the students name as key and the ID or age or …. as value. This works very economically, especially if the table is sparse. my (%studentID, %student_first_name, %studentGPA);$studentID{gogarten}=9999; $student_first_name{gogarten}=‘Johann Peter’; $studentGPA{gogarten}=“nd”; In many instances you need to make sure that the key you want to use has not yet been assigned. If (exists ($studentID{gogarten}) {}; Go through class04_2018.pl http://gogarten.uconn.edu/mcb5472_2018/class04_2018.pl http://gogarten.uconn.edu/mcb5472_2018/gi_list.txt

For Next Monday Write a script that reads in a sequence and prints out the reverse complement. Modify your script to that it can handle a sequence that goes over several lines. Background: $comp =~ tr/ATGC/TACG/; #translates every A in $comp into a T; every T into an A; every G into a C and every C into a G Read P10 From strings to arrays and back Read P 14 on hashes, write the program suggested in the chapter (P14.1-P14.3) Solve the tasks given in class04_2018.pl

For Next Monday (cont.) Do the following statements evaluate to true or false? (Check P5) 1 0 && 1 0||1 45 45-45 45/45 45==45 45<=50 $a=45 50 <=> 70 70 <=> 70 70 <=> 50 45!=45 45!=50 from http://korflab.ucdavis.edu/Unix_and_Perl/current.html#part3

From P6 in http://korflab.ucdavis.edu/Unix_and_Perl/current.html#part3

Most of the smaller assignments should be solvable within half an hour Most of the smaller assignments should be solvable within half an hour. Using the notes, the text book and the internet try to solve one problem for not more than one hour. Then ask me or Dyanna for help. In total, the assignments for one week might take a few hours, but if it goes beyond 3 hours total, ask for help, or hand in the latest version of your attempt to solve the assignment. Sometimes, a little help can go a long way. The main reason for the assignments is to make you actually write code and to learn form the mistakes you make.