Bioinformatics: Computing Perspective Primary source: Beginning Perl for Bioinformatics by James Tisdall
Definition and Example [text] Bioinformatics is the application of computational tools and techniques to the management and analysis of biological data. [example] Does an interesting segment of mouse DNA hold a clue to the development of fatal brain tumors in humans?
more on the example After sequencing the DNA, you search online data sources using web-based sequence alignment tools. This gives some related sequences but not a direct match to the suspected link to brain tumors. However, public genetic databases are growing daily at a rapid rate. Checking daily would be best, and Perl allow you to do so, comparing the results to detect changes and ing you if there is a change.
Wikipedia definition (en.wikipedia.org) Bioinformatics and computational biology involve the use of techniques including applied mathematics, informatics, statistics, computer science, artificial intelligence, chemistry, and biochemistry to solve biological problems usually on the molecular level. Research in computational biology often overlaps with systems biology. Major research efforts in the field include sequence alignment, gene finding, genome assembly, protein structure alignment, protein structure prediction, prediction of gene expression and protein- protein interactions, and the modeling of evolution.applied mathematics informaticsstatisticscomputer scienceartificial intelligencechemistrybiochemistry biologicalmolecular systems biologysequence alignmentgene findinggenome assemblyprotein structure alignmentprotein structure predictiongene expressionprotein- protein interactionsevolution
Organization of DNA DNA is polymer composed of four molecules (called bases or nucleotides): adenine (A) [originally found in the glands] cytosine (C) [originally found in the cell] guanine (G) [originally found in guano] thymine (T) [originally found in the thymus] Bases joined end to end form a single strand of DNA.
More about DNA In the cell, DNA usually appears in a double-stranded form, with two strands wrapped around each other in a double helix shape. The two strands have matching bases, known as base pairs. An A on one matches with a T on the other, and a G is always paired with a C. Reverse complement is used to describe the relationship of the bases on the 2 strands.
Biological study types in vitro means “in glass” (test tube) in vivo means “in life” (living organism) in silico refers to biological studies done on the computer Experimental data to be collected, searched, and analyzed usually requires the use of computers to manage the information. Computer simulation is another important tool in studying important biological problems.
Perl Developed by Larry Wall in 1987 Popular language for bioinformatics and web programming Works well with ASCII text files (flat files) Designed to make it easy for one program to control other programs Supports rapid prototyping Portable
Larry Wall and Perl Wall continues to oversee further development of Perl and serves as the Benevolent Dictator for Life of the Perl project. His role in Perl is best conveyed by the so-called 2 Rules, taken from the official Perl documentation:Benevolent Dictator for Life Larry is always by definition right about how Perl should behave. This means he has final veto power on the core functionality. Larry is allowed to change his mind about any matter at a later date, regardless of whether he previously invoked Rule 1. Got that? Larry is always right, even when he was wrong.
Humor and Perl Wall along with Randal L. Schwartz and Tom Christiansen writing in the second edition of Programming Perl, outlined the Three Virtues of a Programmer:Randal L. SchwartzTom ChristiansenProgramming Perl Laziness - The quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful, and document what you wrote so you don't have to answer so many questions about it. Hence, the first great virtue of a programmer. Also hence, this book. See also impatience and hubris. Impatience - The anger you feel when the computer is being lazy. This makes you write programs that don't just react to your needs, but actually anticipate them. Or at least pretend to. Hence, the second great virtue of a programmer. See also laziness and hubris. Hubris - Excessive pride, the sort of thing Zeus zaps you for. Also the quality that makes you write (and maintain) programs that other people won't want to say bad things about. Hence, the third great virtue of a programmer. See also laziness and impatience.
Getting Perl Can be downloaded from Version on CSE machines (J251 and 263) is or will be on Linux and the vanilla open source version
Text appendices Appendix A – Resources Appendix B – Perl Summary (focusing on what is most useful for our purposes in this course)
Programming Edit – Run – Revise (and Save) Program Process –Identify required inputs –Design the program, usually an algorithm that computes the output from the inputs –How will output be handled? (display or file) –Refine the design with more detail –Write the Perl program code and revise until working correctly
Pseudocode and Code getanswer sub getanswer { print “Type in your answer here :”; my $answer = ; chomp $answer; return $answer: }
More pseudocode get the name of DNAfile from the user read in the DNA from the DNAfile for each regulatory element if element is in DNA, then add one to the count print count
Comments Comments are anything on a line after a # sign until the end of the line (only exception is the first line of many Perl programs, something like #!/usr/bin/perl)
Assignment Read Chapters 1-3 of Perl text and skim Appendices A and B at the back of the book Check out Get ready for Chapter 4