Bioinformatics Computing 1 CMP 807 – Day 1 Kevin Galens.

2 Today’s Objectives Overview/Introduction What is Bioinformatics? Molecular Biology Overview Unix Introduction Sequence Alignment Introduction

3 Course Objectives Define Bioinformatics Review/Learn basic Molecular Biology Develop Unix skills Understand: Sequence alignment Gene finding Structure analysis Utilize bioinformatics software Web-based/Local Understand Data Storage Techniques

4 Textbook Developing Bioinformatics Computer Skills Gibas and Jambeck ISBN: 1-56592-664-1 Bioinformatics: Sequence and Genome Analysis David W. Mount ISBN: 0-8769-608-7 O’Reilly Books:

5 Course Requirements 80% Attendance Ask Questions Have Fun

6 Introduction What is Bioinformatics?


8 Introduction Bioinformatics – “the science of using information to understand biology” Combination of: “Wet Lab” sciences:  Biology  Chemistry “Theoretical” sciences:  Physics  Mathematics  Computer Science  Information Technology

9 Introduction What you need to know to be a bioinformaticist Molecular Biology/Biochemistry DNA->RNA->Protein (Central Dogma) Unix Programming Perl Java/C/C++ Python Web Development Database Managment How to adapt

10 Introduction What Bioinformaticists do Create/manage databases DNA/RNA/Protein Sequence/Structure Microarray Phylogenetic Develop computational analysis methods Assist ‘wet-lab’ scientists Usable interfaces Everything

11 Bioinformatics Introduction Questions?

12 Molecular Biology An introduction/Review

13 What is the Central Dogma?

14 What is DNA? Deoxyribonucleic acid Polymer of nucleotides: Adenine (A) Thymine (T) Guanine (G) Cytosine (C) Double Helix (show PDB: 142D)

15 What is DNA Replication? Copy DNA molecule (5’->3’) Cell division Passage of genetic information Propagation/source of mutation

16 What is RNA? Ribonucleic acid Polymer of nucleotides Adenine (A) Uracil (U) – substitute for T Guanine (G) Cytosine (C) Single chain Varied structures (pdb: 1evv –tRNA)

17 What is Transcription? DNA -> complementary RNA Genes – Transcribed DNA

18 What is a protein? Polymer of amino acid Encoded by DNA via mRNA Synthesized at ribosome Enzymatic/structural PDB: 1gzx - hemoglobin

19 What is translation? Protein synthesis mRNA -> protein Ribosome

20 Molecular Biology Review Questions?


22 What is UNIX? Operating System Uniplexed Information and Computing System 1970s – Bell Labs Multiuser Environment

23 Why do we use UNIX? Multiuser abilities Network capabilities Process Chaining Easy Text file manipulation Software development capabilities Free!

24 UNIXploration Command Line Shell – Command line environment Bash (bourne again shell) csh (c-shell) tcsh (improved version of c-shell)

25 Important UNIX Commands man – view the manual page for a given command apropos – search man pages ls – list contents of a directory pwd – report current directory cd – change directory more/less – page through text clear – clear the terminal

26 Important UNIX Commands > - redirect output (standard output) < - redirect input (standard input) | - pipe cat – concatenate files/input to standard output grep – pattern matching from a file cut – remove sections from files find – search for files sort – sort lines of a file

27 Text Editors vi/vim u – undo i – insert A – append : - enter command line :help – view help page :q - quit ZZ – save/quit esc – exit mode x – delete character dw – delete word dd – delete line

28 UNIX Exercise Create ~/software and ~/bin Install BLAST ftp

29 Fundamentals of Sequence Alignment

30 Global Alignment: Needleman-Wunsch What is Global alignment? Uses whole length of both sequences Result: 1 optimal alignment Needleman-Wunsch: Utilize a 2-d matrix Scenario: Align: COELACANTH and PELICAN +1 – Match -1 – Mismatch -1 - Gap

31 Global Alignment: Needleman-Wunsch


33 Resulting alignment: COELACANTH P-ELICAN-- or COELACANTH -PELICAN--

34 Local Alignment: Smith-Waterman What is a local alignment? Find the highest scoring substring No assumption on sequence length Smith-Waterman Use a 2-d matrix Scenario: Align: COELACANTH and PELICAN +1 – Match -1 – Mismatch -1 - Gap

35 Local Alignment: Smith-Waterman


37 Resulting alignment: ELACAN ELICAN

38 Sequence Alignment More sophisticated scoring: Substitution Matrix PAMX (Point Accepted Mutation)  Scaled according to evolutionary distance of closely related proteins  PAM1 = 1% of amino acid positions have changed  PAM250 – most common BLOSUMX (BLOck SUbstitution Matrix)  Scaled according to more distantly related proteins  BLOSUM62 – based on proteins with <=62% identity

39 Sequence Alignment Intro Questions?

