Lecture 1 BNFO 136 Usman Roshan. Course overview Pre-req: BNFO 135 or approval of instructor Python progamming language and Perl for continuing students.

Slides:



Advertisements
Similar presentations
INTRODUCTION TO DNA By the end of this lecture you will know:
Advertisements

Evolution and proteins You can see the effects of evolution, not only in the whole organism, but also in its molecules - DNA and protein For a mutation.
CSE/Beng/BIMM 182: Biological Data Analysis Instructor: Vineet Bafna TA: Yuan Zhao Course Link Course Link.
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
The Chemistry of Life Macromolecules
1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
BNFO 602 Multiple sequence alignment Usman Roshan.
Lecture 1 BNFO 601 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Dynamic programming.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Lecture 1 BNFO 135 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Programs for comparing.
Lecture 1 BNFO 240 Usman Roshan. Course overview Perl progamming language (and some Unix basics) Sequence alignment problem –Algorithm for exact pairwise.
BNFO 602, Lecture 2 Usman Roshan Some of the slides are based upon material by David Wishart of University.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 18: Application-Driven Hardware Acceleration (4/4)
BNFO 602 Lecture 2 Usman Roshan. Sequence Alignment Widely used in bioinformatics Proteins and genes are of different lengths due to error in sequencing.
BNFO 602 Lecture 1 Usman Roshan.
BNFO 240 Usman Roshan. Last time Traceback for alignment How to select the gap penalties? Benchmark alignments –Structural superimposition –BAliBASE.
CSE 182: Biological Data Analysis Instructor: Vineet Bafna TA: Ryan Kelley
Lecture 1 BNFO 601 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Sequence alignment.
1. Primary Structure: Polypeptide chain Polypeptide chain Amino acid monomers Peptide linkages Figure 3.6 The Four Levels of Protein Structure.
Lecture 4 BNFO 235 Usman Roshan. IUPAC Nucleic Acid symbols.
BNFO 235 Lecture 5 Usman Roshan. What we have done to date Basic Perl –Data types: numbers, strings, arrays, and hashes –Control structures: If-else,
Proteins Structures Primary Structure.
© Wiley Publishing All Rights Reserved. Biological Sequences.
Building Blocks of life Molecular Structure: DNA, RNA and amino acids Lecture 3.
Genetic Code All of the information to make a new organism is contained in the chromosomes of the cell. Chromosomes are made of tightly coiled DNA or Deoxyribonucleic.
Proteins and DNA Chapter 3.
CSE 6406: Bioinformatics Algorithms. Course Outline
Insulin: Weight = 5733, 51 amino acids Glutamine Synthetase: Weight = 600,000, 468 amino acids.
The structure of DNA.
Intelligent Systems for Bioinformatics Michael J. Watts
Protein Synthesis (Eukaryotes)
COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering.
DNA and Protein Synthesis A Brief Tutorial. Background DNA is the genetic material. DNA is the genetic material. Sometimes called “the blueprint of.
D. NUCLEIC ACIDS 1.ARE MADE OF THE ELEMENTS C,H,O,N,P.
DNA VISUAL PPT QUIZ #2 CH. 12. Question #1 : Cytosine will form a base pair only with: a. cytosine b. adenine c. thymine d. Katzine e. guanine.
Chapter 11 DNA and GENES. DNA: The Molecule of Heredity DNA, the genetic material of organisms, is composed of four kinds nucleotides. A DNA molecule.
Proteins. Proteins Chains of amino acids Basic structure below:
PROTEINS The final product of the DNA blueprint Hemoglobin.
Introduction to Bioinformatics Algorithms Algorithms for Molecular Biology CSCI Elizabeth White
Protein Structure  The structure of proteins can be described at 4 levels – primary, secondary, tertiary and quaternary.  Primary structure  The sequence.
Gene Expression Role of DNA. Where is DNA? In the chromosomes in the nucleus.
Teaching Bioinformatics Nevena Ackovska Ana Madevska - Bogdanova.
Structure of DNA Notes 12/17. The Double Helix Made up of units called nucleotides, which have three parts.
Lecture 1 BNFO 601 Usman Roshan. Course overview Perl progamming language (and some Unix basics) –Unix basics –Intro Perl exercises –Dynamic programming.
Introduction to molecular biology Data Mining Techniques.
DNA & GENES DNA: the molecule of heredity DNA ultimately determines an organism’s traits. Within the structure of DNA is the complete instructions for.
Data-intensive Computing: Case Study Area 1: Bioinformatics
Lecture 1 BNFO 601 Usman Roshan.
Protein Synthesis and Protein Folding
Bell Ringer On a clean sheet of paper, this will be turned in today.
Protein Synthesis.
. Nonpolar (hydrophobic) Nonpolar (hydrophobic) Amino Acid Side Chains
Macromolecules.
Introduction to Python
Nucleic Acids.
BNFO 602 Lecture 2 Usman Roshan.
BNFO 602 Lecture 2 Usman Roshan.
The Cell Cycle and Protein Synthesis
Protein Structures.
بیوشیمی : پروتئین ها و لیپیدها
DNA and RNA.
DNA:The cells Information system
Biological Chemistry.
The final product of the DNA blueprint
Nucleic and Amino Acids
UNIT A IN YOUR GRADE 9 SCIENCE 9 PROBE TEXTBOOK
BC Science Connections 10
Four Levels of Protein Structure
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Lecture 1 BNFO 136 Usman Roshan

Course overview Pre-req: BNFO 135 or approval of instructor Python progamming language and Perl for continuing students –Some unix basics –Input/output, lists, dictionaries, loops, if-else, counting Sequence analysis –Comparison of protein and DNA sequences –Scoring a sequence alignment –Picking the best alignment from a set –Computing a sequence alignment –Finding the most similar sequence to a query

Overview (contd) Grade: 15% programming assignments, two 25% mid- terms and 35% final exam Recommended Texts: –Introduction to Bioinformatics by Arthur M. Lesk –Beginning Perl for Bioinformatics by James Tisdall –Python Scripting for Computational Science by Hans Petter Langtangen

Nothing in biology makes sense, except in the light of evolution AAGACTT -3 mil yrs -2 mil yrs -1 mil yrs today AAGACTT T_GACTTAAGGCTT _GGGCTTTAGACCTTA_CACTT ACCTT (Cat) ACACTTC (Lion) TAGCCCTTA (Monkey) TAGGCCTT (Human) GGCTT (Mouse) T_GACTTAAGGCTT AAGACTT _GGGCTTTAGACCTTA_CACTT AAGGCTTT_GACTT AAGACTT TAGGCCTT (Human) TAGCCCTTA (Monkey) A_C_CTT (Cat) A_CACTTC (Lion) _G_GCTT (Mouse) _GGGCTTTAGACCTTA_CACTT AAGGCTTT_GACTT AAGACTT

Representing DNA in a format manipulatable by computers DNA is a double-helix molecule made up of four nucleotides: –Adenosine (A) –Cytosine (C) –Thymine (T) –Guanine (G) Since A (adenosine) always pairs with T (thymine) and C (cytosine) always pairs with G (guanine) knowing only one side of the ladder is enough We represent DNA as a sequence of letters where each letter could be A,C,G, or T. For example, for the helix shown here we would represent this as CAGT.

Transcription and translation

Amino acids Proteins are chains of amino acids. There are twenty different amino acids that chain in different ways to form different proteins. For example, FLLVALCCRFGH (this is how we could store it in a file) This sequence of amino acids folds to form a 3-D structure

Protein structure Primary structure: sequence of amino acids. Secondary structure: parts of the chain organizes itself into alpha helices, beta sheets, and coils. Helices and sheets are usually evolutionarily conserved and can aid sequence alignment. Tertiary structure: 3-D structure of entire chain Quaternary structure: Complex of several chains

Getting started FASTA format: >human ACAGTAT >mouse ACGTA >cat AGGTGAAA

Python basics Basic types: –Scalar: number or a string, –Lists: collection of objects –Dictionaries: collection of (key,value) pairs Reading sequences from a file into a data structure Basic if-else conditionals and for loops

Some simple programs How many sequences in a FASTA file? What is the length and name of the longest and shortest sequence? What is the average sequence length? Verify that input contains DNA sequences. Compute the reverse complement of a DNA sequence

Two dimensional lists Represented as list of lists For example the matrix would be stored as [ [23, 4, 1], [12, 5, 12], [1, 6, 20] ]