Welcome to Introduction to Bioinformatics Computing aka BIC1.

Welcome to Introduction to Bioinformatics Computing aka BIC1

Team taught by Rhys Price Jones, Ph.D. –rpjavp@rit.edurpjavp@rit.edu –Bldg. 7B-2250; 5-5866 –Office Hours: Monday, Wednesday, Friday 10- 11am Anne R. Haake, Ph.D. –arh@it.rit.eduarh@it.rit.edu –Bldg. 70-2627; 5-5365 –Office Hours: Tuesday;Thursday 10-11:30 a.m.

The Focus of Bioinformatics Using computers to answer biological questions –Storage –Visualization –Analysis Using computers to figure which biological questions to ask

What is this course about? We will focus on analysis: –We will study techniques for quickly and effectively commandeering computing resources to the solution of problems raised in the realm of biology –We will study algorithms (more on this later..) that underlie many of the popular bioinformatics software packages The majority of these algorithms are concerned with sequence analysis (more on this, too…)

The Context of Bioalgorithms It is important to keep in mind that a mathematically perfect solution to an ideally posed problem may not be the most biologically relevant We need a flexibility, a willingness to rephrase the question, to rethink the process, to adapt and re-adapt

Course Structure 3 Classroom sessions each week to introduce the biological perspective and computational approaches for each biological problem 1 Laboratory session to give you hands-on experience in applying and refining computational methods in the context of biology

Readings Textbook: Introduction to Computational Molecular Biology Setubal and Meidanis, PWS Publishing :An International Thomson Publishing Company. Boston, 1997 ISBN:0-534-95262-3 Papers from the current literature, as assigned Lecture notes and lab manuals as posted and linked to from the course home pagecourse home page Note that, unless otherwise noted, net-based resources should be accessed using Netscape. Other browsers may not be able to correctly interpret the JavaScript code.

Expectations – Computing Background There are skills you should possess in part already, but which will be significantly enhanced by being exercised in this course: –identifying and clearly phrasing a computational problem from a general biological question –locating existing tools –understanding the capabilities and limitations of such tools –rapidly developing, testing and analyzing tools for the solution of such problems if necessary

Computing Background – Specific skills Programming in a language such as Lisp, Perl, Scheme, Java, C, Python, etc. (if in doubt, ask!) Static and dynamic data structures – arrays, lists, trees, etc. Control structures, especially recursion Rapid prototyping, careful version control Understanding of mathematics for: –analysis –proof –modeling

Biological Motivation The fundamental building blocks of life are proteins and nucleic acids 100,000 or so different proteins in a human –Enzymes, structural proteins, transport molecules, antibodies Their properties and interactions are what make us what we are

Biological Motivation Nucleic Acids –DNA and RNA Encode the information necessary to build proteins Pass this information on from generation to generation

Biological Motivation What are proteins? –Polymers of amino acids (20 different) –Sequence of these amino acids (primary structure) determines the protein’s shape (secondary and tertiary structures) –Protein shape and chemical composition of it’s amino acids determine protein function

Figure from W. Gilbert, Ph.D New Hampshire Biotech. Center So…in theory, we can infer protein function if we know the protein sequence

Biological Motivation How do we find out protein sequence so that we can understand structure, function, and ultimately systems biology? State-of-the-art –Can sequence proteins directly this has been technically difficult but is getting better –More often we determine protein sequence from the nucleic acid sequences that encode them

The Central Dogma Hereditary information for a complete individual stored in the DNA,which is self-replicating, and is organized into units of expression (genes) A gene is expressed in 2 steps: DNA is transcribed into RNA RNA is translated into protein

Using DNA Sequence to Discover Protein Information Why do it? Availability of DNA sequence information –Rapid development of DNA sequencing technology –Genomes of many different species have now been sequenced Difficulties? –Data sets are large –Cellular pathway from DNA to RNA to protein can be complicated

Some Genomes E. coli 4.6 x 10 6 bases –Approx. 4,000 genes Yeast15 x 10 6 bases –Approx. 6,000 genes Smallest human chromosome 50 x 10 6 bases Human 3 x 10 9 bases –Approx. 30,000 genes ?

The Computational Approach The nucleotide sequence of a genome contains all information necessary to produce a functional organism Therefore, we should, in theory, be able to duplicate this decoding using computers –what do you think about this?

Why Use Computational Techniques? The datasets are too large to analyze by hand Efficient algorithms are the only way to perform the analyses that we need to answer the biological questions

The Biologists View to Sequence Analysis Many common biological problems can be answered through comparison of DNA sequences

Some Biological Questions Answered Through Sequence Analysis Determine if an interesting DNA sequence has been seen by anyone else Find all the protein coding regions in a genome Infer the function of a new gene from a known one by matching two amino acid sequences Measure the evolutionary distance between species Predict local secondary structure of a peptide sequence, predict protein conformation, predict function Study protein families

The Computer Scientists’ View Problems on biological sequences are string matching problems Operations on strings are standard in the CS algorithm toolbox DNA is a string of A’s, T’s, G’s, C’s

Many Molecular Biology Problems on Sequences Can Be Formulated As String Matching Problems Comparing two or more strings for similarities Searching databases for related strings Looking for new patterns occurring frequently in DNA Reconstructing long strings of DNA from overlapping string fragments And more…

We Will Be Studying Algorithms For: Exact string matching Inexact string matching Sequence alignment problems Multiple alignment problems And more…

Role of Evolutionary Theory Central to computational biology Evolution is descent with modification, driven by: –Diversity: different individuals carry different variants of the basic blueprint –Mutations: DNA sequence can be changed –Selection bias

Role of Evolutionary Theory Related organisms have: –similar DNA –similar protein sequences –similar organization of genes Similar structures tend to have similar functions The bottom line: –evolution is the reason that we can assume similarity is meaningful in computational biology

Welcome to Introduction to Bioinformatics Computing aka BIC1.

Similar presentations

Presentation on theme: "Welcome to Introduction to Bioinformatics Computing aka BIC1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Welcome to Introduction to Bioinformatics Computing aka BIC1.

Similar presentations

Presentation on theme: "Welcome to Introduction to Bioinformatics Computing aka BIC1."— Presentation transcript:

Similar presentations

About project

Feedback