Presentation is loading. Please wait.

Presentation is loading. Please wait.

BIOINFORMATICS Summary

Similar presentations


Presentation on theme: "BIOINFORMATICS Summary"— Presentation transcript:

1 BIOINFORMATICS Summary
Mark Gerstein, Yale University gersteinlab.org/courses/452 (last edit in fall '06, handout version, including in-class changes)

2 Used in class M11 [2006,12.06]

3 You'll Forget… [From S Harris's Science Cartoons,

4 … So I'll distill

5 What is Bioinformatics?
(Molecular) Bio - informatics One idea for a definition? Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale. Bioinformatics is “MIS” for Molecular Biology Information. It is a practical discipline with many applications.

6 Data Types Sequences Structures Functional Genomics

7 "Core" Bioinformatics Core Stuff
Computing with sequences and structures protein structure prediction biological databases and mining them New Stuff: Networks and Expression Analysis `Will teach these in CS 545 (Data Mining) next semester Fairly Speculative: simulating cells

8 Hierarchical Structure of Course Information
Memorize the previous summary Good familarity with main points in lectures (quizzes) Rest of overheads and readings for reference on projects and …

9 Cross-cutting Themes Algorithms for Comparison Predictions
Dynamic programming Different measures of similarity (RMS vs. Structural similarity; PAM & Blossum vs %ID) Generalized similarity matrix in threading Statistical scoring schemes (with P-values) For sequences, structures, sequence to structure, and even expression data Time complexity of the comparisons Predictions LOD scores (# with features / expectation ) Progressive more complex features Amount of features information IN vs. prediction OUT Testing against benchmarks with cross-validation (sec. struc. prediction, seq. comparison scoring, datamining) Other methods, need for heuristics

10 Cross-cutting Themes Increasing the chemically reality and complexity of genes Character strings, fold (just CAs), volumes and surfaces from all atom representation, energy and minimization, dynamics (time and velocity) Simulation Vector configuration boiled down a scalar E through potential Compute intensive exploration of configurations (MC, MD) Averages over correctly weighted configurations Importance of simplification The Survey Mode Collecting information in DB tables Importance of integration and interoperation Organizing it around "part" classifications Surveying it for useful statistics (taking into account biases) Doing datamining to find more tenuous relationships

11 Anti-Themes

12 Depth v Breadth

13 Historical Perspective
1980 2005 2000 1990 1985 1995 Single Structures Modeling & Geometry Forces & Simulation Docking Sequences, Sequence-Structure Relationships Alignment Structure Prediction Fold recognition Genomics Dealing with many sequences Gene finding & Genome Annotation Databases Integrative Analysis Expression & Proteomics Data Datamining Simulation again….

14 (from CooperToons, http://members.aol.com/ChipCooper/cartoon26.html)


Download ppt "BIOINFORMATICS Summary"

Similar presentations


Ads by Google