Download presentation
Presentation is loading. Please wait.
Published byArlene Newton Modified over 6 years ago
1
Impact of Formal Methods in Biology and Medicine
Debashis Sahoo Department of Computer Science CSE291 – H00 – Lecture 1
2
Outline Introduction History of Bioinformatics
Introduction to computing Data collection Experiment design Data analysis
3
Bioinformatics Definition
Biological Data Representation Storage Access Processing bi·o·in·for·mat·ics [bahy-oh-in-fer-mat-iks] noun ( used with a singular verb ) the retrieval and analysis of biochemical and biological data using mathematics and computer science, as in the study of genomes.
6
Professor Donald E. Knuth
The "father" of the analysis of algorithms He is the author of the seminal multi-volume work The Art of Computer Programming. “It is hard for me to say confidently that, after fifty more years of explosive growth of computer science, there will still be a lot of fascinating unsolved problems at peoples' fingertips, that it won't be pretty much working on refinements of well-explored things. I can't be as confident about computer science as I can about biology. Biology easily has 500 years of exciting problems to work on, it's at that level.”
7
Historical perspective
8
History of Bioinformatics
Gergor Mendel (1866, Verhandlungen des naturforschenden Vereins Brünn) 1951 – structure for the alpha-helix and beta-sheet Pauling and Corey (PNAS – 1951) double helix model for DNA Watson and Crick (Nature, 171: , 1953) 1955 – protein sequence of bovine insulin F. Sanger.
9
History of Bioinformatics
1958 – 1990 Revolution in Computer Science and Engineering Computer, , network, internet 1990 – BLAST Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J. (1990) "Basic local alignment search tool." J. Mol. Biol. 215: The Haemophilus influenzea genome (1.8 Mb) is sequenced. 1993 – 2013 – Microarrays 2005 – 2013 – High-throughput sequencing
10
Introduction to computing
11
What is a computer? 1 1 1 1 Controller Turing Machine (1936) Tape
1 1 1 Read/Write head Controller Turing Machine (1936) Alan Turing, "On computable numbers, with an application to the Entscheidungsproblem", Proceedings of the London Mathematical Society, Series 2, 42 (1937), pp 230–265.
12
Modern Computer Main Memory Processor Disk Drives IO controller
Display Keyboard Mouse
13
What is a Computer Program?
Assembly Program C Program Executable file Load to Memory Run the program
14
Data collection
15
Public Databases Gene Expression Omnibus (GEO) Array Express
National Center for Biotechnology Information (NCBI) UCSC Genome Browser The human protein atlas Catalogue of Somatic Mutations in Cancer – COSMIC The Cancer Genome Atlas (TCGA)
22
https://tcga-data.nci.nih.gov/tcga/
23
Experiment design
24
To call in the statistician after the experiment is done may be no more than asking him to perform a postmortem examination : he may be able to say what the experiment died of. - R. A. Fisher
26
Independent Samples Statistical tests are based on the assumption that each subject was sampled independently. Provides maximum amount of information. Provides better estimation of the mean.
27
The Gaussian Approximation
Everybody believes in the normal approximation, the experimenters because they think it is a mathematical theorem, the mathematicians because they think it is an experimental fact. G. Lippman (1845 – 1921)
28
Sample Size Estimation
29
Data analysis
30
Correlation
31
Hypothesis Testing Randomly select samples from the population
State the null hypothesis Distribution of values in two different populations are the same Perform the statistical test T test, F test, Chi-sq test Get P-value Set a threshold (usually < 0.05) for significance
32
Multiple Comparisons The Bonferroni correction
P < 0.05/N (N = number of comparisons) False Discovery Rate (FDR) – Q value What fraction of all the discoveries are false? Q = 10%, N = 100, smallest p-value < Q/N Permutation based approaches
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.