Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.

Similar presentations


Presentation on theme: "Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc."— Presentation transcript:

1 Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc.

2 Goals of the talk The major battle fields in Bioinformatics research The most popular weapons used in the battle

3 History Human genome project Overlapping with other branches –Computational Biology –Biocomputing –Biostatistics –Cheminfomatics

4 The Central Dogma of Molecular Biology DNARNAProtein TranscriptionTranslation

5 Major battle fields in bioinformatics DNA –Genome sequencing –Gene discovery mRNA –Micro-array analysis –Sequencing Protein –Structure modeling and prediction –Proteomics …

6 Major weapons Computational algorithm –Hash method –Dynamic algorithm –String and Tree (binary, suffix) –Clustering Probability and Statistical theory and methods –Bayesian theorem, Markov chain (HMM), Principle component –Monte Carlo simulation –Neural Network Physical chemistry –Functions to describe the physical chemistry interactions in bio-molecules –Molecular mechanics, Molecular dynamics algorithm Data storage and access –Database: Oracle, MySQL etc. –Web interface Large-scale computing platform –Hardware –Software

7 Genome sequencing : Celera shotgun assembly Venter et al. 2001

8 Gene discovery based on sequence comparison Finding new genes based on their sequence similarity and evolution relationship with known genes Methods –Hash-based database search method, like BLAST (PSI-BLAST), FASTA, BLAT etc. –Sequence alignment using Dynamic Programming algorithm

9 BLAST database search (http://www.ncbi.nih.gov/BLAST/) Query sequence Database sequences Query database

10 Sequence alignment BLAST ||| | BLA-T Example Programs CLUSTALW DIALIGN

11 Dynamics algorithm Sequence A = (A1, A2, …, Ai,..., Am) Sequence B = (B1, B2, …, Bj, …, An)

12 Ab initio gene prediction methods Statistics based gene prediction –Nucleotides distribution frequencies in the coding regions –Exon/Intron boundary signal Examples –GenScan, Burge and Karlin 1997 –Fgenesh, Solovyev and Salamov 1994

13 Hybrid gene prediction method Example: Celera Otto program –BLAST against Refseq database –BLAST against EST database, other genomic sequences etc. –Genscan, Fgenesh

14 Problems in Gene discovery Example: Given a cDNA sequence, find its true location in the genome map among lots of alternatives 1 1’2’3’ 23 Genomic component Query transcript/protein

15 Two-step solution 1.BLAST search of the cDNA sequence against the whole genome map 2.Using an LIS algorithm to find the correct genomic component hit

16 Phylogenetic analysis Goal: study the function and evolution relationship among a group of genes –Divide homologous genes into function families –Find the evolution relationship between the ortholog genes belonging to different species (e.g., the theory of Out of Africa) Methods –Hierarchical Clustering –Neighbore-joining etc. PHYLIP program, Univ. of Washington

17

18 Micro-array analysis Expression-genomics Primary goals –Look for the genes with different expression levels between experiments, which are candidates of functional genes –Look for the group of genes that have correlated gene expression levels, which could suggest that they are in the same biological pathway

19 Methods –General probability and statistics methods –Dimension reduction Principle components Lowess –Clustering Tools –S-plus, R

20 Example Herbicide –Plants was treated with herbicide to observe the gene expression profiles in a series of time steps. –The genes that appeared right before plant dies (12 hours) are the possible “death” genes –If we knock down the “death” genes in the normal plants, they could last longer time than the herbs.

21 Protein structure prediction Why is protein structure important? –The functions of a gene depend on its translated protein structure Protein binding with its ligands Protein-protein interactions –A protein molecule usually keeps one stable structure under normal physiological conditions (Anfinson, 1960es) –Drug design Docking and high throughput drug screening.

22 Sequence Protein structure Function Bioinformatics

23 Protein structure prediction methods

24 Protein sequence Database search Sequence alignment Select template structure Build conserved regions first Loop modeling Build side-chains Optimizing Homology modeling procedure

25 Homology modeling programs Academic software –MODELER, Sali A. –COMPOSER, Blundell T. –SWISS-MODEL –Rasmol (graphics) Commercial software –QUANTA, MSI inc. –SYBYL, TRIPOS inc.

26 Threading Find the best fold candidates among a limited number of choices Add 3D information to the score function of dynamic programming

27 Ab initio protein structure principle

28 Threading programs –Topits, Eisenberg D. –Threader, Jones D. –ProSup, Sipple M –123D, Alexandra N. Ab initio programs –Rosetta, David Baker

29 Current status in the protein structure prediction field Moult J., CASP (Critical Assessment of Techniques for Protein Structure Prediction). Homology modeling is very mature already Threading and Ab initio method have been used in industry Structure genomics

30 Large scale computing platform Hardware –Super-computers Cray/SGI DEC/Compaq Intel –Linux clusters –Blade Software –Parallel computing (MPP, PVM etc.) –Linux –Grid computing: the Globus Project

31 Linux clusters

32 Data storage and access Bioinformatics is producing huge amount of data each day –How to organize and store data –How to access data Database software –Commercial Oracle, DB2, Sybase –Freeware MySQL, PostgreSQL

33 Data store and access Bioinformatics is producing huge amount of data each day –How to organize and store data –How to access data Database software –Commercial Oracle, DB2, Sybase –Freeware MySQL, PostgreSQL Current popular database –DNA, protein sequence, like Genbank, SwisProt, PIR etc. –Protein structure, like PDB, Scop –DNA, mRNA, protein function, like GO, PFAM

34 Database example: Gene Ontology (GO) Molecular function Biological process Cellular component

35 Data access Web interface –Protocol CGI, JSP, ASP –Computer languages Perl, Java, C/C++, Visual Basic, Visual C++

36 Forth looking Where are the markets –Develop new programs –Assemble current programs to build more efficient data mining pipelines –Data storage and access –Integrate the current database to use them more effectively –Computing platform, including hardware, software support, consulting etc. What we can offer –Multi-talents –Team work –Networking

37 http://www.hongyu.org/paper/bioinformatics.ppt


Download ppt "Bioinformatics Methods and Applications Dr. Hongyu Zhang Ceres Inc."

Similar presentations


Ads by Google