Download presentation
Presentation is loading. Please wait.
Published byWarren Taylor Modified over 8 years ago
1
각종 생물정보 분석도구 의 실무적 활용 및 실습 김형용 개발팀 Insilicogen, Inc.
2
Contents Introduction to biological sequence Pairwise alignment BLAST Multiple alignment ClustalW Phylogenetic analysis Phylip Genome analysis Apollo
3
Rosetta stone Hieroglyphic, Demotic Egyptian, Greek How can I translate it?
4
Biological sequence A kind of language “AGTCAGTCAGTCAGTCAGTTTCCCAAA” “PEEKSAVTALWGKVNVDEVGGEALGRLLV VYPWT” Format FASTA format GenBank(EMBL, DDBJ) format XML
5
FASTA format
6
Transformational grammar Regular grammar : [A|G](C.+)* Context free grammar : DNA Palindrome, “ 다시합창합시다 ” Context sensitive grammar Unrestricted Grammar : 자연어
7
Sequence Analysis method Sequence to sequence comparison : Alignment Pattern search : Using regular grammar RNA 2 nd structure modeling : Using context free grammar ADCNY- RQCLCR-PM AYC-YNR- CKCRDP- ADCNYRQCLCR PM AYCYNRCKCRD P
8
Substitution matrix DNA Protein BLOSUM (BLOCK Amino Acid Substitution Matrix) PAM (Percent Accepted Mutation)
10
Sequence alignment
11
ADCNY- RQCLCR-PM AYC-YNR- CKCRDP- ADCNYRQCLCR PM AYCYNRCKCRD P
12
Pairwise alignment Global alignment Needleman & Wunsch algorithm Local alignment Smith & Waterman algorithm Repeated matches Overlap matches
13
BLAST Unknown sequence Known sequence Database
14
NCBI toolkit BLAST analysis in your computer ftp://ftp.ncbi.nih.gov/blast/executables/LATES T/ncbiz.exe ftp://ftp.ncbi.nih.gov/blast/executables/LATES T/ncbiz.exe formatdb blastall bl2seq
15
Multiple alignment Purpose Predicting protein structure and function Phylogenetic analysis Confirm SNPs or other polymorphism Criteria Structural similarity Evolutionary similarity Functional similarity Sequence similarity
16
Multiple alignment Main application Extrapolation Phylogenetic analysis Pattern identification Domain identification DNA regulatory elements Structure prediction PCR analysis
17
Example of Multiple alignment Cellulose-binding domain of cellobiohydrolase I (30-35 residue)
19
Multiple alignment formats MSF : Multiple Sequence alignment Format Selex : Extended version of MSF ALN : Default output of ClustalW Phylip : Variant of ALN Converting format Fmtseq : http://bioweb.pasteur.fr/seqanal/interfaces/fmtseq. html
20
ClustarW 모든 sequence pair 에 대해 Kimura 의 모델을 이용하여, evolutionary distance diagonal matrix 를 만든다. Neighbor-joining clustering algorithm 을 사용 하여 guide tree 를 만든다. Similarity 가 감소하는 순으로 alignment 한다. Windows 용 다운로드 ftp://ftp.ebi.ac.uk/pub/software/dos/clustalw/
21
Phylogenetic analysis Phylogeny inference or “tree building” Character and rate analysis Practical approach Multiple fasta format (*.fasta) Multiple sequence alignment format (*.msf, *.aln, *.phy, *.nex) Tree format (*.tre) Result image (*.ps, *.png, *.jpg)
22
Common phylogenetic tree terminology
25
Types of tree
26
Phylogenetic tree building method
27
Types of data Character-based method Distance –based method
28
Similarity vs. Evolutionary Relationship Similar : having likeness or resemblance (an observation) Related : genetically connected (an historical fact)
29
Parsimony method The ‘most-parsimonious’ tree is the one that requires the fewest number of evolutionary events Advantages Simple, intuitive, logical Can be used to infer the sequence of extinct ancestor Disadvantages Derived from Medieval logic, not statistics
30
Maximum likelihood method The highest ML value is considered Advantages Statistical and evolutionary model-based The most ‘consistent’ Can be used to infer the sequence of ancestor Disadvantages Computationally very intense (limits number of taxa and length of sequence)
31
Minimum Evolution method The tree with the shortest sum of the branch lengths is chosen as the best tree Advantages Indirectly measured distances (immunological, hybridization) Usually faster than character-based methods Has an objective function Disadvantages Information lost when characters transformed to distances Slower than clustering method
32
Clustering methods (UPGMA & Neighbor-Joining) The algorithm itself builds ‘the’ tree Advantages Indirectly measured distances (immunological, hybridization) Fastest (very large DB quickly) Disadvantages Similarity and relationship are not necessarily the same thing. Have no explicit optimization criteria
33
Phylip Phylogeny Inference Package 주요 프로그램들 Dnaml, proml : Maximum likelihood Dnapenny, protpars : Parsimony method Fitch, neighbor : Distance method Drawgram, drawtree : drawing
34
그외 프로그램들 PAUP : *.tre 파일의 생성 TreeView : *.tre 파일의 viewing BioEdit : GUI 환경에서 대부분의 작업을 수행 (fastdnaml 유용 )
35
Genome Analysis Genome sequencing Transcriptome sequencing (EST) Microsatellite, SNP, Genotyping
36
EST Expression Sequence Tag
37
Eukaryotic gene structure
38
Genome annotation Repeat identification : RepeatMasker Gene prediction : GenScan, FGENESH Other region : tRNAScan-SE, CpG-island Regulatory region : TESS BLAST (dbEST, other genome, known genes)
39
Gene modeling
40
Genome Browser Ensembl UCSC Genome browser AceDB Apollo GAVI
41
Apollo Genome browser & annotation tool Input data XML : GAME, Chado Ensembl : GFF, direct MySQL connection GenBank, EMBL Analysis result : BLAST, sim4, blat, FgenesH, Genscan, tRNAScan-SE http://www.fruitfly.org/annot/apollo/
42
GAVI : Genome Ajax Viewer Insilicogen’s web service Manual addition your feature Zoom in/out, move left/right Analysis result import : Genscan, RepeatMasker
43
실습 Pairwise alignment : bl2seq BLAST searching to your data : blastall Multiple alignment for interesting protein : ClustalW Phylogenetic tree drawing : Phylip Genome annotation : Apollo, GAVI
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.