Presentation is loading. Please wait.

Presentation is loading. Please wait.

張家銘 | Chang Jia Ming. 1. 自我介紹 2. 過往學術研究 簡報大綱 張家銘 | Chang Jia Ming 1. 自我介紹 2. 過往學術研究 簡報大綱.

Similar presentations


Presentation on theme: "張家銘 | Chang Jia Ming. 1. 自我介紹 2. 過往學術研究 簡報大綱 張家銘 | Chang Jia Ming 1. 自我介紹 2. 過往學術研究 簡報大綱."— Presentation transcript:

1 張家銘 | Chang Jia Ming

2 1. 自我介紹 2. 過往學術研究 簡報大綱

3 張家銘 | Chang Jia Ming 1. 自我介紹 2. 過往學術研究 簡報大綱

4 2002 ~ 2008 military replace service @ Institute of Information Scienc Acedmia Sinica 1996 ~ 2000 Bachelor ( 推薦甄試入學 ) 2002 ~2002 Master @ Computer Science, National Tsing Hua Uni. Dr. Chuan Yi Tang Dr. Ting-Yi SungDr. Wen-Lian Hsu 2008~2013 PhD La Caxia fellowship @ The Centre for Genomic Regulation Barcelona, Spain Dr. Cedric Notredame 2014~2016 Postdoc @ Institute of Human Genetics Montpellier, France Dr. Giacomo Cavalli 張家銘 | Chang Jia Ming

5

6 The Institute of Human Genetics The French National Centre for Scientific Research 302 publications between 2008 and 2012 20 % of these research papers with an IF >10 2 Nature, 9 in other Nature series, 6 Cell, 2 Science, 7 Genes & Dev, 6 Mol Cell, 7 EMBO J, 5 PNAS the mean of IF is 6.4 Genome Dynamics, Giacomo Cavalli’s Lab 25 papers in high-impact journals Cell, Science, Nature Genetics, PLoS Biology. Awards: the silver medal of the CNRS EMBO fellow He is directly involved in the FP7 EpiGeneSys NoE as board member. 2008 ERC Advanced Investigator Grant, 2.2 million euro 張家銘 | Chang Jia Ming

7 Activity Badminton 中研院總統盃合照 法國羽球組織

8 張家銘 | Chang Jia Ming 1. 自我介紹 2. 過往學術研究 簡報大綱

9 Computational approach in Biology Data Document classification in protein subcellular localization prediction Database search to enrich sequence homology information Sampling approach to detect alignment uncertainty Graph algorithm in NMR backbone assignment Network approach in 3D organization of chromosomes (future work) 張家銘 | Chang Jia Ming

10 Chang J-M, Su EC-Y, Lo A, Chiu H-S, Sung T-Y, & Hsu W-L (2008) PSLDoc: Protein subcellular localization prediction based on gapped-dipeptides and probabilistic latent semantic analysis. Proteins: Structure, Function, and Bioinformatics 72(2):693-710. IF:2.63, 34 citations Prokaryotic Structure protein MPLDLYNTLTRRKERF… http://2012books.lardbucket.org/books/introduction-to-chemistry-general-organic-and-biological/s21-04-proteins.html 張家銘 | Chang Jia Ming

11 Document Classification Documents Classifier Categories Salton’s Vector Space Model 張家銘 | Chang Jia Ming

12 What could be the terms of proteins? Gapped-dipeptides : Let XdZ denote the amino acid coupling pattern of amino acid types X and Z that are separated by d amino acids Liang HK, Huang CM, Ko MT, Hwang JK. The Amino Acid-Coupling Patterns in Thermophilic Proteins. Proteins: Structure, Function and Bioinformatics (2005), 59, 58-63. If d= 20, there are 8400 (=20*20*21) features for a vector 張家銘 | Chang Jia Ming

13 Two Problems If d= 13, there are 5600 (=20*20*14) features for a vector. => how to reduce feature size? 張家銘 | Chang Jia Ming Most of features are zero. => How to in-rich sequence information? Too big Too sparse

14 Protein Subcellular Localization prediction by Document classification Too big Too sparse 張家銘 | Chang Jia Ming

15 Feature reduction – topic model 張家銘 | Chang Jia Ming

16 Probabilistic Latent Semantic Analysis A joint probability between a term w and a document d can be modeled as Latent variable z (“small” #states) Concept expression probabilities Document-specific mixing proportions The parameters could be estimated by maximum- likelihood function through EM algorithm Hofmann T: Unsupervised Learning by Probabilistic Latent Semantic Analysis. Mach Learn 2001, 42(1-2):177-196. 張家銘 | Chang Jia Ming

17 PLSA model fitting Likeli-hood function 張家銘 | Chang Jia Ming M-step : E-step: the probability that a term w in a particular document d explained by the class corresponding to z

18 Probabilistic Latent Semantic Analysis 張家銘 | Chang Jia Ming

19 Evaluation and Results *HYBIRD combines the results of CELLO II and ALIGN. 張家銘 | Chang Jia Ming

20 PSLDoc PSLDoc 2 Chang J-M, Taly J-F, Erb I, Sung T-Y, Hsu W-L, Tang CY, Notredame C, & Su ECY (2013) Efficient and interpretable prediction of protein functional classes by correspondence analysis and compact set relations. PLoS ONE 8, e75542. (IF:3.23) Too sparse 張家銘 | Chang Jia Ming

21 PSI-BLAST Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–402 (1997). 55650 citations by Google Scholar Position Specific Score Matrix 張家銘 | Chang Jia Ming

22 Database Size Data SetNo. UniRef503,077,464 UniRef906,544,144 UniRef1009,865,668 UniProt11,009,767 NCBI NR10,565,004 UniProt (release 15.15 – 2010) NCBI non-redundant (NR) UniRef50 UniRef90 UniRef100 張家銘 | Chang Jia Ming

23 Performance comparison of fast search and normal search on databases of different sizes for the GramNeg_1444 IPP = The information per position provides a quantitative measure of sequence conservation among the homologous sequences used to construct the PSSM for each sequence position. 張家銘 | Chang Jia Ming

24 Compact set S6 S1 S2 S3 S4 S5 S1S2S3S4S5S6 S10101618138 S201417159 S3091012 S40819 S5011 S60 11 10 9 8 C is a compact set if min { E(v i,v k )|v i C, v k V \ C } > max{ D(v i,v j )|v i,v j C } 張家銘 | Chang Jia Ming

25 Hierarchical clustering s1s1 s6s6 s2s2 s3s3 s4s4 s5s5 C1C1 C2C2 C3C3 compact set tree s1s1 s6s6 s2s2 s3s3 s4s4 s5s5 S1S1 S2S2 S3S3 S4S4 single-linkage clustering S1S2S3S4S5S6 S10101618138 S201417159 S3091012 S40819 S5011 S60 張家銘 | Chang Jia Ming

26 CS+1NN on Gram-Negative 張家銘 | Chang Jia Ming

27 Chang J-M, Di Tommaso P, Taly J-Fo, & Notredame C (2012) Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. BMC Bioinformatics 13(Suppl 4). IF:2.58, 27citations 張家銘 | Chang Jia Ming

28 Sequence alignment http://phylo.cs.mcgill.ca/ 張家銘 | Chang Jia Ming

29 Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res 2005, 33(3):816-824. Do CB, Mahabhashyam MS, Brudno M, Batzoglou S: ProbCons: Probabilistic consistency-based multiple sequence alignment. Genome Res 2005, 15(2):330-340. Homology extension Pair-hidden Markov Model 張家銘 | Chang Jia Ming

30 10% more columns are correctly aligned when compared with PRALINE TM. The rows, Pairs and Cols, denote the sum of corrected aligned pairs and columns, respectively. The number of pairs and columns in the reference alignments are 3,294,102 and 1,781, respectively. 張家銘 | Chang Jia Ming

31 Chang J-M, Tommaso PD, & Notredame C, (2014) TCS: A New Multiple Sequence Alignment Reliability Measure to Estimate Alignment Accuracy and Improve Phylogenetic Tree Reconstruction. Molecular Biology and Evolution 31(6):1625-37. IF:9.105, 15 citations 張家銘 | Chang Jia Ming

32 Alignment uncertainty - data Aln1 OPOSSUM-- BLOS-UM62 Aln2 OPOSSUM-- BLO-SUM62 OPOSSU M BLOSUM6 2 Landan G, Graur D (2007) Heads or Tails: A Simple Reliability Check for Multiple Sequence Alignments. Molecular Biology and Evolution 24: 1380 –1383. MUSSOP O 26MUSOL B MSA 張家銘 | Chang Jia Ming

33 Alignment uncertainty - data Aln1 OPOSSUM-- BLOS-UM62 Aln2 OPOSSUM-- BLO-SUM62 OPOSSUM B\B L\L O\O S\\S U\U M\M 6|6 2|2 OPOSSUM Landan G, Graur D (2007) Heads or Tails: A Simple Reliability Check for Multiple Sequence Alignments. Molecular Biology and Evolution 24: 1380 –1383. If there are two paths { chooses low-road; } 張家銘 | Chang Jia Ming

34 Alignment uncertainty - data It gets worse with a multiple sequence alignment. Aln1 BLOS- UM45 OPOSSUM- - BLOS- UM62 Aln3 BLO-SUM45 OPOSSUM- - BLO-SUM62 Aln2 BLO- SUM45 OPOSSUM- - BLOS- UM62 Aln4 BLOS- UM45 OPOSSUM- - BLO- SUM62 Telling apart Uncertainty parts of the alignment is more important than the overall accuracy. 張家銘 | Chang Jia Ming

35 Which alignment task is difficult? pairwise alignment multiple sequence alignment 3*l 2 l3l3 If l = 200, the second is 66 times slower than the first l 張家銘 | Chang Jia Ming

36 x y MSA Pairwise alignments x y consistency Where are samples? Consistency between MSA & pairwise alignment : 0/1 How can we increase the resolution of confidence? 張家銘 | Chang Jia Ming

37 Transitive relation In mathematics, a binary relation R over a set X is transitive if whenever an element a is related to an element b, and b is in turn related to an element c, then a is also related to c. -WikiPedia 張家銘 | Chang Jia Ming

38 Transitive relation in alignment scene consistency multiple sequence alignment x y pairwise alignment x a a y 張家銘 | Chang Jia Ming

39 x y x a x d a y x b e y c y MSA Pairwise alignments consistencyinconsistency 張家銘 | Chang Jia Ming

40 reference alignment Seq1 …SALMLWLSARESIKREN…YPD… Seq2 …SAYNIYVSFQ----RESA…KD… … Seqn …SAYNIYVSAQ----RENA…KD… Seq1 …SALMLWLSARESIKREN…YPD… Seq2 …SAYNIYVSF----QRESA…KD… … Seqn …SAYNIYVSA----QRENA…KD… S S SP1 SP2 confidence1 confidence2 Guidence/TCS SP1 – SP2 ? confidence1 – confidence2 Test2 - structural modeling @ alignment level 張家銘 | Chang Jia Ming

41 GuidanceTCS= 71.10% = 83.5% Penn O, Privman E, Ashkenazy H, Landan G, Graur D, Pupko T: GUIDANCE: a web server for assessing alignment confidence scores. Nucleic Acids Res 2010, 38(Web Server issue):W23-28. 209 citation by Google Penn O, Privman E, Landan G, Graur D, Pupko T: An alignment confidence score capturing robustness to guide tree uncertainty. Mol Biol Evol 2010, 27(8):1759-1767. 152 citation by Google 張家銘 | Chang Jia Ming

42 Chang J-M, Tommaso PD, Lefort V, Gascuel O & Notredame C (2015) TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction. Nucleic Acids Research, 43(W1):W3-6. IF:9.112, 1 citation since 1 July 2015 張家銘 | Chang Jia Ming

43 ㄆ ㄆ Wu K-P, Chang J-M, Chen J-B, Chang C-F, Wu W-J, Huang T-H, Sung T-Y, & Hsu W-L (2006) RIBRA--an error-tolerant algorithm for the NMR backbone assignment problem. Journal of computational biology 13(2):229-244. IF:1.74, 32 citations, 1st paper from Taiwan in RECOMB 張家銘 | Chang Jia Ming

44 NMR assignment problem Spin system (SS) Sequence: AKFERQHMDSSTSRNLTKDR… C a i-1 C b i-1 CaiCaiCaiCai CbiCbiCbiCbi58.432.756.340.8 CC CC N H H CC CC CC H2H2 H2H2 H3H3 One amino acid http://www.bioc.aecom.yu.edu/labs/girvlab/MolBiophys/structure.html 張家銘 | Chang Jia Ming

45 Coding Atreya, H.S., K.V.R. Chary, and G. Govil, Automated NMR assignments of proteins for high throughput structure determination: TATAPRO II. Current Science, 2002. 83(11): p. 1372-1376. One spin system could locate in many possible position of the protein sequence. C a i-1 C b i-1 CaiCaiCaiCai CbiCbiCbiCbi58.432.756.340.8 AKFERQHMDSSTSRNLTKDR SS Possible place 張家銘 | Chang Jia Ming

46 Natural Language Processing - Noises or Ambiguity ? Speech recognition : Homopone selection 台 北 市 一 位 小 孩 走 失 了 台 北 市 小 孩 台 北 適 宜 走 失 事 宜 一 位 一 味 移 位 張家銘 | Chang Jia Ming

47 Spin System Positioning 55.266 38.675 44.555 0 44.417 0 55.043 30.04 44.417 0 30.665 28.72 55356 29.782 60.044 37.541 D 50G 10R 40I 50|51 55.266 38.675 44.555 0 => 50 10 44.417 0 55.043 30.04 =>10 40 44.417 0 30.665 28.72 =>10 40 55356 29.782 60.044 37.541 => 40 50 We assign spin system groups to a protein sequence according to their codes. Spin System 張家銘 | Chang Jia Ming

48 Link Spin System groups Segment 3 Segment 2 Segment 1 55.266 38.675 44.555 0 44.417 0 55.043 30.04 44.417 0 30.665 28.72 55356 29.782 60.044 37.541 DGRI 張家銘 | Chang Jia Ming

49 Iterative Concatenation DGRI….FKJJREKL …. Step n Segment 99 1 2 …. 56 Spin Systems 1 2 2 47 1 Step1 56 Step2 Segment 1 Segment 2 Segment 31 … Step n-1 Segment 78Segment 79 … 張家銘 | Chang Jia Ming

50 Conflict Segments DGRIGEIKGRKTLATPAVRRLAMENNIKLS Segment 78 Segment 71 Segment 79 Segment 99Segment 98 Segment 97 Two kinds of conflict segments Overlap (e.g. segment 71, segment 99) Use the same spin system (e.g. both segment 78 and segment 79 contain spin system 1) 張家銘 | Chang Jia Ming

51 A Graph Model for Spin System Linking Independent Set: Subset S of vertices such that no two vertices in S are connected ) G(V,E) V: a set of nodes (segments). E: (u, v), u, v  V, u and v are conflict. Goal Assign as many non-conflict segments as possible => find the maximum independent set of G. 張家銘 | Chang Jia Ming

52 An Example of G Seq. : GEIKGRKTLATPAVRRLAMENNIKLSE Seq. : GEIKGRKTLATPAVRRLAMENNIKLSE Segment1: SP12->SP13->SP14 Segment2: SP9->SP13->SP20->SP4 Segment3: SP8->SP15->SP21 Segment4: SP7->SP1->SP15->SP3 Seg1Seg3Seg4Seg2 Seg1 Seg3 Seg2 Seg4 SP13 SP15 Overlap 張家銘 | Chang Jia Ming

53 Segment Extension DGRGEKGRKTLATPAVRRLAMENNIKLS DGRGEKGRKTLATPAVRRLAMENNIKLS MaxIndSet 77 99‘ 97‘ 99 97 45 23 26 31 29 32 33 24 27 28 77 71 78 99‘ 97‘ 99 97 張家銘 | Chang Jia Ming

54 www.researchgate.net/profile/Jia-Ming_Chang4scholar.google.com/citations?user=5TgmGNEAAAAJgithub.com/warnnamewww.researchgate.net/profile/Jia-Ming_Chang4scholar.google.com/citations?user=5TgmGNEAAAAJgithub.com/warnname Thank You Lunch!


Download ppt "張家銘 | Chang Jia Ming. 1. 自我介紹 2. 過往學術研究 簡報大綱 張家銘 | Chang Jia Ming 1. 自我介紹 2. 過往學術研究 簡報大綱."

Similar presentations


Ads by Google