Jeffrey Zheng School of Software, Yunnan University August 4, 2014 2 nd International Summit on Integrative Biology August 4-5, 2014 Chicago, USA.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Integrating Genomes D. R. Zerbino, B. Paten, D. Haussler Science 336, 179 (2012) Teacher: Professor Chao, Kun-Mao Speaker: Ho, Bin-Shenq June 4, 2012.
Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
1 DNA Analysis Amir Golnabi ENGS 112 Spring 2008.
August 19, 2002Slide 1 Bioinformatics at Virginia Tech David Bevan (BCHM) Lenwood S. Heath (CS) Ruth Grene (PPWS) Layne Watson (CS) Chris North (CS) Naren.
Introduction to BioInformatics GCB/CIS535
Molecular Evolution with an emphasis on substitution rates Gavin JD Smith State Key Laboratory of Emerging Infectious Diseases & Department of Microbiology.
Gene Regulatory Networks - the Boolean Approach Andrey Zhdanov Based on the papers by Tatsuya Akutsu et al and others.
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
1 Integration of Background Modeling and Object Tracking Yu-Ting Chen, Chu-Song Chen, Yi-Ping Hung IEEE ICME, 2006.
1 Bayesian inference of genome structure and application to base composition variation Nick Smith and Paul Fearnhead, University of Lancaster.
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
Aula 4 Radial Basis Function Networks
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
IE 594 : Research Methodology – Discrete Event Simulation David S. Kim Spring 2009.
Artificial Intelligence (AI) Addition to the lecture 11.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
University of Toronto Department of Computer Science © 2001, Steve Easterbrook CSC444 Lec22 1 Lecture 22: Software Measurement Basics of software measurement.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
JM - 1 Introduction to Bioinformatics: Lecture VIII Classification and Supervised Learning Jarek Meller Jarek Meller Division.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
NEURAL NETWORKS FOR DATA MINING
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Fuzzy Reinforcement Learning Agents By Ritesh Kanetkar Systems and Industrial Engineering Lab Presentation May 23, 2003.
10 December, 2008 CIMCA2008 (Vienna) 1 Statistical Inferences by Gaussian Markov Random Fields on Complex Networks Kazuyuki Tanaka, Takafumi Usui, Muneki.
Initial sequencing and analysis of the human genome Averya Johnson Nick Patrick Aaron Lerner Joel Burrill Computer Science 4G October 18, 2005.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
©2009 Mladen Kezunovic. Improving Relay Performance By Off-line and On-line Evaluation Mladen Kezunovic Jinfeng Ren, Chengzong Pang Texas A&M University,
Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:
Comp. Genomics Recitation 9 11/3/06 Gene finding using HMMs & Conservation.
LOGO iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance- Pairs and Reduced Alphabet Profile into the General Pseudo Amino.
Cellular Automata BIOL/CMSC 361: Emergence 2/12/08.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S
Variant calling: number of individuals vs. depth of read coverage Gabor T. Marth Boston College Biology Department 1000 Genomes Meeting Cold Spring Harbor.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Modeling K The Common Core State Standards in Mathematics Geometry Measurement and Data The Number System Number and Operations.
A field of study that encompasses computational techniques for performing tasks that require intelligence when performed by humans. Simulation of human.
By Jyh-haw Yeh Department of Computer Science Boise State University.
Discovering Evolutionary Theme Patterns from Text -An exploration of Temporal Text Mining KDD’05, August 21–24, 2005, Chicago, Illinois, USA. Qiaozhu Mei.
Organic Evolution and Problem Solving Je-Gun Joung.
T-tests Chi-square Seminar 7. The previous week… We examined the z-test and one-sample t-test. Psychologists seldom use them, but they are useful to understand.
Bayesian Modeling of Quantum-Dot-Cellular-Automata Circuits Sanjukta Bhanja and Saket Srivastava Electrical Engineering, University of South Florida, Tampa,
Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.
A Robust and Accurate Binning Algorithm for Metagenomic Sequences with Arbitrary Species Abundance Ratio Zainab Haydari Dr. Zelikovsky Summer 2011.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
1 Artificial Regulatory Network Evolution Yolanda Sanchez-Dehesa 1, Loïc Cerf 1, José-Maria Peña 2, Jean-François Boulicaut 1 and Guillaume Beslon 1 1.
DNA Sequences Analysis Hasan Alshahrani CS6800 Statistical Background : HMMs. What is DNA Sequence. How to get DNA Sequence. DNA Sequence formats. Analysis.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),
Sub-fields of computer science. Sub-fields of computer science.
Software Defects Cmpe 550 Fall 2005
Big data classification using neural network
Daniil Chivilikhin and Vladimir Ulyantsev
Article Review Todd Hricik.
High-throughput Biological Data The data deluge
SVM-based Deep Stacking Networks
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Predicting Gene Expression from Sequence
GWAS-eQTL signal colocalisation methods
Presentation transcript:

Jeffrey Zheng School of Software, Yunnan University August 4, nd International Summit on Integrative Biology August 4-5, 2014 Chicago, USA

 Frontier of Non-Coding DNAs/RNAs  General Comparison Model for Pseudo DNAs & Real DNAs  Sample Cases  Conclusion

Ratios on Non-Coding DNAs Tools for Analysis Current Situation Assumption & Question

 ENCODE: over 80% of DNA in the human genome "serves some purpose, biochemically speaking".  However, this conclusion is strongly criticized... 3% U. Gibba 90% Takifugu 98% Human 30% Arabidopsis

 Frequency Distribution  GC densities  Repeat sub-sequences  …  Machine Learning  Bayesian Inference and Induction  Neural Network  Hidden Markov Model  …

A hairpin Analysis Results in various conditions Refined Distributions on different parameters A DNA Sequence

 Total DNA varies widely between organisms  Ratios of coding DNAs and Non-coding DNAs in genomes are different significantly  98% human genomes are Non-coding DNAs  Non-coding RNAs/DNAs may be drivers of complexity, they are a larger heterogeneous group Due to various criteria, no a general classification can be used to sub- classify this group

Assumption: A general classification of Non-Coding DNA interactions could be relevant to higher levels of pair structures between a distance on a DNA sequence. Both 0-1 outputs & DNA segments are random sequences Question: Can interaction models of Stream Cipher mechanism simulate a general classification for Non-Coding DNAs?

Variant Logic DNAs & Pseudo DNAs General Model Main Procedure

 An unified 0-1 logic framework base on input/output and logic functions using four Meta symbols: {⊥, +, -, ⊤} ◦ 0-0 : ⊥, 0-1 : +, ◦ 1-0 : -, 1-1 : ⊤.  Multiple Maps of Variant Phase Spaces can be visualized

DNA Sequences Variant Logic G0-0 : ⊥ A0-1: + T1-0: - C1-1: ⊤ Results of automated chain- termination DNA sequencing. Four Meta States

 Two input sources: ◦ Pseudo DNAs – Artificial Sequences using Stream Cipher on Interactions – HC256 ◦ Real DNAs – Human DNAs  Variant Construction to measure & quantity input sequences on 4 meta bases {ACGT}  Using Visual Maps to identify higher levels of global symmetries between A&T and C&G maps for both artificial & real DNAs

Stream Cipher Mechanism 0-1 Sequences + Interaction Models Pseudo DNA Sequences DNA Sequences Variant Construction Visual Maps …TAACTTAGCA…HC256 Human … Virus Sample Cases on Pseudo DNA: Probability Statistics on 4 Meta symbols Different Maps Artificial DNAs vs. Real DNAs in Visual Maps Y = mode = 1 X r=1 =TGACCTGATACC X r=2 =TAACTTAGCACT X r=3 = CAATTCGACATT mode = 2 X r=1 =TACGTC X r=2 =TATTCA X r=3 =CAAGAC

Input: Pseudo DNA/Real DNA Vector X t : GGTACTTGCAT… Projected as Four 0-1 vectors M G : … M A : … M T : … M C : … Calculated as four Probability Vectors Determine four pairs of map position Collected all DNA Vectors Four Maps constructed

2700 DNA Sequences Human DNAs vs. HC256 Pseudo DNAs Sets of Maps

 Two Sets of T=2700 sequences ◦ Non-Coding DNAs for Human Genomes  SRR xxxxxxx, N= 500bp  For a sample point, a sequence could be >SRR TAATTCTTGAGTTCATGTCCCGCATCCAGGGCACACTTGTGCAAGGGGTGGGTTCCCAAGACCTTAT GCAGCTCTGCCTCTGTGGCTTTGCAGTGTACAGTCACCATGGCTGCTGTCTTGGATCAGAGTTGAGT GCCTGTGGTATTTCTAGGCTCAGGATGAAAGCTTCCCGTGGCTCTACCATTCAGGGATCTTGACGTG GCGGCCCCATTCCCACAGCTCCTGTAGGTAGTGCCCCAGTGGGGACTCTGTGTGGAGGCTTCAATC CCATATTTCCTGTTGGCACTGCCCTAGTGGACTTTTGATTTCTTTCTGATTCAGTCTTGGAAGGTTGT GTGTTTCCAGGAATTTATCCATTTTCTCTAGGTTTTCTAGTTTATGCACACAAAGATATTCTGAGGATCT TTTTTTGTGTCAGTGGTATCCTTTGCAATGTCTCATTTGTAATTTTTGATTGTGCTTATTGGAATCTTCTT TTTTCTTGTATAATCTAACTAGCA

Human DNA: Pseudo DNA: HC256

 Using Variant Logic, Four DNA Meta States correspond Four Variant Meta States  Pseudo DNAs can be generated under Various conditions to form Visual Maps  Both Real & Artificial DNAs have stronger similarity  Visual Maps may provide a General Classification for Genomic analysis on DNA Interactions  Further Explorations are required…

1. B. Banfai, H. Jia, J. Khatun et al. (2012) Long noncoding RNAs are rarely translated in two human cell lines, Genome Research, Cold Spring Harbor Laboratory Press, 22: Doi: /gr J.M. Engreitz, A. Pandya-Jones, P. McDonel et al. (2013) Large Noncoding RNAs can Localize to Regulatory DNA Targets by Exploriting the 3D Architecture of the Genome, Proceedings of The Biology of Genomes, Cold Spring Harbor Laboratory Press, J. Zheng, C. Zheng and T. Kunii (2011) A Framework of Variant Logic Construction for Cellular Automata, in Cellular Automata – Innovative Modelling for Science and Engineering, Edited by A. Salcido, InTech Press, , J. Zheng, W. Zhang, J. Luo, W. Zhou and R. Shen (2013) Variant Map System to Simulate Complex Properties of DNA Interactions Using Binary Sequences, Advances in Pure Mathematics, 3 (7A) doi: /apm A /apm A J. Zheng, J. Luo and W. Zhou (2014) Pseudo DNA Sequence Generation of Non-Coding Distributions Using Variant Maps on Cellular Automata," Applied Mathematics, 5(1) doi: /am /am J. Zheng, W. Zhang, J. Luo, W. Zhou, V. Liesaputra (2014) Variant Map Construction to Detect Symmetric Properties of Genomes on 2D Distributions. J Data Mining Genomics Proteomics 5:150. doi: /