Seojin Bang. The goal of this review paper is.. To address problems and computational solutions that arise in analysis of omics data. To highlight fundamental.

Slides:



Advertisements
Similar presentations
Most Random Gene Expression Signatures are Significantly Associated with Breast Cancer Outcome Venet, et al. PLoS Computational Biology, 2011 Molly Carroll.
Advertisements

Molecular Systems Biology 3; Article number 140; doi: /msb
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Zhen Shi June 2, 2010 Journal Club. Introduction Most disease-causing mutations are thought to confer radical changes to proteins (Wang and Moult, 2001;
Next Generation Sequencing, Assembly, and Alignment Methods
1 Genetics The Study of Biological Information. 2 Chapter Outline DNA molecules encode the biological information fundamental to all life forms DNA molecules.
Next-generation sequencing and PBRC. Next Generation Sequencer Applications DeNovo Sequencing Resequencing, Comparative Genomics Global SNP Analysis Gene.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Radiogenomics in glioblastoma multiforme
Todd J. Treangen, Steven L. Salzberg
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
Igor Ulitsky.  “the branch of genetics that studies organisms in terms of their genomes (their full DNA sequences)”  Computational genomics in TAU ◦
ResponseNet revealing signaling and regulatory networks linking genetic and transcriptomic screening data CSE Fall.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics lectures at Rice University Li Zhang Lecture 11: Networks and integrative genomic analysis-3 Genomic data
Lecture 11. Topics in Omic Studies (Cancer Genomics, Transcriptomics and Epignomics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational.
No reference available
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Simultaneous identification of causal genes and dys-regulated pathways in complex diseases Yoo-Ah Kim, Stefan Wuchty and Teresa M Przytycka Paper to be.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Dense-Region Based Compact Data Cube
Lesson: Sequence processing
CSCI2950-C Lecture 12 Networks
GraDe-SVM: Graph-Diffused Classification for the Analysis of Somatic Mutations in Cancer Morteza H.Chalabi, Fabio Vandin Hello.
WABI: Workshop on Algorithms in Bioinformatics
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Reverse-engineering transcription control networks timothy s
CSCI2950-C Genomes, Networks, and Cancer
Genomic Data Integration
Statistical Applications in Biology and Genetics
Global Transcriptional Dysregulation in Breast Cancer
 The human genome contains approximately genes.  At any given moment, each of our cells has some combination of these genes turned on & others.
Dept of Biomedical Informatics University of Pittsburgh
University of California at San Diego
Volume 5, Issue 1, Pages (October 2013)
1 Department of Engineering, 2 Department of Mathematics,
Genomes and Their Evolution
1 Department of Engineering, 2 Department of Mathematics,
Proteomics Informatics David Fenyő
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Quantitative Genetic Interactions Reveal Biological Modularity
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Schedule for the Afternoon
Batyr Charyyev.
Genetics: From Genes to Genomes
The Study of Biological Information
SEG5010 Presentation Zhou Lanjun.
Anastasia Baryshnikova  Cell Systems 
Volume 58, Issue 4, Pages (May 2015)
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Integrative omic approaches for the study of host–pathogen interactions Integrative omic approaches for the study of host–pathogen interactions (A) Proteomic.
Proteomics Informatics David Fenyő
Deep Learning in Bioinformatics
Label propagation algorithm
Schematic representation of a transcriptomic evaluation approach.
Interactome Networks and Human Disease
Presentation transcript:

Seojin Bang

The goal of this review paper is.. To address problems and computational solutions that arise in analysis of omics data. To highlight fundamental algorithmic ideas that serve as a launching point for extracting biological insights from omics data.

PART1. Processing, storage, and retrieval of high-throughput sequencing data PART3. Integrative interactomics This review focuses on three important areas. R.Q. Wu et al. J DENT RES 2010;90: PART2. Data mining for transcriptomics

PART1. Processing, storage, and retrieval of high-throughput sequencing data PART2. Data mining for transcriptomics PART3. Integrative interactomics R.Q. Wu et al. J DENT RES 2010;90:

PART 1 TGAT CATG TGGACG AGTTCT CCGTGT AAT GTTAG CGTAC CAGTTG CTCGT Original Sequence fragmentation Sequencing ATGCGG TAGCCG TCGACG GTAG AGTACT TACCA CTCG CGTA ATGT Assembly TAGCCG CCGTGT GTAG … CGTAC TACCA TAGCCGTGTAGTTCTCGTTAGTACTCGTAGGACGAATGTCGTACCA Assembled Sequence Alignment (Read Mapping) TAGCCG TAGCCGTGTAGTTCTCGTTAGTACTCGTAGGACGAATGTCGTACCA CTCGT CGTAC TACCAGTAG CCGTGT GTAG AGTTCT TGAT CATG TGGACG AAT ATGT GTTAG AGTACT Reference genome TAGCCGTGTAGTTCTCGTTAGTACTCGTAGGACGAATGTCGTACCA Aligned Sequence

PART 1 TGAT CATG TGGACG AGTTCT CCGTGT AAT GTTAG CGTAC CAGTTG CTCGT Original Sequence fragmentation Sequencing ATGCGG TAGCCG TCGACG GTAG AGTACT TACCA CTCG CGTA ATGT Assembly TAGCCG CCGTGT GTAG … CGTAC TACCA TAGCCGTGTAGTTCTCGTTAGTACTCGTAGGACGAATGTCGTACCA Assembled Sequence Alignment (Read Mapping) TAGCCG TAGCCGTGTAGTTCTCGTTAGTACTCGTAGGACGAATGTCGTACCA CTCGT CGTAC TACCAGTAG CCGTGT GTAG AGTTCT TGAT CATG TGGACG AAT ATGT GTTAG AGTACT Reference genome TAGCCGTGTAGTTCTCGTTAGTACTCGTAGGACGAATGTCGTACCA Aligned Sequence Genome Assembly

PART 1 Genome Assembly Too many possible short- sequence pairs to be compared. Problem Use Graphical approaches such as de Bruijn graph Solution

PART 1 TGAT CATG TGGACG AGTTCT CCGTGT AAT GTTAG CGTAC CAGTTG CTCGT Original Sequence fragmentation Sequencing ATGCGG TAGCCG TCGACG GTAG AGTACT TACCA CTCG CGTA ATGT Assembly TAGCCG CCGTGT GTAG … CGTAC TACCA TAGCCGTGTAGTTCTCGTTAGTACTCGTAGGACGAATGTCGTACCA Assembled Sequence Alignment (Read Mapping) TAGCCG TAGCCGTGTAGTTCTCGTTAGTACTCGTAGGACGAATGTCGTACCA CTCGT CGTAC TACCAGTAG CCGTGT GTAG AGTTCT TGAT CATG TGGACG AAT ATGT GTTAG AGTACT Reference genome TAGCCGTGTAGTTCTCGTTAGTACTCGTAGGACGAATGTCGTACCA Aligned Sequence Read Mapping

PART 1 Read Mapping Huge running times and shortage of storages to save ref. genome Problem Use FM-index technique that is a hybrid of BWT(Burrows- Wheeler transformation) and suffix array. Solution BWT Suffix Array

PART 1 TGAT CATG TGGACG AGTTCT CCGTGT AAT GTTAG CGTAC CAGTTG CTCGT Original Sequence fragmentation Sequencing ATGCGG TAGCCG TCGACG GTAG AGTACT TACCA CTCG CGTA ATGT Assembly TAGCCG CCGTGT GTAG … CGTAC TACCA TAGCCGTGTAGTTCTCGTTAGTACTCGTAGGACGAATGTCGTACCA Assembled Sequence Alignment (Read Mapping) TAGCCG TAGCCGTGTAGTTCTCGTTAGTACTCGTAGGACGAATGTCGTACCA CTCGT CGTAC TACCAGTAG CCGTGT GTAG AGTTCT TGAT CATG TGGACG AAT ATGT GTTAG AGTACT Reference genome TAGCCGTGTAGTTCTCGTTAGTACTCGTAGGACGAATGTCGTACCA Aligned Sequence Large-scale genome sequence compressed storage and search

Data are compressed in such a way that they can be efficiently and accurately searched without decompressing first. Flow chart of CaBLAST Solution Previous compressive techniques require the data to be decompressed before computational analysis. As size of genomic library is getting larger, any computational analysis that runs on the full genomic library take a long time. Problem

PART1. Processing, storage, and retrieval of high-throughput sequencing data PART3. Integrative interactomics R.Q. Wu et al. J DENT RES 2010;90: PART2. Data mining for transcriptomics

Identifying cell-specific expression signals Heterogeneity of cell types may confound gene expression analysis. Problem Use Linear Mixed Model to identify expression profiles for each cell type from overall expression signals. Solution Overall expression signals Cell type specific signals

Identifying regulatory genes and modules in a disease-based analysis How can we construct gene regulatory network in such a way that the network is sparsely structured. Problem 1. Remove non-significant correlations between two genes using pre-defined threshold or penalized method such as lasso 2. Find a gene set of minimum size such that its expression profile linearly fit the given genes of interest. (SPARCLE) Solution Subnetwork of the breast cancer gene regulatory network for the biological process cell cycle Emmert-Streib et al. Front. Genet. 2014

Identifying gene expression alternations in disease Problem: How to distinguish passenger and driver genes of a cancer in the copy number variation region? Solution: CONEXIC integrates copy number variations and gene expression data from tumor samples to identify driving mutations and the processes they influence. Problem: Genetic alterations between patients with same disease can differ. but often involve common pathways. Solution: PARADIGM and PARADIGM-SHIFT construct pathways from cancer transcriptomic profiling data sets because genetic alterations between patients often involve common pathways.

PART1. Processing, storage, and retrieval of high-throughput sequencing data PART3. Integrative interactomics R.Q. Wu et al. J DENT RES 2010;90: PART2. Data mining for transcriptomics

Analysis of heterogeneous genomic data set Networks or interactomes are commonly represented as graphs. We can define subnetwork (modules) as we did for protein- protein and regulatory interaction networks. How can we find modules that are specific to conditions of interest? Problem Node: gene, RNA, protein or metabolite Edge: known interactions among them Solution

Intractome analysis of disease data sets How can we test the modularity of genes that are putatively associated with a specific disease? Problem To assess significance values of each module by comparing with those computed on randomized network. Solution Although genes underlying a disease may differ among individuals, pathways are likely to be shared and thus proteins associated with the same disease have a tendency to interact.

Conclusion and Future Prospects We addressed problems and computational solutions that arise in analysis of omics data. Compressive techniques for next-generation sequencing read data sets and their quality scores remains a major challenge. Transcriptomic data shifts from microarray to next generation sequencing. We will also need to develop transcriptomic analysis methods to handle the new form of data. Much future work in integrative interactomics will focus on characterizing the differences that distinguish individuals and cells from other.