Overview Identify similarities present in biological sequences and present them in a comprehensible manner to the biologists Objective Capturing Similarity.

Slides:



Advertisements
Similar presentations
This algorithm is used for dimension reduction. Input: a set of vectors {Xn є }, and dimension d,d
Advertisements

Vector Operations in R 3 Section 6.7. Standard Unit Vectors in R 3 The standard unit vectors, i(1,0,0), j(0,1,0) and k(0,0,1) can be used to form any.
Scalable High Performance Dimension Reduction
SALSA HPC Group School of Informatics and Computing Indiana University.
Three Dimensional Viewing
Collective Collaborative Tagging System Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana.
High Performance Dimension Reduction and Visualization for Large High-dimensional Data Analysis Jong Youl Choi, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox.
6/3/2015 T.K. Cocx, Prediction of criminal careers through 2- dimensional Extrapolation W. Kosters et al.
Interpolative Multidimensional Scaling Techniques for the Identification of Clusters in Very Large Sequence Sets April 27, 2011.
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Multivariate Data Visualization Adapted from Slides by: Matthew O. Ward Computer Science Department Worcester Polytechnic Institute This work was supported.
Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Protein Sequence Classification Using Neighbor-Joining Method
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Microarray analysis Algorithms in Computational Biology Spring 2006 Written by Itai Sharon.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
AdvisorStudent Dr. Jia Li Shaojun Liu Dept. of Computer Science and Engineering, Oakland University 3D Shape Classification Using Conformal Mapping In.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Science in Clouds SALSA Team salsaweb/salsa Community Grids Laboratory, Digital Science Center Pervasive Technology Institute Indiana University.
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Lionel F.
Remarks on Big Data Clustering (and its visualization) Big Data and Extreme-scale Computing (BDEC) Charleston SC May Geoffrey Fox
Presenter: Yang Ruan Indiana University Bloomington
Parallel Applications And Tools For Cloud Computing Environments Azure MapReduce Large-scale PageRank with Twister Twister BLAST Thilina Gunarathne, Stephen.
SALSA HPC Group School of Informatics and Computing Indiana University.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Multidimensional Scaling by Deterministic Annealing with Iterative Majorization Algorithm Seung-Hee Bae, Judy Qiu, and Geoffrey Fox SALSA group in Pervasive.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
ISOMAP TRACKING WITH PARTICLE FILTER Presented by Nikhil Rane.
SCALABLE AND ROBUST DIMENSION REDUCTION AND CLUSTERING
Data Projections & Visualization Rajmonda Caceres MIT Lincoln Laboratory.
Looking at Use Case 19, 20 Genomics 1st JTC 1 SGBD Meeting SDSC San Diego March Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
12/24/2015 A.Aruna/Assistant professor/IT/SNSCE 1.
SALSA Group Research Activities April 27, Research Overview  MapReduce Runtime  Twister  Azure MapReduce  Dryad and Parallel Applications 
Parallel Applications And Tools For Cloud Computing Environments CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Yang Ruan PhD Candidate Salsahpc Group Community Grid Lab Indiana University.
Principal Components Analysis ( PCA)
Out of sample extension of PCA, Kernel PCA, and MDS WILSON A. FLORERO-SALINAS DAN LI MATH 285, FALL
Bioinformatics Overview
Thilina Gunarathne, Bimalee Salpitkorala, Arun Chauhan, Geoffrey Fox
Data Mining, Neural Network and Genetic Programming
Concept Map: Clustering Visualizations of Categorical Domains
Date of download: 11/9/2017 Copyright © ASME. All rights reserved.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Principal Component Analysis (PCA)
Our Objectives Explore the applicability of Microsoft technologies to real world scientific domains with a focus on data intensive applications Expect.
Figure 1. Complete work-flow of the Scasat
DACIDR for Gene Analysis
Hierarchical clustering approaches for high-throughput data
Step-By-Step Instructions for Miniproject 2
Three Dimensional Viewing
Jewels, Himalayas and Fireworks, Extending Methods for
SPIDAL and Deterministic Annealing
Adaptive Interpolation of Multidimensional Scaling
Protein Structures.
Star Coordinates A Multi-dimensional Visualization Technique with Uniform Treatment of Dimensions.
Towards High Performance Data Analytics with Java
Dimension reduction : PCA and Clustering
Jewels, Himalayas and Fireworks, Extending Methods for
Large scale multilingual and multimodal integration
Vision: Modular analysis – or not?
MDS and Visualization September Geoffrey Fox
Star Coordinates : A Multi-dimensional Visualization Technique with Uniform Treatment of Dimensions Rohit Kelapure & David Tussedorf.
Presentation transcript:

Overview Identify similarities present in biological sequences and present them in a comprehensible manner to the biologists Objective Capturing Similarity Presenting Similarity # X Y Z 0.358 0.262 0. 295 1 0.252 0.422 0.372 D1 P1 Distance Calculation D2 P2 Dimension Reduction D3 P3 Clustering D4 P4 Visualization D5 >G0H13NN01D34CL GTCGTTTAAGCCATTACGTC … >G0H13NN01DK2OZ GTCGTTAAGCCATTACGTC … # Cluster 1 3 Processes: P1 – Pairwise distance calculation P2 – Multi-dimensional scaling P3 – Pairwise clustering P4 – Visualization Data: D1 – Input sequences D2 – Distance matrix D3 – Three dimensional coordinates D4 – Cluster mapping D5 – Plot file 8/23/2013

Applications Pairwise Distance Calculation Given a set of gene sequences performs pairwise alignment and distance computation Pleasingly parallel SPMD implementation with a combine step at the end Pairwise Clustering with Deterministic Annealing Given a 𝑁𝑥𝑁 distance matrix for 𝑁 sequences classifies sequences into clusters Threading is used in fork-join style parallel “for” loops Multi-dimensional Scaling Given a 𝑁𝑥𝑁 distance matrix for 𝑁 sequences maps sequences into xD (usually x=3) points while preserving pairwise distance Vector Sponge Clustering with Deterministic Annealing Solves problems where k-Means applicable i.e. points have vectors allowing trimmed clusters of user determined size and a sponge to pick up points not in clusters

Metagenomics with DA clusters Pathology 54D COG Database with a few biology clusters LC-MS 2D Lymphocytes 4D 8/23/2013