Multi-level predictive analytics and motif discovery across large dynamic spatiotemporal networks and in complex sociotechnical systems: An organizational.

Slides:



Advertisements
Similar presentations
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Advertisements

Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Whole Genome Alignment using Multithreaded Parallel Implementation Hyma S Murthy CMSC 838 Presentation.
Design of Fault Tolerant Data Flow in Ptolemy II Mark McKelvin EE290 N, Fall 2004 Final Project.
© Copyright Eliyahu Brutman Programming Techniques Course.
BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
CSE 6406: Bioinformatics Algorithms. Course Outline
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Tutorial session 2 Network annotation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
TAL7011 – Lecture 4 UML for Architecture Modeling.
Digesting the Genome Glut Promoting the Use and Extension of GMOD To Emerging Model Organisms David Clements 1 Brian Osborne 2 Hilmar Lapp 1 Xianhua Liu.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Constructing and Analyzing a Gene Regulatory Network Siobhan Brady UC Davis.
Networks are connections and interactions. Networks are present in every aspect of life. Examples include economics/social/political sciences. Networks.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Exploratory Decomposition Dr. Xiao Qin Auburn.
A Viewpoint-based Approach for Interaction Graph Analysis
UML Diagrams By Daniel Damaris Novarianto S..
MULTIPLE GENES AND QUANTITATIVE TRAITS
Fig. 1. — The life cycle of S. papillosus. (A) The life cycle of S
Unified Modeling Language
Global Transcriptional Dysregulation in Breast Cancer
JMC CGEMS SUMMER GENOMICS TRAINING WORKSHOPS
Pipelines for Computational Analysis (Bioinformatics)
UML Diagrams Jung Woo.
Figure 1. Exploring and comparing context-dependent mutational profiles in various cancer types. (A) Mutational profiles of pan-cancer somatic mutations,
 The human genome contains approximately genes.  At any given moment, each of our cells has some combination of these genes turned on & others.
Version 3 April 21, 2006 Takahiro Yamada (JAXA/ISAS)
Here is a plot which we call a raster plot
Software Architecture & Design Pattern
1 Oregon State University
Ahnert, S. E., & Fink, T. M. A. (2016). Form and function in gene regulatory networks: the structure of network motifs determines fundamental properties.
Building and Analyzing Genome-Wide Gene Disruption Networks
BIOLOGY NOTES EVOLUTION PART 2 PAGES
MULTIPLE GENES AND QUANTITATIVE TRAITS
Complex phylogenetic relationships among sand-dwelling Malawi cichlids
Genome organization and Bioinformatics
BNFO 602 Phylogenetics Usman Roshan.
BNFO 602 Phylogenetics – maximum parsimony
Gene expression analysis
PSB RECAP: Genomic Hula
The student is expected to: 6A identify components of DNA, and describe how information for specifying the traits of an organism is carried in the DNA.
Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors  Michael Dannemann, Aida M.
The Impact of Network Medicine in Gastroenterology and Hepatology
Understanding Multi-Environment Trials
Modelling Structure and Function in Complex Networks
Shaul Druckmann, Dmitri B. Chklovskii  Current Biology 
The Study of Biological Information
Thomas Willems, Melissa Gymrek, G
Natural Selection Genetic Drift Gene Flow Mutation Recombination
Volume 3, Issue 1, Pages (July 2016)
BIOLOGY NOTES EVOLUTION PART 2 PAGES
Michal Levin, Tamar Hashimshony, Florian Wagner, Itai Yanai 
Isabelle S. Peter, Eric H. Davidson  Cell 
Gautam Dey, Tobias Meyer  Cell Systems 
Complex evolutionary trajectories of sex chromosomes across bird taxa
BIOLOGY NOTES EVOLUTION PART 2 PAGES
Phylogeny and the Tree of Life
Fig. 2. —Phylogenetic relationships and motif compositions of some representative MORC genes in plants and animals. ... Fig. 2. —Phylogenetic relationships.
Learning to Detect Human-Object Interactions with Knowledge
Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors  Michael Dannemann, Aida M.
Presentation transcript:

Multi-level predictive analytics and motif discovery across large dynamic spatiotemporal networks and in complex sociotechnical systems: An organizational genetics approach Sunil Wattal, Fox School of Business with Rob Kulathinal, College of Science and Technology Zoran Obradovic, College of Science and Technology Youngjin Yoo, Case Western University

Motivation massive amounts of digital trace data human actors and man-made artifacts comprise complex socio-technical systems massively interconnected

Digital Trace Data massive unstructured granular heterogenous dynamic performative

Need for new methods Limitations of traditional econometric models Need for newer approaches parallel with evolutionary systems biology

Research Questions can we characterize a stream of digital trace data from a complex socio-technical system with finite genetic elements? can we explain and predict the behavior of complex socio-technical systems (i.e., phenotypes) based on the underlying pattern of “behavioral gene” (i.e., genotypes) interactions? how do mutational input, gene flow, and recombination in “behavioral genes” affect the evolution of socio-technical systems?

Construction of Behavioral Genes and Behavioral Genomes Structured user-created data Unstructured user-created data Structured sensor data Construction of Behavioral Genes and Behavioral Genomes Sequence Analysis of Behavioral Genes Construction of Behavioral Gene Co-expression Networks Multi-level Dynamic Analysis of Gene Co-Expression Networks

Data transformation and alignment FiA generic model of digital trace data with encoding based on behavioral ontology. Colors, letters, and numbers can represent activities, objects, or individuals. An example alignment using our GitHub data. Time-stamped digital trace data are transformed into analyzable sequence data, which capture the four elements of each event: actor, action, object, and time. Blocks are colored differently according to each activity type; numbers 1 and 2 indicate whether an activity is executed by a core or peripheral developer.

Motif discovery Example motif discovery using GitHub data. Frequently repeated subsequences can be detected through sequence mining techniques. Those subsequences jointly form patterns of the project sequences while each represents related but difference social meanings.

The phylogeny of two projects on GitHub. Each node in the phylogenetic tree represents one version of the project’s source code. The node is labeled as “developer’s name+creation time”. As we can see, jQuery-Box-Slider (A) has three major groups of revisions and most revisions fall into the group at the bottom. While for the jQuery-Fast-Click project (B), there are also three major groups, however, each groups have relatively similar numbers of revisions. Such difference could be contributed by the individual developer vs. various developers involved in each project. 

Network Analysis An example network using WordPress data from December 2012. Network showing 302 WordPress (internal) and external APIs connected by 7,894 edges. Red represents internal APIs; blue represents external API interactions; green is all other APIs. Although the pattern of interactions among genes is non-linear and selective, certain combinations of genes are repeatedly used across a diverse set of functions.

Heat Map Plugin topology overlap matrix plots across time. Topology overlab matricies for 2011-2014 API clusters. API clusters are represented by the colored blocks on the axes of heatmaps. Warmer colors within the heatmaps indicate higher similarity between APIs.

Future Outcomes Predicting the future performance of complex socio-technical systems with dynamic individual components based on the on-going behaviors of the individual components Predicting “what’s next” for individual components’ behaviors

Questions!!!