Network Mapping of Large Data Sets Al Ozonoff, Ph.D. Joel Bernanke, M.Sc. Boston University School of Public Health.

Slides:



Advertisements
Similar presentations
Network analysis Sushmita Roy BMI/CS 576
Advertisements

Network biology Wang Jie Shanghai Institutes of Biological Sciences.
Estimating the number of Kentuckians living with HIV disease with unmet needs for HIV-related primary care in calendar year 2010  Reducing new HIV infections.
Analysis and Modeling of Social Networks Foudalis Ilias.
Modeling Malware Spreading Dynamics Michele Garetto (Politecnico di Torino – Italy) Weibo Gong (University of Massachusetts – Amherst – MA) Don Towsley.
Associations between Obesity and Depression by Race/Ethnicity and Education among Women: Results from the National Health and Nutrition Examination Survey,
The multi-layered organization of information in living systems
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
4. PREFERENTIAL ATTACHMENT The rich gets richer. Empirical evidences Many large networks are scale free The degree distribution has a power-law behavior.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
School of Information University of Michigan SI 614 Random graphs & power law networks preferential attachment Lecture 7 Instructor: Lada Adamic.
Emergence of Scaling in Random Networks Barabasi & Albert Science, 1999 Routing map of the internet
A Real-life Application of Barabasi’s Scale-Free Power-Law Presentation for ENGS 112 Doug Madory Wed, 1 JUN 05 Fri, 27 MAY 05.
Rivers of the World are Small-World Networks Carlos J. Anderson David G. Jenkins John F. Weishampel.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Spreading dynamics on small-world networks with a power law degree distribution Alexei Vazquez The Simons Center for Systems Biology Institute for Advanced.
Sedgewick & Wayne (2004); Chazelle (2005) Sedgewick & Wayne (2004); Chazelle (2005)
Global topological properties of biological networks.
Centrality Measures These measure a nodes importance or prominence in the network. The more central a node is in a network the more significant it is to.
Effectiveness of interactive web-based lifestyle program on prevention of cardiovascular diseases risk factors in patient with metabolic syndrome: a randomized.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
Protein Classification A comparison of function inference techniques.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Models of Influence in Online Social Networks
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
The Erdös-Rényi models
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Funded by the EC, FP 6, Contract No (FOOD) Blood lipids among young children in Europe: results from the European IDEFICS study Stefaan De Henauw.
The United States air transportation network analysis Dorothy Cheung.
Simple Linear Regression
Obesity among Hispanics - a brief demographic account Rodolfo Valdez, Ph.D., M. Sc. Division of Diabetes Translation Centers for Disease Control and Prevention.
Social Network Analysis: What it Is, How it Works, and How You Can Do It Prof. Paul Beckman San Francisco State University.
HS499 Bachelor’s Capstone Week 6 Seminar Research Analysis on Community Health.
ANALYZING PROTEIN NETWORK ROBUSTNESS USING GRAPH SPECTRUM Jingchun Chen The Ohio State University, Columbus, Ohio Institute.
LANGUAGE NETWORKS THE SMALL WORLD OF HUMAN LANGUAGE Akilan Velmurugan Computer Networks – CS 790G.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Copyright restrictions may apply Household, Family, and Child Risk Factors After an Investigation for Suspected Child Maltreatment: A Missed Opportunity.
1 Clinical Investigation and Outcomes Research Research Using Existing Databases Marcia A. Testa, MPH, PhD Department of Biostatistics Harvard School of.
Introduction to STATA for Clinical Researchers Jay Bhattacharya August 2007.
Taipei Medical University. Adolescents with Higher Althernate Healthy Eating Index For Taiwan (AHEI-T) Scores Have Lower Blood Lipid Level De-Zhi Weng,
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Lipoatrophy and lipohypertrophy are independently associated with hypertension: the effect of lipoatrophy but not lipohypertrophy on hypertension is independent.
Psychological Distress and Recurrent Pain: Results from the 2002 NHIS Psychological Distress and Recurrent Pain: Results from the 2002 NHIS Loren Toussaint,
Web Intelligence Complex Networks I This is a lecture for week 6 of `Web Intelligence Example networks in this lecture come from a fabulous site of Mark.
Complex Network Theory – An Introduction Niloy Ganguly.
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
2007 Youth Risk Behavior Survey Results Alaska High School Survey Grades 9-12 Alaska Division of Public Health Weighted Data Sexual Behaviors.
Complex Network Theory – An Introduction Niloy Ganguly.
LECTURE 2 1.Complex Network Models 2.Properties of Protein-Protein Interaction Networks.
The Geography of HIV in Harris County, Texas,
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Taipei Medical University The correlation between Geriatric Nutritional Risk Index and nutritional status in hemodialysis patients I ntroduction O bjective.
Area/Density/Edge Metrics Patch radius of gyration – measure of avg distance organism can move within a patch before patch bnd(extent) Correlation length.
Information Retrieval Search Engine Technology (10) Prof. Dragomir R. Radev.
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Iron, Cholesterol, and Cancer Brian J. Wells, MD, MS The Cleveland Clinic Foundation.
Scale-free and Hierarchical Structures in Complex Networks L. Barabasi, Z. Dezso, E. Ravasz, S.H. Yook and Z. Oltvai Presented by Arzucan Özgür.
Correlation & Simple Linear Regression Chung-Yi Li, PhD Dept. of Public Health, College of Med. NCKU 1.
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Doyle M. Cummings, Pharm.D.,FCP, FCCP
Predict Failures with Developer Networks and Social Network Analysis
Network topology. Network topology. Nodes are linked by edges. Node size represents a quantifiable node property (e.g. fold-change in two different experimental.
Journal reviews 이승호.
Presentation transcript:

Network Mapping of Large Data Sets Al Ozonoff, Ph.D. Joel Bernanke, M.Sc. Boston University School of Public Health

May 22, 2008Interface - RISK : Reality2 Network Analyses of Linked Data Sets ─ Yook (2002) developed network generators that captured the Internet’s topology; postulated preferential attachment and linear distance dependence. Yook, S.-H., Jeong, H., & Barabasi, A.-L Modeling the Internet’s large- scale topology. PNAS, 99, ─ Schwikowski (2000) built a protein-protein interaction network in yeast to predict protein function. Schwikowski, B., Uetz, P., & Fields, S A network of protein-protein interaction in yeast. Nature Biotechnology, 18 12,

May 22, 2008Interface - RISK : Reality3 Networks in Public Health ─ Jones (2003) reported on power-law scaling in sexual contact networks, relating the scaling coefficient to the rate of disease transmission and the threat of epidemic. Jones, J. H., & Handcock, M. S An assessment of preferential attachment as a mechanism for human sexual network formation. Proc. R. Soc. Lond. B, 270, ─ De (2004) used network centrality measures to identify key individuals in a gonorrhea outbreak. De, P., Singh, A. E., Wong, T., Yacoub, W. & Jolly, A. M Sexual network analysis of gonorrhea outbreak. Sex Transm Infect, 80,

May 22, 2008Interface - RISK : Reality4 Natural Mapping of a Data Set When linkages are not predefined, suitable criteria for identifying linkages must be developed. We propose a natural mapping of a data set onto a network: variables map to nodes and the associations among variables map to edges

May 22, 2008Interface - RISK : Reality5 The NHANES Data Set The National Health and Nutrition Examination Survey (NHANES) assesses the health and nutritional status of adults and children in the United States through interviews and physical examinations. The NHANES data set includes: ─ Demographics─ Laboratory test results ─ Dietary records─ Physiological measurements ─ General health information

May 22, 2008Interface - RISK : Reality6 Selecting Data to Map A selected subset of continuous measures from all four of the NHANES modules were included in the analysis. Continuous measures with small numbers of observations (< 20) were excluded. Examples: ─ Age (years) ─ Blood titers ─ Number of green vegetables eaten per month ─ Cardiovascular stress test measurements

May 22, 2008Interface - RISK : Reality7 Generating a Correlation Matrix We generated a correlation matrix that includes the Spearman correlation between every variable and every other variable. All the correlations were converted to their absolute value. We included correlations in in the matrix regardless of their significance.

May 22, 2008Interface - RISK : Reality8 Mapping the NHANES Data Set Variables were mapped to nodes. Spearman correlations among the variables were mapped to edges. The exact correlation was either retained as a measure of the strength of an association or was dichotomized (0, 1) based on a cutoff. Age (years) Body Mass Index 0.6 Age (years) Body Mass Index Cutoff = 0.7

May 22, 2008Interface - RISK : Reality9 Software SAS 9.1 – Integrate NHANES data modules and generate correlation matrix. UUCINET – Convert correlation data to network data. Netdraw – Visualize and analyze network data. KeyPlayer – Identify key players.

May 22, 2008Interface - RISK : Reality10 Networks by Cutoff Cutoff = 0.2Cutoff = 0.5Cutoff = 0.8

May 22, 2008Interface - RISK : Reality11 Distribution of Connections by Cutoff Cutoff = 0.2Cutoff = 0.5Cutoff = 0.8

May 22, 2008Interface - RISK : Reality12 Degrees and Unlinked Nodes Mean number of connections per node (degree) Percentage of unlinked nodes (isolates) Cutoff

Hubs and Key Players May 22, 2008Interface - RISK : Reality13 Hubs – Nodes with many connections (edges). Key Players – A set of N nodes that, in this case, is maximally correlated with the rest of the network.

May 22, 2008Interface - RISK : Reality14 10 Key Players For the entire weighted network: ─ Age (years)─ CD4 count (cells/mm 3 ) ─ Urine creatinine (mg/dl)─ CD8 count (cells/mm 3 ) ─ Upper arm length (cm)─ Alcohol fasting time (min) ─ Antacid / laxative fasting time (min) ─ Number of years taking insulin ─ How often wore hearing aid in the past year (number) ─ Lipid adjusted dioxin (pg/g)

May 22, 2008Interface - RISK : Reality15 Hubs and Key Players - Creatinine Nodes with higher degrees are larger. The purpple squares are the10 key players. Notice that the key players are not necessarily the largest hubs.

May 22, 2008Interface - RISK : Reality16 Urine Creatinine Ego Network Urine Elements e.g. Molybdenum Urine Creatinine Urine Phthalates Urine Phosphates

May 22, 2008Interface - RISK : Reality17 Hubs and Key Players – CD4, CD8 Nodes with higher degrees are larger. The blue squares are the 10 key players. Notice that the key players are not necessarily the largest hubs.

May 22, 2008Interface - RISK : Reality18 CD4, CD8, and Immunotoxins Isoflavones CD-4 counts CD-8 counts PCBs TCDDs

May 22, 2008Interface - RISK : Reality19 Conclusion Future directions: ─ Further exploration of scale-free (power law) properties of the NHANES data network. ─ Extend methodology to binary outcomes. ─ Account for negative correlations. ─ Investigate confounding. ─ Analyze additional data sets.

May 22, 2008Interface - RISK : Reality20 Network Terms Node – a junction point. Edge – a line connecting two nodes. Degree – the number of edges a node has. Hub – a node with many connections (edges). Key players – a group of nodes who together are connected to the maximum number of distinct nodes. Power distribution – f(x) ~ x- γ

May 22, 2008Interface - RISK : Reality21 A Basic Undirected Network Isolate – a node that is not connected to the rest of the network. Pendant – a node that is connected to the rest of the network by only one edge.