Link Reconstruction from Partial Information Gong Xiaofeng, Li Kun & C. H. Lai

Slides:



Advertisements
Similar presentations
Estimating the detector coverage in a negative selection algorithm Zhou Ji St. Jude Childrens Research Hospital Dipankar Dasgupta The University of Memphis.
Advertisements

Detection Chia-Hsin Cheng. Wireless Access Tech. Lab. CCU Wireless Access Tech. Lab. 2 Outlines Detection Theory Simple Binary Hypothesis Tests Bayes.
1 ECE 776 Project Information-theoretic Approaches for Sensor Selection and Placement in Sensor Networks for Target Localization and Tracking Renita Machado.
Brief introduction on Logistic Regression
Evaluating Classifiers
Learning Algorithm Evaluation
Evaluation of segmentation. Example Reference standard & segmentation.
From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Classification and risk prediction
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
CS 8751 ML & KDDEvaluating Hypotheses1 Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Evaluating Hypotheses
Chapter Goals After completing this chapter, you should be able to:
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
IENG 486 Statistical Quality & Process Control
PSY 307 – Statistics for the Behavioral Sciences
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Sample Size Determination Ziad Taib March 7, 2014.
Decision Tree Models in Data Mining
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
Hypothesis Testing:.
Testing Hypotheses I Lesson 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics n Inferential Statistics.
Chapter 10 Hypothesis Testing
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Fundamentals of Hypothesis Testing: One-Sample Tests
Chapter 5 Sampling and Statistics Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Evaluation – next steps
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Week 8 Fundamentals of Hypothesis Testing: One-Sample Tests
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Experimental Evaluation of Learning Algorithms Part 1.
A Course In Business Statistics 4th © 2006 Prentice-Hall, Inc. Chap 9-1 A Course In Business Statistics 4 th Edition Chapter 9 Estimation and Hypothesis.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Machine Learning 5. Parametric Methods.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Lecture 1.31 Criteria for optimal reception of radio signals.
Evaluating Classifiers
Hypothesis Testing Is It Significant?.
Data Mining Lecture 11.
Data Mining Classification: Alternative Techniques
Statistical Process Control
9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE
STATISTICAL INFERENCE PART IV
Learning Algorithm Evaluation
Model Evaluation and Selection
Testing Hypotheses I Lesson 9.
Presentation transcript:

Link Reconstruction from Partial Information Gong Xiaofeng, Li Kun & C. H. Lai

General situations where problems may arise Observed network (A NxN filled with 0s and 1s) Scenarios: A) no side information. statistical analysis, clustering, modeling, process, etc. B) Some links are uncertain (positions known) link reconstruction problem, based on model, similarity measure. C) Some 1s are set to be 0s (positions unknown) variant problem of link reconstruction, possible related to link prediction. D) network is subject to change. one kind of prediction problem (link prediction), node prediction, network evolution, etc.

B.1 Problem of network reconstruction Guess out the values (0 or 1) of dashed arrows. There are some unknown links, which may be corrupted, missed or unable to measure at time. Presumptions: o Network has structures. o Unknown links are fairly sampled. oNumber of unknown links are small.

B.2 Procedures of reconstruction of links Available information -> fitted probabilistic model P(NxN) -> connection probability p(i,j) of each unknown links (i,j) -> determine a threshold of connection probability Pt -> set (i,j) to be 1, if p(i,j)>pt, and 0 otherwise observed network parameters model function optimization connection probability threshold reconstruction or prediction modeling prediction

B.3 Reformulated signal detection problem Observed network -> 3 types of signals, 0, 1 and ?. Fitted model -> connection probabilities, P0 and P1. Signals (P?) to be classified -> ? Problem: Giving connection probability P? -> type of signal (0 or 1) Assumption under certain model: Unknown links do not influence significantly the reliability of fitted model (P0 and P1), i.e., Connection probability P? of any unknown link can be regarded as be sampled from P0 or P1.

Searching an optimal detection scheme? e.g., Neyman-Pearson criterion, Observation (data): connection probability (p) Hypothesis: H0: 0-link and H1: 1-link Data space E: R0 and R1, acceptance region Decision D: D0 (accept H0) and D1 (accept H1) B.4 An equivalent hypothesis testing problem

B.5 Measuring reconstruction performance actual value predictingoutcome pn p’True Positive (TP)False Positive (FP)P’ n’False Negative (FN)True Negative (TN)N’ PN Contingency table (or confusion matrix) statistics defined: Sensitivity or True Positive Rate ( TPR ) : TPR=TP/P=TP/(TP+FN) False Positive Rate ( FPR ) : FPR=FP/N=FP/(FP+TN) Accuracy ( ACC ) : ACC=(TP+TN)/(P+N) True Negative Rate or Specificity ( SPC ) : SPC=TN/N=1-FPR Positive Predictive Value ( PPV ) : PPV=TP/(TP+FP) Receiver Operating Characteristic ( ROC ) : TPR vs. FPR

B.6 Relation to performance measures f0(p) R4R3 R2 R1 f1(p) pt connection probabilities

B.7 Criterion of MAP For reconstruction problem, we choose criterion to maximize the a posteriori probability of the two hypothesis.

A.1 Probabilistic model of structured networks

A.2 Estimate model parameters (MLE)

B.8 Example network

B.9 Density function of connection probabilities

B.10 MAP detector minimizes average error Density function is usually jagged and difficult to work with. Distribution function is preferred. Consider the minimum average error (cost).

B.11 Distribution of connection probabilities

B.12 Generalizability of algorithm Unknowns following same distribution approximately? Possible reasons for unfavorable burst at tail, source of model error.

B.13 Robustness of algorithm sensitive to number of unknown links?

B.14 Comparison of operation points

B.15 Reconstruction results PNACC (%)TP/P (%)TN/N (%)TP/(TP+FP) (%) USAir Network, 10% missed

C.1 A variant problem of link reconstruction Observed network -> types of signals, 0 and some 0s are originally 1s, but be set as 0s. position unknown, number known or unknown.

C.2 Procedures for the variant problem Available information -> fitted probabilistic model P(NxN) -> connection probability p(i,j) of each 0-link (i,j) -> (a) number (M) unknown -> determine a threshold of connection probability Pt -> set (i,j) to be 1, if p(i,j)>pt, and 0 otherwise (b) number (M) known -> scoring: ranking connection probabilities of candidate links (all 0-links) -> set M links with highest score to be 1s.

C.3 Algorithm based on common neighbor

C.4 Comparison between two methods Probability density functions Distribution functions

C.5 Generalizability and robustness of algorithms

C.6 Reconstruction performance by ranking

D.1 Problem of link prediction Procedure is identical to that of the variant link reconstruction problem. Econophysics Co-authorship network (N=506, m=519, nL=379)

D.2 Factors to affect prediction performance Problem of generalizability: a) size of the training set, or time span of prediction; b) time-changing growing mechanism

D.3 Effects of training set size Assume new links to be known, examine the variant problem above: training data set is not able to capture underlying distribution faithfully, either size is too small or growing rule is time dependent.

Conclusions The problem of network reconstruction is thoroughly studied. Under more general framework, the problem can be reformulated as hypothesis testing problem, which gives deeper insights into our understanding of the problem, and enable us to relate the reconstruction performance of various methods to quantities at more fundamental level.

THANK YOU