Multivariate Ranking, Prioritization, and Selection Using Partial Order for Comparative Knowledge Discovery in Multi-Indicator Information Fusion Systems.

Slides:



Advertisements
Similar presentations
Modellistica e Gestione dei Sistemi Ambientali A tool for multicriteria analysis: The Analytic Hierarchy Process Chiara Mocenni University of.
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
3D reconstruction.
Medical Image Registration Kumar Rajamani. Registration Spatial transform that maps points from one image to corresponding points in another image.
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Topic 6 - Image Filtering - I DIGITAL IMAGE PROCESSING Course 3624 Department of Physics and Astronomy Professor Bob Warwick.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Chapter 8 Content-Based Image Retrieval. Query By Keyword: Some textual attributes (keywords) should be maintained for each image. The image can be indexed.
Visual Recognition Tutorial
Lecture Presentation Software to accompany Investment Analysis and Portfolio Management Seventh Edition by Frank K. Reilly & Keith C. Brown Chapter.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Comparison and Combination of Ear and Face Images in Appearance-Based Biometrics IEEE Trans on PAMI, VOL. 25, NO.9, 2003 Kyong Chang, Kevin W. Bowyer,
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Lecture II-2: Probability Review
Statistical Color Models (SCM) Kyungnam Kim. Contents Introduction Trivariate Gaussian model Chromaticity models –Fixed planar chromaticity models –Zhu.
Peter Congdon, Centre for Statistics and Department of Geography, Queen Mary University of London. 1 Spatial Path Models with Multiple.
Linear Algebra and Image Processing
Digital Image Characteristic
Accuracy Assessment. 2 Because it is not practical to test every pixel in the classification image, a representative sample of reference points in the.
CS 376b Introduction to Computer Vision 02 / 26 / 2008 Instructor: Michael Eckmann.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Chapter 2 Summarizing and Graphing Data
Lecture Presentation Software to accompany Investment Analysis and Portfolio Management Seventh Edition by Frank K. Reilly & Keith C. Brown Chapter 7.
Some Background Assumptions Markowitz Portfolio Theory
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Medical Image Analysis Image Reconstruction Figures come from the textbook: Medical Image Analysis, by Atam P. Dhawan, IEEE Press, 2003.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
Course 9 Texture. Definition: Texture is repeating patterns of local variations in image intensity, which is too fine to be distinguished. Texture evokes.
SINGULAR VALUE DECOMPOSITION (SVD)
Spatial Interpolation III
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM.
Raquel A. Romano 1 Scientific Computing Seminar May 12, 2004 Projective Geometry for Computer Vision Projective Geometry for Computer Vision Raquel A.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
So, what’s the “point” to all of this?….
Methods for point patterns. Methods consider first-order effects (e.g., changes in mean values [intensity] over space) or second-order effects (e.g.,
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Statistical Data Analysis 2010/2011 M. de Gunst Lecture 10.
Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.
Sliding Window Filters Longin Jan Latecki October 9, 2002.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
INDIAN SCIENCE CONGRESS Mumbai 2015 Actuarial Science Symposium G. P. Patil Penn State University, University Park, PA USA.
1 Forum for Interdisciplinary Mathematics Patna, India G. P. Patil December 2010.
Motivation, Description, and Timeliness Geoinformatics for spatial and temporal hotspot detection and prioritization is a critical need for.
1 Seattle JSM Session G. P. Patil August 6, 2006.
Geographic and Network Surveillance for Arbitrarily Shaped Hotspots Overview Geospatial Surveillance Upper Level Set Scan Statistic System Spatial-Temporal.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Comparative Knowledge Discovery with Partial Order and Composite Indicator Partial Order Ranking of Objects with Weights for Indicators and Its Representability.
1 Multi-criterion Ranking and Poset Prioritization G. P. Patil December 2004 – January 2005.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
CSE 554 Lecture 8: Alignment
Data Transformation: Normalization
EPA Presentation March 13,2003 G. P. Patil
Frequency Distributions
NSF Digital Government surveillance geoinformatics project, federal agency partnership and national applications for digital governance.
Principal Component Analysis (PCA)
Fitting Curve Models to Edges
Learning with information of features
Computer Vision Lecture 16: Texture II
Geographic and Network Surveillance for Arbitrarily Shaped Hotspots
Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.
Albany New York (1) G. P. Patil
Image Filtering Readings: Ch 5: 5. 4, 5. 5, 5. 6, , 5
Essentials of Statistics 4th Edition
Presentation transcript:

Multivariate Ranking, Prioritization, and Selection Using Partial Order for Comparative Knowledge Discovery in Multi-Indicator Information Fusion Systems with Multi- Disciplinary Applications: Recent Past, Present, and Near Future Statistics seminar G P Patil

2 Federal Agency Partnership CDC DOD EPA NASA NIH NOAA USFS USGS Agency Databases Thematic Databases Other Databases Homeland Security Disaster Management Public Health Ecosystem Health Other Case Studies Statistical Processing: Hotspot Detection, Prioritization, etc. Data Sharing, Interoperable Middleware Standard or De Facto Data Model, Data Format, Data Access Arbitrary Data Model, Data Format, Data Access Application Specific De Facto Data/Information Standard Agency Databases Thematic Databases Other Databases Homeland Security Disaster Management Public Health Ecosystem Health Other Case Studies Statistical Processing: Hotspot Detection, Prioritization, etc. Data Sharing, Interoperable Middleware Standard or De Facto Data Model, Data Format, Data Access Arbitrary Data Model, Data Format, Data Access Application Specific De Facto Data/Information Standard SurvellanceGeoinformaticsof Hotspot Detection, Prioritization and Early Warning NSF Digital Government Project # PI: G. P. Patil Websites: NSF Digital Government surveillance geoinformatics project, federal agency partnership and national applications for digital governance. Cellular Surface National and International Applications Biosurveillance Carbon Management Coastal Management Community Infrastructure Crop Surveillance Disaster Management Disease Surveillance Ecosystem Health Environmental Justice Environmental Management Environmental Policy Homeland Security Invasive Species Poverty Policy Public Health Public Health and Environment Robotic Networks Sensor Networks Social Networks Syndromic Surveillance Tsunami Inundation Urban Crime Water Management

3 We present a prioritization innovation. It lies in the ability for ranking and prioritization of objects and indicators based on intrinsic multiple indicator structural characteristics without having to integrate indicators into an index, using partial order sets and related novel concepts, methods, techniques, and tools. This leads us to early warning systems, and also to the selection of investigational entities. Prioritization Innovation Partial Order Set Ranking

Preliminaries Data Matrix : Multivariate Data Set Indicator Data Matrix: [x ij ]:n rows/objects: a 1 …a n : m columns/indicators: I 1 …I m.Objects may be entities, such as, individuals, units, pixels, areas, regions, patients, genes, drugs, documents, clients, products, tools with relevant characteristics as potential indicators for some single or multiple outcomes, endpoints, concepts, domains. m-dimensional data set consisting of n data points: no measurement column available on response variable y. To begin with, latent( abstract) concept for the objects with indicative indicator values/ measurements with common orientation.

As a simple example, consider size of an individual as the abstract concept.Consider height, weight, volume of the individual as indicators of size with assumed common orientation of positive monotonicity/ positive correlations.Generally speaking, larger the size, larger the indicator; larger the indicator, larger the size. The three indicators/ indicator measurements may have three-dimensional elliptical distributon with pairwise positive correlations.

The multivariate data set is usually a nonlinear partially ordered set. Not all pairs of objects are comparable. For a two indicator set up: Figures: Ranking usually amounts to linearizing the poset by ranking the objects with appropriate scalar rank-scores consistent with the comparability in the data matrix. Rank-scores need to inherit the comparabilities in the data set. Incomparable pairs are expected to become comparable in either direction.

On which line is the linearized set to lie? Without loss of generality, on which axis passing thru the origin? In which manner of separations between successive objects?Projections on a ray thru the origin have been popular.The ray is determined by w= ( w 1,…,w m ), where w j >0,with summation of w j being unity, a differential weight vector, measuring relative importance of indicators for the abstract concept.Projection is a fixed scalar multiple of what is popularly called weighted composite index with weight vector w.

Choice of w involves subjective trade off/ compensation among indicators.It becomes a sensitive issue between stakeholders.Reconciliation in view of data matrix evidence becomes a practical challenge and scientific/ statistical opportunity. Can we think of a data based w intrinsic to the data matrix?And relative to such a w, and its corresponding ray, can we think of alternative ways of computing appropriate rank-scores, which do not involve indicator trade offs? And if we can think of several methods of rank-scores and resultant rankings, is it possible to measure their individual performance to help find a best method among them for the given data set? Interestingly, all of these questions are frontier questions that we should wish to address. And fortunately, we now have some initial answers that we wish to share on the challenging issues of multivariate ranking over the past several decades.

Intrinsic Differential Weight Vector w I for the Data Matrix based Indicator Set, Measuring Relative Importance of Indcators. Method1: L 0 -distance: Pairwise Object Comparisons, and Indicator Agreements among Object Comparison Disagreements. Method2: L 1 -distance: Pairwise Indicator Ranking Comparisons. Method3: L 2 -distance: Pairwise Indicator Ranking Comparisons.

Method1: Consider Multivariate Zeta Matrix: nxn. Object x Object Comparability Matrix. Cell Entry: m-variate bit, binary digit:111…,000…, …01, where 1 if a i > a j, and 0, otherwise. Comparability cell has all 1’s, or, all 0’s in its bit, indicating collective agreement among indicators. Incomparability cell has some 1’s and some 0’s in its bit, indicating collective disagreement among indicators. For each incomparability cell, count for each indicator the number of agreements with the collectivity of indicators. Add up for each indicator over all of the incomparability cells. Normalize/ unitize to give the intrinsic w I we are looking for. Incidentally, and importantly, this intrinsic w I also provides a powerful basis for comparison and selection of indicators. Method2 and Method3: Will come back, if time permits.

Conceptualizing and Computing Performance Measure of a Comparability Invariant Partial Order Ranking Method: Consider Multivariate Zeta Matrix as before: But, this time, Cell entry: ( m+1 )-variate bit with the first m variates as bebefore, and the ( m+1 )-th variate corresponding to the Ranking. For each incomparability cell, count for each indicator the agreement with the Ranking.Add up for each indicator over all the incomparability cells. Normalize/ unitize to give the w R induced by the Ranking R. Define its performance measure PMR by corr/ gen. corr ( w I, w R ).

Some Comparability Invariant/ Partial Order based Ranking Methods: Method1: Weighted Composite Index for Rank- score: WCI. Method2: Comparability Weighted Net Superiority Index for Rank-score: CWNSI. Method3: MCMC based Weighted indicator Average Rank for Rank-score: WIARI. Method4: MCMC based Weighted indicator Cumulative Rank Frequency Distribution for Stochastic Rank-score: WICRFDI.

13 Method1: Weighted Composite Index for Rank-score Existence of intrinsic differential weight vector w and the correspondingly weighted composite index w.x = | w | | x | cos( w, x ) = | w | x projection of x on w. w.d = 0, w.d > 0, w.d < 0 where d = x 1 – x 2

An illustration with two indicator space

Method2: Comparability Weighted Net Superiority Index for Rank-score Rank-score ( x )=(O(x)- F(x))(O(x)+ F(x) )/ (n-1) = Net Superiority x Comparability Figure:

Method3: MCMC based Weighted Indicator Average Rank for Rank-score

Method4: MCMC based Weighted Indicator Cumulative Rank Frequency Distribution for Stochastic Rank-score

Multivariate NonParametrics with Partial Order: Multivariate Ranking With Multivariate Data Set as Data Matrix Data Matrix: [x ij ], n x m, Columns as variables, and not known necessarily as indicators with common orientation.

Consider 2 m transforms of the Data Matrix with columns retained or reversed. Transform ID given by m- dimensional bit, …01, 0 to mean retain, and 1 to mean reverse. Multivariate Median: For each transform, compute CWNSI to provide a triplet of its median object and an object immediately above and below in rank.

Consider the object frequency distribution over the 3 x 2 m objects thus centrally discovered. Declare the modal object of this frequency distribution to be the multivariate median estimate we are looking for. It is possible to have several maximal modes, in which case, their centroid may be declared as the estimate.

Alternatively,allocate minimum rank to each object from within its 2 m CWNSI rank values from the 2 m transforms. Call this minimum rank its data depth. Maximum data depth will then yield the multivariate median estimate.It is possible to have several objects with maximum data depth. The centroid will play the role.We conjecture approximate affine invariance

Multivariate Order Statistics relative to the Multivariate Median Construct n x m data matrix of co- ordinates-wise/ columns-wise separation of each object from the estimated multivariate median.Consider the 2 m transforms, yielding 2 m rank values for each object.

Choose now the maximum rank, and call it the outlyingness measure of the object, giving the rank-score for the rank for it as a multivariate order statistic. Appropriately weighted linear combinations of these multivariate order statistics will help improve and sharpen the multivariate median. Iterative moving windows on grids will help with image enhancement, and spatial data smoothing.

Some Applications: Genome Wide Association Studies: Knut Wittkowski Eli Lilly Debashis Ghosh Human Environment InterFace Ashbindu Singh Myers and Patil Bruggemann and Patil Several Other Applications Four Current Monographs Here