SiZer Background Finance "tick data":

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.
Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Next Week, Early, Oct.,
RHESSI/GOES Observations of the Non-flaring Sun from 2002 to J. McTiernan SSL/UCB.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Introduction --Classification Shape ContourRegion Structural Syntactic Graph Tree Model-driven Data-driven Perimeter Compactness Eccentricity.
The Digital Image.
Digital Images Chapter 8 Exploring the Digital Domain.
Chapter 2 Dimensionality Reduction. Linear Methods
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
Graphical Summary of Data Distribution Statistical View Point Histograms Skewness Kurtosis Other Descriptive Summary Measures Source:
Clustering of DNA Microarray Data Michael Slifker CIS 526.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
1 Introduction to Dijet Resonance Search Exercise John Paul Chou, Eva Halkiadakis, Robert Harris, Kalanand Mishra and Jason St. John CMS Data Analysis.
CS 6825: Binary Image Processing – binary blob metrics
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Object Orie’d Data Analysis, Last Time Organizational Matters
Object Orie’d Data Analysis, Last Time Statistical Smoothing –Histograms – Density Estimation –Scatterplot Smoothing – Nonpar. Regression SiZer Analysis.
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
Object Orie’d Data Analysis, Last Time Cornea Data & Robust PCA –Elliptical PCA Big Picture PCA –Optimization View –Gaussian Likelihood View –Correlation.
Robust PCA Robust PCA 3: Spherical PCA. Robust PCA.
Noise in 3D Laser Range Scanner Data Xianfang Sun Paul L. Rosin Ralph R. Martin Frank C. Langbein School of Computer Science Cardiff University, UK.
Object Orie’d Data Analysis, Last Time Discrimination for manifold data (Sen) –Simple Tangent plane SVM –Iterated TANgent plane SVM –Manifold SVM Interesting.
Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR.
Object Orie’d Data Analysis, Last Time
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
SiZer Background Scale Space – Idea from Computer Vision Goal: Teach Computers to “See” Modern Research: Extract “Information” from Images Early Theoretical.
SWISS Score Nice Graphical Introduction:. SWISS Score Toy Examples (2-d): Which are “More Clustered?”
Maximal Data Piling Visual similarity of & ? Can show (Ahn & Marron 2009), for d < n: I.e. directions are the same! How can this be? Note lengths are different.
Object Orie’d Data Analysis, Last Time Cornea Data & Robust PCA –Elliptical PCA Big Picture PCA –Optimization View –Gaussian Likelihood View –Correlation.
Object Orie’d Data Analysis, Last Time Organizational Matters What is OODA? Visualization by Projection.
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
Statistics – O. R. 892 Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research University of North Carolina.
1 Hailuoto Workshop A Statistician ’ s Adventures in Internetland J. S. Marron Department of Statistics and Operations Research University of North Carolina.
Participant Presentations Please Prepare to Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Next Week, Early,
1 Long-Range Dependence in a Changing Internet Traffic Mix STATISTICAL and APPLIED MATHEMATICAL SCIENCES INSTITUTE Félix Hernández-Campos Don Smith Department.
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, I J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Next Week, Early, Oct.,
GWAS Data Analysis. L1 PCA Challenge: L1 Projections Hard to Interpret (i.e. Little Data Insight) Solution: 1)Compute PC Directions Using L1 2)Compute.
Object Orie’d Data Analysis, Last Time Organizational Matters
Recall Flexibility From Kernel Embedding Idea HDLSS Asymptotics & Kernel Methods.
Distance Weighted Discrim ’ n Based on Optimization Problem: For “Residuals”:
SigClust Statistical Significance of Clusters in HDLSS Data When is a cluster “really there”? Liu et al (2007), Huang et al (2014)
Object Orie’d Data Analysis, Last Time DiProPerm Test –Direction – Projection – Permutation –HDLSS hypothesis testing –NCI 60 Data –Particulate Matter.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Unsupervised Learning
Statistical Smoothing
Return to Big Picture Main statistical goals of OODA:
Cluster Analysis II 10/03/2012.
Database management system Data analytics system:
Exploring Microarray data
A Forest of Sensors: Using adaptive tracking to classify and monitor activities in a site Eric Grimson AI Lab, Massachusetts Institute of Technology
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Radial DWD Main Idea: Linear Classifiers Good When Each Class Lives in a Distinct Region Hard When All Members Of One Class Are Outliers in a Random Direction.
(will study more later)
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
Maximal Data Piling MDP in Increasing Dimensions:
Participant Presentations
Principal Component Analysis
Principal Nested Spheres Analysis
ECE 692 – Advanced Topics in Computer Vision
Efficient Distribution-based Feature Search in Multi-field Datasets Ohio State University (Shen) Problem: How to efficiently search for distribution-based.
Jianbin Wang, H. Christina Fan, Barry Behr, Stephen R. Quake  Cell 
Dimension reduction : PCA and Clustering
Paul Craig, Jessie Kennedy
Unsupervised Learning
Volume 16, Issue 4, Pages (July 2016)
Presentation transcript:

SiZer Background Finance "tick data": (time, price) of single stock transactions Idea: "on line" version of SiZer for viewing and understanding trends

SiZer Background Finance "tick data": (time, price) of single stock transactions Idea: "on line" version of SiZer for viewing and understanding trends Notes: trends depend heavily on scale double points and more background color transition (flop over at top)

SiZer Background Internet traffic data analysis: SiZer analysis of time series of packet times at internet hub (UNC) Hannig, Marron, and Riedi (2001) 3

time series of packet times SiZer Background Internet traffic data analysis: SiZer analysis of time series of packet times at internet hub (UNC) across very wide range of scales needs more pixels than screen allows thus do zooming view (zoom in over time) zoom in to yellow bd’ry in next frame readjust vertical axis 4

SiZer Background Internet traffic data analysis (cont.) Insights from SiZer analysis: Coarse scales: amazing amount of significant structure Evidence of self-similar fractal type process? Fewer significant features at small scales But they exist, so not Poisson process Poisson approximation OK at small scale??? Smooths (top part) stable at large scales? 5

Dependent SiZer Rondonotti, Marron, and Park (2007) SiZer compares data with white noise Inappropriate in time series Dependent SiZer compares data with an assumed model Visual Goodness of Fit test 6

Dep’ent SiZer : 2002 Apr 13 Sat 1 pm – 3 pm Internet Traffic At UNC Main Link 2 hour span 7

Dep’ent SiZer : 2002 Apr 13 Sat 1 pm – 3 pm Big Spike in Traffic Is “Really There” 8

Dep’ent SiZer : 2002 Apr 13 Sat 1 pm – 3 pm Zoom in for Closer Look 9

Zoomed view (to red region, i.e. “flat top”) Strange “Hole in Middle” Is “Really There” 10

Zoomed view (to red region, i.e. “flat top”) Zoom in for Closer Look 11

Further Zoom: finds very periodic behavior! SiZer found interesting structure, but depends on scale 12

Possible Physical Explanation IP “Port Scan” Common device of hackers Searching for “break in points” Send query to every possible (within UNC domain): IP address Port Number Replies can indicate system weaknesses Internet Traffic is hard to model 13

SiZer Background Historical Note & Acknowledgements: Scale Space: S. M. Pizer SiZer: Probal Chaudhuri Main References: Chaudhuri & Marron (1999) Chaudhuri & Marron (2000) Hannig & Marron (2006)

Significance in Scale Space SiZer Background Extension to 2-d: Significance in Scale Space Main Challenge: Visualization References: Godtliebsen et al (2002, 2004, 2006)

SiZer Overview Would you like to try smoothing & SiZer? Marron Software Website as Before In “Smoothing” Directory: kdeSM.m nprSM.m sizerSM.m Recall: “>> help sizerSM” for usage

PCA to find clusters Return to PCA of Mass Flux Data: MassFlux1d1p1.ps

PCA to find clusters SiZer analysis of Mass Flux, PC1 MassFlux1d1p1s.ps

PCA to find clusters SiZer analysis of Mass Flux, PC1 All 3 Signif’t MassFlux1d1p1s.ps

PCA to find clusters SiZer analysis of Mass Flux, PC1 Also in Curvature MassFlux1d1p1s.ps

PCA to find clusters SiZer analysis of Mass Flux, PC1 And in Other Comp’s MassFlux1d1p1s.ps

PCA to find clusters SiZer analysis of Mass Flux, PC1 Conclusion: Found 3 significant clusters! Worth deeper investigation Correspond to 3 known “cloud types”

Recall Yeast Cell Cycle Data “Gene Expression” – Micro-array data Data (after major preprocessing): Expression “level” of: thousands of genes (d ~ 1,000s) but only dozens of “cases” (n ~ 10s) Interesting statistical issue: High Dimension Low Sample Size data (HDLSS)

Yeast Cell Cycle Data, FDA View CellCycRaw.ps Central question: Which genes are “periodic” over 2 cell cycles?

Yeast Cell Cycle Data, FDA View Periodic genes? Naïve approach: Simple PCA spellman_alpah_complete_pca.ps

Yeast Cell Cycles, Freq. 2 Proj. PCA on Freq. 2 Periodic Component Of Data spellman_alpah_complete_proj_pca.ps

Frequency 2 Analysis Project data onto 2-dim space of sin and cos (freq. 2) Useful view: scatterplot Angle (in polar coordinates) shows phase Approach from Zhao, Marron & Wells (2004)

Frequency 2 Analysis CellCyc2dSpellClass.ps

Frequency 2 Analysis Project data onto 2-dim space of sin and cos (freq. 2) Useful view: scatterplot Angle (in polar coordinates) shows phase Colors: Spellman’s cell cycle phase classification Black was labeled “not periodic” Within class phases approx’ly same, but notable differences Now try to improve “phase classification”

Yeast Cell Cycle Revisit “phase classification”, approach: Use outer 200 genes (other numbers tried, less resolution) Study distribution of angles Use SiZer analysis (finds significant bumps, etc., in histogram) Carefully redrew boundaries Check by studying k.d.e. angles

SiZer Study of Dist’n of Angles CellCycTh200SiZer.ps

Reclassification of Major Genes CellCycTh200ScatPlot.ps

Compare to Previous Classif’n CellCyc2dSpellClass.ps

New Subpopulation View CellCycTh200KDE.ps

New Subpopulation View Note: Subdensities Have Same Bandwidth & Proportional Areas (so Σ = 1) CellCycTh200KDE.ps

Participant Presentation Jasmine Yang OODA of PNC Data