15-826: Multimedia Databases and Data Mining

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
Advertisements

Eigen Decomposition and Singular Value Decomposition
Eigen Decomposition and Singular Value Decomposition
CMU SCS : Multimedia Databases and Data Mining Lecture #20: SVD - part III (more case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU
Dimensionality Reduction PCA -- SVD
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
15-826: Multimedia Databases and Data Mining
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
CMU SCS : Multimedia Databases and Data Mining Lecture #16: Text - part III: Vector space model and clustering C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part II (more case studies) C. Faloutsos.
Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.
Indexing Time Series Based on Slides by C. Faloutsos (CMU) and D. Gunopulos (UCR)
Multimedia and Text Indexing. Multimedia Data Management The need to query and analyze vast amounts of multimedia data (i.e., images, sound tracks, video.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
Multimedia Databases Text I. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image.
Spatial and Temporal Data Mining
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Singular Value Decomposition and Data Management
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
Dimensionality Reduction
10-603/15-826A: Multimedia Databases and Data Mining Text - part II C. Faloutsos.
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Indexing Time Series.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Multimedia and Time-series Data
CMU SCS : Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
Multimedia and Time-Series Data When Is “ Nearest Neighbor ” Meaningful? Group member: Terry Chan, Edward Chu, Dominic Leung, David Mak, Henry Yeung, Jason.
1. Systems of Linear Equations and Matrices (8 Lectures) 1.1 Introduction to Systems of Linear Equations 1.2 Gaussian Elimination 1.3 Matrices and Matrix.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
CS654: Digital Image Analysis Lecture 11: Image Transforms.
IMinMax B.C. Ooi, K.-L Tan, C. Yu, S. Stephen. Indexing the Edges -- A Simple and Yet Efficient Approach to High dimensional Indexing. ACM SIGMOD-SIGACT-
Summer School on Hashing’14 Dimension Reduction Alex Andoni (Microsoft Research)
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Data statistics and transformation revision Michael J. Watts
ANALYSES OF THE CHAOTIC BEHAVIOR OF THE ELECTRICITY PRICE SERIES
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Spatial Data Management
Text Based Information Retrieval
Indexing and Data Mining in Multimedia Databases
LSI, SVD and Data Management
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Lecture 7: Dynamic sampling Dimension Reduction
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Lecture 16: Earth-Mover Distance
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Graph and Tensor Mining for fun and profit
Lecture 15: Least Square Regression Metric Embeddings
15-826: Multimedia Databases and Data Mining
Latent Semantic Analysis
Presentation transcript:

15-826: Multimedia Databases and Data Mining C. Faloutsos 15-826 15-826: Multimedia Databases and Data Mining Lecture #32: Conclusions C. Faloutsos

Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Points Text Time sequences; images etc Graphs Data Mining 15-826 (c) 2016, C. Faloutsos

Indexing - similarity search 15-826 (c) 2016, C. Faloutsos

Indexing - similarity search R-trees z-ordering / hilbert curves M-trees (DON’T FORGET … ) A B C D E F G H I J P1 P2 P3 P4 15-826 (c) 2016, C. Faloutsos

Indexing - similarity search R-trees z-ordering / hilbert curves M-trees beware of high intrinsic dimensionality A B C D E F G H I J P1 P2 P3 P4 15-826 (c) 2016, C. Faloutsos

Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Points Text Time sequences; images etc Graphs Data Mining 15-826 (c) 2016, C. Faloutsos

Text searching ‘find all documents with word bla’ 15-826 (c) 2016, C. Faloutsos

Text searching Full text scanning (‘grep’) Inversion (B-tree or hash index) (signature files) Vector space model Ranked output Relevance feedback String editing distance (-> dynamic prog.) 15-826 (c) 2016, C. Faloutsos

Multimedia indexing day 1 365 S1 Sn 15-826 (c) 2016, C. Faloutsos

‘GEMINI’ - Pictorially day 1 365 S1 Sn eg,. std F(S1) F(Sn) eg, avg 15-826 (c) 2016, C. Faloutsos

Multimedia indexing Feature extraction for indexing (GEMINI) Lower-bounding lemma, to guarantee no false alarms MDS/FastMap 15-826 (c) 2016, C. Faloutsos

Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Points Text Time sequences; images etc Graphs Data Mining 15-826 (c) 2016, C. Faloutsos

Time series & forecasting Goal: given a signal (eg., sales over time and/or space) Find: patterns and/or compress count lynx caught per year year 15-826 (c) 2016, C. Faloutsos

Wavelets Q: baritone/silence/soprano - DWT? f t value time 15-826 (c) 2016, C. Faloutsos

Time series + forecasting Fourier; Wavelets Box/Jenkins and AutoRegression non-linear/chaotic forecasting (fractals again) ‘Delayed Coordinate Embedding’ ~ nearest neighbors 15-826 (c) 2016, C. Faloutsos

Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Points Text Time sequences; images etc Graphs Data Mining 15-826 (c) 2016, C. Faloutsos

Graphs Real graphs: surprising patterns ?? 15-826 (c) 2016, C. Faloutsos

Graphs Real graphs: surprising patterns ‘six degrees’ Skewed degree distribution (‘rich get richer’) Super-linearities (2x nodes -> 3x edges ) Diameter: shrinks (!) Might have no good cuts 15-826 (c) 2016, C. Faloutsos

Graphs - SVD Hubs/Authorities (SVD on adjacency matrix) PageRank (fixed point -> eigenvector) 15-826 (c) 2016, C. Faloutsos

Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining 15-826 (c) 2016, C. Faloutsos

Data Mining - DB 15-826 (c) 2016, C. Faloutsos

Data Mining - DB Association Rules (‘diapers’ -> ‘beer’ ) 15-826 (c) 2016, C. Faloutsos

Taking a step back: We saw some fundamental, recurring concepts and tools: 15-826 (c) 2016, C. Faloutsos

T1: Powerful, recurring tools Fractals/ self similarity 15-826 (c) 2016, C. Faloutsos

T1: Powerful, recurring tools Fractals/ self similarity Zipf, Korcak, Pareto’s laws intrinsic dimension (Sierpinski triangle) correlation integral Barnsley’s IFS compression (Kronecker graphs) 15-826 (c) 2016, C. Faloutsos

T1: Powerful, recurring tools Fractals/ self similarity Zipf, Korcak, Pareto’s laws intrinsic dimension (Sierpinski triangle) correlation integral Barnsley’s IFS compression (Kronecker graphs) ‘Take logarithms’ mean-> meaningless 15-826 (c) 2016, C. Faloutsos

T2: Powerful, recurring tools SVD (optimal L2 approx) v1 first singular vector 15-826 (c) 2016, C. Faloutsos

T2: Powerful, recurring tools SVD (optimal L2 approx) LSI, KL, PCA, ‘eigenSpokes’, (& in ICA ) HITS (PageRank) v1 first singular vector 15-826 (c) 2016, C. Faloutsos

T3: Powerful, recurring tools Discrete Fourier Transform Wavelets 15-826 (c) 2016, C. Faloutsos

T4: Powerful, recurring tools Matrix inversion lemma Recursive Least Squares Sherman-Morrison(-Woodbury) 15-826 (c) 2016, C. Faloutsos

Summary T1: fractals / power laws probably lead to the most startling discoveries (‘the mean may be meaningless’) T2: SVD: behind PageRank/HITS/tensors/… T3: Wavelets: Nature seems to prefer them T4: RLS: matrix inversion, without inverting [T5: approximate counting (do the impossible!) ] 15-826 (c) 2016, C. Faloutsos

Thank you! Feel free to contact me: Reminder: faculty course eval’s: C. Faloutsos 15-826 Thank you! Feel free to contact me: Cell#; christos@cs; GHC 8019 Reminder: faculty course eval’s: www.cmu.edu/hub/fce/ Final: as announced in Hub Mo 5/9/2016, 8:30-11:30am, WEH 7500 (double-check with www.cmu.edu/hub/docs/final-exams.pdf ) Have a great summer! ‘Take logarithms’ mean -> meaningless 15-826 (c) 2016, C. Faloutsos