15-826: Multimedia Databases and Data Mining

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
Advertisements

Eigen Decomposition and Singular Value Decomposition
CMU SCS : Multimedia Databases and Data Mining Lecture #20: SVD - part III (more case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
Dimensionality Reduction PCA -- SVD
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
15-826: Multimedia Databases and Data Mining
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
CMU SCS : Multimedia Databases and Data Mining Lecture #16: Text - part III: Vector space model and clustering C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part II (more case studies) C. Faloutsos.
Principal Component Analysis
Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.
Indexing Time Series Based on Slides by C. Faloutsos (CMU) and D. Gunopulos (UCR)
Multimedia and Text Indexing. Multimedia Data Management The need to query and analyze vast amounts of multimedia data (i.e., images, sound tracks, video.
Efficient Similarity Search in Sequence Databases Rakesh Agrawal, Christos Faloutsos and Arun Swami Leila Kaghazian.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
Multimedia DBs. Time Series Data
Multimedia Databases Text I. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image.
Spatial and Temporal Data Mining
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
Dimensionality Reduction
10-603/15-826A: Multimedia Databases and Data Mining Text - part II C. Faloutsos.
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
Indexing Time Series.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Multimedia and Time-series Data
CMU SCS : Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
Multimedia and Time-Series Data When Is “ Nearest Neighbor ” Meaningful? Group member: Terry Chan, Edward Chu, Dominic Leung, David Mak, Henry Yeung, Jason.
1. Systems of Linear Equations and Matrices (8 Lectures) 1.1 Introduction to Systems of Linear Equations 1.2 Gaussian Elimination 1.3 Matrices and Matrix.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
CS654: Digital Image Analysis Lecture 11: Image Transforms.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
High-Dimensional Data. Topics Motivation Similarity Measures Index Structures.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Data statistics and transformation revision Michael J. Watts
ANALYSES OF THE CHAOTIC BEHAVIOR OF THE ELECTRICITY PRICE SERIES
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Time Series Indexing II
Fast Subsequence Matching in Time-Series Databases.
Spatial Data Management
Text Based Information Retrieval
Instance Based Learning
Web News Sentence Searching Using Linguistic Graph Similarity
Indexing and Data Mining in Multimedia Databases
Self-Organizing Maps for Content-Based Image Database Retrieval
LSI, SVD and Data Management
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Graph and Tensor Mining for fun and profit
Lecture 15: Least Square Regression Metric Embeddings
15-826: Multimedia Databases and Data Mining
Latent Semantic Analysis
Presentation transcript:

15-826: Multimedia Databases and Data Mining C. Faloutsos 15-826 15-826: Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos

Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Points Text Time sequences; images etc Graphs 15-826 (c) 2017, C. Faloutsos

Indexing - similarity search 15-826 (c) 2017, C. Faloutsos

Indexing - similarity search R-trees z-ordering / hilbert curves M-trees (DON’T FORGET … ) A B C D E F G H I J P1 P2 P3 P4 15-826 (c) 2017, C. Faloutsos

Indexing - similarity search R-trees z-ordering / hilbert curves M-trees beware of high intrinsic dimensionality A B C D E F G H I J P1 P2 P3 P4 15-826 (c) 2017, C. Faloutsos

Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Points Text Time sequences; images etc Graphs 15-826 (c) 2017, C. Faloutsos

Text searching ‘find all documents with word bla’ 15-826 (c) 2017, C. Faloutsos

Text searching Full text scanning (‘grep’) Inversion (B-tree or hash index) signature files – Bloom filters Vector space model Ranked output Relevance feedback String editing distance (-> dynamic prog.) 15-826 (c) 2017, C. Faloutsos

Multimedia indexing day 1 365 S1 Sn 15-826 (c) 2017, C. Faloutsos

‘GEMINI’ - Pictorially day 1 365 S1 Sn eg,. std F(S1) F(Sn) eg, avg 15-826 (c) 2017, C. Faloutsos

Multimedia indexing Feature extraction for indexing (GEMINI) Lower-bounding lemma, to guarantee no false dismissals MDS/FastMap 15-826 (c) 2017, C. Faloutsos

Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Points Text Time sequences; images etc Graphs 15-826 (c) 2017, C. Faloutsos

Time series & forecasting Goal: given a signal (eg., sales over time and/or space) Find: patterns and/or compress count DFT year 15-826 (c) 2017, C. Faloutsos

Wavelets Q: baritone/silence/soprano - DWT? f ?? value t f time 15-826 (c) 2017, C. Faloutsos

Time series + forecasting Fourier; Wavelets Box/Jenkins and AutoRegression non-linear/chaotic forecasting (fractals again) ‘Delayed Coordinate Embedding’ ~ nearest neighbors xt-1 xt 15-826 (c) 2017, C. Faloutsos

Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Points Text Time sequences; images etc Graphs 15-826 (c) 2017, C. Faloutsos

Graphs Real graphs: surprising patterns ?? 15-826 (c) 2017, C. Faloutsos

Graphs Real graphs: surprising patterns ‘six degrees’ Skewed degree distribution (‘rich get richer’) Super-linearities (2x nodes -> 3x edges ) Diameter: shrinks (!) Might have no good cuts 15-826 (c) 2017, C. Faloutsos

Graphs - SVD Hubs/Authorities (SVD on adjacency matrix) ‘meat-eaters’ users M products ‘meat-eaters’ ‘steaks’ ‘vegetarians’ ‘plants’ ‘kids’ ‘cookies’ ~ + 15-826 (c) 2017, C. Faloutsos

Graphs - PageRank Hubs/Authorities (SVD on adjacency matrix) PageRank (fixed point -> eigenvector) 15-826 (c) 2017, C. Faloutsos

Tensors Eg., time evolving graphs; Subject-verb-object triplets; etc = + subject object verb politicians artists athletes 15-826 (c) 2017, C. Faloutsos

Taking a step back: We saw some fundamental, recurring concepts and tools: 15-826 (c) 2017, C. Faloutsos

T1: Powerful, recurring tools Fractals/ self similarity 15-826 (c) 2017, C. Faloutsos

T1: Powerful, recurring tools Fractals/ self similarity <-> Power laws Zipf, Korcak, Pareto’s laws intrinsic dimension (Sierpinski triangle) correlation integral Barnsley’s IFS compression Kronecker graphs 15-826 (c) 2017, C. Faloutsos

T1: Powerful, recurring tools Fractals/ self similarity Zipf, Korcak, Pareto’s laws intrinsic dimension (Sierpinski triangle) correlation integral Barnsley’s IFS compression (Kronecker graphs) ‘Take logarithms’ mean-> meaningless Gaussian trap 15-826 (c) 2017, C. Faloutsos

T2: Powerful, recurring tools SVD (optimal L2 approx) v1 first singular vector 15-826 (c) 2017, C. Faloutsos

T2: Powerful, recurring tools SVD (optimal L2 approx) LSI, KL, PCA, ‘eigenSpokes’, (& in ICA ) HITS (PageRank) v1 first singular vector 15-826 (c) 2017, C. Faloutsos

T3: Powerful, recurring tools Discrete Fourier Transform Wavelets 15-826 (c) 2017, C. Faloutsos

T4: Powerful, recurring tools Matrix inversion lemma Recursive Least Squares Sherman-Morrison(-Woodbury) 15-826 (c) 2017, C. Faloutsos

Summary T1: fractals / power laws lead to startling discoveries ‘the mean may be meaningless’ Don’t assume Gaussian (average, k-means, etc) T2: SVD: behind PageRank/HITS/tensors/… T3: Wavelets: Nature seems to prefer them T4: RLS: matrix inversion, without inverting 15-826 (c) 2017, C. Faloutsos

Thank you! Feel free to contact me: Reminder: faculty course eval’s: C. Faloutsos 15-826 Thank you! Feel free to contact me: Cell#; christos@cs; GHC 8019 Reminder: faculty course eval’s: www.cmu.edu/hub/fce/ Final: as announced in Hub Mo 5/8/2017, 8:30-11:30am, DH 2315 (double-check with www.cmu.edu/hub/docs/final-exams.pdf ) Have a great summer! ‘Take logarithms’ mean -> meaningless Gaussian trap 15-826 (c) 2017, C. Faloutsos