Multimedia Data Mining Arvind Balasubramanian Multimedia Lab (ECSS 4.416) The University of Texas at Dallas
Me and My Research Research Interests: – Machine Learning – Data Mining – Statistical Analysis – Applications of the above in Multimedia I am currently working on – developing a clustering algorithm guided by statistical analysis – deriving a composite grading scale for speech and language disorders, in collaboration with the UTD Callier Center
Data Mining and Multimedia Uncovering hidden information from data. Exploiting data to obtain new knowledge and interpret results. Immense applications in Multimedia.
Data Mining Techniques Classification Prediction Cluster Analysis & Class Discovery Extraction and Retrieval Statistical Analysis
Ideas for Projects Text Mining Information Extraction from Domain-specific documents – involves extracting data from free text pieces and populating a database – Serves to organize required information available in unorganized form – Not enough in itself; combine with class discovery
Ideas for Projects Text Mining New Class Discovery using Clustering techniques – identifying groups of keywords that do not fall into known categories – creating new categories and validating them – Possibly employ clustering algorithms with proper similarity measure or distance functions
Ideas for Projects Text Mining (contd.) Query-based document retrieval system – employ one of several base models such as a probabilistic model or a vector space model – design an efficient indexing system – include relevance ranking feature – possibly make the system intelligent using machine learning techniques
Ideas for Projects Pattern Recognition in Multimedia Data Scope – analyze and identify interrelationships within Multimedia data sets – Derive a composite score from several different sub- scores Methods – classic techniques like Principal Component Analysis (PCA) and Factor Analysis (FA) – Statistical methods such as Regression analysis
Ideas for Projects Pattern Recognition in Multimedia Data (contd.) Methods – Principal Component Analysis (PCA) (a)Dimensionality Reduction (b)Efficient Storage and Retrieval of Media data (c)Applications in any multi-dimensional media: Images (noise reduction), Video (content analysis), Audio (Voice Signature recognition)
Ideas for Projects Pattern Recognition in Multimedia Data (contd.) Methods – Factor Analysis (FA) (a)Minimize data redundancy (b)Reveal hidden patterns (c)combining attributes to form a single attribute by determining the importance and contribution of each attribute (d)Medical analysis, IQ tests, Personality tests, Software measurement, Multimedia content analysis, Motion Capture Data analysis.
Ideas for Projects Pattern Recognition in Multimedia Data (contd.) Methods – Statistical Analysis (a)Correlation analysis to bring out interrelationships between data attributes (b)Regression analysis to analyze the ability of a set of data attributes to predict other data attributes
Ideas for Projects Prediction and Suggestion Systems An intelligent media hosting application that – learns from user queries and requests, and accordingly suggests other media items – Suggested items would be retrieved by querying on the features of the media features and metadata – Examples: Esnips music hosting – Many machine learning techniques could be employed: Bayesian reasoning and classification algorithms
Ideas for Projects Ideas for alternative projects having to do with applications of machine learning, data mining and statistical analysis in the domain of multimedia are welcome. Tools – Weka, Matlab, Statistical software packages (even Excel helps a lot!!).
Thank You