Download presentation
Presentation is loading. Please wait.
Published byNicolas Hoch Modified over 6 years ago
1
Beyond Machine Learning - What Is Hidden In Your Data
Beyond Machine Learning - What Is Hidden In Your Data? Marvin Weinstein
2
Not Big - Complex The problem with BIG data is not that it is big…it is that it is complex and… It is noisy and full of artifacts It contains irrelevant data We don’t know how to model it. It is dense Complex – unstructured, coming from many sources and we don’t know how to model it. Noisy and full of artifacts usually imply the need to clean the data. This is dangerous and introduces bias. Queries only find for what we are looking for. Queries introduce bias. Queries/hypotheses don’t work can’t find hidden, unexpected surprises.
3
A Paradigm Shift Dynamic Quantum Clustering (DQC)
Provides an unbiased view of data and discovers hidden information without knowing there is something to be found. DQC is Data agnostic Unbiased No false positives Visual Maintains contact with original data Data agnostic – analyst doesn’t need subject matter expertise before exploring the data Unbiased – no assumptions are made in advance. Can vet the data for usefulness. Robust – works with raw data, no cleaning necessary Sensitive – can find small outliers Ayasdi among others finds structure but has trouble relating it to the original data
4
How DQC Can Be Of Benefit To Your Business ?
Cleaning data is unnecessary. No hypothesis generation No domain knowledge required Can validate data Can identify structures hidden in data that we don’t suspect and wouldn’t know how to model Cleaning data is hard, time consuming and dangerous. Not having to clean your data makes complex projects doable and also gets analysts up to speed much faster with much less cost. Hypothesis generation takes a lot of time and guesswork. It is hard and dangerous, in that an incorrect hypothesis that sort of works can affect future analyses negatively. Not having to form a working hypothesis before exploring the data lets the data speak for itself. It save time and money and produces unexpected insights. No domain knowledge means that an analyst doesn’t need a lot of time to get up to speed before taking an initial look at the data. An unbiased approach that can tell if you measuring the right things is a huge benefit. Being able to know if what you are measuring and storing is important. You could be building up a big data warehouse of information that has no actionable information. The value of being able to reveal hidden information that you don’t know is there and wouldn’t know how to look for is obvious. Also, it is important that the identity of the datapoints in a structure is never lost. All patterns are immediately translatable back to subset of the original data.
5
DQC Works Across All Domains
DQC has succeeded in dealing with data from: On-line gaming (player segmentation) X-ray chemistry Genomics (Alzheimer’s data) Proteomics (Glycoporin/Aquaporin data) Homeland security (search for contraband nuclear material) Hyperspectral data
6
What Does Dense Data Look Like?
X-ray Chemistry On-line gaming Homeland Security/Agriculture Cosmology
7
What Does DQC Reveal? Before DQC After DQC (35)
This data is from the Sloan Digital Sky Survey The points are galaxies, the coordinates are real spatial coordinates; i.e. the real angles on the sky and distance to the galaxy (red shift) On the left the original 3 dimensional data, on the right the movie generated by DQC evolutioon (35 frames) showing the galaxies being attracted to the nearest region of high density. The animation reveals the existence of filaments and voids that one reads about in the NY Times
8
Hyperspectral Example
After SVD Before DQC After DQC (35) Each data point represents of ~600,000 spectra (strength of reflected light from the quarry at each of ~600,000 pixels). On the left one view of this dense data. On the right the movie created by DQC evolution revealing the complex structure hidden in this data. Note that after the rapid initial changes things slow down and the complex final structure evolves very slowly. There is no problem deciding that the evolution has essentially come to an end. Each shape in the final structure is important to the final interpretation of the data.
9
Some Example of Hidden Structure
Thread, string, segment Simple Cluster Structure
10
What These Structures Mean
Colors come from individual threads. Note the dark blue on the right. This funny distribution creating lines on the ground corresponds to one thread in the final structure. The fact that the ground is striated is shown in the black and white picture too. That this is a single material – as indicated by the unique spectral signature – is a surprise.
11
How Can DQC Benefit You? It can save time, money and produce better insights by Avoiding the need to clean the data Avoiding time spent generating hypotheses before getting started Identifying the important set of features to use in later analysis using conventional tools Validating that your data contains the information you need Finding hidden information that you wouldn’t know to look for
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.