Download presentation
Presentation is loading. Please wait.
1
Recommendation Engines & Accumulo - Sqrrl
Data Science Group May 21, 2013
2
Agenda 6:15 - 6:30 More data trumps better algorithms by Michael Walker 6:30 - 7:30 Recommendation Engines by Tom Rampley 7:30 - 8:30 Accumulo - Sqrrl by John Dougherty 8:30 - 9:30 Network at Old Chicago at 14th and Market.
3
Data Science Group New Sponsors
Cloudera O'Reilly Media
5
More data is better
6
More data is better Even if less exact or messier
8
One sensor = strict accuracy
9
One sensor = strict accuracy
Multiple sensors = less accurate & messy
10
One sensor = strict accuracy
Multiple sensors = less accurate & messy More data points = greater value
11
One sensor = strict accuracy
Multiple sensors = less accurate & messy More data points = greater value Aggregate = more comprehensive picture
13
Increase frequency of sensor readings
14
Increase frequency of sensor readings
One measure per min = accurate
15
Increase frequency of sensor readings
One measure per min = accurate 100 readings per second = less accurate
16
Increase frequency of sensor readings
One measure per min = accurate 100 readings per second = less accurate > volume vs. exactitude
17
Increase frequency of sensor readings
One measure per min = accurate 100 readings per second = less accurate > volume vs. exactitude Accept messiness to get scale
18
Sacrifice accuracy in return for knowing general trend
19
Sacrifice accuracy in return for knowing general trend
Big data = probabilistic (not precise)
20
Sacrifice accuracy in return for knowing general trend
Big data = probabilistic (not precise) Good yet has problems
21
Internet of Things
27
Internet of Things "Data Science" means the scientific study of the creation, manipulation and transformation of data to create meaning.
28
Internet of Things "Data Scientist" means a professional who uses scientific methods to liberate and create meaning from raw data.
29
Internet of Things "Big Data" means large data sets that have different properties from small data sets and requires special data science methods to differentiate signal from noise to extract meaning and requires special compute systems and power.
30
Data Science
31
Data Science
32
Data Science "Signal" means a meaningful interpretation of data based on science that may be transformed into scientific evidence and knowledge.
33
Data Science "Noise" means a competing interpretation of data not grounded in science that may not be considered scientific evidence. Yet noise may be manipulated into a form of knowledge (what does not work).
35
Machine Learning
37
Machine Learning Field of study that gives computers the ability to learn without being explicitly programmed.
39
Algorithms
41
Algorithms Process or set of rules to be followed in calculations or other problem-solving operations to achieve a goal, especially a mathematical rule or procedure used to compute a desired result, produce the answer to a question or the solution to a problem in a finite number of steps.
42
More data trumps better algorithms
43
Microsoft Word Grammar Checker
More data trumps better algorithms Microsoft Word Grammar Checker
44
Microsoft Word Grammar Checker Improve algorithms
More data trumps better algorithms Microsoft Word Grammar Checker Improve algorithms
45
Microsoft Word Grammar Checker Improve algorithms New techniques
More data trumps better algorithms Microsoft Word Grammar Checker Improve algorithms New techniques
46
Microsoft Word Grammar Checker Improve algorithms New techniques
More data trumps better algorithms Microsoft Word Grammar Checker Improve algorithms New techniques New features
47
Feed more data into existing methods
More data trumps better algorithms Feed more data into existing methods
48
Feed more data into existing methods
More data trumps better algorithms Feed more data into existing methods Most ML-A one million words or less
49
Feed more data into existing methods
More data trumps better algorithms Feed more data into existing methods Most ML-A one million words or less Experiment: 10 mil mill - 1 billion
50
Results: algorithms improved dramatically
More data trumps better algorithms Results: algorithms improved dramatically Simple algorithm that was worst performer with 1/2 mill words performed better than all others with 1 bill words Algorithm worked best with 1/2 mill performed worst with 1 bill words
51
More trumps smarter (not always)
More data trumps better algorithms Conclusions: More trumps less More trumps smarter (not always)
52
More data trumps better algorithms
Tradeoff between spending time and money on algorithm development versus spending it on data development
53
Google language translation 1 billion words 1 trillion words
More data trumps better algorithms Google language translation 1 billion words 1 trillion words larger yet messier data set - entire internet
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.