Download presentation
Presentation is loading. Please wait.
Published byAnnabel Aileen McKenzie Modified over 8 years ago
1
Big Data Javad Azimi May 2015
2
First of All… Sorry about the language Feel free to ask any question Please share similar experiences
3
Outline Why Big Data? Hadoop and Next Distributed Machine Learning Algorithm A few Projects to Share
4
Definition Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.
5
The importance of Big Data The real issue is not that you are acquiring large amounts of data. It's what you do with the data that counts. The hopeful vision is that organizations will be able to take data from any source, harness relevant data and analyze it to find answers that enable: Cost reductions Time reductions New product development and optimized offerings Smarter business decision making
6
The FOUR V’s of Big Data From traffic patterns and music downloads to web history and medical records, data is recorded, stored, and analyzed to enable that technology and services that the world relies on every day. But what exactly is big data be used? According to IBM scientists big data can be break into four dimensions: Volume, Velocity, Variety and Veracity.
7
The FOUR V’s of Big Data
8
Volume. Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data. Lesson: We have enough data for many application
9
The FOUR V’s of Big Data
10
Variety. Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with. Lesson: Data from difference sources can be used for a specific target application.
11
The FOUR V’s of Big Data
12
Velocity. Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations. Lesson: Online processing time MUST be very quick!
13
The FOUR V’s of Big Data
14
Veracity - Big Data Veracity refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analyzed. Inderpal feel veracity in data analysis is the biggest challenge when compares to things like volume and velocity. In scoping out your big data strategy you need to have your team and partners work to help keep your data clean and processes to keep ‘dirty data’ from accumulating in your systems. Lesson: Noise is different from minority data and this is the problem of everyone Lesson: Noise is different from minority data and this is the problem of everyone
15
Applications (1) User Movie Netflix Collaborative Filtering
16
Applications (2)
17
Applications (3)
18
More…
19
Age Detection using social media Break up estimation using tweeter data Gender detection using profile Avatar Leveraging information across web (Entity detection using table in web) Matching patients with drug (CollabRX) What can we do with tweeter?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.