Mohammad J. Mansourzadeh

Slides:



Advertisements
Similar presentations
R and HDInsight in Microsoft Azure
Advertisements

Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
John Lenhart.  Data stores are growing by 50% each year, and that rate of increase is accelerating [1]  In 2010, we crossed the barrier of the zettabyte.
Big Data and Predictive Analytics in Health Care Presented by: Mehadi Sayed President and CEO, Clinisys EMR Inc.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Big Data Workflows N AME : A SHOK P ADMARAJU C OURSE : T OPICS ON S OFTWARE E NGINEERING I NSTRUCTOR : D R. S ERGIU D ASCALU.
CS525: Special Topics in DBs Large-Scale Data Management
Evolution in Coming 10 Years: What's the Future of Network? - Evolution in Coming 10 Years: What's the Future of Network? - Big Data- Big Changes in the.
Amadeus Travel Intelligence ‘Monetising’ big data sets
Big Data. What is Big Data? Analog starage vs digital. The FOUR V’s of Big Data. Who’s Generating Big Data The importance of Big Data. Optimalization.
Chapter 2: Business Intelligence Capabilities
© 2012 TeraMedica, Inc. Big Data: Challenges and Opportunities for Healthcare Joe Paxton Healthcare and Life Sciences Sales Leader.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Big Data. What is Big Data? Big Data Analytics: 11 Case Histories and Success Stories
© 2012 IBM Corporation IBM Security Systems 1 © 2013 IBM Corporation 1 Ecommerce Antoine Harfouche.
Big Data Bijan Barikbin Denisa Teme Matthew Joseph.
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
© 2007 IBM Corporation IBM Information Management Accelerate information on demand with dynamic warehousing April 2007.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
Innovation Work Circle: Big Data Presented By: Innovation Work Circle Group.
What’s a mobile app? A mobile app is a software program you can download and access directly using your phone or another mobile device, like a tablet.
Big Data – Big Opportunity Mohammad Khansari ITRC President Jan 2015 ITRC, Tehran, Iran.
HADOOP Carson Gallimore, Chris Zingraf, Jonathan Light.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
IoT Meets Big Data Standardization Considerations
Big Data Analytics with Excel Peter Myers Bitwise Solutions.
Smart Grid Big Data: Automating Analysis of Distribution Systems Steve Pascoe Manager Business Development E&O - NISC.
Copyright © 2016 Pearson Education, Inc. Modern Database Management 12 th Edition Jeff Hoffer, Ramesh Venkataraman, Heikki Topi CHAPTER 11: BIG DATA AND.
Big Data Javad Azimi May First of All… Sorry about the language  Feel free to ask any question Please share similar experiences.
B IG D ATA A NALYTICS A Presentation by Meg Monsen, Michael Leonard, and Eric Zeng.
Big Data ---a statistician’s perspective Ming Ji, PhD College of Nursing USF.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
Understand The Use Of Technologies In Fashion Merchandising And Marketing FM 3.02.
MIS 3500 Instructor: Bob Travica Trendy Database Topics 2016.
A Tutorial on Hadoop Cloud Computing : Future Trends.
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
Data Analytics (CS40003) Introduction to Data Lecture #1
Popular Database Management Systems
CNIT131 Internet Basics & Beginning HTML
Data Analytics 1 - THE HISTORY AND CONCEPTS OF DATA ANALYTICS
SNS COLLEGE OF TECHNOLOGY
MapReduce Compiler RHadoop
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Understanding Big Data
Discovering Computers 2010: Living in a Digital World Chapter 14
Advertising Agencies and Interactive Media
ANOMALY DETECTION FRAMEWORK FOR BIG DATA
April 25, 2012 The Three R’s Are Old School – Now It Is All About Volume, Velocity & Variety Peter Guest Alberta Public Sector Client Technical Advisor.
Foundations of Information Systems in Business
Big Data.
BIG Data 25 Need-to-Know Facts.
BIG DATA IN ENGINEERING APPLICATIONS
The Contemporary Firm 550 By: Beatriz Guzman
Ministry of Higher Education
April 25, 2012 The Three R’s Are Old School – Now It Is All About Volume, Velocity & Variety Peter Guest Alberta Public Sector Client Technical Advisor.
American Brush Manufactures Association
Big Data.
Big Data Young Lee BUS 550.
Zoie Barrett and Brian Lam
Dep. of Information Technology By: Raz Dara Mohammad Amin
Big Data: Four Vs Salhuldin Alqarghuli.
Big Data Analysis in Digital Marketing
Big DATA.
Data Analysis and R : Technology & Opportunity
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
UNIT 6 RECENT TRENDS.
Big Data.
Presentation transcript:

Mohammad J. Mansourzadeh Big Data Mohammad J. Mansourzadeh

Big Data What is Big Data? Analog starage vs digital. The FOUR V’s of Big Data. Who’s Generating Big Data The importance of Big Data. Optimization HDFC بیگ دیتا چیست؟ ذخیره سازی آنالوگ در برابر دیجیتال چهار V در بیگ دیتا چه کسانی بیگ دیتا را تولید می کنند؟ اهمیت بیگ دیتا بهینه سازی

Definition Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage,  search, sharing, transfer, analysis, and visualization. بیگ دیتا مجموعه ای از داده های بزرگ و پیچیده است که پردازش آن با ابزارهای مدیریت پایگاه داده و نرم افزارهای پردازش داده سنتی مشکل است. چالش های این حوزه شامل ضبط، گزینش و سازماندهی، جستجو، اشتراک، تجزیه و تحلیل، انتقال و مصورسازی داده هاست.

The FOUR V’s of Big Data From traffic patterns and music downloads to web history and medical records, data is recorded, stored, and analyzed to enable that technology and services that the world relies on every day. But what exactly is big data be used? According to IBM scientists big data can be break into four dimensions: Volume, Velocity, Variety and Veracity. چهار V در بیگ دیتا حجم، سرعت، تنوع، صحت

The FOUR V’s of Big Data

The FOUR V’s of Big Data Volume. Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data. حجم: فاکتورهای بسیاری به افزایش حجم داده ­ها کمک می­ کند. داده­ های بر پایه تراکنش ذخیره شده در طی  سالیان، داده ­های غیرساختارمند سرازیر شده از رسانه ­های اجتماعی؛ مقدار در حال افزایش داده­ های ماشین-به-ماشین و سنسور جمع ­آوری شده. در گذشته، حجم انبوه داده یک مسئله ذخیره کردن بود. اما با کاهش هزینه ­های ذخیره، مسائل دیگری سر بر می ­آورند؛ شامل چگونگی تعیین ارتباط در حجم زیاد داده­ ها و چگونگی استفاده از علم تجزیه و تحلیل به منظور ایجاد ارزش از داده ­های مرتبط.

The FOUR V’s of Big Data

The FOUR V’s of Big Data Variety. Data today comes in all types of formats. Structured, numeric data in traditional databases. Information created from line-of-business applications. Unstructured text documents, email, video, audio, stock ticker data and financial transactions. Managing, merging and governing different varieties of data is something many organizations still grapple with. تنوع: داده­ ها به شکل­ های گوناگونی وارد می­ شوند. داده ­های عددی ساختاریافته در پایگاه­ های داده سنتی؛  اطلاعات ایجاد شده از برنامه­ های کاربردی کسب ­و­کار؛ اسناد متنی غیرساختاریافته، ایمیل، صدا و تراکنش­ های مالی. مدیریت، ادغام و حاکمیت بر انواع گوناگون داده، چیزی است که بسیاری از سازمان­ ها هنوز با آن درگیرند.

The FOUR V’s of Big Data

The FOUR V’s of Big Data Velocity. Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations. سرعت: داده­ ها با سرعتی بی ­سابقه وارد شده و باید در زمان مناسب به سراغ آن­ها رفت. تگ­ های RFID، سنسورها و اندازه­ گیری هوشمند، نیاز به سر و کله زدن با جریانات داده را در اولین زمان نزدیک به اکنون را ایجاد می­ کنند. واکنش سریع به کار با سرعت داده ­ها، چالشی برای بیشتر سازمان­ هاست.

The FOUR V’s of Big Data

The FOUR V’s of Big Data Veracity - Big Data Veracity refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analyzed. Veracity in data analysis is the biggest challenge when compares to things like volume and velocity. In scoping out your big data strategy you need to have your team and partners work to help keep your data clean and processes to keep ‘dirty data’ from accumulating in your systems. صحت: صحت به خطاها، نوفه و داده های غیرنرمال اشاره دارد. آیا داده هایی که ذخیره و استخراج می شوند برای مسئله مورد تحلیل معنی دار است؟ صحت در تحلیل داده ها بزرگترین چالش در مقایسه با چالش های دیگر ماندد حجم و سرعت است.

Who’s Generating Big Data Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data) The progress and innovation is no longer hindered by the ability to collect data But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion 16

The importance of Big Data The real issue is not that you are acquiring large amounts of data. It's what you do with the data that counts. The hopeful vision is that organizations will be able to take data from any source, harness relevant data and analyze it to find answers that enable: Cost reductions Time reductions New product development and optimized offerings Smarter business decision making مسئله واقعی این نیست که مقدار زیادی داده به دست آورید؛ این است که با آن چه می­ کنید. دیدگاه امیدوارانه این است که سازمان­ ها قادر به تحصیل داده از هر منبعی بوده، داده­ های مرتبط را تهیه کرده و آن را تحلیل کنند تا پاسخ سؤالاتی را بیابند که 1) کاهش هزینه­ ها، 2) کاهش زمان، 3) توسعه محصولات جدید و پیشنهادات جدید، و 4) تصمیم ­گیری هوشمندانه ­تر کسب ­وکار را مقدور می ­سازند.

The importance of Big Data  For instance, by combining big data and high-powered analytics, it is possible to: Determine root causes of failures, issues and defects in near-real time, potentially saving billions of dollars annually. Optimize routes for many thousands of package delivery vehicles while they are on the road. Analyze millions of SKUs to determine prices that maximize profit and clear inventory. Generate retail coupons at the point of sale based on the customer's current and past purchases. Send tailored recommendations to mobile devices while customers are in the right area to take advantage of offers. Recalculate entire risk portfolios in minutes. Quickly identify customers who matter the most. Use clickstream analysis and data mining to detect fraudulent behavior برای مثال، با ترکیب Big Data و تحلیل­ های  قوی، این امکان وجود دارد تا: علت های اصلی شکست ها، مسائل و نقوص را در لحظه تعیین کرد تا سالانه تا میلیاردها دلار صرفه­ جویی کرد. مسیر وسیله ­های حمل بسته­ های تحویلی را زمانی که هنوز در جاده هستند، بهینه کرد. در چند دقیقه تمام سبد ریسک را دوباره حساب کرد. سریعاً مشتریانی که بیشترین اهمیت را دارند، شناسایی کرد.

Applications Science Data bases from astronomy, genomics, environmental data, transportation data, … Humanities and Social Sciences Scanned books, historical documents, social interactions data, new technology like GPS … Business & Commerce Corporate sales, stock market transactions, census, airline traffic, … Entertainment Internet images, Hollywood movies, MP3 files, … Medicine MRI & CT scans, patient records, …

HDFS / Hadoop Data in a HDFS cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster. In this way, the map and reduce functions can be executed on smaller subsets of your larger data sets, and this provides the scalability that is needed for big data processing. The goal of Hadoop is to use commonly available servers in a very large cluster, where each server has a set of inexpensive internal disk drives.

PROS OF HDFS Scalable – New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top. Cost effective – Hadoop brings massively parallel computing to commodity servers. The result is a sizeable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data. Flexible – Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide. Fault tolerant – When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat.

Thank you for your attention. Authors: Tomasz Wis Krzysztof Rudnicki