Presentation is loading. Please wait.

Presentation is loading. Please wait.

硅谷大数据技术应用与发展趋势 三盟教育大数据论坛 海口,海南

Similar presentations


Presentation on theme: "硅谷大数据技术应用与发展趋势 三盟教育大数据论坛 海口,海南"— Presentation transcript:

1 硅谷大数据技术应用与发展趋势 三盟教育大数据论坛 海口,海南
蒋志予 博士 三盟硅谷大数据研究院 院长 5/28/2016 1059 East Meadow Circle, Suite 147, Palo Alto, CA 94303, USA (美国), (中国)

2 大 纲 简介 大数据简史和发展趋势 大数据技术和应用 教育大数据和大数据教育 硅谷大数据学院 对接 咨询 合作

3 简 历 蒋志予,纽约州立大学石溪分校实验粒子物理学博士、斯坦福博士后。企业信息化转型、创新、物联网大数据云时代新工业革命的推动者。在产品和解决方案创新、系统集成、管理和技术咨询、数据中心转型、以数据为中心的决策支持系统,工业互联网等方面为全球500强企业,大学和政府部门提供战略和战术咨询服务及解决方案。 三盟硅谷大数据研究院院长 TCS Lead Solution Architect and Data Scientist Hitachi America R&D Chief Research Architect Dell Global Services Lead Solution Architect IBM GBS Executive Architect Bearing Point Director of Big Data Solution Practice Lockheed Martin IT CTO for Homeland Security Various positions with leading software companies in Silicon Valley 美国能源部国家实验室和NASA高级研究员

4 大数据 数据是重要资源,也是未来企业和信息架构的中心
大数据特征是大量,多样,高速,价值(Volume, Variety, Velocity, Value) 把数据变成知识,知识变成理解,理解变成行动,行动变成结果是企业的新生产力 大数据不是新概念,有过去三十年的科学,政府,企业,互联网的实践, 是建立在信息化基础之上决策支持系统 价值及投资回报驱动大数据项目 专业,平台,分析是大数据的三个支柱 数据科学家需要交响大数据的三个支柱,在大数据里找到有用的小数据,发现并利用规律 大数据是工业管理的中心和新工业革命的基石 目的是实现实时和非实时的监控,自动化,及优化 要做到更便宜,更快,更好 大数据项目费用结构像冰山,数据管理是冰山底部,是主要的工作 数据要全面,一致,准确,及时,可信,有价值 工业互联网和物联网将引领工业大数据时代 可预测的设备维护是重要应用方向之一

5 发现顶夸克 1995年11月

6 Tevatron at Fermilab

7 The Highest Energy Proton Anti-Proton Collider (1991-2011)

8 Collider Detectors at the Tevatron

9 Dzero Detector

10 Dzero Collaboration

11 大数据计算 在线处理 离线分析 数据挖掘 统计分析 机器学习 (神经网络) 模型和算法 大规模分布式存储和并行运算 图像重建
WWW (World Wide Web)

12 Us-Visit Entry/Exit Program

13

14 数据科学流程

15 企业信息化大数据简史和未来 Volume Velocity Variety 4. Value 1x 100x 10000x
MACHINE DRIVEN Industrial Internet SATELLITE IMAGES BIO- INFORMATICS M2M LOG FILES SENSORS VIDEO AUDIO All data types HUMAN DRIVEN Social Internet WEB LOGS DOCUMENTS SOCIAL Data Landscape evolution Sensors / IOT BUSINESS DRIVEN Client Server OLTP Volume Velocity Variety Big data tomorrow Big data today Volume will be in the exabytes. The New VSP will manage exabytes as a common pool of resources. Volumes will be so big we will not able to back it up in the traditional way. Applications will need to be Mapped to the applications and reduced. Volumes are not just about capacity but also the number of things. File systems must scale to PB. Velocity How fast can we capture data, faster pipes FC and IP, faster media- disks will get slower as they increase in capacity so we will ne Non volatile memory in front and automatically tier data to the right level of storage. Most of this data will not be changing so it can be stored in an active archive and replicated once for protection. Some data will be so cold that it will need to be stored on cold devices like Optical or tape for scoes of years. How fast can we find this. A directory crawl or a table scan to find one out of a billion items will be too slow, We need to do queries against meta data. So object stores become important. As object age the meta data becomes more active as we query for information. Processing for information will be with meta data and rarely need to access the data object How fast can we provision. We can’t take month to stand up an application. We need to roll on a converged solution or go to the cloud Variety There will be many modalities of data and applications. In order to avoid silos we need to virtualize the data from the applications with an object store and use RESTful interfaces Value This is where Big data comes in. It is about correlating all this data and applying analytics. We differentiate the Big Data of Today from Big Data of tomorrow. Big Data of Tomorrow will require deep expertise in the verticals like energy, transportation, healthcare, etc. We will be a major player since Hitacih is in those vertical with deep R&D capability 4. Value 1x 100x 10000x I.T. MUST MANAGE, GOVERN AND ANALYZE MORE DATA WITH MORE COMPLEX RELATIONSHIPS … IN REAL TIME … AT SCALE

16 Evolution of Big Data “By 2020, there will be more information coming from everywhere” “50 ZB of data created” Machine Data Human-Generated Data 2.5 x 1018 bytes of data created every day Enterprise Data 1.3T Tags and sensors 4B people online 31B Devices 144.8 billion messages per day 340 million tweets per day 684,000 bits of content shared on Facebook per day 11 million instant messages per day 72 hours of video uploaded on YouTube per minute

17 大数据的十大应用 理解并预测客户行为。公司在越来越多的将社交媒体数据、网络浏览数据、传感器数据和文本等非结构化数据加入到自己的数据库中。客户行为,客户流失,商品销售,道德风险甚至大选结果,都可以被准确预测。 优化业务流程。零售商根据社交媒体数据、网络搜索趋势和天气预报来调整库存;物流公司通过地理定位、射频识别传感器以及实时交通数据来优化供应链和运货路线;企业人力资源通过大数据工具优化人才招揽,测度公司文化和员工忠诚度。 个人信息数据化。可穿戴设备可以帮助用户实时关注自己的身体数据。更为重要的是,它可以搜集大量群体数据,产生新洞见,反馈给个体用户。 提高公共医疗保健。大数据加速基因解码工程;临床研究样本以数量级增加;监控预测流行疾病的趋势和爆发。 提高运动表现。 1. Understanding and Targeting Customers This is one of the biggest and most publicized areas of big data use today. Here, big data is used to better understand customers and their behaviors and preferences. Companies are keen to expand their traditional data sets with social media data, browser logs as well as text analytics and sensor data to get a more complete picture of their customers. The big objective, in many cases, is to create predictive models. You might remember the example of U.S. retailer Target, who is now able to very accurately predict when one of their customers will expect a baby. Using big data, Telecom companies can now better predict customer churn; Wal-Mart can predict what products will sell, and car insurance companies understand how well their customers actually drive. Even government election campaigns can be optimized using big data analytics. Some believe, Obama’s win after the 2012 presidential election campaign was due to his team’s superior ability to use big data analytics. 2. Understanding and Optimizing Business Processes Big data is also increasingly used to optimize business processes. Retailers are able to optimize their stock based on predictions generated from social media data, web search trends and weather forecasts. One particular business process that is seeing a lot of big data analytics is supply chain or delivery route optimization. Here, geographic positioning and radio frequency identification sensors are used to track goods or delivery vehicles and optimize routes by integrating live traffic data, etc. HR business processes are also being improved using big data analytics. This includes the optimization of talent acquisition – Moneyball style, as well as the measurement of company culture and staff engagement using big data tools. 3. Personal Quantification and Performance Optimization Big data is not just for companies and governments but also for all of us individually. We can now benefit from the data generated from wearable devices such as smart watches or smart bracelets. Take the Up band from Jawbone as an example: the armband collects data on our calorie consumption, activity levels, and our sleep patterns. While it gives individuals rich insights, the real value is in analyzing the collective data. In Jawbone’s case, the company now collects 60 years worth of sleep data every night. Analyzing such volumes of data will bring entirely new insights that it can feed back to individual users. The other area where we benefit from big data analytics is finding love - online this is. Most online dating sites apply big data tools and algorithms to find us the most appropriate matches. 4. Improving Healthcare and Public Health The computing power of big data analytics enables us to decode entire DNA strings in minutes and will allow us to find new cures and better understand and predict disease patterns. Just think of what happens when all the individual data from smart watches and wearable devices can be used to apply it to millions of people and their various diseases. The clinical trials of the future won’t be limited by small sample sizes but could potentially include everyone! Big data techniques are already being used to monitor babies in a specialist premature and sick baby unit. By recording and analyzing every heart beat and breathing pattern of every baby, the unit was able to develop algorithms that can now predict infections 24 hours before any physical symptoms appear. That way, the team can intervene early and save fragile babies in an environment where every hour counts. What’s more, big data analytics allow us to monitor and predict the developments of epidemics and disease outbreaks. Integrating data from medical records with social media analytics enables us to monitor flu outbreaks in real-time, simply by listening to what people are saying, i.e. “Feeling rubbish today - in bed with a cold”. 5. Improving Sports Performance Most elite sports have now embraced big data analytics. We have the IBM SlamTracker tool for tennis tournaments; we use video analytics that track the performance of every player in a football or baseball game, and sensor technology in sports equipment such as basket balls or golf clubs allows us to get feedback (via smart phones and cloud servers) on our game and how to improve it. Many elite sports teams also track athletes outside of the sporting environment – using smart technology to track nutrition and sleep, as well as social media conversations to monitor emotional wellbeing. 6. Improving Science and Research Science and research is currently being transformed by the new possibilities big data brings. Take, for example, CERN, the Swiss nuclear physics lab with its Large Hadron Collider, the world’s largest and most powerful particle accelerator. Experiments to unlock the secrets of our universe – how it started and works - generate huge amounts of data. The CERN data center has 65,000 processors to analyze its 30 petabytes of data. However, it uses the computing powers of thousands of computers distributed across 150 data centers worldwide to analyze the data. Such computing powers can be leveraged to transform so many other areas of science and research. 7. Optimizing Machine and Device Performance Big data analytics help machines and devices become smarter and more autonomous. For example, big data tools are used to operate Google’s self-driving car. The Toyota Prius is fitted with cameras, GPS as well as powerful computers and sensors to safely drive on the road without the intervention of human beings. Big data tools are also used to optimize energy grids using data from smart meters. We can even use big data tools to optimize the performance of computers and data warehouses. 8. Improving Security and Law Enforcement. Big data is applied heavily in improving security and enabling law enforcement. I am sure you are aware of the revelations that the National Security Agency (NSA) in the U.S. uses big data analytics to foil terrorist plots (and maybe spy on us). Others use big data techniques to detect and prevent cyber attacks. Police forces use big data tools to catch criminals and even predict criminal activity and credit card companies use big data use it to detect fraudulent transactions. 9. Improving and Optimizing Cities and Countries Big data is used to improve many aspects of our cities and countries. For example, it allows cities to optimize traffic flows based on real time traffic information as well as social media and weather data. A number of cities are currently piloting big data analytics with the aim of turning themselves into Smart Cities, where the transport infrastructure and utility processes are all joined up. Where a bus would wait for a delayed train and where traffic signals predict traffic volumes and operate to minimize jams. 10. Financial Trading My final category of big data application comes from financial trading. High-Frequency Trading (HFT) is an area where big data finds a lot of use today. Here, big data algorithms are used to make trading decisions. Today, the majority of equity trading now takes place via data algorithms that increasingly take into account signals from social media networks and news websites to make, buy and sell decisions in split seconds.

18 大数据的十大应用 推动科研。欧洲核子研究组织和美国费米实验室基于例子加速器的高能物理研究都需要大数据分析的支持
优化机器和设备表现。智能仪表帮助优化电网;无人驾驶汽车;数据中心也可以利用大数据工具优化 社会安全和法律实施。国土安全部大数据反恐;信用卡公司利用大数据防范欺诈交易;警察利用大数据预测犯罪行为、追捕罪犯。 智慧城市。 金融 1. Understanding and Targeting Customers This is one of the biggest and most publicized areas of big data use today. Here, big data is used to better understand customers and their behaviors and preferences. Companies are keen to expand their traditional data sets with social media data, browser logs as well as text analytics and sensor data to get a more complete picture of their customers. The big objective, in many cases, is to create predictive models. You might remember the example of U.S. retailer Target, who is now able to very accurately predict when one of their customers will expect a baby. Using big data, Telecom companies can now better predict customer churn; Wal-Mart can predict what products will sell, and car insurance companies understand how well their customers actually drive. Even government election campaigns can be optimized using big data analytics. Some believe, Obama’s win after the 2012 presidential election campaign was due to his team’s superior ability to use big data analytics. 2. Understanding and Optimizing Business Processes Big data is also increasingly used to optimize business processes. Retailers are able to optimize their stock based on predictions generated from social media data, web search trends and weather forecasts. One particular business process that is seeing a lot of big data analytics is supply chain or delivery route optimization. Here, geographic positioning and radio frequency identification sensors are used to track goods or delivery vehicles and optimize routes by integrating live traffic data, etc. HR business processes are also being improved using big data analytics. This includes the optimization of talent acquisition – Moneyball style, as well as the measurement of company culture and staff engagement using big data tools. 3. Personal Quantification and Performance Optimization Big data is not just for companies and governments but also for all of us individually. We can now benefit from the data generated from wearable devices such as smart watches or smart bracelets. Take the Up band from Jawbone as an example: the armband collects data on our calorie consumption, activity levels, and our sleep patterns. While it gives individuals rich insights, the real value is in analyzing the collective data. In Jawbone’s case, the company now collects 60 years worth of sleep data every night. Analyzing such volumes of data will bring entirely new insights that it can feed back to individual users. The other area where we benefit from big data analytics is finding love - online this is. Most online dating sites apply big data tools and algorithms to find us the most appropriate matches. 4. Improving Healthcare and Public Health The computing power of big data analytics enables us to decode entire DNA strings in minutes and will allow us to find new cures and better understand and predict disease patterns. Just think of what happens when all the individual data from smart watches and wearable devices can be used to apply it to millions of people and their various diseases. The clinical trials of the future won’t be limited by small sample sizes but could potentially include everyone! Big data techniques are already being used to monitor babies in a specialist premature and sick baby unit. By recording and analyzing every heart beat and breathing pattern of every baby, the unit was able to develop algorithms that can now predict infections 24 hours before any physical symptoms appear. That way, the team can intervene early and save fragile babies in an environment where every hour counts. What’s more, big data analytics allow us to monitor and predict the developments of epidemics and disease outbreaks. Integrating data from medical records with social media analytics enables us to monitor flu outbreaks in real-time, simply by listening to what people are saying, i.e. “Feeling rubbish today - in bed with a cold”. 5. Improving Sports Performance Most elite sports have now embraced big data analytics. We have the IBM SlamTracker tool for tennis tournaments; we use video analytics that track the performance of every player in a football or baseball game, and sensor technology in sports equipment such as basket balls or golf clubs allows us to get feedback (via smart phones and cloud servers) on our game and how to improve it. Many elite sports teams also track athletes outside of the sporting environment – using smart technology to track nutrition and sleep, as well as social media conversations to monitor emotional wellbeing. 6. Improving Science and Research Science and research is currently being transformed by the new possibilities big data brings. Take, for example, CERN, the Swiss nuclear physics lab with its Large Hadron Collider, the world’s largest and most powerful particle accelerator. Experiments to unlock the secrets of our universe – how it started and works - generate huge amounts of data. The CERN data center has 65,000 processors to analyze its 30 petabytes of data. However, it uses the computing powers of thousands of computers distributed across 150 data centers worldwide to analyze the data. Such computing powers can be leveraged to transform so many other areas of science and research. 7. Optimizing Machine and Device Performance Big data analytics help machines and devices become smarter and more autonomous. For example, big data tools are used to operate Google’s self-driving car. The Toyota Prius is fitted with cameras, GPS as well as powerful computers and sensors to safely drive on the road without the intervention of human beings. Big data tools are also used to optimize energy grids using data from smart meters. We can even use big data tools to optimize the performance of computers and data warehouses. 8. Improving Security and Law Enforcement. Big data is applied heavily in improving security and enabling law enforcement. I am sure you are aware of the revelations that the National Security Agency (NSA) in the U.S. uses big data analytics to foil terrorist plots (and maybe spy on us). Others use big data techniques to detect and prevent cyber attacks. Police forces use big data tools to catch criminals and even predict criminal activity and credit card companies use big data use it to detect fraudulent transactions. 9. Improving and Optimizing Cities and Countries Big data is used to improve many aspects of our cities and countries. For example, it allows cities to optimize traffic flows based on real time traffic information as well as social media and weather data. A number of cities are currently piloting big data analytics with the aim of turning themselves into Smart Cities, where the transport infrastructure and utility processes are all joined up. Where a bus would wait for a delayed train and where traffic signals predict traffic volumes and operate to minimize jams. 10. Financial Trading My final category of big data application comes from financial trading. High-Frequency Trading (HFT) is an area where big data finds a lot of use today. Here, big data algorithms are used to make trading decisions. Today, the majority of equity trading now takes place via data algorithms that increasingly take into account signals from social media networks and news websites to make, buy and sell decisions in split seconds.

19 2013

20

21 大数据项目方法论 Define Design Develop Test Activities Outcomes Storage Server
Network Non-blocking Oversubscribed Physical or virtual Hadoop distribution Distributed file stores NoSQL vs. Hadoop optimized databases Data Integration Ingestion Quality Governance Analytics Scalability Tuning Capacity plan Infrastructure Hardware Software Big Data readiness analysis Proposed Big Data (ref) architecture Overall analysis and recommendations Solution delivery project plan System performance Reliability Proof of Concept Activities Product mapping Server Storage Network Software Analytics tools selection Open Source Search Tableu Informatica Others Data governance model Capacity Plan Big Data Readiness analysis Delivery collaterals Recommendations Reference architecture Consulting / training / support Scalability Limitations Performance Big Data Demo environment Outcomes

22 数据湖 Why Enterprise Information Fabric (EIF)?
Most of the enterprises are preferring to go down the EIF path for the following reasons: Design for unknown and the future 01 Plug & Play model 02 Horizontal and Vertically Scalable 03 Adopting any or all of the digital data sources 04 Focus on value creation rather than integration 05 High degree of Automation 06 Higher reusability 07 Process Consistency 08 Heterogeneous workload management 09 Seamless integration to Cloud and 3rd party platforms 10 Real Time Reporting & Events/Insights/Dashboards/Exploration (access, audit, masking) Data Security Data Governance Enterprise Analytics Fabric Enterprise Data Fabric

23 交响~产生价值 物联网大数据云平台 垂直行业 数据获取和管理 数据科学 -个性化教育- -信息系统- -分析- -传感器- -机器学习-
-物联网- 数据科学 -分析- -机器学习- There is a strong customer demand for complex big data analytics applications across vertical such as energy, healthcare, automotive, and mining. One approach to building such applications is to build them from scratch for every customer. However this approach has the drawbacks as follows. Inefficient, resource intensive, expensive Hard for domain experts to be involved in the development, hence incomplete Loss of opportunities to leverage learnings from other applications This means that we need a different approach, which is shown in Fig.1 on the right hand side (the left hand side shows the first mentioned approach). By building a common analytics framework, we will be able to significantly reduce application development time as well as deployment time. The framework consist of front end that empowers application builders easily to construct analytics logics as well as dashboards and back end that can take care of multiple execution platforms on which analytics operations extending from simple calculation such as mean to complex machine learning algorithms such as predictive maintenance run. The framework enables domain experts to build applications and it can keep incorporating “knowledge” from a new application.

24 Sofia University College of Data Science
硅谷大数据学院 Sofia University College of Data Science 维思 February 2016

25 大数据学院的使命 - Why 天时:引领第四次工业革命时代 人和:你 我 他 地利:硅谷及硅谷精神

26 大数据的三个支柱:交响产生价值-What
平台 Platform -数据获取和管理- DAQ Integration Management -传感器- Sensors -物联网- IoT -智能终端- Intelligent Systems 领域 Verticals -心理学- Psychology -管理科学- Management Science -通讯媒体娱乐- Communication Media & Entertainment -金融- Finance 数据科学 Data Science -可视化- Visualization -分析- Analysis -算法和模型- Algorithm & Modeling -机器学习- Machine Learning Cloud Cyber Security Visibility Automation Optimization Cheaper Faster Better

27 特色与重点 - How 宗旨:产学研结合培养全球化大数据领军人才 师资:学术和行业专家 硅谷精英 课程:精简而实用 理论结合实践
生源:跨境和多样 学位:美国认证的学位和证书 模式:课堂 网络 论坛 美中分校 产业链 出路:实习 推荐 合作 孵化等多种形式 发展:开放型创新

28 对接 咨询 合作 硅谷产业链 IGT There is a strong customer demand for complex big data analytics applications across vertical such as energy, healthcare, automotive, and mining. One approach to building such applications is to build them from scratch for every customer. However this approach has the drawbacks as follows. Inefficient, resource intensive, expensive Hard for domain experts to be involved in the development, hence incomplete Loss of opportunities to leverage learnings from other applications This means that we need a different approach, which is shown in Fig.1 on the right hand side (the left hand side shows the first mentioned approach). By building a common analytics framework, we will be able to significantly reduce application development time as well as deployment time. The framework consist of front end that empowers application builders easily to construct analytics logics as well as dashboards and back end that can take care of multiple execution platforms on which analytics operations extending from simple calculation such as mean to complex machine learning algorithms such as predictive maintenance run. The framework enables domain experts to build applications and it can keep incorporating “knowledge” from a new application.

29 Thank You! 蒋志予博士 三盟硅谷大数据研究院院长 5/28/2016
微信:jzyjiang1986 5/28/2016 1059 East Meadow Circle, Suite 147, Palo Alto, CA 94303, USA (美国), (中国)


Download ppt "硅谷大数据技术应用与发展趋势 三盟教育大数据论坛 海口,海南"

Similar presentations


Ads by Google