Download presentation
Presentation is loading. Please wait.
1
Understanding big data…
Interesting things I’ve discovered on my fellowship so far… Michelle Darlymple Endeavour Teacher Fellow
4
Workshop plan…
5
My fellowship… Project LOs Fellowship… Project Pedagogy Leadership
The Teacher Fellow will report on the development of their knowledge and understanding of: Key terms and concepts relevant to big data. Software for analysis and visualisation of big data. Utilising big data to develop data sets for use in the classroom. Fellowship… Project Pedagogy Leadership
6
Vocabulary Lots of buzz words going round when you start looking at anything to do with “big data” – I’m still getting sorted out what they all mean!
7
Vocabulary
8
Vocabulary Anything that’s not small data!
Megabyte = 10^6 Gigabyte = 10^9 Terabyte = 10^12 Petabyte = 10^15 Exabyte = 10^18 Zettabyte = 10^21 Yottabyte = 10^24 increasing volume (amount of data), velocity (speed of data in and out), and variety (range of data types and sources). Gartner, and now much of the industry, continue to use this "3Vs" model for describing big data If Gartner’s definition (the 3Vs) is still widely used, the growing maturity of the concept fosters a more sound difference between big data and Business Intelligence, regarding data and their use: Business Intelligence uses descriptive statistics with data with high information density to measure things, detect trends etc.; Big data uses inductive statistics and concepts from nonlinear system identification [25] to infer laws (regressions, nonlinear relationships, and causal effects) from large data sets [26] to reveal relationships, dependencies, and to perform predictions of outcomes and behaviors.[25][27]
10
Interesting story 1 DigitalGlobe is focusing its search on the oceans around Malaysia, not on land. Its satellites take scores of photos, which are transmitted into its big data storage banks. Corrections are made to the photos as needed, such as making colors consistent and the contrast uniform, adjusting for the different camera angles [because the satellites are always moving], and detecting clouds that obscure the view. The system then eliminates photos that are unusable. - See more at: Crowdsourcing website, Tomnod.com, being used to enable public to join in the hunt for the missing aircraft. - See more at:
11
Interesting story 1 http://www.tomnod.com/nod/
"So what [DigitalGlobe] does is get the input of many thousands of people, run it through big data filters on the back end that say things like: 'Are there areas of the Indian Ocean where a lot of people have flagged an item of interest?' They then do cluster analysis on that. Then, experts in search-and-rescue may say, 'There's a hot spot, go fly over this.'" - See more at:
12
Interesting story 1 http://www.tomnod.
"So what [DigitalGlobe] does is get the input of many thousands of people, run it through big data filters on the back end that say things like: 'Are there areas of the Indian Ocean where a lot of people have flagged an item of interest?' They then do cluster analysis on that. Then, experts in search-and-rescue may say, 'There's a hot spot, go fly over this.'" - See more at:
13
Careers
14
Careers http://www.icrunchdata.com/big-data-jobs-index.aspx
Big Data refers to the immense amount of data collected and analyzed from every imaginable device in our modern culture, and has fueled one of the most hyper-growth niches of employment in a century. Some sources predict that by 2015, Big Data will create 4.4 million Technology jobs globally, of which 1.9 million will be in the United States. Needless to say, there is a shortage of skilled talent in the industry to support this demand, which creates incredible career opportunities for the highly skilled icrunchdata user community.
15
Careers
16
Data scientist?
17
Data scientist? What Does a Data Scientist Do.flv
18
Data scientist?
19
Data scientist? What Does a Data Scientist Do.flv
But here's the best part: Since it's not a one-dimensional discipline, data scientists can emerge from just about any field. A good data scientist is someone who has the right tools (math, programming, critical thinking), is self-sufficient (doesn't need someone else to implement his or her ideas) and has an interest in understanding the context in which the skills can be applied. This is what the marketplace seeks. /10/data-scientist-the-sexiest-job-of-the-21st-century/
20
Cats & dogs Activity 1 https://www.kaggle.com/
Kaggle is a platform for predictive modelling and analytics competitions on which companies and researchers post their data and statisticians and data miners from all over the world compete to produce the best models. This crowdsourcing approach relies on the fact that there are countless strategies that can be applied to any predictive modelling task and it is impossible to know at the outset which technique or analyst will be most effective. Kaggle has approximately 95,000 data scientists worldwide, from fields such as computer science, statistics, economics and mathematics.[3] It has partnered with organisations such as NASA, Wikipedia, Deloitte and Allstate for its competitions. Kaggle is best known as the platform that's hosting the $3 million Heritage Health Prize.[4] Another recent competition looks at improving gesture recognition for Microsoft Kinect.[5] Competitions have resulted in many successful projects including furthering the state of the art in HIV research,[6] chess ratings[7] and traffic forecasting.[8] Several academic papers have been published on the basis of findings made in Kaggle competitions.[citation needed] A key to this is the effect of the live leaderboard, which encourages participants to continue innovating beyond existing best practice.[9] The winning methods are frequently written up on the Kaggle blog, No Free Hunch.
21
Cats & dogs Activity 1 As a customer shops for an insurance policy, he/she will receive a number of quotes with different coverage options before purchasing a plan. This is represented in this challenge as a series of rows that include a customer ID, information about the customer, information about the quoted policy, and the cost. Your task is to predict the purchased coverage options using a limited subset of the total interaction history. If the eventual purchase can be predicted sooner in the shopping window, the quoting process is shortened and the issuer is less likely to lose the customer's business. Using a customer’s shopping history, can you predict what policy they will end up choosing?
22
Cats & dogs Activity 1
23
Cats & dogs Activity 1 Your turn… Cat or dog?
24
Cats & dogs Activity 1 Picture 1
25
Cats & dogs Activity 1 Picture 1 Cat or dog?
26
Cats & dogs Activity 1 Picture 2
27
Cats & dogs Activity 1 Picture 2 Cat or dog?
28
Cats & dogs Activity 1 Picture 3
29
Cats & dogs Activity 1 Picture 3 Cat or dog?
30
Cats & dogs Activity 1 Picture 4
31
Cats & dogs Activity 1 Picture 4 Cat or dog?
32
Cats & dogs Activity 1 Picture 5
33
Cats & dogs Activity 1 Picture 5 Cat or dog?
34
Cats & dogs Activity 1 Picture 6
35
Cats & dogs Activity 1 Picture 6 Cat or dog?
36
Cats & dogs Activity 1 Picture 7
37
Cats & dogs Activity 1 Picture 7 Cat or dog?
38
Cats & dogs Activity 1 Picture 8
39
Cats & dogs Activity 1 Picture 8 Cat or dog?
40
Cats & dogs Activity 1 Picture 9
41
Cats & dogs Activity 1 Picture 9 Cat or dog?
42
Cats & dogs Activity 1 Picture 10
43
Cats & dogs Activity 1 Picture 10 Cat or dog?
44
Picture 2 Picture 1 Picture 5 Picture 6 Picture 3 Picture 4 Picture 7
Cats & dogs Activity 1 Picture 2 Picture 1 Picture 5 Picture 6 Picture 3 Picture 4 Picture 7 Picture 8 Picture 9 Picture 10
45
Where could this activity lead with your students?
Cats & dogs Activity 1 Where could this activity lead with your students? What direction might the conversations head…?
46
Interesting story 2 http://www.datakind.org/
DataKind™ brings together leading data scientists with high impact social organizations through a comprehensive, collaborative approach that leads to shared insights, greater understanding, and positive action through data in the service of humanity. We believe that improving the quality of, access to, and understanding of data in the social sector will lead to better decision-making and greater social impact. To do this, we offer the following services: Datadives - A DataDive™ is a weekend event that teams selected social organizations that have well-defined data problems with volunteer data scientists to tackle their data challenges. Datacorps - The DataCorps™ is an elite group of data scientists, technologists, project managers, and designers who have been vetted by DataKind and engage with social organizations on a pro bono basis. These projects last between one to six months and are structured so that members can work in their spare time.
47
Interesting story 2 Every minute in India, 11 people die from a treatable disease. Why? In part, because millions living in rural outreaches have severely limited access to a doctor. This problem became the foundation of Mobilizing Health’s mission. The San Francisco nonprofit uses text messaging to connect doctors with sick patients living far away from the health services they need. The docs can collect information, make a diagnosis, and even prescribe medicine. One of the great things about using cell phone technology is the vast amounts of data that you can capture—data that can be used, in this case, to make patient services more effective. Could Mobilizing Health use data from the text communications to manage doctor behavior? Or to anticipate which doctors will be available most quickly? Or to identify the onset of a health crisis? What happened? In December 2011, our data surgeons, led by Chris Diehl, joined forces at the San Francisco DataDive with the crew from Mobilizing Health to find out what could be done with all the data captured in these texts. The team analyzed the number of requests docs were getting, their response rates, and the actual messages between doctors and patients. And they were able to process the text in the messages to identify the medications doctors were prescribing. What's Next? Given that we were able to put together a good picture of doctor behavior, especially with regard to how they prescribe medications, there’s an opportunity to create new protocols that can streamline the provision of services. By understanding how factors drive doctors’ decisions, there’s a real opportunity to help move the process forward. Human interaction should always be a part of health care, but health outcomes may improve dramatically when services are prompt and predictable.
48
What if… Activity 2 https://what-if.xkcd.com/63/
If all digital data were stored on punch cards, how big would Google’s data warehouse be?
49
What information do students need to get started on this problem?
What if… Activity 2 What information do students need to get started on this problem? What can they find themselves? Where do we step in to help?
50
What if… Activity 2 https://what-if.xkcd.com/55/
51
Data visualisations
52
Data visualisations http://guns.periscopic.com/?year=2013
53
Interesting story 3 Every time you go shopping, you share intimate details about your consumption patterns with retailers. And many of those retailers are studying those details to figure out what you like, what you need, and which coupons are most likely to make you happy. As Pole’s computers crawled through the data, he was able to identify about 25 products that, when analyzed together, allowed him to assign each shopper a “pregnancy prediction” score. More important, he could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy.2/19/magazine/shopping-habits.html?pagewanted=1&_r=1&hp One Target employee I spoke to provided a hypothetical example. Take a fictional Target shopper named Jenny Ward, who is 23, lives in Atlanta and in March bought cocoa-butter lotion, a purse large enough to double as a diaper bag, zinc and magnesium supplements and a bright blue rug. There’s, say, an 87 percent chance that she’s pregnant and that her delivery date is sometime in late August. CLICK TO NEXT SLIDE
54
Interesting story 3 So Target started sending coupons for baby items to customers according to their pregnancy scores. Duhigg shares an anecdote — so good that it sounds made up — that conveys how eerily accurate the targeting is. An angry man went into a Target outside of Minneapolis, demanding to talk to a manager: Target knows before it shows. “My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?” The manager didn’t have any idea what the man was talking about. He looked at the mailer. Sure enough, it was addressed to the man’s daughter and contained advertisements for maternity clothing, nursery furniture and pictures of smiling infants. The manager apologized and then called a few days later to apologize again. On the phone, though, the father was somewhat abashed. “I had a talk with my daughter,” he said. “It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August. I owe you an apology.”
55
Big data art European travel patterns
56
Data visualisations
57
iNZight
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.