Presentation is loading. Please wait.

Presentation is loading. Please wait.

applications and skills required

Similar presentations


Presentation on theme: "applications and skills required"— Presentation transcript:

1 applications and skills required
Data Science: applications and skills required

2 A Society that is “Always On”
Society, organizations, and people are “Always On”. Your “data plan” keeps you (always) in the touch Data are collected about anything, at any time, and at any place. Register for courses Check/post on social media Pay toll with your PeachPass Track your exercises with FitBit

3 Internet of Events

4 Examples of Big Data Bit (0 or 1) and Byte (8 bits: big enough for a char.) Kilo-Byte ~= 1000 Bytes (1024 to be exact) Mega-, Giga-, and Tera- are common now Peta-, Exa-, and Zetta- An IDC study estimates that the amount of digital information stored in 2014 already exceeded 4 Zettabytes and predicts that the “digital universe” will to grow to 44 Zettabytes in 2020. The study characterizes 44 Zettabytes as “6.6 stacks of iPads from Earth to the Moon”. Twitter produces over 90 million tweets per day. eBay uses two data warehouses at 7.5 petabytes and 40PB as well as a 40PB Hadoop cluster for search, consumer recommendations, and merchandising. Walmart handles more than 1 million customer transactions every hour, which are imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data – the equivalent of 167 times the information contained in all the books in the US Library of Congress.

5 Big Data – characteristics
Volume - The quantity of generated and stored data. Variety - The type and nature of the data. Velocity - The speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development. Veracity - The quality of captured data can vary greatly, affecting accurate analysis. Variability - Inconsistency of the data set can hamper processes to handle and manage it.

6 The Big Data Mindset Design marketing processes with data in mind: reengineer marketing processes to collect relevant data Engage in R&D everywhere:  promote a culture of testing throughout the organization Use predictive analytics: identify customer patterns and generate targeted offers Challenge conventional wisdom: data analytics can provide definitive answers, there’s no excuse for using the status quo as a default

7 A Big Challenge … One of the main challenges of today’s organizations is to extract information and value from data stored in their information systems.

8 Data Science: definition

9 Data Science: illustration
Data science aims to turn data into real value… Data Value Extract Trasform-ation Learn-ing Structured: DB;Spreadseet Un-Structured: ; text Big Small Static Streaming Any type of visualization delivering insights

10 Contributing Disciplines
Courses required in the CSC/CPS programs Programming STA/MAT courses Database DM/ML methods Graphics & Visualization

11

12 Data Scientists: what they do
Assist organizations in turning data into value. A data scientist answers questions, like • (Reporting) What happened? • (Diagnosis) Why did it happen? • (Prediction) What will happen? • (Recommendation) What is the best that can happen?

13 Positions in a DS Team Data analyst Data engineer (data wrangler)
Data scientist Specialists Algorithms & Performance Visualization Big data tools

14 8 Skills You Need to Be a Data Scientist

15 Machine Learning/Data Mining Tasks
Classification (map data into predefined groups) Regression (map a data item to a real valued prediction variable) Prediction (similar to classification, but deal with a future state) Clustering (similar to classification, but the groups are defined by the data) Association rules (identifies association among data) Sequence discovery (determine sequential patterns in data)

16 What can we do in class? Data science awareness in CSC 125
Formulas, functions, charts (w/ Excel), queries (Access) Importing, transforming, sorting and filtering CSC/CPS courses Programming, DB, Software Engineering, Visualization, HPC Intro to Data Science Data exploration and processing Machine learning

17 Sample Application: Customer Attrition
With a customer attrition analysis (telecom) Churn or not churn Dataset with 3333 rows (customers) and 21 columns State, area code, phone number AccountLength, IntlPlan, VMailPlan, VMailMessage Minutes, # calls, and charge for Day/Eve/Night time IntlMins, IntlCalls, IntlCharge CustServCalls

18 Going Through the Process
Data exploration Correlations btw DayMins, DayCalls, DayCharge

19 Going Through the Process
Data exploration: ratios Impact of International plan Impact of # of service calls

20 Visual Data Mining Impact of multiple variables # of service calls
Day minutes

21 Machine Learning Models
Decision tree Divide data into training and test groups (/w similar dist.) Training group (80% or 2666/3333)  build model Test group (the rest 20%)  evaluate model

22 Machine Learning Models
The tree model Evaluation Correlation is

23 Sample Application: process mining
Adds process perspective to ML and DM Seeks for confrontation btw event data (observed) and process model (hand-made or discovered)

24 Merging Framework

25 Affinity Function How to evaluate the strength of the bindings between antigen and antibody occurrence frequency (AOF) temporal relation (OLT) event attibute value (EAV)

26 Occurrence Frequency <𝑎,𝑏,𝑐,𝑑,𝑒,𝑓> An execution sequence

27 Occurrence Frequency <𝑎,𝑏,𝑐,𝑑,𝑒,𝑓> <𝑎,𝑏,𝑐,𝑑,𝑒,𝑓>
18 juni 2018 Occurrence Frequency <𝑎,𝑏,𝑐,𝑑,𝑒,𝑓> <𝑎,𝑏,𝑐,𝑑,𝑒,𝑓> occurrence frequency of <𝑎,𝑏,𝑐,𝑑,𝑒,𝑓> is 2 in the fragment of log.

28 Occurrence Frequency occurrence frequency of <𝑎,𝑏,𝑐,𝑑,𝑒,𝑓> is 2
18 juni 2018 Occurrence Frequency occurrence frequency of <𝑎,𝑏,𝑐,𝑑,𝑒,𝑓> is 2 occurrence frequency of <𝐼,𝐽,𝐿,𝑀,𝑄,𝑅> is 2 When two cases match, the occurrence frequencies of their execution sequence are ‘equivalent’ statistically.

29 Temporal Relation If two cases match, there is some time overlap between them. IN000001 T23:08:04 T14:03:02 TK00005 T09:18:32 T16:55:37

30 Temporal Relation

31 Event Attribut Value In real life processes, it often happens that some values are passed from event to event between two cases belonged to two different logs but identical whole process.

32 Process Mining Apps in Healthcare
What happened? What is the typical treatment of patients having acute myeloid leukemia? What is the typical working day of a surgeon? Why did it happen? What caused the unusual amount of incidents in the department? Why was the service level agreement not reached? What caused the long waiting list? What will happen? Is this patient likely to deviate from the normal treatment plan? How many beds are needed tomorrow? Is it possible to handle these five new cases in time? What is the best that can happen? Which check should be done first to reduce flow time? How many physicians are needed to reduce the waiting list by 50%?

33 Processes for Experiment


Download ppt "applications and skills required"

Similar presentations


Ads by Google