Some tools and a discussion. GOALS Some tools and a discussion. Patrick Bailey, MS
Do you have a magical mystery tour?
Consider … as the saying goes, “even a blind squirrel finds a nut.” But the best, most predictive models are fundamentally influenced by a modeler with expert knowledge and context of the problem. Kuhn, Max, Johnson, Kjell, Applied Predictive Modeling, Springer, 2013
Process Zumel et al, Practical Data Science with R, Manning 2014
Collaboration Zumel et al, Practical Data Science with R, Manning 2014
An Observation This challenge partly results from the scenario that current data mining is a data driven trial-and- error process (Ankerst 2002) where data mining algorithms extract patterns from converted data via some predefined models based on an expert’s hypothesis. Data mining is presumed to be an automated process producing automatic algorithms and tools without human involvement and the capability to adapt to external environment constraints… In real world data mining, the requirement for discovering actionable knowledge in constraint-based context is satisfied by interaction between humans (domain experts) and the computerized data mining system. This is achieved by integrating human qualitative intelligence with computational capability.
A Facilitator Brings this Together Facilitation applies to a structured meeting with an objective. This does not imply a predetermined decision. Facilitators job Keeps focus on objective Captures what is said (parking boards, mind mapping software etc.) Prods participants with leading and follow up questions Keeps a healthy environment
Asch’s Experiment A C B
Value Big Data Imperatives, Mohanty et al., Apress, 2013
Influence Diagrams An influence diagram (ID) (also called a relevance diagram, decision diagram or a decision network) is a compact graphical and mathematical representation of a decision situation. It is a generalization of a Bayesian network, in which not only probabilistic inference problems but also decision making problems (following maximum expected utility criterion) can be modeled and solved. http://en.wikipedia.org/wiki/Influence_diagram
Components of Influence Diagrams and Decision Trees Represents relationships & sequence of context Decision Uncertainty Objective/ Value
Relevance Arrows (arcs) in Influence Diagrams Event A outcome is relevant to probability of Event B outcome Outcome of Event A is known when making Decision B Decision A is necessary to estimate probability of Event B Decision A is made prior to Decision B A B Relevance Arrows (arcs) in Influence Diagrams Uncertainties become events after they happen
Optimized Growing Season Outcome Actual Weather Example for Farming Weather Forecast Optimized Growing Season Outcome Seed Recommendations Determine Date to Plant Market Preference Select Crop
Considerations Iterative approach. Do you believe the goal can be measured? Can the influences be measured? Consider the data that supports the outcomes of decisions and events. Consider data outside of your boundaries. Does the data reveal a new influence? (Curiosity at play.) Noticeable pattern (nodes with high degree of input)
Also… https://www.moresteam.com/toolbox/fishbone-diagram.cfm
GQM Metric Question Goal
Recommended Reading