Lab 2. Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several.

Slides:



Advertisements
Similar presentations
1 After completing this lesson, you will be able to: Insert a table. Navigate and select cells within a table. Merge table cells. Insert and delete columns.
Advertisements

MS-Word XP Lesson 7.
Congratulations! You have just installed the Presentation Game Add-In.
Microsoft Office XP Microsoft Excel
San Jose State University Engineering 101 JKA & KY.
Analyzing Bivariate Data With Fathom * CFU Using technology with a set of contextual linear data to examine the line of best fit; determine and.
Introduction to Excel 2007 Part 2: Bar Graphs and Histograms February 5, 2008.
Tutorial 5: Working with Excel Tables, PivotTables, and PivotCharts
LSP 120: Quantitative Reasoning and Technological Literacy Section 118 Özlem Elgün.
Guide to Oracle10G1 Introduction To Forms Builder Chapter 5.
Microsoft Office 2007 Access Chapter 2 Querying a Database.
® Microsoft Office 2010 Word Tutorial 3 Creating a Multiple-Page Report.
Working with Vector Graphics – Lesson 21 Working with Vector Graphics Lesson 2.
Clementine Tutorial. This tutorial will introduce you to the Clementine toolkit for data mining and show you how to get started with your own data mining.
Introduction To Form Builder
Microsoft Excel 2010 Chapter 8
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms Psych 209.
1 After completing this lesson, you will be able to: Format numeric data. Adjust the size of rows and columns. Align cell content. Create and apply conditional.
Chapter 2 Querying a Database
InDesign CS3 Lessons 1 and 2. Work Area When First Opened.
Technology Basics Creating Worksheet Formulas. 2 Understand Formulas Equations used to calculate values in cells are called formulas. Formulas consist.
Chapter 3 Maintaining a Database
Computer Literacy BASICS
6 Copyright © 2004, Oracle. All rights reserved. Working with Data Blocks and Frames.
Chapter 2 Querying a Database MICROSOFT ACCESS 2010.
Chapter 6 Advanced Report Techniques
9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key”
Basics of creating a Virtual Patient Centre for Medical & Healthcare Education eLearning Unit Steven Malikowski & Chara Balasubramaniam Press the F5 Key.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
Automating Database Processing Chapter 6. Chapter Introduction Design and implement user-friendly menu – Called navigation form Macros – Automate repetitive.
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
Advanced Project Plan Formatting Lesson 14. Skills Matrix SkillsMatrix Skill Customize the calendar view Format bar styles for tasks in the Calendar view.
Introduction to Clementine Tutors: Cecia Chan & Gabriel Fung Data Mining Tutorial.
Office Management Tools II Ms Saima Gul.  When you create your tables, you should assign each table a primary key—one or more fields whose contents are.
Microsoft Office 2007 Access Chapter 2 Querying a Database.
® Microsoft Office 2010 Word Tutorial 3 Creating a Multiple-Page Report.
Getting Started with TI-Interactive. TI-Interactive TI-Interactive can be used to create a variety of graphs. Scatter Plots, Line Plots, Histograms, Modified.
CTS130 Spreadsheet Lesson 9 - Building Charts. What is a Chart? A chart is a visual display of information in a worksheet. Charts can help you make comparisons,
Office 2003 Post-Advanced Concepts and Techniques M i c r o s o f t Access Project 7 Advanced Report and Form Techniques.
Copyright 2007, Paradigm Publishing Inc. ACCESS 2007 Chapter 2 BACKNEXTEND 2-1 LINKS TO OBJECTIVES Creating Related Tables Creating Related Tables Determining.
Microsoft Access 2010 Chapter 8 Advanced Form Techniques.
FIX Eye FIX Eye Getting started: The guide EPAM Systems B2BITS.
® Microsoft Office 2010 Exploring the Basics of Microsoft Windows 7.
Course ILT Graphics and mail merge Unit objectives Insert clipart and charts, add AutoShapes, insert and format a picture, and delete graphics Create a.
Using Google Sheets To help with data. Sheets is a spreadsheet program that can interface with Docs, or Slides A spreadsheet program has cells (little.
Chapter 8: Plotting. After completing this Chapter, you will be able to use the following features: Planning the Plot Sheet Plotting Environments Plotting.
LSP 120: Quantitative Reasoning and Technological Literacy Topic 1: Introduction to Quantitative Reasoning and Linear Models Lecture Notes 1.3 Prepared.
Lesson 1: Exploring Excel Learning Objectives After studying this lesson, you will be able to:  Explain ways Excel can help your productivity.
Web Page-Chapter 6 Forms. Inserting a Form  Display the Insert bar  Click the arrow to the right of the display category on the Insert bar and then.
 Columns  Rows  Cells  Ranges  Cell addresses  Column headers  Row headers  Formulas  Spreadsheet.
TABLE OF CONTENTS 2014 BasmahAlQadheeb. What is a report? A report is a clearly structured document that presents information as clearly as possible.
Modify Tables and FieldsModify Tables and Fields Lesson 4 © 2014, John Wiley & Sons, Inc.Microsoft Official Academic Course, Microsoft Word Microsoft.
Key Applications Module Lesson 22 — Managing and Reporting Database Information Computer Literacy BASICS.
Extracting Information from an Excel List The purpose of creating a database, or list in Excel, is to be able to manipulate the data elements in ways that.
Adobe Photoshop T.Ahlam Algharasi. Adobe Photoshop Adobe Photoshop is a seriously powerful photo and image edit ( treating and manipulation, compositing,
Customizing Menus and Toolbars CHAPTER 12 Customizing Menus and Toolbars.
1 After completing this lesson, you will be able to: Build formulas. Copy formulas. Edit formulas. Use the SUM function and AutoSum. Use the Formula Palette.
Laboratory Exercise # 10 – Microsoft Word Additional Topics Office Productivity Tools 1 Laboratory Exercise # 10 Microsoft Word Additional Topics Objectives:
Copyright © 2009 Pearson Education, Inc. Slide 4- 1 Practice – Ch4 #26: A meteorologist preparing a talk about global warming compiled a list of weekly.
CONDITIONAL FORMATTING AND CUSTOM NUMBER FORMATS LEC 5 1.
1 Access Lesson 1 Understanding Access Fundamentals Microsoft Office 2010 Fundamentals Story / Walls.
Reports. Reports display information retrieved from a database in an attractive printed format. Reports can be created directly from tables, but More.
1 Excel Lesson 7 Working with Multiple Worksheets and Workbooks Microsoft Office 2013 Introductory.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Using Macros Lesson.
Clementine Tutorial.
Lesson Five: Building Custom Patient Lists
Data Scenario: Header and Details files
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms
Creating Additional Input Items
Presentation transcript:

Lab 2

Exploring the Data with Graphs During data mining, it is often useful to explore the data by creating visual summaries. Clementine offers several different types of graphs to choose from, depending on the kind of data that you want to summarise.

For example, to find out what proportion of the patients responded to each drug, use a Distribution node

Place a Distribution node in the workspace and connect it to the Source node (don't forget to use your middle mouse button). Then double-click the Distribution node to open its dialog box and set the options for display

Select Drug as the target field whose distribution you want to show. Then, click Execute from the dialog box

The distribution graph helps you see the "shape" of the data. It shows that patients responded to drug Y most often and to drugs B and C least often

Exploring the Data with Graphs Now let's look more closely at what factors might influence Drug, the target variable. As a researcher, you know that the concentrations of sodium and potassium in the blood are important factors. So let's create another graph, this time looking at how the Na and K values influence the choice of drug.

Since these are both numeric values, you can create a scatterplot of sodium versus potassium, using the drug categories as a color overlay.

Place a Plot node in the workspace and connect it to the Source node. (Remember to drag with your middle mouse button.) Then, double-click the Plot node to open its dialog box.

Select K as the X field, Na as the Y field, and Drug as the overlay field. Then, click Execute.

Note: You can also create the plot by clicking the Execute button in the dialog box.

The plot clearly shows a threshold above which the correct drug is always drug Y and below which the correct drug is never drug Y. This threshold is a ratio--the ratio of sodium (Na) to potassium (K).

So far, you have been exploring the data using graphs. Next, we'll move on to data preparation where we'll perform a common data mining operation--deriving a new field.

Before moving on, you may want to clean up the workspace. Delete the two Graph nodes and the Table node. To delete a node, right-click on it and choose Delete from the context menu. Or, select multiple nodes with your mouse and press the Delete key.

Since the ratio of sodium to potassium seems to predict when to use drug Y, you should derive a field that contains the value of this ratio for each record. This field might be useful later when you build a model to predict when to use each of the five drugs.

To derive a new field, start by inserting a Derive node into the stream.

Remember, you can automatically connect nodes by first selecting the Source node in the canvas and then double-clicking the Derive node from the palettes.

Then, double-click the Derive node to open its dialog box and specify a method for creating the new field.

Name the new field Na_to_K. Since you obtain the new field by dividing the sodium value by the potassium value, enter Na/K for the formula. You can also create a formula by clicking the icon just to the right of the field

This opens the Expression Builder, a way to interactively create expressions using built-in lists of functions, operands, and fields and their values.

Using the Expression Builder is covered in-depth later in this guide. Click here to jump ahead now.

You can check the distribution of your new field by attaching a Histogram node to the Derive node. In the Histogram node dialog box, specify Na_to_K as the field to be plotted and Drug as the overlay field.

When you execute the stream, you should get the graph shown here. Based on the display, you can conclude that when the Na_to_K value is about 15 or above, drug Y is the drug of choice.

So far, by exploring and manipulating the data, you have been able to form some hypotheses. The ratio of sodium to potassium in the blood seems to affect the choice of drug. But you cannot fully explain all of the relationships yet.

This is where modeling will likely provide some answers.