Presentation is loading. Please wait.

Presentation is loading. Please wait.

B.Ramamurthy Partially Based on Ben Jones Book [1]

Similar presentations


Presentation on theme: "B.Ramamurthy Partially Based on Ben Jones Book [1]"— Presentation transcript:

1 B.Ramamurthy Partially Based on Ben Jones Book [1]
Communicating Data B.Ramamurthy Partially Based on Ben Jones Book [1] Rich's Big Data Training 11/7/2018

2 Review We spent most of Sessions 1, 2 and 3 with R data analysis software and the Rstudio integrated development environment. Packages, libraries, plots, charts, maps, external data access and worked on many exploratory data analysis In Session 4 we looked at amazon cloud services; we will revisit this in our next session. In Session 5 we looked at Javascript and the expressive JS libraries in d3.js, three.js and jquery. Today in Session 6, we will move to a level of abstraction above all these in two diverse approaches to data analytics and visualization in Gephi and Tableau. Rich's Big Data Training 11/7/2018

3 Overview In this session we will learn how to communicate data with tools such as Gephi and Tableau software. We will begin with Gephi which is quite focused on networks and graphs; then work on Tableau that is quite broad in its application area. Gephi and Tableau are somewhat complimentary to each other. Rich's Big Data Training 11/7/2018

4 Gephi Gephi is an open source freeware for analyzing and visualizing networks and graphs, and detecting communities and discovering relationships; Gephi is originally a contribution of an open source community in France. Especially useful for social network data analysis Download Gephi for Windows or the latest version Rich's Big Data Training 11/7/2018

5 Gephi Applications Analyzing Social Networks Detecting communities
Dynamic Networks Twitter Data analysis Text Network analysis (“mine the talk”) Rich's Big Data Training 11/7/2018

6 Gephi data and Algortihms
Gephi data is very simple: three types of data: nodes, edges and attributes. edges are always between two nodes and attributes are data associated to nodes or edges, like some string or integer results. Nodes and edges structure is called the network topology. Attributes are called network data. Gephi uses well-known algorithms for creation of the graph and graph operations. This information about the algorithms used is given at every step by in the information icon. Rich's Big Data Training 11/7/2018

7 Creating Gephi Dataset
At the fundamental level Gephi needs two sets of data: nodes and edges While Gephi accepts a variety of formats for inputs we will look at graph data represented by (i) xml file with node and edges tags, (ii) csv file each for nodes and edges Rich's Big Data Training 11/7/2018

8 Gephi Tool Layout Overview Data Laboratory Preview 11/7/2018
Rich's Big Data Training 11/7/2018

9 Gephi Workflow Data Laboratory facilitates importing data into the Gephi workspace Overview provides various options for creating and manipulating the basic graph Clustering Ranking Labeling Filtering The Review panel provides aesthetics and export features for capturing the graph created. Rich's Big Data Training 11/7/2018

10 Force Atlas Analysis Graphs are drawn based on similarities and differences (no similarities) in data. Settings can be customized to place more emphasis on individual nodes independence from one another and relative (semantic) proximity to one another. For example, you can specify Attraction Strength and Repulsion Strength: former extracting similarities in the creation of the graph and latter updating the graph for dissimilarities. Attraction pulls the nodes of the graph towards the center and drives dissimilar nodes to the perimeter. Clusters/communities can be identified and actions can be taken to target specific communities for sales campaign or any such activities. ForceAtlas2_Paper.pdf . For those with statistical backgrounds, the tool offers many models along with detailed analysis. Other algorithms: Fructerman-Reingold, Yifan Hu Rich's Big Data Training 11/7/2018

11 Application of Gephi Gephi graph analysis is appropriate for applications that needs discovery of clusters of people, customer, employees, candidates, sales people etc. Customer relationship management, customer segmentation for targeted sales campaign, employee/people resource clusters for various business activities, election campaigns, fund raising campaigns (developments). Rich's Big Data Training 11/7/2018

12 Gephi Exercises We introduce the basic features of Gephi using a data set from digital humanities project. This is a partial project that simply introduces main features. Exercise 2 is a complete network example with a known data set of Les Miserables Broadway show. Rich's Big Data Training 11/7/2018

13 Based on the book by B. JONEs [1]
Tableau Based on the book by B. JONEs [1] Rich's Big Data Training 11/7/2018

14 Outline Huge opportunity to find and share insights contained in data:
“data-driven” applications Communication involves: numbers, words, images and videos There are challenges: meaningful? fidelity? appeal? engaging? useful? breathtaking? Tableau software has developed and created a visualization querying engine and user interface to make it easier to discover and communicate with data. It frees the data from tables and spreadsheets that are indeed originally meant to be input medium Tableau is for everyone, no need to know a programming language Tableau desktop can connect to wide variety of data sources: relational databases, cloud sources, Hadoop technologies, etc. Available for only Windows operating system. Rich's Big Data Training 11/7/2018

15 Data Data refers to any kind of factual information that can be stored and digitally transmitted: Can be news articles, financial information in tables, data bases and so on. Communicating data is an important step in the data discovery process as shown in the next slide Rich's Big Data Training 11/7/2018

16 The discovery process Question Gathering data Structuring data
Exploring data Communicating data Rich's Big Data Training 11/7/2018

17 Discovery process (contd.)
This is a highly iterative process that begins with a question; Domain-specific. Specific question such as “which combination of products occurs most often?” General question such as “what can we learn about historical sales of our products?” Gathering data: Internal , external Buy or methods for gathering data yourself through feeds and APIs, free data available online (R data, amazon data) The Data Science book we used for earlier sessions has given quite a few sources for gathering data Verify the sources for reliability and fidelity Rich's Big Data Training 11/7/2018

18 Discovery Process (contd.)
Data Structuring: This is an arduous process often refereed to as “data wrangling” and “data munging” Cleaning up tags and fillers and Filtering off unwanted data Data is formatted, shaped, merged, converted and made ready for data exploration step We looked at this with an R exercise in Session 3 Our Data science book has many examples: see the example using data extracted via NYTimes API in Chapter 5 Rich's Big Data Training 11/7/2018

19 Discovery Process (contd.)
Exploring data: data is viewed, analyzed from various points of views until one of more insights are gleaned. This exploration provides the insights/discoveries/knowledge/quantitative results Communicating data involves representing the discoveries in a form that the discoveries/insights can be easily understood by decision makers. Rich's Big Data Training 11/7/2018

20 Six principles of communicating data [1]
Know your goal Who? Target audience What? Intended meaning Why? Desired effect Use the right data Does not have to be big data but right data: Example: the story of a single data point 14. Right amount of data: big or small Ethically and legally collected Select suitable visualizations Quantitative, ordinal and nominal data types, each demand different types of visualization Choices: position, length, angle, area, grey ramp, color ramp, color hue, shape, maps Rich's Big Data Training 11/7/2018

21 Six Principles (contd.) [1]
Design for aesthetics (of course) Choose an effective medium and channel Medium: the form the message takes Channel: how it gets delivered Check the results Check the reach, understanding and impact Rich's Big Data Training 11/7/2018

22 Tableau Tableau is a drag and drop analysis and visualization software
It is a level of abstraction above d3.js, three.js and R in that it requires no programming Learning curve for Tableau is flat; one can quickly ramp up and create useful and impressive visuals and analytics Rich's Big Data Training 11/7/2018

23 Main Components of Tableau
Workbook Worksheet Data sources, Plots, charts. Dashboard(s): single interactive visual with one or more sheets worksheets Story: a sequence of interactive visuals with one or more dashboards and worksheets with navigation facilitating presentation dashboards Rich's Big Data Training 11/7/2018

24 Dimensions and Measures
When a user connects to a data source, Tableau automatically classifies each field as either a Dimension or Measure. Dimensions are fields that are used to group or categorize the data Example: Country, State Measure Names Measures are fields that can be used compute: like summing and averaging. Area Population Latitude, longitude Measure values Rich's Big Data Training 11/7/2018

25 Usage of Tableau Excellent tool of team interaction: for encouraging discussions during team meetings to explore “what if” questions. No need for a prepared dashboard or story: just data exploration Dashboards enable you to communicate facts to your management team, to your customer via your web page. Example: create a dash board and display it on your web page, let your audience interact and watch and monitor their interest Story: lets you communicate results to any audience, specifically clients, decision makers, sales force and upper management. Rich's Big Data Training 11/7/2018

26 Tableau Exercises We introduce the main features and basic plots and “worksheet” of Tableau using world data about GDP and population. (Exercise 3) Exercise 4 is a comprehensive example covering most features of a Tableau and an interesting real data set of NHL 100 top point scorers. Exercise 5 continues with the same NHL data with the focus preparing a Tableau “Dashboard” Final exercise is on designing a Tableau “Story” using the World data on GDP and population. Rich's Big Data Training 11/7/2018

27 Summary We studied principles and methods for communicating data
More specifically we looked at Gephi for network/graph analysis, Tableau for drag-drop data analytics and visualization We also worked on complete examples illustrating the features of the two tools. Rich's Big Data Training 11/7/2018

28 References B. Jones. Communicating data with Tableau, Designing, developing and delivering data visualizations, O’Reilly, 2014. Rich's Big Data Training 11/7/2018


Download ppt "B.Ramamurthy Partially Based on Ben Jones Book [1]"

Similar presentations


Ads by Google