Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Virtualization Demoette… Data Lineage Reporting

Similar presentations


Presentation on theme: "Data Virtualization Demoette… Data Lineage Reporting"— Presentation transcript:

1 Data Virtualization Demoette… Data Lineage Reporting
Hello, and welcome to the Demoette series for Cisco Information Server, or CIS. In this Demoette, we discuss Data Lineage Reporting in CIS.

2 Agenda What is it and why does it matter? A basic demo Summary
Here is our agenda. We begin by defining Data Lineage Reporting and outlining its importance for our customers. Next we walk through a very basic demo of Data Lineage Reporting. Finally, we summarize the contents of this demoette.

3 Agenda What is it and why does it matter? A basic demo Summary
Let’s begin by discussing what Data Lineage Reporting is and why it is important for our customers.

4 What is it? CIS Developers are familiar with the Data Lineage graph that can be displayed for any resource in CIS Studio. <CLICK> In contrast, Data Lineage Reporting is a way to produce lineage analysis reports that may include many CIS resources.

5 What is it? Data Lineage Reporting CIS Stored Procedures:
GetColumnDependencies GetColumnReferences SQL Scripts Batch reporting on groups of resources CIS ships with two stored procedures that form the basis of its Data Lineage Reporting capability. GetColumnDependencies provides a column-by-column report of the upstream data lineage of a CIS resource. GetColumnReferences provides a similar report of downstream data lineage. These procedures are great for reporting on any individual resource. As we shall see in this demoette, however, we can write some very simple SQL Scripts that allow us to build reports on meaningful groups of resources, and we can define these groups in any way that matters to our business.

6 Why does it matter? Data Lineage Reporting Impact analysis
Reusability analysis Data curation Input to third-party software Data Lineage reporting is important to our customers for many reasons. First of all, it enables impact analysis, as organizations seek to understand the overall enterprise impact of schema changes on underlying physical data sources. Lineage reporting may also be used for other purposes, such as uncovering CIS artifacts that may be reusable across projects, or for data curation efforts. Lineage data may also be used as input to third-party software, such as report generators.

7 Agenda What is it and why does it matter? A basic demo Summary
Next, let’s walk through a very basic demo of Data Lineage Reporting.

8 Demo: Here is the business problem…
CIS Developers Here is the business problem we are trying to solve in this demoette. Our CIS developers can use the graphical lineage view in CIS Studio to understand lineage for any given resource they need to touch. This is sufficient for their purposes.

9 Demo: Here is the business problem…
However, Project Managers need to take a larger view, especially to do impact analysis on schema changes to underlying data sources over time. These people need a single view of lineage data that spans all the resources in a CIS Studio project-level folder, and they need it in a form that is easy to share, like the Excel spreadsheet shown here. Project Managers

10 Demo: before you begin…
Before you begin the demo, you will need to install the CAR file that is found in the additional resources that accompany the demo. This demo uses CIS system tables and the Example views that ship with the product, so there is no need for any external data sources. The CAR file consists of four scripts and a view. There are two very simple scripts that show how we can report Dependencies and References for an individual resource.

11 Demo: before you begin…
We also have two more interesting scripts that show how we can report on a group of CIS resources.

12 Demo: before you begin…
Finally, we have a view that we use to provide a list of resources to our multiple-resource scripts.

13 Demo: Dependencies for a single resource
Let’s begin our demo by seeing how we can call the Dependencies stored procedure for a single resource. We have written a very simple script, whose signature is shown here. <CLICK> Our script accepts an input parameter that names the fully-qualified CIS resource we want to analyze. We also provide two other parameters required by the Dependencies procedure. The first tells whether or not we want to ignore caches in our analysis, and the second specifies whether or not we want to perform recursive analysis along the entire dependencies chain. <CLICK> The script returns an output result set defined by the Dependencies procedure. You can find the meaning of each column on the Info tab of the Dependencies procedure. Note that the CIS dependencies procedure only analyzes Views; it does not accept requests to analyze folders, SQL Scripts, or other resource types. <CLICK> At run time, we enter the parameters for a single resource, and execute our script. <CLICK> The script returns a result set with one row for each column in the resource we specified. <CLICK> Here is a view of a single row from our results. The “derivations” column is especially interesting. It gives us a complete description of the trail from our resource back to the original data source. Of course, we could save the result set to a file, an open it in Excel if we wanted to share the information with non-CIS users.

14 Demo: References for a single resource
Our single-resource script for References is similar to that for Dependencies. The CIS dependencies procedure is a bit simpler, so we only need to supply the name of the resource we want to analyze. <CLICK> At run-time, we enter the name of a view. <CLICK> A result set is returned with one row for each column in the view. <CLICK> We can drill down on details for any row.

15 Demo: Dependencies for multiple resources
At this point, we have seen how we can use very simple scripts to call the Dependencies and References procedures in CIS. However, we still haven’t solved our business problem, because our Project Managers want to create reports containing groups of CIS Views. Let’s solve this problem next. We want to build lineage scripts that can accept a parameter list containing all of the view names for a report. This parameter view might come from an Excel spreadsheet, or from any other source. We need to be able to accept any view that contains a fully-qualified resource name. Here is an example that we will use for this demo. <CLICK> This example creates a filtered view from the CIS system table ALL_RESOURCES. <CLICK> We filter ALL_RESOURCES so that our view only returns Tables from the shared/examples folder in the CIS namespace. Remember that when CIS views are published, they are exposed as tables, so the CIS system data uses the term table. Our resources, however, are actually CIS virtual views. <CLICK> Finally, we generate a fully-qualified resource name column by concatenating the path and name columns for each selected view. <CLICK> When we execute this view, we get information about all of the Views in the shared/examples folder that ships with CIS. <CLICK> We will use our generated column, FULLY_QUALIFIED_RESOURCE_NAME, as input to our new scripts.

16 Demo: Dependencies for multiple resources
Here is the signature for our new general-purpose Dependencies script. <CLICK> As you can see, it returns the same data structure as our earlier script. <CLICK> However, instead of accepting a single input view name, it accepts the name of a view that contains a list of views we want to analyze. It also accepts the column name we want to use within that view that contains fully-qualified resource names.

17 Demo: Dependencies for multiple resources
Our new script builds a dynamic SQL statement using the view name and column name we supplied as input parameters. <CLICK> It then iterates over the results, and calls the Dependencies procedure once for each view in our parameter result set.

18 Demo: Dependencies for multiple resources
As our results tab shows, we get a single report that contains lineage analysis for all of the tables in our parameter view. <CLICK> We can save the results to a file, and open it in Excel for sharing and further analysis.

19 Demo: References for multiple resources
Our new References script is similar to the Dependencies script. It accepts a view name and column name for any view that contains a list of resources to be analyzed. <CLICK> When we execute the procedure, we get References analysis for all of the views in our parameter set. Our Project Managers now have all the information they need to do impact analysis at the project level. Our demo is complete.

20 Agenda What is it and why does it matter? A basic demo Summary
Let’s summarize what we have seen in this presentation.

21 Summary CIS Lineage Procedures: GetColumnDependencies
GetColumnReferences SQL Scripts Batch reporting on groups of resources Benefits Impact analysis Reusability analysis Data curation Input to third-party software CIS ships with two stored procedures that form the basis of its Data Lineage Reporting capability. GetColumnDependencies provides a column-by-column report of the upstream data lineage of a CIS resource. GetColumnReferences provides a similar report of downstream data lineage. As we have seen, it is easy to build SQL Scripts that allow us to report on meaningful groups of resources, and we can define these groups in any way that matters to our business. Data Lineage reporting is important to our customers for many reasons. First of all, it enables impact analysis, as organizations seek to understand the overall enterprise impact of schema changes on underlying physical data sources. Lineage reporting may also be used for other purposes, such as uncovering CIS artifacts that may be reusable across projects, or for data curation efforts. Lineage data may also be used as input to third-party software, such as report generators. Thank you.

22 TOMORROW starts here.


Download ppt "Data Virtualization Demoette… Data Lineage Reporting"

Similar presentations


Ads by Google