Download presentation
Presentation is loading. Please wait.
1
The ABCs of SSIS! Glenda Gable Email: ggable.it@tig313.com
Blog: sql313.com LinkedIn: linkedin.com/in/tig313 Welcome to a crash course on the ABCs of SSIS I am Glenda Gable – My contact information is listed here, but will also be available at the end as well, in case you have further questions. Speaking of questions, if we have a few right before the demo, I will take some, but the majority of questions will be held until the end. I tried to pack quite a bit in knowing there is a recorded version that you can reference things at a later time when you need. The url is on the resources slide.
2
Agenda This presentation shows the basics of SSIS to help with automating database tasks, such as maintenance, importing and exporting data, or ETL transactions. The discussion will start with an understanding of when SSIS should be used vs. when a database object, such as a view or stored procedure, should be used. You will also see how to create a basic package, how to use the built-in logging and configurations, and a view of a decent sized ETL used for creating a data warehouse for BI. Lastly, we will talk about the different ways SSIS is used, based on the role of the person using it and the importance of organizing the overall SSIS structure. What is SSIS? When to use SSIS? SSIS versus T-SQL Synchronous vs. Asynchronous Non, Semi and Full Blocking Demo !! View a decent sized ETL used for creating BI warehouse Importance of overall ETL organization As for the agenda… we are going to go through SSIS in a step by step manner, so you can see how useful it is as a tool for automating database tasks (maintenance plans, importing/exporting data, etc.). I have listed the topics here so you can see the order of what we will cover. Personally, I love events like this because these short sessions give me enough to start out. Then, I can research and grow my knowledge on my own. I tried to take that approach with this presentation – throw enough info at you to get you started, give you some things to think about, and give you resources to find out more.
3
What is SSIS? Microsoft Definition: Microsoft Integration Services is a platform for building high performance data integration solutions, including extraction, transformation, and load (ETL) packages for data warehousing. Integration Services includes graphical tools and wizards for building and debugging packages; tasks for performing workflow functions such as FTP operations, executing SQL statements, and sending messages; data sources and destinations for extracting and loading data; transformations for cleaning, aggregating, merging, and copying data; a management service, the Integration Services service for administering package execution and storage; and application programming interfaces (APIs) for programming the Integration Services object model. Origins and How to Develop in SSIS: For SQL Server 2000, SSIS didn’t exist, instead, Data Transformation Services (DTS) was used. SSIS was introduced in SQL Server It isn’t just a new version of DTS, it was actually built from the ground up as a separate application. For SQL 2005, 2008 and 2008 R2, all development of SSIS packages happens in Business Intelligence Development Studio (BIDS), from within Visual Studio. For SQL 2012, SSIS development occurs in SQL Server Data Tools (SSDT). SSIS stands for SQL Server Integration Services I am not going to read the official definition, but… it is mainly used for data warehousing, although it can be useful for other things as well. Being able to perform workflow functions, extracting and importing, cleaning, aggregating, merging, and copying data – and be able to debug it – is great. Although the debugging isn’t as robust as I personally would like (and I don’t think I am alone in that), it is much better than nothing. SQL Server actually introduced SSIS in the 2005 version. Before that DTS was the only tool to use. SSIS was a big step up from DTS. From what I have heard, 2012 has made another big step up in many ways. I haven’t used 2012 yet – but I am chomping at the bit for it!!
4
When to use SSIS? Performance Hardware SQL Server version
Capabilities/Features Data Sources/Destinations Development/Maintenance/Upgrade considerations Extensibility What are you and/or your co-workers comfortable with? James Serra - Vincent Rainardi - Dan English - Martin Schoombee - I wanted to discuss the reasons for using SSIS, as well as just show you how. Mostly because there are so many differences of opinions on this topic and it is valid. I think most agree it comes down to each situation being evaluated independently, but these listed, are the majority of considerations when needing to possibly use the type of functionality that SSIS provides. In this presentation, I am talking about using either SSIS or T-SQL. I will go into each topic listed in more detail.
5
Hardware; SQL Server version
Performance SSIS uses server memory to do manipulation, whereas T-SQL uses the SQL engine. Some things are done much quicker and easier in T-SQL. For example, a JOIN statement in T-SQL is quicker than a lookup transformation in SSIS. However, there are ways of using T-SQL within SSIS to leverage both, but the performance should be compared to see what works best in your environment. Hardware; SQL Server version What does your hardware look like - can it support memory intensive transactions without impacting performance, or did your company spend more on disks? Also, which version are you using, for example SQL 2000 – SSIS doesn’t exist (psst... perfect excuse to upgrade if you can!). Capabilities/Features ; Data Sources/Destinations There are things that are much easier to do in either T-SQL or SSIS. Each has its own set of capabilities that should be leveraged. For example, importing files is much easier in SSIS. With this idea, knowing where the data is coming from and going into also helps decide which is a better option to use. With multiple data sources and types of sources, such as Oracle, XML, files, etc., SSIS is better equipped than T-SQL. Also, the complexity of the task can make an impact, T-SQL had advantages in this. Development/Maintenance/Upgrade considerations Just because SSIS has a graphical interface doesn’t mean it is always quicker to use for development, sometimes T-SQL is faster. For instance, if you have several tables to combine, it may take longer to drag and drop the sources, and configure the unions, than it would be to write the full select statement in T-SQL. Also, think about the likelihood that when an upgrade happens to SQL Server, things will need to be “fixed”. An example of this is when T-SQL code needs to be changed because of changing features in an upgrade. Performance I will talk briefly about a script component that will help gauge performance very well later. Hardware Did your business spend more money on disks and not as much on memory, or vice versa? Development/Maintenance/Upgrade considerations: Also consider when a new column is added, the T-SQL or database object (view, stored procedure, etc.) might break (if you used select * ), but the SSIS package wont break. The notification will be displayed, or logged indicating the data structure is out of sync, but it will still work without a change. Now of course, if you didn’t use select * (which I would fuss at for doing so) – you wont have a problem with T-SQL or database objects either. Extensibility In SSIS, the C# script task allows soooo much more to be accomplished, both database and non-database related tasks, that it is a big consideration to make. T-SQL is limited to database tasks only. Another option within SSIS is built in logging features, which T-SQL doesn’t have.
6
What are you and/or your co-workers comfortable with?
Lastly, what is your environment going to be able to work with most comfortably? Does your company have certain standards or best practices that will dictate which tool is to be used? What do you feel comfortable receiving from a co-worker with no verbal instructions at all? This can be a scary thing to think about – how many developers/administrators comment or document as well as they should? It’s almost like documentation and testing are bad words. Everyone has had something dropped in their laps that they didn’t know how to do (if you havent – you will – lol). If everyone is using something that is commonly known, it takes some of the pressure off when tasks get assigned with an URGENT deadline. Consider the following situations: You are on vacation and some high-up executive (i.e. the wicked witch) has changed priorities and there is work to be done IMMEDIATELY! (EEEK!!) You are on vacation and something breaks (ugggg…) Its better to have something that someone else can fix – so you don’t have to dial in from the beach, the pool, or the slopes! Someone left the company and you are the “lucky” soul that inherited their work.
7
SSIS versus T-SQL All in all… each situation should warrant a review of what is best to use. I tried to show some advantages of each, and possible thoughts to follow, without showing preferential treatment for one over the other. There are MANY debates about which is better. There is no right answer. I have a few urls at the end to check out if you are wanting to read more. You can also google: when to use ssis vs tsql
8
Synchronous vs. Asynchronous
The difference between the synchronous and asynchronous components are if the output requires a new buffer to be created, or if the existing buffer can be used. Synchronous components – uses the same buffer # of records IN equals # of records OUT Asynchronous components – creates a new buffer # of records IN may/maynot equal # of records OUT Synchronous is faster, and MUCH preferred. However, most of the time a transformation is chosen due to its capabilities, not whether it is synchronous or not.
9
Non, Semi and Full Blocking
A dataflow within SSIS contains three types of transformations – each determines how the data passes through it before going to the next component Non-blocking components – the data flows through with minimal pause to the next component (synchronous) Semi-blocking components – as the data flows in, small chunks of data are held and then passed to the next component (asynchronous) Full-blocking components – all data is stopped and held until it is completely done with it’s task before passing to the next component (asynchronous) Note: It might be good to drop these names in an interview – makes you sound like you know what you are talking about Non-blocking - Adding a computed column, as long as it is based on the data already there, it can do that for each individual row. Semi-blocking – when you are using a union – data is grabbed and held in chunks. Full-blocking – When you sort – you need ALL the data before you can sort it. SQL Saturday – Aug 2012 – Baton Rouge – first session with Tim Costello
10
Non-blocking transformations Semi-blocking transformations
Audit Character Map Conditional Split Copy Column Data Conversion Derived Column Lookup Multicast Percent Sampling Row Count Script Component Export Column Import Column Slowly Changing Dimension OLE DB Command Semi-blocking transformations Data Mining Query Merge Merge Join Pivot Unpivot Term Lookup Union All Full-blocking transformations Aggregate Fuzzy Grouping Fuzzy Lookup Row Sampling Sort Term Extraction There are ways of getting around using some of the Semi- or Full- blocking transformations, but not in all cases. I will show an example in the demo of getting around the Sort transformation specifically. * Bolded components are used frequently
11
DEMO How to create a basic package Deploying and executing packages
SOME tricks of the trade Event Handlers Built-In Logging capabilities Package Configurations File system capabilities (ftp, move, etc.) Time to look behind the curtain and see what is behind the wizard’s magic! QUESTIONS?? Creating a basic package: Microsoft BOL has step by step procedures to use: There are also MANY posts online about how to start learning and using SSIS. SOME tricks of the trade: if you use the “Execute SQL Task” component, it can call a stored procedure or run T-SQL directly. This allows for a hybrid approach. The only downside is that you might have to open the SSIS package to see what is being run, and then flip back to the database to pull up the stored procedure. Use the “SQL command” – not necessarily the “Table or view” to pick columns from – there is a performance issue with using the table name. Deploying – property for “Don’t Save Sensitive Data” – helps migrating from 1 database instance to another Deploying – for ease to deploy, the BI Helper add-on is VERY helpful! Sort – can be done in some cases without using the sort transformation Marking a file read-only allows it to be open and still used when a package is executed Data Viewers are great to help see what the data truly looks like Executing a package from the server itself is required to use the package configuration files, otherwise, the configuration files on your PC will be used If you use a view as a datasource - the data types arent all recognized. If you put the view's select statement within the datasource, the data types are recognized very easily. Performance checks – When I attended the SQL Saturday in Baton Rouge in Aug 2012, one of the sessions I had was with Tim Costello. He mentioned the script component that Todd McDermid wrote to gauge the performance of each component, or package. You can find out more here:
12
This is what I look like when I forget 1 small detail when working with SSIS!! I find I like having someone else to talk to, and I end up solving it myself half the time. Its ok to be frustrated (at least that is what I keep telling myself) The Devil is in the Details… Personal side-note…. Properties/setting data types/configurations/etc
13
View a working BI - ETL This is an ETL that is what is used by my company. I started here in March, and have been making improvements here and there. I am in the planning stages for a major overhaul in organization of the packages and logging, then will be planning on optimizing the packages, starting with the low-hanging fruit first. This is not a display of best-practices, nor is it a scare tactic. Mostly, I just wanted to show how big an ETL process can be, for those not exposed to it before. Keep in mind, there are many ETLs that are MUCH bigger than this.
14
Importance of overall ETL organization
Think about how to log multiple ETLs running at the same time When an ETL is initially created, it normally “morphs” into more when you add more sources – so think about structuring your packages in a way that is easy to get to when you are in development Think about the order of the process when structuring the packages Think about how to keep all things about 1 ETL separate from another, such as package configuration files, data source files, etc. Also paralellism When they wanted to start doing BI reporting, the ETL was small, but it wasn’t thought of as growing. Therefore, there are a lot of things that should be separated, but are all pushed together.
15
Summary Get to know the transformations well – so you know if SSIS is the best tool for the job at hand. The more you get into SSIS – the more you realize you didn’t know. There are a lot of settings and tweaking that can be done to make things run better/faster. Think about how you want this organized when there are so many packages doing stuff you cant remember what they all do – and have multiple ETLs going at the same time! Have fun playing!!
16
Resources Presentation Slides/Recording – available on my blog
Microsoft BOL – SSIS Tutorial Performance WireTap Todd McDermid - GOOGLE! also MANY great books! SSIS vs. T-SQL debate James Serra - Vincent Rainardi - Dan English - Martin Schoombee - Send Mail Task – uses SMTP
17
Q&A Glenda Gable Email: ggable.it@tig313.com Blog: sql313.com
LinkedIn: linkedin.com/in/tig313
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.