Learn MSBI Part 2:- Performing Validation on Records

Posted by in MSBI (SSIS, SSRS, SSAS) category on for Beginner level | Points: 250 | Views : 5154 red flag
Rating: 5 out of 5  
 1 vote(s)

In this article we will explain how to perform validation in SSIS.

Links to other Labs of Learn MSBI Step by Step.

Learn MSBI Part 1:- Creating SQL tables, Practicals of SSIS-ETL(Extraction, Transformation & Loading)

Learn MSBI Part 3:- SCD (Slow Changing Data)

Learn MSBI Part 4:- Fact and Dimension.

Learn MSBI Part 5:- CUBE Creation

Learn MSBI Part 6:- Displaying reports using SSRS

Learn MSBI Part 7:- DQS (Data Quality Services)

Learn MSBI Part 8:- Data Profiling Task

Learn MSBI Part 9:- Pivot, UnPivot and Aggregation

Learn MSBI Part 10:- Execute SQL Task component (SSIS)

Learn MSBI Part 11:- Send Mail Task & Script Task Component

Learn MSBI Part 12:- CDC Concept using SSIS

Learn MSBI Part 13:- Asynchronous, Synchronous, Full-blocking, Semi-blocking & No-blocking

Learn MSBI Part 14:- Understanding Page Split and maintaining SSIS Performance

Learn MSBI Part 15:- Debugging in deployment – Instrumentation using Data Taps

Learn MSBI Part 16:- Using Multi-threading and its performance in SSIS program

This article will demonstrate how to do validation in SSIS practically. If you are directly landed on this article do not forgot to refer Lab 1

In practical scenario always there are chances to have unwanted or junk records which are not useful so in such scenarios there should be some mechanism to handle or sort out such issues.

In programming world we call it as Validation and is performed on the actual raw data which comes from CSV file in our case.

For example in the following image of CSV you will find some zero records and some negative records under “CustomerAmount” so such records should be validated and moved to error file or bad record files.

Our goal in this lab would be to do validation in practical with proper steps demonstrating to all readers. Following image algorithm shows how validation should work. All proper records will go into database whereas the bad records i.e. “CustomerAmount” which are equal to zero(=0) and less than zero(<0) should go in error log files.

In order to achieve it practically go to SSIS toolbox on SSDT – VS 2013 from toolbox and introduce new component called “Conditional Split” as shown in the below image

  1. Drag and drop from toolbox Conditional Split on Data Flow between Capital Letter and Customer DataWare.
  2. Drag output of Capital Letter and drop it as input to Conditional Split.
  3. Do right click on Conditional Split and click on Edit to configure the component.

Here you will find the Condition expression is in red colour even though syntax are correct. Red colour of syntax is due to

“Data Type issue which is non-numeric in form and comparison can only be done on Numeric Data Type”

Here there is a need to do conversion from String to Numeric.

Before we proceed to conversion lets understand basics of datatypes.

Identifying DataType

Connection Manager plays here a vital role when we have Extraction and Loading. During Extraction we have created a Connection Manager named as “CSV Connection” which you will be able to find it below the design mode as shown in the following image

SSIS package is designed to think that data can be of any type for example in our case we have CSV file as an input raw data and it is a free form text where data can be of any type i.e. numeric, alphanumeric, date and time etc. so in order to accommodate such text it is treated and by default it is kept as string Data Type.

So you will find that “CustomerAmount” column is by default available as string DataType as shown in the below figure: –

In order to do comparison we have to introduce new component from Common Transformation available under SSIS Toolbox as “Data Conversion” which we have renamed it as CustomerAmount Conversion before Conditional Split and after Derived Column -“Capital Letter”

Steps to ADD and CONFIGURE “Data Conversion” component: –

  1. Drag and drop “Data Conversion” component on Data Flow.
  2. Rename the component to “CustomerAmount Conversion” for ease of understanding the component use. It will convert DataType of incoming raw data which is string[dt_string] to currency[DT_CY].
  3. Drag and drop the Capital Letter output blue arrow as input to “CustomerAmount Conversion” component.
  4. Now double click or do right click “CustomerAmount Conversion” component in order to configure it.

It will now open Data Conversion Transformation Editor Window so that you can configure it. This window will show Available Input Column whose DataType can be converted using this component.

  1. Here we will select “CustomerAmount” column under Available Input Column for DataType conversion.
  2. Set an Output Alias name to existing “CustomerAmount” column here we have named it to CustomerAmountCurrency.
  3. Select DataType in order to convert it according to your requirement. Here we have set it to currency [DT_CY
  4. And now finally click on OK in order to close the Data Conversion Transformation Editor Window.

After you have configured “CustomerAmount Conversion” component: –

1. Now drag the output of this component as input to “Condition Split” component as you see in the following image.

2. Open “Condition Split” component in Edit mode in order to configure it, by double clicking or right click Edit component.

3. Under Condition Split Transformation Editor give an Output Name “Error”.

4. Write the condition if you are good enough to write the condition or just drag and drop the condition in order to form expression as shown in below image where it validate data coming from “CustomerAmountCurrency” less than 0 i.e. negative values or equal to 0.Below is the formed Condition written as expression: –

(([CustomerAmountCurrency < 0] || ([CustomerAmountCurrency == 0]))

5. Once Condition writing is finished this time you will find that equation will not be marked in red colour as here comparison of DataType is done successfully and Condition will execute properly.

6. Click on OK to close the Conditional Split Transformation Editor Window.

After you have configured “CustomerAmount Conversion” component.

1. Now drag and drop Conditional Split output blue arrow as input to “Customer DataWare House”. As soon as blue arrow is drag and dropped it will show “Input Output Selection” window here you will see two output: –

  • one is “Conditional Split Default Output” and
  • other is “Error”.

“Conditional Split Default Output” will carry actual output which will go to table of our database under “Customer DataWare House” where Error will carry output of the Condition expression i.e. negative values(<0) or equal to 0(==0) as written above under Conditional Split Transformation Editor.

2. We will select “Conditional Split Default Output” under “Input Output Selection” window.

3. And click on OK.

Now we have configured one output of Conditional Split to Customer DataWare House where actual output will go in to database table and other output of Conditional Split is the Error where bad records i.e. negative values(<0) or equal to 0(==0) will be moved.

Configure second output “Error” of Conditional Split

For that from SSIS Toolbox we have to add “Other Destination” with “Flat File Destination“ as shown in the following image

  • Drag and drop from Other Destinations à Flat File Destination on Data Flow.
  • Rename the same for easy understanding to Error Destination.
  • Configure the component by double clicking it or do right click Edit.

Once Flat File Destination Editor window is opened then Add New Connection Manager for Error Destination File

  • Click on Connection Manager.
  • Cick New for Adding New Connection Manager.
  • Flat File Format selector window will open here select “Delimited” and click on OK. As our input selected file are comma separated file for New column we have used Delimited. There are also other format selectors are available as you can see in the following image with description and can be selected as when they are needed.

After Destination Flat File format is selected then Editor window will open: –

  1. Give suitable name to Connection Manager, here we have given name “ErrorConnectionManager”.
  2. From left select “General” tab, browse and select file text file from local computer where you want to save Destination file data. Here we have selected “ErrorFile.txt” where our bad data columns will go. Keep rest values set it to default as shown in the image below
  3. Now from left select “Column” tab, keep default values for Row as {CR}{LF} & Column Delimiter as Comma{,}
  4. Under “Advanced” tab, just go through each column’s properties here you will find its properties are available with default values so let it remain set to default values.
  5. Next under “Preview” tab, you will find all column names without data. After you do debug then destination flat file will be filled with data if any bad is present.
  6. Once you go through all tabs then click on ok to close current window.

Now before clicking OK and closing Flat File Destination Editor window just ensure that new ‘ErrorConnectionManager” is selected.

Next click on Mapping tab to check proper Mappings between Input Column & Destination Column as you can see in the following image snapshot below. Now click on OK and close the editor window.

After configuring all component press ctrl+F5 on keyboard or click on “Start” available on top menu in order to run and debug the package. Refer to the following image here you will find that two components are running successfully where as there is an error in the third component as red cross marked sign gets appeared on component and the application gets stopped straight away. This is because of default behavior of the application which states that it should stop if any error occurs in it.

Now in order to keep application running we need to change the current default behavior of the running application. So to handle occurred error we will now fix it by introducing new component which take into account red arrow error output.

Very first we will change default behavior of component where we got error. So in our case we will change default behavior of “CustomerAmount Conversion” component. Below are detailed steps for the same: –

  1. Do double click or do right click and Edit on “CustomerAmount Conversion” component.
  2. Editor will open and on the bottom left side you will see “Configure Error Output” just click on it.
  3. Here on Error you will see drop down out and on that you will see other three options, select “Redirect row” which means if error occurs row will be redirected through red arrow output. Second option is “Failed component” means it will as soon as error occurs which is default behavior of component and last is “Ignore failure” which stated that if error occurs then it will not stop application instead it will move ahead with error and execute application.
  4. Just next to Error you will see Truncation which is also having same dropdown and option under it as Error. Truncation will have truncated data whenever we have size mismatch of the characters within the application for example if we have six characters allowed within the application and 9 characters are being passed then those 3 characters will be truncated. And depending on option selected it will “Redirect row” or “Failed component” or it will “Ignore failure”.
  5. In case of big application where there are number of rows and if you want to mark anyone dropdown option to all columns then just select desired rows and select option from here and click “Apply” button, which will mark the same option to those selected rows.

Once red arrow error output of “CustomerAmount Conversion” is set to “Redirect Row” then you will see following a yellow warning image icon as shown in the following image which expresses that if it is not configured to any transformation or destination component then its output through red arrow will be lost.

Now in order get/store those error output we will follow below written steps: –

  1. We will drop a destination component from other destination on the Data Flow from SSIS Toolbox and the output or “CustomerAmount Conversion” Error coming from red arrow will be given to this new component.
  2. Rename this new component to “ConnectionErrorRows”.
  3. Drag and drop red arrow output of “CustomerAmount Conversion” component as input to “ConnectionErrorRows” component. As soon as red arrow is drag and dropped it will open a new window of “Configure Error Output”.
  4. By selecting it from dropdown now set Error and Truncation to “Redirect Row” option.
  5. Click OK to close “Configure Error Output” window.

Red arrow gets pointed to “ConversionErrorRows” component with name Data Conversion Error Output on red arrow.

Next we will configure “ConversionErrorRows” similarly which we did earlier for “Error Destination” do double clicking it or do right click Edit on “ConversionErrorRows” to open it Editor.

Once Flat File Destination Format Editor window opens: –

  1. On the left side click on Connection Manager.
  2. Click on New.
  3. Select Flat File Format as Delimited and click OK.
  4. Flat File Connection Manager Editor window will open over here give suitable name to Connection Manager, here we have given name “ConversionErrorFile”.
  5. From left select “General” tab, browse and select file text file from local computer where you want to save technical error data. Here we have selected “ErrorOutput.txt”. Keep rest values set it to default as shown in the image below
  6. Now from left select “Column” tab, keep default values for Row as {CR}{LF} & Column Delimiter as Comma{,}
  7. Under “Advanced” tab, just go through each column’s properties here you will find its properties are available with default values so let it remain set to default values.
  8. Next under “Preview” tab, you will find all column names without data. After you do debug then destination flat file will be filled with data if any bad is present.
  9. Once you go through all tabs then click on ok to close current window.

Now on Connection Manager, check “ConversionErrorFile” is correctly set.

Now on the left click on Mappings tab, give a final check on Input Column is correctly Mapping to Destination Column.

Once Mappings details found are correctly set then click on OK.

Now our all components are configured, we will debug the package by clicking ctrl+F5 on keyboard or click on Start icon available on the top of Menu.

Once package is running in debug mode you will see green colour checked mark on the top-right side of the all the component which means that each component is working and running correctly.

  1. Total 8 rows are being read from CSV file and goes for transformation process for capitalization.
  2. After capitalization it will go through Customer Amount Datatype conversion process under it has found one non-numeric form and is detected as bad data and moved it ConversionErrorRows component this is one type of technical error found in the input raw data.
  3. Once DataType conversion from string to money/currency is done now it will go for Conditional split component for business validation where it will check for amount values with zero or less than zero i.e. negative values if found any it will be moved to Error Destination.
  4. Error Destination will has found two rows with business validation errors which it has moved to Error Destination.
  5. Finally correct Data with correct values i.e. 5 rows will further go to Customer DataWare House i.e. in database table “dbo.Customer”.
  6. Go to tables and check correctly 5 rows are updated in the database.

Also you can go and check on the location where each text file is saved,

  • First is “ErrorFile.txt” which will contain output of Error Destination which is Business Validation error containing equal to zero(==0) and negative(-50) less than 0(<0) values in it.
  • Second is “ErrorOutput.txt” and it will save technical error which comes from raw data, here we have non-numeric data in one of the row which will go to this “ConversionErrorRows” component.

To see default output i.e. 5 rows which goes in “Customer DataWare House” component can be seen under SQL Server Management Studio in “CustomerDataWareHouse” database with table named “dbo.tblCustomer”.

On SQL Server Management Studio under Object Explorer go to “CustomerDataWareHouse” database –> Tables –> dbo.tblCustomer just do right click and click on “Select Top 1000 Rows”.

It will display query on right top side and at bottom you will see table with 5 rows present in it.

That’s all above we have receive output as per set validation criteria.Hope you understood written each step of practical for Conditional split, Data conversion and Error handling.

Below is one of the video from our project series called MSBI(SSIS, SSAS and SSRS) hoping you will enjoy it: -

Page copy protected against web site content infringement by Copyscape

About the Author

Full Name: Ahtesham Shaikh
Member Level: Bronze
Member Status: Member
Member Since: 10/3/2016 2:36:04 AM
Country: India
Ahtesham Shaikh
Hey Friends, Myself Ahtesham a coder, writer & technical supporter for IT world and for my company Questpond. Having 10+ years of experience in microsoft technologies like WCF, WPF, MVC, Sharepoint, AngularJS, MSBI, Excel 2000 - 2013, Powerpoint, C#, SQL Server and so on. I just find myself happy and satisfy by doing this job. Apart from profession love reading novels and other digital network books. I'm a faculty at Questpond and do take trainings on c#, object oriented programming, msbi, sharepoint, mvc, mvc, angularJS and so on. Feel free get in touch me for one-to-one trainings, offline trainings @Andheri Mumbai, Online training via microsoft live meeting.

Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)