In this article we will explain about Data Profiling Task.
Links to other Labs of Learn MSBI Step by Step.
Learn MSBI Part 1:- Creating SQL tables, Practicals of SSIS-ETL(Extraction, Transformation & Loading)
Learn MSBI Part 2:- Performing Validation on Records
Learn MSBI Part 3:- SCD (Slow Changing Data)
Learn MSBI Part 4:- Fact and Dimension.
Learn MSBI Part 5:- CUBE Creation
Learn MSBI Part 6:- Displaying reports using SSRS
Learn MSBI Part 7:- DQS (Data Quality Services)
Learn MSBI Part 9:- Pivot, UnPivot and Aggregation
Learn MSBI Part 10:- Execute SQL Task component (SSIS)
Learn MSBI Part 11:- Send Mail Task & Script Task Component
Learn MSBI Part 12:- CDC Concept using SSIS
Learn MSBI Part 13:- Asynchronous, Synchronous, Full-blocking, Semi-blocking & No-blocking
Learn MSBI Part 14:- Understanding Page Split and maintaining SSIS Performance
Learn MSBI Part 15:- Debugging in deployment – Instrumentation using Data Taps
Learn MSBI Part 16:- Using Multi-threading and its performance in SSIS program
Data Profiling Task component primarily serves checking quality of the incoming data from excel file, CSV file or SQL Server file. Data Profiling exactly runs algorithm and does quality checking of the incoming data by checking null records, unique records, empty records or some junk records. And by getting to know quality of data managing SSIS project can be very much easy.
Following is the sample input data from CSV file contains many field with comma separated like Name, Email Address, DOB, Salary etc and many more. Next running of Data Profiling task will be done which will show its profile and quality of the data of this file would be known.
Data Profiling Task works only on SQL Server as source in other words it only profile data if it is available on SQL Server it does not perform profiling directly on excel or CSV file. So to make running Data Profiling Task on above mentioned CSV file it has to be imported on SQL Server
Import CSV file into Database i.e. SQL Server
Go to SQL Server Management Studio and open Object Explorer do a right click on the database “CustomerDataWareHouse” and click on Task then click on Import Data
It will open wizard to import and export, very first screen is welcome screen which can be skipped by doing a tick mark on “Do not show this starting page again” and then click on Next.
Next screen is to choose Data Source.
- Click on General tab and under it go and select “Flat File Source” from dropdown as Data Source.
- Select CSV file from local system just by clicking on “Browse” button.
- Do a Preview of the uploaded CSV file by clicking on Preview tab. Once found things correct under Preview go ahead and click on Next.
Next screen of the wizard is to choose Destination type, on Destination field select from dropdown –> “Microsoft OLE DB Provider for SQL Server” and give server name, here server name is “KHADAK-PC”.
Use Authentication as Windows and select Database as CustomerDataWareHouse.
Then click Next to move to following next screen.
Here in this new screen select and check input, here set Source to CSV file selected from local drive and then set Destination on SQL Server table here name is automatically taken by the wizard which is [dbo].[DataProfiling.csv]
Once the fields are selected then click on Next.
Under Next screen keep things as it is only i.e. keep fields on this screen remain set to default values only and just click on Finish over here.
Here comes final screen of the wizard to do preview of all the field selected in the wizard if all things are found correct just click on Finish.
It will start executing the wizard and start importing the CSV file once it is successfully executed it will show message displayed as “The execution was successful”.
And then click on Close.
Next go to Object Explorer and click on refresh to update the latest addition of the imported table under Tables of Database “CustomerDataWareHouse” called as “dbo.DataProfiling.csv”
Once CSV file is successfully imported and gets properly displayed on SQL table now it is time to check imported data quality. Now on Visual Studio 2013, go to Solution Explorer and do right click on SSIS Packages to add new package as shown in the below image.
Once new design package is open then drag and drop Data Profiling Task from SSIS Toolbox. Data Profiling Task is a Control Flow Task component which means that it is going to be visible on Control Flow tab only and it available under SSIS Toolbox as shown in the image below.
Once the component is on Control Flow now rename the component and Edit to configure it. Once Edit mode is open click on the first General tab. Under General tab specify location to save the profile output of data generated which will be running on imported CSV currently available as SQL table.
For that do a right click on the Data Profiling Task and click on “Edit” in order to configure it. It will open Task Editor on that click on left side click on the General tab. Now under General tab on Destination field click dropdown mark and then click on “New File connection” which will open File Connection Manager Editor.
Under Connection Manager Editor set field “Usage type” from dropdown as “Create file” and give File location by clicking on “Browse” to save text file of profile created on local drive. Once connection properties are set click on OK to close File Connection Manager editor.
Next is to select various profile that will run on the imported data for that click on Quick Profile available button at the bottom as shown in the image below. Once clicked it will open “Quick File Form” screen where there are following fields: –
ADO.NET Connection: click on New and select Server and Database name
Table or view: Here select Table name on which profile need to run.
Compute: Under this select all necessary profile or selected profile just tick beside to select profile which will run on input data and then click on OK to close Quick Profile screen.
Now under Profile request tab will see all the requested profile and on after clicking each profile its properties can be seen under below window. After checking details of each profile and click on OK to close Data Profiling Task Editor.
Once the component is configured red cross mark on the component has gone off.
Next is to set “DataProfiling.dtsx” as startup package by doing right click on “DataProfiling.dtsx” and click on “Set as Starting Object”.
Now execute the package by clicking on Start button available at the top as shown in the image below.
After the package is executed it will show green mark on the component which states that it has executed successfully also go to respective folder on the drive selected earlier in the wizard it has created file of profile called DataProfiling.txt.
Once file is found open “DataProfiling.txt” it contains content which is in XML format but it is not readable.
In order to read the output of profile do right click and then click Edit on the “Data Profiling Task“ component which will open “Data Profiling Task” Editor on that click on Open Profile Viewer which will make readable the profile file .
Once “Data Profile Viewer” screen is opened it will show each profile in details with figures(counts & percentage) and graphical presentation as shown in the image below.
Also view our below video Learn MSBI Project Series:-