Learn MSBI Part 7:- DQS (Data Quality Services)

Posted by in MSBI (SSIS, SSRS, SSAS) category on for Beginner level | Points: 250 | Views : 3924 red flag

In this article we will explain about DQS (Data Quality Services)

Links to other Labs of Learn MSBI Step by Step.

Learn MSBI Part 1:- Creating SQL tables, Practicals of SSIS-ETL(Extraction, Transformation & Loading)

Learn MSBI Part 2:- Performing Validation on Records

Learn MSBI Part 3:- SCD (Slow Changing Data)

Learn MSBI Part 4:- Fact and Dimension.

Learn MSBI Part 5:- CUBE Creation

Learn MSBI Part 6:- Displaying reports using SSRS

Learn MSBI Part 8:- Data Profiling Task

Learn MSBI Part 9:- Pivot, UnPivot and Aggregation

Learn MSBI Part 10:- Execute SQL Task component (SSIS)

Learn MSBI Part 11:- Send Mail Task & Script Task Component

Learn MSBI Part 12:- CDC Concept using SSIS

Learn MSBI Part 13:- Asynchronous, Synchronous, Full-blocking, Semi-blocking & No-blocking

Learn MSBI Part 14:- Understanding Page Split and maintaining SSIS Performance

Learn MSBI Part 15:- Debugging in deployment – Instrumentation using Data Taps

Learn MSBI Part 16:- Using Multi-threading and its performance in SSIS program

DQS (Data Quality Services) is all about giving end user such a tool from where data coming is proper i.e. bad data is itself stopped by end user to get into the system and only good or quality data is in.

To understand the exact meaning for the above text mentioned let make clearer of an idea how that can be done. Consider below example where data has been randomly entered into the sheet where for same product under the list there are different names for each case like

Cello Gripper, Lexi and Pen are -> Pen so they should be clubbed together

Nataraj, Staedler and Pencil -> Pencil

Kores and Rubber -> Rubber(eraser)

And such data is best judged by end user so such stationary or the particulars rather than putting it differently it can listed as one entity so there are chances to reduce bad data content entering the system. Below is the image which depicts the same: -

  1. With DQS such authority can be given to end user by rectifying bad data and entering only correct data into the system
  2. End user will correct bad data or remove bad data based on his/her knowledge base and this knowledge base should be improved over a period of time.
  3. And last some approval mechanism should be in place for the good knowledge base which is implanted to keep good data and rectify bad data.

Once DQS tool is enabled into our SQL server then using updated knowledge base system end user can distinguish between good data and bad data.

Next this article will demonstrate complete practical steps for DQS

Very first step to do installation of DQS from SQL Server version available in the system.

For that go to Start -> Menu -> All Programs then expand SQL Server folder under that expand Data Quality Services and under to that click on Data Quality Server Installer.

Once clicked on the "Data Quality Server" installer it will open installer in command prompt as shown in the image below

It will prompt to set password for Database Master Key. Set here password with mixed combination of keys with uppercase letters, alphanumeric, numeric and special characters.

Please Note : For installation of DQS will take maximum 5 to 10 mins depending on hardware of the system on which it is installed.

Once password is set then it will proceed for installation and once the installation is done it will show successfully executed screen of DQS as shown in the image below.

After the installation is done go to SQL Server Management Studio and under Databases it will show DQS installed with following databases under it :-

DQS_MAIN: It is the most important databases out of all three. Knowledge base actually gets saved into it.

DQS_PROJECTS: So whatever knowledge base is saved under DQS_MAIN is used and applied on the data also it gets saved under DQS_PROJECTS.

DQS_STAGING_DATA: Data uploaded from various sources like excel, CSV comes to DQS_STAGING_DATA on temporarily basis over here data is reformatted again and goes into database.

Please Note :- With these DQS database table a user is not directly use here introduction is done so that in future if server migration is needed to be done then at that time required migration database should be known. All these three DQS database should be migrated from server to server.

It is highly recommended to not to directly use these three DQS databases instead make use of Data Quality Client which is used to work on DQS.

In order to open DQS Client go start menu of the system -> All Programs -> expand Microsoft SQL Server 2012 -> expand Data Quality Services -> Data Quality Client.

It will open the DQS client start up screen as shown in the image below :-

After being asked for Server name, provide the server name where the database of DQS is available. If not sure about the server name then go to open SQL Server Management Studio click on File a Connect Object Explorer copy server name from here and paste it on the SQL Server DQS screen and then click on "Connect" as shown in the image below.

Once proper server name is given SQL Server DQS it will get connected and then open following screen. Here under DQS there are three sections :-

Knowledge Base Management - DQS first will create Knowledge base then

Data Quality Projects - this Knowledge base can be utilized by creating a project and

Administration - once project is created finally it will be administered/monitored the progress of the project.

As this is a new project, first a New Knowledge Base will be created by clicking first option under Knowledge Base Management.

It will open screen to create new Knowledge Base Management give it a suitable name here name given to it is "CustomerClear" and then select Activity as "Domain Management" followed by clicking "Next" to move to following screen.

Following screen is to create new Domain Management, for that click on icon of creating new domain which will open new screen called "Create Domain". Give a suitable name to domain, here given as "StationaryName" uncheck "Enable Speller" as spell check will be done manually in this case and then click on OK.

Once domain is created for that next is to define rules and establish relations. So starting with first tab named "Term Based Relation" under which it will correct input values to a common synonym like Cello Gripper and Lexi are pens so their common synonym is pen so it will be put under pen and similarly it is for Nataraj and Staedler are pencils and for it common synonym is pencils.

For that click on the Add icon and manually enter the values as shown in the image below.

Next tab is of "Domain Rules", in other words this tab will help to set validation on the fields by using domain rule which will be applied on input field.

For that click on Add Icon to add new rule then give suitable name and add description to it. Thereafter click on Add icon below given to add expression. Here set expression to

Values does not match a regular expression = [a-z]{1,20}

Above expression states that it will expect only input values having only alphanumeric values (special characters and numeric characters not allowed) with minimum length of 1 and maximum 20 characters will be accepted.

Next tab is of "Domain Values" for time being it is going to be kept unchanged as this is required when there are multiple projects. And in this example there is only one project.

And last tab is of "Reference data" this is also not going to be used as it will be required when other data provider are referred too.

Thereafter click on Finish and then finally click on Publish.

Once it is published it can be seen on the main section at the bottom of Knowledge Base. Thereafter it can be edited any time by clicking Domain Management.

Next after Knowledge base it created then create Date Quality Projects by clicking on "New Data Quality Project".

After Data Quality Projects screen is open the give suitable name to it and add description to it. Then select the Knowledge base created from the dropdown and finally before clicking on Finish select activity as Cleansing.

Next Data Quality Projects is four steps process and for each step there are tabs: -

1. Starting with first tab called "Map" under which selection of the database (in this case excel file will be selected) is going to be done to cleanse against mapped domains.

  • Under Map tab select "Excel fiie" as data source from the dropdown.
  • Select Excel from location of local drive just by clicking on Browse.
  • Under Mappings set Source Column –> StationaryName(String) and Domain –> StationaryName from dropdown option available under Mappings.
  • Then click on Next to move on to following tab.

2. Following tab is of Cleanse, here click on Start to perform cleanse the file selected as Data Source once it has done the process of cleansing it will show the Cleansing records in percentage how much it is successful and also it will show Source Statistics. Once found everything is correct then click on Next.

3. In the third tab called Manage and View results is to set Approval for all the record selected from datasource excel file starting with first correct records from "Invalid" tab as shown in the image below. Set record "909890" as correct to "Rubber" & set null record "DQS_NULL" correct to "NotKnown". Since now these invalid record has been corrected click on Approve available button at the right side.

And other record available under "New" tab mark as Approve and then click on Next.

Finally is the time for setting Export tab, before exporting do a final review of Output data which is available on the left side. Once review is done and found output data is correct then on right side select Destination Type as "CSV File" for the output and browse the output to save it on local system. Also select the Output Format as "Data Only" and click on Export.

Once the output is produced in CSV format go to the location on local hard drive and open CSV file and get the output file compared with old excel file named DQS file. Notice the changes which has been done on old fields & resulting the change as it should be which is clearly notice on new output file.

And the third section is of Administration which will work on earlier created Knowledge Base and Data Quality Project. Here under Administration monitoring will be done on Data Quality activities and also system settings can be done.

Once clicked on Activity monitoring it will open Administration screen from where activity can be track like is it active or inactive, elapsed time and start or end time. Activity of both knowledge base and Data Quality Project can be checked.

Also get to see following video on MSBI project series which start from the basic to start SSIS : -

Page copy protected against web site content infringement by Copyscape

About the Author

Full Name: Ahtesham Shaikh
Member Level: Bronze
Member Status: Member
Member Since: 10/3/2016 2:36:04 AM
Country: India
Ahtesham Shaikh
Hey Friends, Myself Ahtesham a coder, writer & technical supporter for IT world and for my company Questpond. Having 10+ years of experience in microsoft technologies like WCF, WPF, MVC, Sharepoint, AngularJS, MSBI, Excel 2000 - 2013, Powerpoint, C#, SQL Server and so on. I just find myself happy and satisfy by doing this job. Apart from profession love reading novels and other digital network books. I'm a faculty at Questpond and do take trainings on c#, object oriented programming, msbi, sharepoint, mvc, mvc, angularJS and so on. Feel free get in touch me for one-to-one trainings, offline trainings @Andheri Mumbai, Online training via microsoft live meeting.

Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)