Let us work with Data Factory - Step by step explanation

Rajnilari2015
Posted by in Azure category on for Beginner level | Points: 250 | Views : 384 red flag

Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. Azure does not support an as it is model of SSIS package. However, we can achieve the same by using Data Factory. In this article, we will perform a simple workflow using the Data Factory approach to trigger a stored procedure which will in turn insert some records into a Azure SQL Table using Stored Procedure Activity.


 Download source code for Let us work with Data Factory - Step by step explanation

Introduction

Data Factory is a cloud-based data integration service that orchestrates and automates the movement and transformation of data. Azure does not support an as it is model of SSIS package. However, we can achieve the same by using Data Factory. In this article, we will perform a simple workflow using the Data Factory approach to trigger a stored procedure which will in-turn insert some records into a Azure SQL Table using Stored Procedure Activity.

Prerequisite

Please go through Step by Step approach to create a Azure SQL Database and export data from Non-Azure SQL Server 2014 Database article before proceeding with this exercise.

Step by Step Approach

Step 1: Create the DB objects like Table and Stored Procedure

Let us execute the below script in our Azure SQL Database using SQL Server Management Studio which will create tblStudent table and usp_InsertStudent that will insert the records into the tblStudent.

--Script to create Student table
CREATE TABLE [dbo].[tblStudent](
	[StudentID] [int] IDENTITY,
	[StudentName] [varchar](50) NOT NULL,
	[DOB] [datetime] NOT NULL
)
GO

--Script to insert record into Student table
CREATE PROCEDURE usp_InsertStudent
 AS

 BEGIN

	;WITH numberCte AS(
		SELECT Rn=1 UNION ALL
		SELECT Rn=Rn+1 FROM numberCte WHERE Rn<20)

	INSERT INTO [dbo].[tblStudent]

	SELECT 
		StudentName = 'StudentName' + CAST(Rn AS VARCHAR(20))
		,DOB = DATEADD(DAY,Rn,GETDATE())
	FROM numberCte
	
 END

Step 2: Create a data factory

After Log in to Azure portal(https://portal.azure.com), click Intelligence + Analytics followed by Data Factory.

Enter the below details in the New data factory blade

And click Create button.

Once the deployment is over, we can find our resource.

Next we can see the contents of the data factory created from the data factory page

Step 3: Create an Azure SQL linked service

Next we need to create a Azure SQL linked service that links the Azure SQL Database to the data factory. The step are defined below

  1. Launch the Data Factory Editor by clicking on the Author and deploy on the Data Factory blade for RNADF

  2. Next we need to click on the New data store on the command bar and among the several available options, let us choose Azure SQL Database.

  3. The JSON script for creating an Azure SQL linked service will appear in the editor

  4. Let us replace the placeholders accordingly so that the final JSON should look like

    		{
    	    		"name": "AzureSqlLinkedService",
    	    		"properties": 
    	    		{
    		        	"type": "AzureSqlDatabase",
    		        	"typeProperties": 
    		        	{
    		            		"connectionString": "Data Source=tcp:rnaazuretestdb.database.windows.net,1433;Initial Catalog=RNAAzureDB;Integrated Security=False;User ID=rna@rnaazuretestdb;Password=**********;Connect Timeout=30;Encrypt=True"
    		        	}
    	    		}
    		}
    		

    Finally, click Deploy button

    Our AzureSqlLinkedService appears under Linked Service on the left

Step 4: Create an output dataset

Next we need to create an output dataset. The step are defined below

  1. Let us click on More..., then New Dataset

  2. Choose Azure SQL and among the several available options

  3. The JSON script for creating an output dataset will appear in the editor

  4. Let us replace the placeholders accordingly so that the final JSON should look like

    		{
    		    "name": "InsertStudentDS",
    		    "properties": {
    		        "published": false,
    		        "type": "AzureSqlTable",
    		        "linkedServiceName": "AzureSqlLinkedService",
    		        "typeProperties": {
    		            "tableName": "tblStudent"
    		        },
    		        "availability": {
    		            "frequency": "Hour",
    		            "interval": 1
    		        }
    		    }
    		}
    		

    Kindly note that, the node linkedServiceName is the name of the Azure SQL linked service which we created at Step 3. The node tableName is the name of the table in the Azure SQL Database instance that linked service refers to which is tblStudent in our case, created at Step 1.

    Finally, click Deploy button

    Our InsertStudentDS appears under Datasets on the left

Step 5: Create a pipeline with SqlServerStoredProcedure activity

Next we need to create a Create a pipeline with SqlServerStoredProcedure activity. The step are defined below

  1. Let us click on More..., then New pipeline

  2. The JSON script for creating a new n pipeline will appear in the editor

  3. Let us replace the pipeline default template with the below JSON

    		{
        		"name": "DataMovementPipeline",
        		"properties": 
        		{
            		"activities": 
    	        			[
    			            	{
    				                "type": "SqlServerStoredProcedure",
    				                "typeProperties": 
    				                {
    				                    "storedProcedureName": "usp_InsertStudent"
    			                	},
    		                		"outputs": 
    		                		[
    				                    {
    				                        "name": "InsertStudentDS"
    				                    }
    		                		],
    				                "scheduler": 
    				                {
    				                    "frequency": "Hour",
    				                    "interval": 1
    				                },
    		                		"name": "SqlServerStoredProcedureActivity"
    		            		}
    			        			],
    		        "start": "2017-03-10T00:00:00Z",
    		        "end": "2017-03-10T05:00:00Z",
    		        "isPaused": false,        
    		        "pipelineMode": "Scheduled"
        		}
    		}
    		

    The explanation of the above pipeline activity JSON is provided below.

  4. Finally, click Deploy button. Our DataMovementPipeline appears under Pipelines on the left

Step 6: Monitor the pipeline

Now let's come back to the Data Factory blade and click Diagram

In the Diagram View we can find an overview of the pipelines and datasets.

Double click the dataset InsertStudentDS in the Diagram View and we can find out the various slices in which the activity will be executed between the start and end time and at which interval which was set at the time of pipeline creation with SqlServerStoredProcedure activity at Step 5.

We can find out that, one of the slices has been executed. Let us verify the result in the Azure SQL DB.

Reference

Introduction to Azure Data Factory Service

Conclusion

To summarize what we have learnt in this exercise :

  1. Create the DB objects like Table and Stored Procedure
  2. Create a data factory
  3. Create an Azure SQL linked service
  4. Create AzureSqlTable dataset
  5. Create a pipeline with SqlServerStoredProcedure activity
  6. Monitor the pipeline

Data Factory helps us to orchestrate and automates the movement and transformation of data. Hope this article will be helpful for others. Thanks for reading. Zipped file attached.

Page copy protected against web site content infringement by Copyscape

About the Author

Rajnilari2015
Full Name: Niladri Biswas (RNA Team)
Member Level: Platinum
Member Status: Member,Microsoft_MVP,MVP
Member Since: 3/17/2015 2:41:06 AM
Country: India
-- Thanks & Regards, RNA Team


Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)