Load CSV and Text files into SQL Server 2016 using R and TSQL

Rajnilari2015
Posted by in R Language category on for Beginner level | Points: 250 | Views : 556 red flag
Rating: 5 out of 5  
 1 vote(s)

This article will help us to learn how to load CSV and Text files into SQL Server 2016 using R and TSQL.


 Download source code for Load CSV and Text files into SQL Server 2016 using R and TSQL

Introduction

R is an open source programming language and environment originally written by Ross Ihaka and Robert Gentleman and is an ideal fit for statistical analysis, graphics representation,reporting, predictive analytics and machine learning.

Since SQL Server 2016, Microsoft has incorporated R as a component into their database engine after acquiring the same with a vision of solving data analytics problem within

  1. On-Premise
  2. MS Azure
  3. Hybrid Cloud Platform

In the previous article, we have learnt how to execute R-Script into SQL Server 2016 Environment. This article will help us to learn how to load CSV and Text files into SQL Server 2016 using R and TSQL.

1. Load CSV file into SQL Server 2016 using R and TSQL

Let us first create a "Employee.csv" file as under

Once done, let us open SSMS and execute the below script

	EXECUTE sp_execute_external_script
			  @language = N'R'
			, @script = N'	setwd("d:/RTestFiles/CSV");
							empRecords <- read.csv("Employee.csv",header = TRUE);
							OutputDataSet <- as.data.frame(empRecords);'
			, @input_data_1 = N' '
		WITH RESULT SETS ((
							[EmpId] INT NOT NULL,		
							[FirstName] VARCHAR(20) NOT NULL, 
							[LastName] VARCHAR(20) NOT NULL, 
							[LandPhone] VARCHAR(20) NOT NULL,							
							[Address] VARCHAR(20) NOT NULL,
							[DOB] DATE NOT NULL,
							[Gender] CHAR(1) NOT NULL
						))

and the output is as under

Code explanation

sp_execute_external_script is a system stored procedure that is use to execute a script (R in this case) provided as argument at an external location.

setwd command is to set the working directory which is d:/RTestFiles/CSV in our case

read.csv() is R function which reads csv file [ Employee.csv in our case ]. We need to retain the header of the header of the CSV file and so header property is set to TRUE

Next we converted the csv file data to Data frame and pass it to OutputDataSet variable.

@input_data_1 specifies the name of the variable used to represent the query defined by @input_data_1.In case of R, the data type of the input variable is a data frame.

A more explanation on the sp_execute_external_script can be obtained from here.

The With Result Sets feature of Denali(SQL Server 2012) allow us to change the column names and data types of the result set that a stored procedure returns.

As a next step we will put the above TSQL script into a stored procedure as under

CREATE PROCEDURE usp_LoadCSVContentUsingR_TSQL
AS
	BEGIN
		EXECUTE sp_execute_external_script
			  @language = N'R'
			, @script = N'	setwd("d:/RTestFiles/CSV");
							empRecords <- read.csv("Employee.csv",header = TRUE);
							OutputDataSet <- as.data.frame(empRecords);'
			, @input_data_1 = N' '
		WITH RESULT SETS ((
							[EmpId] INT NOT NULL,		
							[FirstName] VARCHAR(20) NOT NULL, 
							[LastName] VARCHAR(20) NOT NULL, 
							[LandPhone] VARCHAR(20) NOT NULL,							
							[Address] VARCHAR(20) NOT NULL,
							[DOB] DATE NOT NULL,
							[Gender] CHAR(1) NOT NULL
						))
	END

As a final step, we will hold the result of the usp_LoadCSVContentUsingR_TSQL into @EmployeeRecords table variable and then project it as under

--Declare @EmployeeRecords table variable
DECLARE @EmployeeRecords TABLE 
(	[EmpId] INT NOT NULL,		
	[FirstName] VARCHAR(20) NOT NULL, 
	[LastName] VARCHAR(20) NOT NULL, 
	[LandPhone] VARCHAR(20) NOT NULL,							
	[Address] VARCHAR(20) NOT NULL,
	[DOB] DATE NOT NULL,
	[Gender] CHAR(1) NOT NULL
)

-- Insert records into the @EmployeeRecords table variable from usp_LoadCSVContentUsingR_TSQL
INSERT INTO @EmployeeRecords
EXECUTE usp_LoadCSVContentUsingR_TSQL

-- Project the Employee Records
SELECT * FROM @EmployeeRecords

We can perform the same using R-Studio as under

read.csv('d:/RTestFiles/CSV/Employee.csv', sep = ',', quote="\"", check.names=F)

2. Load TEXT file into SQL Server 2016 using R and TSQL

The file content is same as the CSV content except that it is "TAB" separated.

Once done, let us open SSMS and execute the below script

EXECUTE sp_execute_external_script
			  @language = N'R'
			, @script = N'	setwd("d:/RTestFiles/Textfile");
							empRecords <- read.table("Employee.txt", sep = "\t",header = TRUE, fileEncoding = "UTF-16LE");
							OutputDataSet <- as.data.frame(empRecords);'
			, @input_data_1 = N' '
		WITH RESULT SETS ((							
							[EmpId] INT NOT NULL,		
							[FirstName]  VARCHAR(20) NOT NULL, 
							[LastName]  VARCHAR(20) NOT NULL, 
							[LandPhone] VARCHAR(20) NOT NULL,							
							[Address] VARCHAR(20) NOT NULL,
							[DOB] VARCHAR(20) NOT NULL,
							[Gender] VARCHAR(20) NOT NULL
						))

The result is as under

The command using R-Studio will be

read.table('d:/RTestFiles/Textfile/Employee.txt', sep = '\t',header = TRUE, fileEncoding = "UTF-16LE")

N.B. : We can also read an online text file using R script and TQL as described below

EXECUTE sp_execute_external_script
			  @language = N'R'
			, @script = N'	url <- "https://onlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission.txt";
							leukemiaRemissionRecords <- read.table(url, sep = "\t",header = TRUE, fileEncoding = "UTF-16LE");
							OutputDataSet <- as.data.frame(leukemiaRemissionRecords);'
			, @input_data_1 = N' '
		WITH RESULT SETS ((							
							[REMISS] INT NOT NULL,		
							[CELL]  VARCHAR(20) NOT NULL, 
							[SMEAR]  VARCHAR(20) NOT NULL, 
							[INFIL] VARCHAR(20) NOT NULL,							
							[INFIL] VARCHAR(20) NOT NULL,
							[BLAST] VARCHAR(20) NOT NULL,
							[TEMP] VARCHAR(20) NOT NULL
						))

The result is as under

The command using R-Studio will be

>url <- "https://onlinecourses.science.psu.edu/stat501/sites/onlinecourses.science.psu.edu.stat501/files/data/leukemia_remission.txt";
>leukemiaRemissionRecords <- read.table(url, sep = "\t",header = TRUE, fileEncoding = "UTF-16LE");
>print(leukemiaRemissionRecords);

N.B.: Suppose we have a file with special character say [ ~ ].In this case we have to read the file as below

read.table('d:/RTestFiles/Textfile/EmployeeSpecialCharacter.txt', sep = '~',header = TRUE)

Reference

Data Import

Conclusion

In this article we have learnt how to Load CSV and Text file into SQL Server DB using R and TSQL. Hope this helps. Thanks for reading. Zipped file attached.

Page copy protected against web site content infringement by Copyscape

About the Author

Rajnilari2015
Full Name: Niladri Biswas (RNA Team)
Member Level: Platinum
Member Status: Member,Microsoft_MVP,MVP
Member Since: 3/17/2015 2:41:06 AM
Country: India
-- Thanks & Regards, RNA Team


Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)