Read a Excel blob file using Excel Data Reader in Azure

Rajnilari2015
Posted by in Azure category on for Beginner level | Points: 250 | Views : 32295 red flag

Azure Blob storage is a service for storing large amounts of unstructured data. Excel Data Reader is a lightweight and fast library written in C# for reading Microsoft Excel files. In this article we will look how we can read excel blob using Excel Data Reader.


 Download source code for Read a Excel blob file using Excel Data Reader in Azure

Introduction

Azure Blob storage is a service for storing large amounts of unstructured data. Excel Data Reader is a lightweight and fast library written in C# for reading Microsoft Excel files. In this article we will look how we can read excel blob using Excel Data Reader.

Step 1: Create a Source Blob Container in the Azure Portal

Open the Azure portal(https://portal.azure.com), and then choose Storage Account say Containers

Create a source container say sourcecontainer

It will appear as

Step 2: Upload a file in the Source Container

Create an excel file (say test.xlsx) whose content is as under

Now let us upload the file in sourcecontainer.

Step 3: Read Excel blob file using Excel Data Reader programatically

Fire up a console application and add the below Nuget packages

Install-Package WindowsAzure.Storage

Install-Package Microsoft.WindowsAzure.ConfigurationManager

Install-Package ExcelDataReader

Next write the below function

/// <summary>
/// GetExcelBlobData
/// Gets the Excel file Blob data and returns a dataset
/// </summary>
/// <param name="filename"></param>
/// <param name="connectionString"></param>
/// <param name="containerName"></param>
/// <returns></returns>
private static DataSet GetExcelBlobData(string filename, string connectionString, string containerName)
{
    // Retrieve storage account from connection string.
    CloudStorageAccount storageAccount = CloudStorageAccount.Parse(connectionString);

    // Create the blob client.
    CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

    // Retrieve reference to a previously created container.
    CloudBlobContainer container = blobClient.GetContainerReference(containerName);

    // Retrieve reference to a blob named "test.xlsx"
    CloudBlockBlob blockBlobReference = container.GetBlockBlobReference(filename);

    DataSet ds;
    using (var memoryStream = new MemoryStream())
    {
        //downloads blob's content to a stream
        blockBlobReference.DownloadToStream(memoryStream);


        /*                 
            used open source Excel Data Reader - Read Excel files in .NET(http://exceldatareader.codeplex.com/)
            Nuget: Install-Package ExcelDataReader (https://www.nuget.org/packages/ExcelDataReader/)
         
        */

        var excelReader = ExcelReaderFactory.CreateOpenXmlReader(memoryStream);
        ds = excelReader.AsDataSet();
        excelReader.Close();
    }

    return ds;
}

The above function gets the blob's reference and using the DownloadToStream function, it downloads the blob's content to a memory stream. This stream of data is then passed to the ExcelReaderFactory.CreateOpenXmlReader and is finally returned as data set.

Now from inside the main function, let us write the below code

static void Main(string[] args)
{           
    string connectionString = CloudConfigurationManager.GetSetting("StorageConnectionString"); //blob connection string
    string sourceContainerName = ConfigurationManager.AppSettings["sourcecontainerName"]; //source blob container name
    string sourceBlobFileName = "test.xlsx"; //source blob name

    var excelData = GetExcelBlobData(sourceBlobFileName, connectionString, sourceContainerName);            
}

The program invokes the GetExcelBlobData function to read the excel blob content and returns a dataset. At this point the Data set Visualizer reveals.

As can be figure out that, though the excel content has been read but it could not detected the header's properly. Inorder to do so, let us write the below function.

/// <summary>
/// ExtractExcelData
/// Extracts the data and leave the header
/// </summary>
/// <param name="excelData"></param>
/// <returns></returns>
private static DataTable ExtractExcelData(DataSet excelData)
{
    var dt = new DataTable();
    dt.Columns.Add("Name", typeof(string));
    dt.Columns.Add("Age", typeof(string));
    dt.Columns.Add("Gender", typeof(string));
    dt.Columns.Add("Country", typeof(string));

    excelData
       .Tables[0]
       .AsEnumerable()
       .Skip(1)
       .ToList()
       .ForEach(dr => dt.Rows.Add(dr[0], dr[1], dr[2], dr[3]));
    return dt;
}

The program leaves the first row of the table and reads back all other rows into a new data table as revealed by the Data set Visualizer tool.

The app.config file looks as under

  <appSettings>    
    <add key="StorageConnectionString" value="DefaultEndpointsProtocol=https;AccountName=testaccount;AccountKey=CRd****==;EndpointSuffix=core.windows.net" />   
    <add key="sourcecontainerName" value="sourcecontainer" />   
  </appSettings>

The complete program is as under

using Excel;
using Microsoft.Azure;
using Microsoft.WindowsAzure.Storage;
using Microsoft.WindowsAzure.Storage.Blob;
using System.Configuration;
using System.Data;
using System.IO;
using System.Linq;

namespace UploadFilesToBlob
{
    class Program
    {
        static void Main(string[] args)
        {           
            string connectionString = CloudConfigurationManager.GetSetting("StorageConnectionString"); //blob connection string
            string sourceContainerName = ConfigurationManager.AppSettings["sourcecontainerName"]; //source blob container name           
            string sourceBlobFileName = "test.xlsx"; //source blob name

            var excelData = GetExcelBlobData(sourceBlobFileName, connectionString, sourceContainerName);
            var refinedExcelData = ExtractExcelData(excelData);            
        }        

        /// <summary>
        /// GetExcelBlobData
        /// Gets the Excel file Blob data and returns a dataset
        /// </summary>
        /// <param name="filename"></param>
        /// <param name="connectionString"></param>
        /// <param name="containerName"></param>
        /// <returns></returns>
        private static DataSet GetExcelBlobData(string filename, string connectionString, string containerName)
        {
            // Retrieve storage account from connection string.
            CloudStorageAccount storageAccount = CloudStorageAccount.Parse(connectionString);

            // Create the blob client.
            CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

            // Retrieve reference to a previously created container.
            CloudBlobContainer container = blobClient.GetContainerReference(containerName);

            // Retrieve reference to a blob named "test.xlsx"
            CloudBlockBlob blockBlobReference = container.GetBlockBlobReference(filename);

            DataSet ds;
            using (var memoryStream = new MemoryStream())
            {
                //downloads blob's content to a stream
                blockBlobReference.DownloadToStream(memoryStream);


                /*                 
                    used open source Excel Data Reader - Read Excel files in .NET(http://exceldatareader.codeplex.com/)
                    Nuget: Install-Package ExcelDataReader (https://www.nuget.org/packages/ExcelDataReader/)
                 
                */

                var excelReader = ExcelReaderFactory.CreateOpenXmlReader(memoryStream);
                ds = excelReader.AsDataSet();
                excelReader.Close();
            }

            return ds;
        }

		/// <summary>
        /// ExtractExcelData
        /// Extracts the data and leave the header
        /// </summary>
        /// <param name="excelData"></param>
        /// <returns></returns>
        private static DataTable ExtractExcelData(DataSet excelData)
        {
            var dt = new DataTable();
            dt.Columns.Add("Name", typeof(string));
            dt.Columns.Add("Age", typeof(string));
            dt.Columns.Add("Gender", typeof(string));
            dt.Columns.Add("Country", typeof(string));

            excelData
               .Tables[0]
               .AsEnumerable()
               .Skip(1)
               .ToList()
               .ForEach(dr => dt.Rows.Add(dr[0], dr[1], dr[2], dr[3]));
            return dt;
        }
             
    }
}

Conclusion

In this article we looked into how we can read a Excel blob file using Excel Data Reader. Hope this will be helpful. Thanks for reading. Zipped file attached.

Page copy protected against web site content infringement by Copyscape

About the Author

Rajnilari2015
Full Name: Niladri Biswas (RNA Team)
Member Level: Platinum
Member Status: Member,Microsoft_MVP,MVP
Member Since: 3/17/2015 2:41:06 AM
Country: India
-- Thanks & Regards, RNA Team


Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)