Converting Web Page to PDF using ITEXTSharp

Raj.Trivedi
Posted by in ASP.NET category on for Intermediate level | Points: 250 | Views : 30896 red flag
Rating: 5 out of 5  
 2 vote(s)

Hello Team,

In this article we will check how to export the entire page to PDF.

The page containing data and as well as Images in it.

We will be using the ITextSharp DLL which is a Free and opensource dll available on sourceforge.net

Introduction


Writing this article i feel how can a normal discussion on a table can lead to valuable information,Recently my friend had an requirement where he wanted to convert the entire page in to PDF.

Just sometime back we have already seen how to Generate PDF using Itext Sharp.You can refer this article posted on dotnetfunda articles section.Here is the Link :-


In this article we will convert the entire webpage which contains a grid with images and text and a logo on the top to PDF.

First Lets Download the Itextsharp DLL from 

http://sourceforge.net/projects/itextsharp/

Once you download the Zip File just extract it to a folder in that you will get multiple zip files.

From the group of multiple zip files extract the Zip File which is named as itextsharp-dll-core check the Image



 



Once you extract the Zip you will get a Itextsharp.dll and an xml file.Check the image




Objective


Converting the webpage to PDF.



Using the code


  1. Now we have to create an empty website in Visual Studio 2010.
  2. Now we will have to add a Bin Folder in ASP.NET website.
  3. To Add a Bin Folder -> Right Click on the website -> Add ASP.NET Folder -> Bin.
  4. Now we will have to add the ItextSharp dll that we have downloaded in the earlier step.
  5. Now drag and drop a gridview on the page.
  6. Go to Code behind and import the following namespaces.
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.html.simpleparser;
using System.IO;

These namespaces of ITextsharp contains classes to generate PDF Documents from the text,HTML.

System.IO is for input/output operations for the creation of file and writing to it.

7.Now we will bind the data to the grid view.

// Table and Stored Procedure
CREATE TABLE [dbo].[ImageGallery](
	[ID] [int] IDENTITY(1,1) NOT NULL,
	[ImageName] [varchar](50) NULL,
	[Images] [varchar](max) NULL
) ON [PRIMARY]

ALTER proc [dbo].[GetImages]
as
begin 
select * from ImageGallery
end
HTML Mark up 
<%@ Page Language="C#" AutoEventWireup="true" CodeFile="AjaxAsyncFileUploader.aspx.cs" Inherits="AjaxAsyncFileUploader" EnableEventValidation="false" %>



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head runat="server">
    <title></title>
</head>
<body>
    <form id="form1" runat="server">
    <div>
    
    </div>
    <asp:ToolkitScriptManager ID="ToolkitScriptManager1" runat="server">
    </asp:ToolkitScriptManager>
    <br />
    <div align="center">
       <asp:gridview ID="gvdetails" runat="server" AutoGenerateColumns="False">
        <Columns>
            <asp:BoundField DataField="ID" HeaderText="ID" />
            <asp:BoundField DataField="ImageName" HeaderText="Image Name" />
            <asp:TemplateField HeaderText="Image">
                <ItemTemplate>
                    <asp:Image ID="img1" ImageUrl='<%#Eval("Images",GetUrl("{0}")) %>' Width="75" Height="50"  runat="server" />
                </ItemTemplate>
            </asp:TemplateField>
        </Columns>
        </asp:gridview>
    </div>
    <br />
    <div align="center"> <asp:Button ID="Button1" runat="server" onclick="Button1_Click" Text="Export to PDF" /></div>
    
    </form>
</body>
</html>	
// Code behind	
Using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Data;
using System.Data.SqlClient;
using System.Configuration;
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.html.simpleparser;
using System.IO;

public partial class AjaxAsyncFileUploader : System.Web.UI.Page
{
    string errdesc = "0";

    // Connecting tho the data Source
    SqlConnection sqlconn = new SqlConnection("Data Source=.;Initial Catalog=DotNetFunda;Persist Security Info=True;User ID=sa;Password=sqluser");
    protected void Page_Load(object sender, EventArgs e)
    {
        BindGridview(); // Calling the function on the page load event
    }


    // This function will bind the data to grid view.
    private void BindGridview()
    {
        try
        {
            sqlconn.Open();
            SqlCommand cmdins = new SqlCommand();
            cmdins.CommandType = CommandType.StoredProcedure;
            SqlDataAdapter da = new SqlDataAdapter();
            da.SelectCommand = new SqlCommand("GetImages", sqlconn); // Calling the Stored Procedure to get the data.
            DataSet ds = new DataSet();
            // getting the images from database to dataset
            da.Fill(ds, "ImageGallery");
            gvdetails.DataSource = ds; // assigning the data source to gridview.
            gvdetails.DataBind();
        }
        catch (Exception ex)
        {
            errdesc = ex.Message;
        }


    }

    public override void VerifyRenderingInServerForm(Control control)
    {
        // Verifies that the control is rendered //


    }

    // This function will get the absolute url because when we convert the webpage to PDF,the images needs to be downloaded and for that we require absolute URL

    protected string GetUrl(string imagepath)
    {
        string[] splits = Request.Url.AbsoluteUri.Split('/');
        if (splits.Length >= 2)
        {
            string url = splits[0] + "//";
            for (int i = 2; i < splits.Length - 1; i++)
            {
                url += splits[i];
                url += "/";
            }
            return url + imagepath;
        }
        return imagepath;
    }

    // Exporting to PDF
    protected void ExporttoPDF_Click(object sender, EventArgs e)
    {
        Response.ContentType = "application/pdf"; // Setting the application

        // Assigning the header
        Response.AddHeader("content-disposition", "attachment;filename=Image.pdf");
        Response.Cache.SetCacheability(HttpCacheability.NoCache);
        //Creating the object of the String Writer.
        StringWriter sw = new StringWriter();

        // Creating the object of HTML Writer and passing the object of String Writer to HTMl Text Writer
        HtmlTextWriter hw = new HtmlTextWriter(sw);
        this.Page.RenderControl(hw);

        // Now we what ever is rendered on the page we will give it to the object of the String reader so that we can 
        StringReader srdr = new StringReader(sw.ToString());

        // Creating the PDF DOCUMENT using the Document class from Itextsharp.pdf namespace
        Document pdfDoc = new Document(PageSize.A4, 15F, 15F, 75F, 0.2F);

        // HTML Worker allows us to parse the HTML Content to the PDF Document.To do this we will pass the object of Document class as a Parameter.
        HTMLWorker hparse = new HTMLWorker(pdfDoc);
        // Finally we write data to PDF and open the Document
        PdfWriter.GetInstance(pdfDoc, Response.OutputStream);
        pdfDoc.Open();

        // Now we will pass the entire content that is stored in String reader to HTML Worker object to achieve the data from to String to HTML and then to PDF.
        hparse.Parse(srdr);
        
        pdfDoc.Close();
        // Now finally we write to the PDF Document using the Response.Write method.
        Response.Write(pdfDoc);
        Response.End();
    }
}


There are 2 most important things to make a note of :-

  1. GetUrl(string imagepath) function
The use of this function is that we convert the relative URL to Absolute URL.Normally when we bind the images to gridview and when we check we get the path as /images/imagename.jpg.But while converting the HTML to PDF and if the HTML contains images then we require the complete path of the image as it gets to be downloaded from the HTML.So if you check the source you will get something as

http://localhost:49105/ConverttingWebPagetoPDF/Images/3d-desktop-wallpaper-640x480.jpg



So now it is possible to download the image when parsing the HTML to PDF.

2 .Exporting to PDF Click

  1. First we will set the Content Type to PDF.
  2. Then we will assign the name to the File.
  3. Then we will create an Object of String Writer.
  4. Now we will create the object of HTMLTextWriter and pass the object of String Writer to it.In this what ever comes in the string writer will automatically go to HTML Text Writer.
  5. Now we use the String reader class from system.io namespace create an Object and pass the object of string writer to the reader because only then the reader will be able to read what is captured in the string writer.
  6. Now we will create an instance of Document class obtained from ITEXTSHARP DLL.In order to create the PDF Document on the FLY.
  7. Now we create instance of HTMLWorker class ITEXTSHARP DLL from the namespace iTextSharp.text.html.simpleparser so that the rendered HTML can be worked upon and pass it to PDF Document.
  8. Now we write the PDF with the use of PDF Writer class from iTextSharp.text.pdf.
  9. Then we will pass the string reader object to HTML Worker object i.e hparse this will hold the data read from string reader in Hparse object.
  10. Then finally we write the content to PDF and close it

Output Page



PDF Generated




Conclusion


If you see closely in the Screen of the PDF Generated it has total 6 pages.This signifies that the entire page is converted as we had somewhere 17 images on the page.

So finally a small discussion with some refreshment lead me to writing this article.....

Just love DOTNET FUNDA for allowing me to put keys into pages :D

Reference


DLL downloaded from :- http://sourceforge.net/projects/itextsharp/

Page copy protected against web site content infringement by Copyscape

About the Author

Raj.Trivedi
Full Name: Raj Trivedi
Member Level:
Member Status: Member,MVP
Member Since: 6/16/2012 2:04:41 AM
Country: India
Regard's Raj.Trivedi "Sharing is Caring" Please mark as answer if your Query is resolved
http://www.dotnetfunda.com/profile/raj.trivedi.aspx
Raj Trivedi i.e. me started my career as Support Professional and then moved on the Software development eventually reached at these skills Software Development | Enthusiastic Blogger | Content Writer | Technical Writer | Problem Solver | Lecturer on Technology Subjects | Runnerup Award Winner on www.dotnetfunda.com and firm believer in Sharing as a way of Caring Yet this much achieved its still a long way to go and there is biggest dream lying to be one of the best entrepreneurs of India in Technology Department. The Dream has just started and i hope it follows. Highlights are mentioned in details in my profile at http://in.linkedin.com/pub/raj-trivedi/30/61/b30/

Login to vote for this post.

Comments or Responses

Login to post response

Comment using Facebook(Author doesn't get notification)