Click or drag to resize

HTML to PDF Converter Element

The HTML to PDF Converter integrated in Pdf Creator SDK is implemented by the HtmlToPdfElement graphic element. It offers the possibility to specify the position and the size of the PDF content rendered from HTML and the possibility to add many HTML to PDF conversions to same document.

A very useful feature is the possibility to know the size of the rendered content in each page when the rendered content spans on many pages. The information about the last rendered page can be taken from the AddElementResult object returned after adding the element to a renderer like a page or template.

The HtmlToPdfElement offer many constructors that basically call the following two constructors with more or less default values for converting a URL or a HTML string to PDF:

public HtmlToPdfElement(float x, float y, float width, float height, 
    string urlToConvert, int htmlViewerWidth, int htmlViewerHeight);

   public HtmlToPdfElement(float x, float y, float width, float height, 
    string htmlStringToConvert, string htmlStringBaseURL,
    int htmlViewerWidth, int htmlViewerHeight);

The various constructor parameters are explined below.

The first constructor creates a URL to PDF converter element at the specified x and y coordinates with the specified width and height. The virtual browser width and height in pixels are specified by the htmlViewerWidth and htmlViewerHeight parameters.

  • x - The x position in points where the rendered content will be placed.

  • y - The y position in points where the rendered content will be placed.

  • width - The destination width in points for the rendered content. If the specified with is negative, the destination width will be given by the available width in page or template.

  • height - The destination height in points for the rendered content. If the specified height is negative, the destination height will be auto determined so all the content can be rendered. Please note that the specified height is the effective height that will be rendered in the PDF document and does not include for example the empty spaces introduced by custom or automatic page breaks.

  • urlToConvert - The URL to convert to PDF.

  • htmlViewerWidth - The virtual browser width in pixels. The default value is 1024 pixels. The effect of this parameter is similar with viewing the HTML page in a browser window with the specified width. When this parameter is negative, the converter will try to auto-determine the HTML page width from the HTML body element width.

  • htmlViewerHeight - The virtual browser height in pixels. The default value is 0 which means the height will be auto-determined. The effect of this parameter is similar with viewing the HTML page in a browser window with the specified width and height. When this parameter is negative, the converter will try to auto-determine the HTML page height from the HTML body element height.

The second constructor creates a HTML string to PDF converter element at the specified x and y coordinates with the specified width and height. The virtual browser width and height in pixels is specified by the htmlViewerWidth and htmlViewerHeight parameters.

  • htmlStringToConvert - The HTML string converted to PDF.

  • htmlStringBaseURL - The full URL of the page from where this string was taken used to resolve the images and CSS files referenced by a relative URL in the HTML string. This parameter is optional and the default value is NULL. When this parameter is NULL no base URL will be used.

Below is a sample code extracted from the HtmlToPdf sample to demonstrate how easily is to add HTML to PDF conversion to a PDF page and in the document header and footer using the HtmlToPdfElement:

protected void btnConvert_Click(object sender, EventArgs e)
    {
        //create a PDF document
        Document document = new Document();

        //optional settings for the PDF document like margins, compression level,
        //security options, viewer preferences, document information, etc
        document.CompressionLevel = CompressionLevel.NormalCompression;
        document.Margins = new Margins(10, 10, 0, 0);
        document.Security.CanPrint = true;
        document.Security.UserPassword = "";
        document.DocumentInformation.Author = "HTML to PDF Converter";
        document.ViewerPreferences.HideToolbar = false;

        //Add a first page to the document. The next pages will inherit the settings from this page 
        PdfPage page = document.Pages.AddNewPage(PageSize.A4, new Margins(10,10,0, 0), PageOrientation.Portrait);

        // the code below can be used to create a page with default settings A4, document margins inherited, portrait orientation
        //PdfPage page = document.Pages.AddNewPage();

        // add a font to the document that can be used for the texts elements 
        PdfFont font = document.Fonts.Add(new System.Drawing.Font(new System.Drawing.FontFamily("Times New Roman"), 10, 
            System.Drawing.GraphicsUnit.Point));

        // add header and footer before renderng the content
        if (cbAddHeader.Checked)
            AddHtmlHeader(document);
        if (cbAddFooter.Checked)
            AddHtmlFooter(document, font);

        // the result of adding an element to a PDF page
        AddElementResult addResult;

        // Get the specified location and size of the rendered content
        // A negative value for width and height means to auto determine
        // The auto determined width is the available width in the PDF page
        // and the auto determined height is the height necessary to render all the content
        float xLocation = float.Parse(textBoxXLocation.Text.Trim());
        float yLocation = float.Parse(textBoxYLocation.Text.Trim());
        float width = float.Parse(textBoxWidth.Text.Trim());
        float height = float.Parse(textBoxHeight.Text.Trim());

        if (radioConvertToSelectablePDF.Checked)
        {
            // convert HTML to PDF
            HtmlToPdfElement htmlToPdfElement;

            if (radioConvertURL.Checked)
            {
                // convert a URL to PDF
                string urlToConvert = textBoxWebPageURL.Text.Trim();

                htmlToPdfElement = new HtmlToPdfElement(xLocation, yLocation, width, height, urlToConvert);
            }
            else
            {
                // convert a HTML string to PDF
                string htmlStringToConvert = textBoxHTMLCode.Text;
                string baseURL = textBoxBaseURL.Text.Trim();

                htmlToPdfElement = new HtmlToPdfElement(xLocation, yLocation, width, height, htmlStringToConvert, baseURL);
            }

            //optional settings for the HTML to PDF converter
            htmlToPdfElement.FitWidth = cbFitWidth.Checked;
            htmlToPdfElement.EmbedFonts = cbEmbedFonts.Checked;
            htmlToPdfElement.LiveUrlsEnabled = cbLiveLinks.Checked;
            htmlToPdfElement.RightToLeftEnabled = cbRTL.Checked;
            htmlToPdfElement.ScriptsEnabled = cbClientScripts.Checked;
            htmlToPdfElement.ActiveXEnabled = cbActiveXEnabled.Checked;

            // add theHTML to PDF converter element to page
            addResult = page.AddElement(htmlToPdfElement);
        }
        else
        {
            HtmlToImageElement htmlToImageElement;

            // convert HTML to image and add image to PDF document
            if (radioConvertURL.Checked)
            {
                // convert a URL to PDF
                string urlToConvert = textBoxWebPageURL.Text.Trim();

                htmlToImageElement = new HtmlToImageElement(xLocation, yLocation, width, height, urlToConvert);
            }
            else
            {
                // convert a HTML string to PDF
                string htmlStringToConvert = textBoxHTMLCode.Text;
                string baseURL = textBoxBaseURL.Text.Trim();

                htmlToImageElement = new HtmlToImageElement(xLocation, yLocation, width, height, htmlStringToConvert, baseURL);
            }

            //optional settings for the HTML to PDF converter
            htmlToImageElement.FitWidth = cbFitWidth.Checked;
            htmlToImageElement.ScriptsEnabled = cbClientScripts.Checked;
            htmlToImageElement.ActiveXEnabled = cbActiveXEnabled.Checked;

            addResult = page.AddElement(htmlToImageElement);
        }

        if (cbAdditionalContent.Checked)
        {
            // The code below can be used add some other elements right under the conversion result 
            // like texts or another HTML to PDF conversion

            // add a text element right under the HTML to PDF document
            PdfPage endPage = document.Pages[addResult.EndPageIndex];
            TextElement nextTextElement = new TextElement(0, addResult.EndPageBounds.Bottom + 10, "Below there is another HTML to PDF Element", font);
            nextTextElement.ForeColor = System.Drawing.Color.Green;
            addResult = endPage.AddElement(nextTextElement);

            // add another HTML to PDF converter element right under the text element
            endPage = document.Pages[addResult.EndPageIndex];
            HtmlToPdfElement nextHtmlToPdfElement = new HtmlToPdfElement(0, addResult.EndPageBounds.Bottom + 10, "http://www.google.com");
            addResult = endPage.AddElement(nextHtmlToPdfElement);
        }

        // send the generated PDF document to client browser
        document.Save(Response, false, "HtmlConvert.pdf");
    }

    private void AddHtmlHeader(Document document)
    {
        string thisPageURL = HttpContext.Current.Request.Url.AbsoluteUri;
        string headerAndFooterHtmlUrl = thisPageURL.Substring(0, thisPageURL.LastIndexOf('/')) + "/HeaderAndFooterHtml.htm";

        //create a template to be added in the header and footer
        document.HeaderTemplate = document.AddTemplate(document.Pages[0].ClientRectangle.Width, 60);
        // create a HTML to PDF converter element to be added to the header template
        HtmlToPdfElement headerHtmlToPdf = new HtmlToPdfElement(headerAndFooterHtmlUrl);
        document.HeaderTemplate.AddElement(headerHtmlToPdf);
    }

    private void AddHtmlFooter(Document document, PdfFont footerPageNumberFont)
    {
        string thisPageURL = HttpContext.Current.Request.Url.AbsoluteUri;
        string headerAndFooterHtmlUrl = thisPageURL.Substring(0, thisPageURL.LastIndexOf('/')) + "/HeaderAndFooterHtml.htm";

        //create a template to be added in the header and footer
        document.FooterTemplate = document.AddTemplate(document.Pages[0].ClientRectangle.Width, 60);
        // create a HTML to PDF converter element to be added to the header template
        HtmlToPdfElement footerHtmlToPdf = new HtmlToPdfElement(headerAndFooterHtmlUrl);
        document.FooterTemplate.AddElement(footerHtmlToPdf);

        // add page number to the footer
        TextElement pageNumberText = new TextElement(document.FooterTemplate.ClientRectangle.Width - 100, 30,
                            "This is page &p; of &P; pages", footerPageNumberFont);
        document.FooterTemplate.AddElement(pageNumberText);
    }
Page Breaks, Keep Together in HtmlToPdfElement

The converter supports the following CSS styles to control the page breaks: page-break-before: always, page-break-after: always and page-break-inside: avoid. For example, with the page-break-after: always style applied to a HTML element (image, text, etc), you instruct the converter to insert a page break right after that element is rendered.

By default the converter always tries to avoid breaking the text between PDF pages. You can disable this behavior using the AvoidTextBreak property. Also you can enable the converter to avoid breaking the images between PDF pages using the AvoidImageBreak property. By default this property is false.

An advanced and very useful feature when creating PDF reports is the Keep Together feature which can be implemented with the page-break-inside: avoid style. This instructs the converter to avoid breaking the content of a group of HTML elements you want to keep together on the same page. If you think you can apply this style to a table, a table row or a div element you can easily understand the utility of this feature.

Below is an example of using the page-break-inside: avoid style. The table contains a large number of rows, each row containing an image in the left and a text in the right and we don't want a row to span on two pages.

XML
<table>
        <tr style="page-break-inside: avoid">
            <td>
                <img width="100" height="100" src="img1.jpg">
            </td>
            <td>
                My text 1
            </td>
        </tr>

        <tr style="page-break-inside: avoid">
            <td>
                <img width="100" height="100" src="img2.jpg">
            </td>
            <td>
                My text 2
            </td>
        </tr>
</table>

The converter can convert any HTTP link from the HTML document into a link in the PDF document. This works on links containing text, image or any other combination supported by the HTML code. This is the default behavior of the converter. If you don't want to get active links in the generated PDF document you can set the LiveUrlsEnabled property to False.

Enable/Disable Client Scripts and ActiveX/Flash from HTML Page

When you use ExpertPdf PdfCreator, you can specify if the javascripts or ActiveX or plug-ins are allowed to run in the web page that is converted to PDF. The JavaScript code is enabled by default. ActiveX controls (IE rendering engine only) are disabled by default. Plug-ins (WebKit rendering engine only) are enabled by default.

The properties from HtmlToPdfElement class which allow you to manage the client scripts and addins when converting to PDF are:

public bool ScriptsEnabled { get; set; }
public bool ActiveXEnabled { get; set; }
public bool PluginsEnabled { get; set; }
Server Authentication in HtmlToPdfElement

ExpertPdf PdfCreator offers support for several types of server authentication when working with HtmlToPdfElement objects. ExpertPdf supports Integrated Windows Authentication if the NTML provider is used. The current user credentials are used by the converter to authenticate automatically.

The converter can handle IIS authentication like Basic HTTP Authentication. This type of authentication is disabled by default. To enable authentication you have to set the AuthenticationOptions property of the HtmlToPdfElement object.

Below you can find sample code for setting the username and password for authentication when converting HTML to PDF using an HtmlToPdfElement object:

htmlToPdfElement.AuthenticationOptions.Username = username;
htmlToPdfElement.AuthenticationOptions.Password = password;
Bookmarks in HtmlToPdfElement

The converter can produce bookmarks automatically in the generated PDF document for a list of specified HTML tags. The bookmarking is controlled by the PdfBookmarkOptions property and is enabled only when a list of HTML tag names is specified by the TagNames property.

For example, to enable bookmarking of the H1 and H2 tags you can use the following line of code:

htmlToPdfElement.PdfBookmarkOptions.TagNames = new string[] { "H1", "H2" };

The tags to be bookmarked can be further filtered by CSS class name using the ClassNameFilter property. For example, to filter only the H1 and H2 tags having the CSS class bookmark, the following line of code can be added to the previous one:

htmlToPdfElement.PdfBookmarkOptions.ClassNameFilter = "bookmark";

The ClassNameFilter property is case sensitive and the string value set for this property must textually match the class attribute of the HTML tag to be bookmarked.

The WebKit rendering engine supports an even more advanced way of identifying the elements that will generate the bookmarks. That is done using the HtmlElementSelectors property. This property specifies the CSS selectors of the HTML elements to be bookmarked. For example, the selector for all the H1 elements is "H1", the selector for all the elements with the CSS class name 'myclass' is "*.myclass" and the selector for the elements with the id 'myid' is "*#myid".

The converter automatically converts the HTML links with anchors found in the HTML document to internal links in PDF. This features can be used to easily create table of contents in the generated PDF document.

A HTML link with anchor consists in two HTML elements: a link defined with by a <a href="#target">Internal Link</a> tag and the target of the link defined by a <a name="target">Link Target</a> tag. When the PdfCreator finds this construction it automatically generates an internal link in PDF from "Internal Link" to "Link Target".

The generation of internal links can be disabled using htmlToPdfElement.InternalLinksEnabled = false;.

There are a few things to ensure in order to get the internal links correctly generated in the PDF document. When converting an URL to PDF the URL must be fully qualified. For example if a website MyWebsite has a Default.aspx page with internal links which is automatically served by the web server when the address http://MyWebsite is typed in the web browser address bar, then converting directly the http://MyWebsite url might not produce the correct internal links because the converter is unable to determine the web page automatically served by the web server. Instead, when converting http://MyWebsite/Default.aspx the internal links will always be correctly generated. The HtmlToPdfElement constructors have an additional parameter called InternalLinksDocUrl which allows you to specify the fully qualified URL referenced by the internal links before calling the converter method.

When converting a HTML string to PDF it is recommended to always pass the baseUrl and internalLinksDocUrl parameters to the HtmlToPdfElement constructors to instantiate objects to convert the HTML string to PDF.

JPEG Compression of Images in PDF

The converter automatically compresses the images generated in PDF using the JPEG compression algorithm to highly reduce the size of the generated PDF document. The JPEG compression reduces the quality of the images. When the JPEG compression level is increased the quality of the images in the PDF decreases.

The JpegCompressionLevel property defines the current level used for JPEG compression on a scale from 0 to 100. When the compression level is 0 the compression is the worst and the image quality is the best. The default JPEG compression level used by the converter is 10 which offers a good balance between the images quality and the size of the generated PDF document.

If you want to obtain the best image quality it is possible to completely disable the JPEG compression of the images by setting the JpegCompressionEnabled to False.

Retrieve HTML Elements Mapping to PDF from HtmlToPdfElement

This is a very powerful feature of the converter which allows you to obtain the position in the generated PDF document for any HTML element. Knowing the position in the generated PDF document of any element from the HTML document allows you to create bookmarks for elements from the HTML document, create internal links between HTML elements, place texts or images over the HTML elements or assign a digital signature to a certain element from HTML.

This feature can be accessed using the HtmlElementsMappingOptions property of the HtmlToPdfElement object. This property allows you to define a list with HTML IDs of the HTML elements to for which you want to retrieve position using the HtmlElementIds property or a list with the HTML tag names of the HTML elements for which you want to retrieve position using the HtmlTagNames property.

The WebKit rendering engine supports an even more advanced way of specifying the elements for which you want to retrieve position. That is done using the HtmlElementSelectors property. This property specifies the CSS selectors of the HTML elements. For example, the selector for all the H1 elements is "H1", the selector for all the elements with the CSS class name 'myclass' is "*.myclass" and the selector for the elements with the id 'myid' is "*#myid".

The HtmlElementsMappingOptions property must be set before calling the convert method.

The HTML elements mapping is returned in the HtmlElementsMappingResult property. The HtmlElementsMappingResult result is a collection of HtmlElementMapping objects which offers the PDF page index where the element was mapped by the converter and the rectangle where the element was rendered inside that page, the element HTML ID, the element tag name, the element text and the element outer HTML code.

In the code sample below, all the H1 and IMG elements and the elements with the ID MyID1 and MyID2 will be highlighted with a red rectangle in the generated PDF:

// create the HtmlToPdfElement htmlToPdfElement = new HtmlToPdfElement("http://www.html-to-pdf.net");
// define the list with the HTML tags of the elements for which you want to retrieve position
htmlToPdfElement.HtmlElementsMappingOptions.HtmlTagNames = new string[] { "IMG", "H1" };

// define the list with the HTML IDs of the elements for which you want to retrieve position
htmlToPdfElement.HtmlElementsMappingOptions.HtmlElementIds = new string[] { "MyID1", "MyID2" };

// add the element to a PDF page
page.AddElement(htmlToPdfElement);

// iterate over the HTML elements mappings and draw a red rectangle around the element in PDF
foreach (HtmlElementMapping elementMapping in htmlToPdfElement.HtmlElementsMappingOptions.HtmlElementsMappingResult)
{
    // iterate over the positions of the HTML element in PDF because a HTML element
    // can span on many PDF pages
    foreach (HtmlElementPdfRectangle pdfRectangle in elementMapping.PdfRectangles)
    {
        // get the PDF page where the HTML element was rendered
        PdfPage elementPdfPage = document.Pages[pdfRectangle.PageIndex];
        RectangleF elementPdfRectangle = pdfRectangle.Rectangle;

        // get the rectangle inside the PDF page where the element was rendered
        RectangleElement elementHighlightRectangle = new RectangleElement(elementPdfRectangle.X, elementPdfRectangle.Y,
            elementPdfRectangle.Width, elementPdfRectangle.Height);
        elementHighlightRectangle.ForeColor = Color.Red;

        elementPdfPage.AddElement(elementHighlightRectangle);
    }
}