HTML to PDF Converter Element

The integrated HTML to PDF Converter is implemented by the HtmlToPdfElement graphic element. It offers the possibility to specify the position and the size of the PDF content rendered from HTML and the possibility to add many HTML to PDF conversions to same document.

A very useful feature is the possibility to know the size of the rendered content in each page when the rendered content spans on many pages. The information about the last rendered page can be taken from the AddElementResult object returned after adding the element to a renderer like a page or template.

The HtmlToPdfElement offer many constructors that basically call the following two constructors with more or less default values for converting a URL or a HTML string to PDF:

Copy

public HtmlToPdfElement(float x, float y, float width, float height, 
    string urlToConvert, int htmlViewerWidth, int htmlViewerHeight);

   public HtmlToPdfElement(float x, float y, float width, float height, 
    string htmlStringToConvert, string htmlStringBaseURL,
    int htmlViewerWidth, int htmlViewerHeight);

Public Sub New(x As Single, y As Single, width As Single, height As Single, urlToConvert As String, htmlViewerWidth As Integer, _
    htmlViewerHeight As Integer)

Public Sub New(x As Single, y As Single, width As Single, height As Single, htmlStringToConvert As String, htmlStringBaseURL As String, _
    htmlViewerWidth As Integer, htmlViewerHeight As Integer)

The various constructor parameters are explined below.

The first constructor creates a URL to PDF converter element at the specified x and y coordinates with the specified width and height. The virtual browser width and height in pixels are specified by the htmlViewerWidth and htmlViewerHeight parameters.

x - The x position in points where the rendered content will be placed.
y - The y position in points where the rendered content will be placed.
width - The destination width in points for the rendered content. If the specified with is negative, the destination width will be given by the available width in page or template.
height - The destination height in points for the rendered content. If the specified height is negative, the destination height will be auto determined so all the content can be rendered. Please note that the specified height is the effective height that will be rendered in the PDF document and does not include for example the empty spaces introduced by custom or automatic page breaks.
urlToConvert - The URL to convert to PDF.
htmlViewerWidth - The virtual browser width in pixels. The default value is 1024 pixels. The effect of this parameter is similar with viewing the HTML page in a browser window with the specified width. When this parameter is negative, the converter will try to auto-determine the HTML page width from the HTML body element width.
htmlViewerHeight - The virtual browser height in pixels. The default value is 0 which means the height will be auto-determined. The effect of this parameter is similar with viewing the HTML page in a browser window with the specified width and height. When this parameter is negative, the converter will try to auto-determine the HTML page height from the HTML body element height.

The second constructor creates a HTML string to PDF converter element at the specified x and y coordinates with the specified width and height. The virtual browser width and height in pixels is specified by the htmlViewerWidth and htmlViewerHeight parameters.

htmlStringToConvert - The HTML string converted to PDF.
htmlStringBaseURL - The full URL of the page from where this string was taken used to resolve the images and CSS files referenced by a relative URL in the HTML string. This parameter is optional and the default value is NULL. When this parameter is NULL no base URL will be used.

Page Breaks, Keep Together in HtmlToPdfElement

The converter supports the following CSS styles to control the page breaks: page-break-before: always, page-break-after: always and page-break-inside: avoid. For example, with the page-break-after: always style applied to a HTML element (image, text, etc), you instruct the converter to insert a page break right after that element is rendered.

By default the converter always tries to avoid breaking the text between PDF pages. You can disable this behavior using the AvoidTextBreak property. Also you can enable the converter to avoid breaking the images between PDF pages using the AvoidImageBreak property. By default this property is false.

An advanced and very useful feature when creating PDF reports is the Keep Together feature which can be implemented with the page-break-inside: avoid style. This instructs the converter to avoid breaking the content of a group of HTML elements you want to keep together on the same page. If you think you can apply this style to a table, a table row or a div element you can easily understand the utility of this feature.

Below is an example of using the page-break-inside: avoid style. The table contains a large number of rows, each row containing an image in the left and a text in the right and we don't want a row to span on two pages.

XML

Copy

<table>
        <tr style="page-break-inside: avoid">
            <td>
                <img width="100" height="100" src="img1.jpg">
            </td>
            <td>
                My text 1
            </td>
        </tr>

        <tr style="page-break-inside: avoid">
            <td>
                <img width="100" height="100" src="img2.jpg">
            </td>
            <td>
                My text 2
            </td>
        </tr>
</table>

Live HTTP Links in HtmlToPdfElement

The converter can convert any HTTP link from the HTML document into a link in the PDF document. This works on links containing text, image or any other combination supported by the HTML code. This is the default behavior of the converter. If you don't want to get active links in the generated PDF document you can set the LiveUrlsEnabled property to False.

Enable/Disable Client Scripts and ActiveX/Flash from HTML Page

When you use ExpertPdf Html To Pdf Converter, you can specify if the javascripts or ActiveX or plug-ins are allowed to run in the web page that is converted to PDF. The JavaScript code is enabled by default. ActiveX controls (IE rendering engine only) are disabled by default. Plug-ins (WebKit rendering engine only) are enabled by default.

The properties from HtmlToPdfElement class which allow you to manage the client scripts and addins when converting to PDF are:

Copy

public bool ScriptsEnabled { get; set; }
public bool ActiveXEnabled { get; set; }
public bool PluginsEnabled { get; set; }

Public Property ScriptsEnabled() As Boolean
Public Property ActiveXEnabled() As Boolean
Public Property PluginsEnabled() As Boolean

Server Authentication in HtmlToPdfElement

ExpertPdf Html To Pdf Converter offers support for several types of server authentication when working with HtmlToPdfElement objects. ExpertPdf supports Integrated Windows Authentication if the NTML provider is used. The current user credentials are used by the converter to authenticate automatically.

The converter can handle IIS authentication like Basic HTTP Authentication. This type of authentication is disabled by default. To enable authentication you have to set the AuthenticationOptions property of the HtmlToPdfElement object.

Below you can find sample code for setting the username and password for authentication when converting HTML to PDF using an HtmlToPdfElement object:

Copy

htmlToPdfElement.AuthenticationOptions.Username = username;
htmlToPdfElement.AuthenticationOptions.Password = password;

htmlToPdfElement.AuthenticationOptions.Username = username
htmlToPdfElement.AuthenticationOptions.Password = password

Bookmarks in HtmlToPdfElement

The converter can produce bookmarks automatically in the generated PDF document for a list of specified HTML tags. The bookmarking is controlled by the PdfBookmarkOptions property and is enabled only when a list of HTML tag names is specified by the HtmlElementSelectors property.

For example, to enable bookmarking of the H1 and H2 tags you can use the following line of code:

Copy

htmlToPdfElement.PdfBookmarkOptions.HtmlElementSelectors = new string[] { "H1", "H2" };

htmlToPdfElement.PdfBookmarkOptions.HtmlElementSelectors = New String() {"H1", "H2"}

The WebKit rendering engine supports an even more advanced way of identifying the elements that will generate the bookmarks. That is done using the HtmlElementSelectors property. This property specifies the CSS selectors of the HTML elements to be bookmarked. For example, the selector for all the H1 elements is "H1", the selector for all the elements with the CSS class name 'myclass' is "*.myclass" and the selector for the elements with the id 'myid' is "*#myid".

Internal Links in HtmlToPdfElement

The converter automatically converts the HTML links with anchors found in the HTML document to internal links in PDF. This features can be used to easily create table of contents in the generated PDF document.

A HTML link with anchor consists in two HTML elements: a link defined with by a <a href="#target">Internal Link</a> tag and the target of the link defined by a <a name="target">Link Target</a> tag. When the HTML to PDF converter finds this construction it automatically generates an internal link in PDF from "Internal Link" to "Link Target".

The generation of internal links can be disabled using htmlToPdfElement.InternalLinksEnabled = false;.

There are a few things to ensure in order to get the internal links correctly generated in the PDF document. When converting an URL to PDF the URL must be fully qualified. For example if a website MyWebsite has a Default.aspx page with internal links which is automatically served by the web server when the address http://MyWebsite is typed in the web browser address bar, then converting directly the http://MyWebsite url might not produce the correct internal links because the converter is unable to determine the web page automatically served by the web server. Instead, when converting http://MyWebsite/Default.aspx the internal links will always be correctly generated. The HtmlToPdfElement constructors have an additional parameter called InternalLinksDocUrl which allows you to specify the fully qualified URL referenced by the internal links before calling the converter method.

When converting a HTML string to PDF it is recommended to always pass the baseUrl and internalLinksDocUrl parameters to the HtmlToPdfElement constructors to instantiate objects to convert the HTML string to PDF.

Retrieve HTML Elements Mapping to PDF from HtmlToPdfElement

This is a very powerful feature of the converter which allows you to obtain the position in the generated PDF document for any HTML element. Knowing the position in the generated PDF document of any element from the HTML document allows you to create bookmarks for elements from the HTML document, create internal links between HTML elements, place texts or images over the HTML elements or assign a digital signature to a certain element from HTML.

This feature can be accessed using the HtmlElementsMappingOptions property of the HtmlToPdfElement object. This property allows you to define a list of elements for which you want to retrieve position. That is done using the HtmlElementSelectors property. This property specifies the CSS selectors of the HTML elements. For example, the selector for all the H1 elements is "H1", the selector for all the elements with the CSS class name 'myclass' is "*.myclass" and the selector for the elements with the id 'myid' is "*#myid".

The HtmlElementsMappingOptions property must be set before calling the convert method.

The HTML elements mapping is returned in the HtmlElementsMappingResult property. The HtmlElementsMappingResult result is a collection of HtmlElementMapping objects which offers the PDF page index where the element was mapped by the converter and the rectangle where the element was rendered inside that page, the element HTML ID, the element tag name, the element text and the element outer HTML code.

In the code sample below, all the H1 and IMG elements and the elements with the ID MyID1 and MyID2 will be highlighted with a red rectangle in the generated PDF:

Copy

// create the HtmlToPdfElement htmlToPdfElement = new HtmlToPdfElement("http://www.html-to-pdf.net");
// define the list with the HTML tags of the elements for which you want to retrieve position
htmlToPdfElement.HtmlElementsMappingOptions.HtmlElementSelectors = new string[] { "IMG", "H1" };

// add the element to a PDF page
page.AddElement(htmlToPdfElement);

// iterate over the HTML elements mappings and draw a red rectangle around the element in PDF
foreach (HtmlElementMapping elementMapping in htmlToPdfElement.HtmlElementsMappingOptions.HtmlElementsMappingResult)
{
    // iterate over the positions of the HTML element in PDF because a HTML element
    // can span on many PDF pages
    foreach (HtmlElementPdfRectangle pdfRectangle in elementMapping.PdfRectangles)
    {
        // get the PDF page where the HTML element was rendered
        PdfPage elementPdfPage = document.Pages[pdfRectangle.PageIndex];
        RectangleF elementPdfRectangle = pdfRectangle.Rectangle;

        // get the rectangle inside the PDF page where the element was rendered
        RectangleElement elementHighlightRectangle = new RectangleElement(elementPdfRectangle.X, elementPdfRectangle.Y,
            elementPdfRectangle.Width, elementPdfRectangle.Height);
        elementHighlightRectangle.ForeColor = Color.Red;

        elementPdfPage.AddElement(elementHighlightRectangle);
    }
}

' create the HtmlToPdfElement htmlToPdfElement = new HtmlToPdfElement("http://www.html-to-pdf.net");
' define the list with the HTML tags of the elements for which you want to retrieve position
htmlToPdfElement.HtmlElementsMappingOptions.HtmlElementSelectors = New String() {"IMG", "H1"}

' add the element to a PDF page
page.AddElement(htmlToPdfElement)

' iterate over the HTML elements mappings and draw a red rectangle around the element in PDF
For Each elementMapping As HtmlElementMapping In htmlToPdfElement.HtmlElementsMappingOptions.HtmlElementsMappingResult
    ' iterate over the positions of the HTML element in PDF because a HTML element
    ' can span on many PDF pages
    For Each pdfRectangle As HtmlElementPdfRectangle In elementMapping.PdfRectangles
        ' get the PDF page where the HTML element was rendered
        Dim elementPdfPage As PdfPage = document.Pages(pdfRectangle.PageIndex)
        Dim elementPdfRectangle As RectangleF = pdfRectangle.Rectangle

        ' get the rectangle inside the PDF page where the element was rendered
        Dim elementHighlightRectangle As New RectangleElement(elementPdfRectangle.X, elementPdfRectangle.Y, elementPdfRectangle.Width, elementPdfRectangle.Height)
        elementHighlightRectangle.ForeColor = Color.Red

        elementPdfPage.AddElement(elementHighlightRectangle)
    Next
Next