Advanced Post Convert Customization of the Generated PDF Document

ExpertPdf is more than a html to pdf converter. ExpertPdf is a full featured PDF library. The main purpose of the component is to convert web pages to PDF, but the tool can also be used to work with existing PDF documents, merge several PDF documents into a single document, add pages, texts, images, shapes to PDF documents, etc.

The PdfConverter class offers a set of render methods producing Document objects created by the converter tool as a result of the html to pdf conversion.

Copy

public Document GetPdfDocumentObjectFromUrl(string url);
public Document GetPdfDocumentObjectFromUrl(string url, string internalLinksDocUrl);
public Document GetPdfDocumentObjectFromHtmlFile(string htmlFilePath);
public Document GetPdfDocumentObjectFromHtmlFile(string htmlFilePath, string internalLinksDocUrl);
public Document GetPdfDocumentObjectFromHtmlString(string htmlString);
public Document GetPdfDocumentObjectFromHtmlString(string htmlString, string urlBase);
public Document GetPdfDocumentObjectFromHtmlString(string htmlString, string urlBase, string internalLinksDocUrl);
public Document GetPdfDocumentObjectFromHtmlStream(System.IO.Stream htmlStream, Encoding streamEncoding);
public Document GetPdfDocumentObjectFromHtmlStream(System.IO.Stream htmlStream, Encoding streamEncoding, string urlBase);
public Document GetPdfDocumentObjectFromHtmlStream(System.IO.Stream htmlStream, Encoding streamEncoding, string urlBase, string internalLinksDocUrl);

Public Function GetPdfDocumentObjectFromUrl(url As String) As Document
Public Function GetPdfDocumentObjectFromUrl(url As String, internalLinksDocUrl As String) As Document
Public Function GetPdfDocumentObjectFromHtmlFile(htmlFilePath As String) As Document
Public Function GetPdfDocumentObjectFromHtmlFile(htmlFilePath As String, internalLinksDocUrl As String) As Document
Public Function GetPdfDocumentObjectFromHtmlString(htmlString As String) As Document
Public Function GetPdfDocumentObjectFromHtmlString(htmlString As String, urlBase As String) As Document
Public Function GetPdfDocumentObjectFromHtmlString(htmlString As String, urlBase As String, internalLinksDocUrl As String) As Document
Public Function GetPdfDocumentObjectFromHtmlStream(htmlStream As System.IO.Stream, streamEncoding As Encoding) As Document
Public Function GetPdfDocumentObjectFromHtmlStream(htmlStream As System.IO.Stream, streamEncoding As Encoding, urlBase As String) As Document
Public Function GetPdfDocumentObjectFromHtmlStream(htmlStream As System.IO.Stream, streamEncoding As Encoding, urlBase As String, internalLinksDocUrl As String) As Document

The PDF document object resulted after conversion is an instance of the Document class. This Document object offers access to the collection of pages of PDF document. You can iterate through the document pages, add new pages to the document, append external PDF documents or add new elements like text and images to the document pages. After modification, the document can be saved to a file or to a stream using one of the Save methods of the Document class. The classes and methods that can be used to customize the generated PDF document are available in the ExpertPdf.HtmlToPdf.PdfDocument namespace.

The most important class defined in the ExpertPdf.HtmlToPdf.PdfDocument namespace is the Document class. When the Document object is returned by one of the render methods mentioned above, it already contains the PDF pages generated by the converter from the URL or from the HTML string being converted. The collection of PDF pages can be accessed with Pages property. New pages with the desired size, orientation and margins can be added to the collection of pages and new elements can be added to any page in the collection.

In the following sections we offer a detailed description of the elements that can be added to a page in a PDF document. You can add a HtmlToPdfElement object which makes possible multiple conversions in the same PDF document, a HtmlToImageElement object which embeds the image of a HTML document into the PDF document. You can also add new texts and images, shapes, digital signatures, bookmarks, templates, watermarks, file attachments and notes.

Sample Code

In the code sample below taken from the WinForms_HtmlElementsLocationInPdf demo application, all the H1 and IMG elements and the elements with the ID ID1 and ID2 will be highlighted with a green rectangle in the generated PDF:

Copy

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;

using ExpertPdf.HtmlToPdf;
using ExpertPdf.HtmlToPdf.PdfDocument;

namespace WinForms_HtmlElementsLocationInPdf
{
    public partial class HtmlElementsLocationInPdf : Form
    {
        public HtmlElementsLocationInPdf()
        {
            InitializeComponent();
        }

        private void btnConvert_Click(object sender, EventArgs e)
        {
            try
            {
                PdfConverter pdfConverter = new PdfConverter();

                // inform the converter about the HTML elements for which we want the location in PDF
                // in this sample we want the location of IMG, H1 and H2 elements
                pdfConverter.HtmlElementsMappingOptions.HtmlElementSelectors = new string[] { "IMG", "H1", "H2" };

                // call the converter and get a Document object from URL
                Document pdfDocument = pdfConverter.GetPdfDocumentObjectFromUrl(textBoxURL.Text.Trim());

                // iterate over the HTML elements locations and hightlight each element with a green rectangle
                foreach (HtmlElementMapping elementMapping in pdfConverter.HtmlElementsMappingOptions.HtmlElementsMappingResult)
                {
                    // because a HTML element can span over many PDF pages the mapping

                    // of the HTML element in PDF document consists in a list of rectangles,
                    // one rectangle for each PDF page where this element was rendered
                    foreach (HtmlElementPdfRectangle elementLocationInPdf in elementMapping.PdfRectangles)
                    {
                        // get the PDF page
                        PdfPage pdfPage = pdfDocument.Pages[elementLocationInPdf.PageIndex];
                        RectangleF pdfRectangleInPage = elementLocationInPdf.Rectangle;

                        // create a RectangleElement to highlight the HTML element
                        RectangleElement highlightRectangle = new RectangleElement(pdfRectangleInPage.X, pdfRectangleInPage.Y,
                            pdfRectangleInPage.Width, pdfRectangleInPage.Height);
                        highlightRectangle.ForeColor = Color.Green;

                        pdfPage.AddElement(highlightRectangle);
                    }
                }

                // save the PDF bytes in a file on disk
                string outFilePath = System.IO.Path.Combine(Application.StartupPath, "Result.pdf");

                try
                {
                    pdfDocument.Save(outFilePath);
                }
                finally
                {
                    // close the Document to realease all the resources
                    pdfDocument.Close();
                }

                // open the generated PDF document in an external viewer
                DialogResult dr = MessageBox.Show("Open the rendered file in an external viewer?", "Open Rendered File", 
                    MessageBoxButtons.YesNo);
                if (dr == DialogResult.Yes)
                {
                    System.Diagnostics.Process.Start(outFilePath);
                }
            }
            catch (Exception ex)
            {
                MessageBox.Show(ex.Message);
                return;
            }
        }
    }
}

Imports System.Collections.Generic
Imports System.ComponentModel
Imports System.Data
Imports System.Drawing
Imports System.Text
Imports System.Windows.Forms

Imports ExpertPdf.HtmlToPdf
Imports ExpertPdf.HtmlToPdf.PdfDocument

Namespace WinForms_HtmlElementsLocationInPdf
    Public Partial Class HtmlElementsLocationInPdf
        Inherits Form
        Public Sub New()
            InitializeComponent()
        End Sub

        Private Sub btnConvert_Click(sender As Object, e As EventArgs)
            Try
                Dim pdfConverter As New PdfConverter()

                ' inform the converter about the HTML elements for which we want the location in PDF
                ' in this sample we want the location of IMG, H1 and H2 elements
                pdfConverter.HtmlElementsMappingOptions.HtmlElementSelectors = New String() {"IMG", "H1", "H2"}

                ' call the converter and get a Document object from URL
                Dim pdfDocument As Document = pdfConverter.GetPdfDocumentObjectFromUrl(textBoxURL.Text.Trim())

                ' iterate over the HTML elements locations and hightlight each element with a green rectangle
                For Each elementMapping As HtmlElementMapping In pdfConverter.HtmlElementsMappingOptions.HtmlElementsMappingResult
                    ' because a HTML element can span over many PDF pages the mapping

                    ' of the HTML element in PDF document consists in a list of rectangles,
                    ' one rectangle for each PDF page where this element was rendered
                    For Each elementLocationInPdf As HtmlElementPdfRectangle In elementMapping.PdfRectangles
                        ' get the PDF page
                        Dim pdfPage As PdfPage = pdfDocument.Pages(elementLocationInPdf.PageIndex)
                        Dim pdfRectangleInPage As RectangleF = elementLocationInPdf.Rectangle

                        ' create a RectangleElement to highlight the HTML element
                        Dim highlightRectangle As New RectangleElement(pdfRectangleInPage.X, pdfRectangleInPage.Y, pdfRectangleInPage.Width, pdfRectangleInPage.Height)
                        highlightRectangle.ForeColor = Color.Green

                        pdfPage.AddElement(highlightRectangle)
                    Next
                Next

                ' save the PDF bytes in a file on disk
                Dim outFilePath As String = System.IO.Path.Combine(Application.StartupPath, "Result.pdf")

                Try
                    pdfDocument.Save(outFilePath)
                Finally
                    ' close the Document to realease all the resources
                    pdfDocument.Close()
                End Try

                ' open the generated PDF document in an external viewer
                Dim dr As DialogResult = MessageBox.Show("Open the rendered file in an external viewer?", "Open Rendered File", MessageBoxButtons.YesNo)
                If dr = DialogResult.Yes Then
                    System.Diagnostics.Process.Start(outFilePath)
                End If
            Catch ex As Exception
                MessageBox.Show(ex.Message)
                Return
            End Try
        End Sub
    End Class
End Namespace