Click or drag to resize

Advanced Post Convert Customization of the Generated PDF Document

ExpertPdf is more than a html to pdf converter. ExpertPdf is a full featured PDF library. The main purpose of the component is to convert web pages to PDF, but the tool can also be used to work with existing PDF documents, merge several PDF documents into a single document, add pages, texts, images, shapes to PDF documents, etc.

The PdfConverter class offers a set of render methods producing Document objects created by the converter tool as a result of the html to pdf conversion.

public Document GetPdfDocumentObjectFromUrl(string url);
public Document GetPdfDocumentObjectFromUrl(string url, string internalLinksDocUrl);
public Document GetPdfDocumentObjectFromHtmlFile(string htmlFilePath);
public Document GetPdfDocumentObjectFromHtmlFile(string htmlFilePath, string internalLinksDocUrl);
public Document GetPdfDocumentObjectFromHtmlString(string htmlString);
public Document GetPdfDocumentObjectFromHtmlString(string htmlString, string urlBase);
public Document GetPdfDocumentObjectFromHtmlString(string htmlString, string urlBase, string internalLinksDocUrl);
public Document GetPdfDocumentObjectFromHtmlStream(System.IO.Stream htmlStream, Encoding streamEncoding);
public Document GetPdfDocumentObjectFromHtmlStream(System.IO.Stream htmlStream, Encoding streamEncoding, string urlBase);
public Document GetPdfDocumentObjectFromHtmlStream(System.IO.Stream htmlStream, Encoding streamEncoding, string urlBase, string internalLinksDocUrl);

The PDF document object resulted after conversion is an instance of the Document class. This Document object offers access to the collection of pages of PDF document. You can iterate through the document pages, add new pages to the document, append external PDF documents or add new elements like text and images to the document pages. After modification, the document can be saved to a file or to a stream using one of the Save methods of the Document class. The classes and methods that can be used to customize the generated PDF document are available in the ExpertPdf.HtmlToPdf.PdfDocument namespace.

The most important class defined in the ExpertPdf.HtmlToPdf.PdfDocument namespace is the Document class. When the Document object is returned by one of the render methods mentioned above, it already contains the PDF pages generated by the converter from the URL or from the HTML string being converted. The collection of PDF pages can be accessed with Pages property. New pages with the desired size, orientation and margins can be added to the collection of pages and new elements can be added to any page in the collection.

In the following sections we offer a detailed description of the elements that can be added to a page in a PDF document. You can add a HtmlToPdfElement object which makes possible multiple conversions in the same PDF document, a HtmlToImageElement object which embeds the image of a HTML document into the PDF document. You can also add new texts and images, shapes, digital signatures, bookmarks, templates, watermarks, file attachments and notes.

Sample Code

In the code sample below taken from the WinForms_HtmlElementsLocationInPdf demo application, all the H1 and IMG elements and the elements with the ID ID1 and ID2 will be highlighted with a green rectangle in the generated PDF:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;

using ExpertPdf.HtmlToPdf;
using ExpertPdf.HtmlToPdf.PdfDocument;

namespace WinForms_HtmlElementsLocationInPdf
{
    public partial class HtmlElementsLocationInPdf : Form
    {
        public HtmlElementsLocationInPdf()
        {
            InitializeComponent();
        }

        private void btnConvert_Click(object sender, EventArgs e)
        {
            try
            {
                PdfConverter pdfConverter = new PdfConverter();

                // inform the converter about the HTML elements for which we want the location in PDF
                // in this sample we want the location of IMG, H1 and H2 elements
                pdfConverter.HtmlElementsMappingOptions.HtmlElementSelectors = new string[] { "IMG", "H1", "H2" };

                // call the converter and get a Document object from URL
                Document pdfDocument = pdfConverter.GetPdfDocumentObjectFromUrl(textBoxURL.Text.Trim());

                // iterate over the HTML elements locations and hightlight each element with a green rectangle
                foreach (HtmlElementMapping elementMapping in pdfConverter.HtmlElementsMappingOptions.HtmlElementsMappingResult)
                {
                    // because a HTML element can span over many PDF pages the mapping

                    // of the HTML element in PDF document consists in a list of rectangles,
                    // one rectangle for each PDF page where this element was rendered
                    foreach (HtmlElementPdfRectangle elementLocationInPdf in elementMapping.PdfRectangles)
                    {
                        // get the PDF page
                        PdfPage pdfPage = pdfDocument.Pages[elementLocationInPdf.PageIndex];
                        RectangleF pdfRectangleInPage = elementLocationInPdf.Rectangle;

                        // create a RectangleElement to highlight the HTML element
                        RectangleElement highlightRectangle = new RectangleElement(pdfRectangleInPage.X, pdfRectangleInPage.Y,
                            pdfRectangleInPage.Width, pdfRectangleInPage.Height);
                        highlightRectangle.ForeColor = Color.Green;

                        pdfPage.AddElement(highlightRectangle);
                    }
                }

                // save the PDF bytes in a file on disk
                string outFilePath = System.IO.Path.Combine(Application.StartupPath, "Result.pdf");

                try
                {
                    pdfDocument.Save(outFilePath);
                }
                finally
                {
                    // close the Document to realease all the resources
                    pdfDocument.Close();
                }

                // open the generated PDF document in an external viewer
                DialogResult dr = MessageBox.Show("Open the rendered file in an external viewer?", "Open Rendered File", 
                    MessageBoxButtons.YesNo);
                if (dr == DialogResult.Yes)
                {
                    System.Diagnostics.Process.Start(outFilePath);
                }
            }
            catch (Exception ex)
            {
                MessageBox.Show(ex.Message);
                return;
            }
        }
    }
}