Click or drag to resize

Frequently Asked Questions about ExpertPdf Html To Pdf Converter

I am using ExpertPdf v8 (or older) and after I have installed IE9 (or newer) on the computer where the converter runs, the generated PDF does not contain anymore searchable text.

Versions older than v9 depend on IE for HTML rendering. Starting with v9, we have introduced a new rendering engine that does not have the problem described. Please upgrade to v9 or above.

When I convert a HTML string to PDF, the external CSS files and images are not applied in the rendered PDF document.

When you convert a HTML string referencing external CSS files and images by relative URLs, the converter cannot determine the full URLs just looking at the string.

In order to solve this you have to set the baseURL parameter of the HTML string convert function with the full URL of the page from where you have retrieved the HTML string. This requires from the HTML string to have a valid HEAD tag. When the relative paths of your images are textually prefixed with the baseURL parameter the full URL of the images should be produce in order to start getting images in PDF.

This issue might also indicate an authentication or permissions problem on the server when accessing the external resources like images and CSS files. The HTML string is loaded into converter and the text is converted to PDF, but the images and CSS files are still accessed from an URL and they might not be accessible.

If you are sure that the url base you have set is correct but the images still don't appear in PDF, you can try to prefix a relative image URL from the HTML string with the the base url to construct a full image URL. Then you can put this url in a web browser on the server where you perform the conversion and check that the image is correctly displayed in the browser. After that you can try to convert the image URL to PDF using the GetPdfBytesFromUrl() method.

Another special situation where this problem might occur is for ASP.NET applications using forms authentication. The ASP.NET forms authentication implementation usually stores the forms authentication ticket in a cookie which should be sent back to server each time a resource is requested but the converter does not have the ability to automatically send this kind of cookie back to the server and therefore the authentication of the requests of external resources like images and CSS files can fail. A possible workaround for this problem is to store the images and CSS files referenced by the web page to be converted in a location which doesn't require authentication. If this is not an acceptable solution then you can try to set the forms authentication to the cookieless mode. In this mode the encrypted authentication ticket is set in the URL query string and not in a cookie and the base URL parameter of the HTML string convert function should be set accordingly to the URL containing the authentication ticket.

In rendered PDF document the images and text fonts appear to be smaller than they are in the source HTML document. For other HTML documents the rendered content is not shrunk but it appears in the left side of the document and is not centered. How can I control this?

The ExpertPDF HTML to PDF converter allows a very fine control of the PDF rendering process. The default settings of the converter should be acceptable for majority of the situations but sometimes more control and customizations are necessary. If want to learn more about the HTML to PDF rendering please check this article.

The converter internally uses a virtual display where to render the HTML page very similar to what the web browser does on the screen. This virtual display is different from display of your computer and it has a fixed resolution on your computer (which normally is 96 dpi) independent of the resolution of your computer screen. The web page elements dimensions are usually measured in pixels and this is the reason why the virtual display of the converter is also specified in pixels. These are the only dimensions used by the converter which are expressed in pixels. All the other dimensions are specified in points (1 point is 1/72 inches). However, because of the fixed resolution of the virtual display, the pixels dimensions of your web page can be easily converted to dimensions expressed in points. The converter API offers the UnitsConverter class which can be used to convert dimensions from pixels to points and from points to pixels.

You can specify the virtual display width and height in pixels using the PageWidth and PageHeight properties or you can specify the same values as parameters when you construct the PdfConverter object.

By default the virtual display width is set to 1024 pixels which should be sufficient to display the majority of the web pages. If the web page you are converting cannot be completely displayed in this width then you can increase this value or you can set the PageWidth property to 0 to allow the converter to automatically determine your web page width from the HTML elements width. The PageHeight property is 0 by default which means the virtual display will be automatically resized to display the whole HTML page. There are situations when the converter cannot automatically determine the web page height for example when the web page is a frame set. In this case, you can manually set the PageHeight to a certain value in pixels so that the page is displayed in the way you expect.

After the HTML content is displayed in the virtual display the virtual display content will be transfered into PDF as you would take a picture of the virtual display and put that picture into a PDF document. The PDF documents pages have a fixed size in points. For example, the A4 page with portrait orientation is 595 points in width and 842 points in height. If the virtual display width is more than 595 points then the rendered HTML content would be shrinked to fit the PDF page width and display the whole HTML content in the PDF document. If the virtual display width is less than 595 points then the rendered HTML content will not be resized and will be rendered in the top left corner of the PDF page at real size.

The dimension of the A4 portrait page in virtual device pixels is 793 x 1122 pixels. This means that at a default virtual display width of 1024 pixels, the HTML content will be shrunk to fit the PDF page. This is the reason why you can see smaller fonts and images in PDF than they are in the source HTML document.

The FitWidth property in PdfDocumentOptions can be used to specify if the HTML content is resized to fit the PDF page width. The default value is True which makes the HTML content to be resized if necessary to fit the PDF page width. When False, the HTML content will not be resized and it will be rendered at the real size in PDF (the size it has in the virtual display at the current virtual display resolution).

The FitHeight property in PdfDocumentOptions is similar to FitWidth and can be used to force the converter to fit all the HTML content into the height of the PDF page. If both FitWidth and FitHeight are True, then the HTML content will fit both the width and height of the PDF page.

When the FitHeight property is False, the HTML content could be wider than the PDF page width and the therefore the HTML content will be cut off to the right in PDF. In this case, in order to get the whole content in PDF, you have to set a wider page for the PDF document. You can first try to set Landscape orientation for the PDF page by setting PdfConverter.PdfDocumentOptions.PdfPageOrientation = PDFPageOrientation.Landscape. If this not enough you can choose a wider standard page like A3 or A2. You can even set a custom size for the PDF page as described in a section above. You can set the PdfConvert.PdfDocumentOptions.PdfPageSize = PdfPageSize.Custom and in this case the custom size of the PDF page will be taken from PdfConverter.PdfDocumentOptions.CustomPdfPageSize property.

The AutoSizePdfPage property was added to control the PDF page width. It has effect only when the FitWidth property is False. When FitWidth is False and AutoSizePdfPage is True, the PDF page width will be automatically resized to a custom value such that all the HTML content is displayed in PDF at real size.

If you don't want to resize the PDF page but you want to keep it A4 portrait for example, then you have to decrease the virtual display width. If your page can be correctly and entirely displayed in 793 pixels (which is the width of the A4 portrait page in pixels) you can set this value for PdfConverter.PageWidth property and you should get the whole HTML rendered at real size in PDF.

The HTML content can appear as not centered in the PDF page when the HTML content can be normally displayed at a width less than 1024 pixels. In this case there will normally be an empty space in the right side of the virtual display. When the virtual display content is transfered to PDF the content will appear as not centered in PDF. You can also solve this if you set the PdfConverter.PageWidth to a value of 793 pixels or less.

How can I obtain the HTML string from a web page and convert it to PDF?

If you are trying to convert a ASP.NET page you can use the Server.Execute method from to obtain the HTML string. Here is some code that can be used to obtain the HTML string from a page of your application:

StringWriter sw = new StringWriter();
Server.Execute("PageToConvert.aspx", sw);
string htmlCodeToConvert = sw.GetStringBuilder().ToString();

You can also use the methods from ConverterUtils class that we provide in the library to get the HTML code from a web page from Internet.

Is the ASP.NET session data available in the converted ASP.NET page during conversion?

The converter executes the web page to be converted in a new session, different from the session in which your ASP.NET application runs. This basically happens because the converter does not send the session cookie from the browser back to the server. Therefore, the data currently stored in the session is not available in the converter web page even if the page is part of your application.

Basically you have 2 options to consider when trying to overcome this situation: send the necessary data for loading the page to be converted in the query string of the converted page URL OR get the web page HTML code using the Server.Execute(Url) method. The Server.Execute method is executed in your application session, so all the session data and existing authentication should be valid.

When you get the HTML string with the Server.Execute(Url) method, the resulted HTML code should reference the external CSS, images and JavaScript code by a full URL, not by a relative URL. Here is some code to obtain the HTML string from a page of your application:

StringWriter sw = new StringWriter();
Server.Execute("PageToConvert.aspx", sw);
string htmlCodeToConvert = sw.GetStringBuilder().ToString();

To instruct the converter how to automatically turn all the relative URLs into absolute URLs, you have to pass the baseURL parameter of the convert function with the full URL of the page from where you have taken the HTML string.

What type of fonts does the converter support and how can I embed the fonts in the generated PDF document?

The converter supports any true type font preinstalled in Windows operating system. It also supports the open type fonts with the condition that the open type font has TrueType outlines, not PostScript outlines. To check what type of outlines has an installed font, you can open the font from the Fonts folder in Control Panel. When a font is not supported by the converter the MS Sans Serif font is used by default.

The converter also supports custom true type fonts you can install from a .ttf or a .otf file. After you have installed a font it becomes available only for the currently logged in Windows user who installed the font. In order to make the font available for all the users, including the ASP.NET user in the case you are using the converter from an ASP.NET application, you have to reboot the computer after the font installation.

To embed the true type fonts in the generated PDF you have to set property EmbedFonts to True. By default this property is False. The same property is also available in the HtmlToPdfElement class defined by the HTML to PDF Converter library.

Starting with v9, the html to pdf converter supports custom web fonts (TTF, WOFF).

I installed a true type font on the server but the converter is still using a default font instead of the installed font.

First make sure that you restarted the server after true type font installation. After you have installed a font it becomes available only for the currently logged in Windows user who installed the font. In order to make the font available for all the users, including the ASP.NET user, in case you are using the converter from an ASP.NET application, you have to restart the server after font installation.

If the server restarting didn't solve the problem, then make sure you installed a supported type of font. The converter supports any true type font preinstalled in Windows operating system. It also supports the open type fonts with the condition that the open type font has TrueType outlines not PostScript outlines. To check what type of outlines has an installed font, you can open the font from the Fonts folder in Control Panel. When a font is not supported by the converter the MS Sans Serif font is used by default.

Another situation when the default font is used occurs when the style used for a font is not supported by that font. The true type fonts have separate glyphs for different styles. For example a font can have separate glyphs for the normal, italic or bold styles. Some fonts can support only the normal style other fonts can support only the italic style. You have to use only the styles supported for a true type font. You can see what styles are available for a font in the Fonts folder of the Control Panel. If a font does not have the required style the converter first tries to use a supported style and if it is not possible then a default font is used.

Some images are cut off between PDF pages? Is there any option to avoid this?

Set the AvoidImageBreak property to True. By default this property is False and the images might get cut off between PDF pages. You can also set the page-break-inside: avoid CSS style inline on the IMG tag to achieve the same result.

Can I deploy my ASP.NET application using the HTML to PDF Converter on a shared server?

The converter requires Full Trust level for the ASP.NET application calling it. The default trust level for an ASP.NET applications is Full Trust but the shared hosting providers usually modify the trust level to Medium Trust which makes our converter to not run properly in such environments. In order to solve this issue you can ask you shared hosting provider to give Full Trust level for you ASP.NET application. Another possibility is to create a ASP.NET web service around the converter library, install that web service on a machine where the full trust is allowed and call the web service from your application. You have to ensure that the web service you create can be used only from your application.

Does ExpertPdf Html To Pdf Converter work on Windows Azure?

ExpertPdf works on Windows Azure only if it is used in a virtual machine or a cloud service. Due to some security restrictions, ExpertPdf does not work on Azure if App Service execution mode is used.

You can read more details about Windows Azure execution models here: https://azure.microsoft.com/en-us/product-categories/compute/.

If you need to convert from html to pdf in a website hosted on Windows Azure, run your website using a virtual machine or cloud service execution model that offer more flexibility than the simple web site execution model.

Later edit: ExpertPdf (v12.2 or above) works on Azure Web Apps, on Windows, starting with the Basic plan (does not work with Free/Shared plans). The Web Apps version requires a restricted rendering engine (WebKit2 rendering engine was added to the library) and because of that, some features are not available. To name a few: no support for web fonts, support only for single page HtmlToPdfElement objects, no support to exclude elements from conversion.

Conversion failure. Exception of type 'System.OutOfMemoryException' was thrown.

This error might occur in case of html to pdf conversion of large web pages, containing many / very large images.

Our tool initially converts the images to bitmaps and then compresses them to jpg internally. If you have several large images (with large resolutions), even if you display them in a 20x30 box in html, they are still large and when converted to bmp, they consume a lot of memory.

To work around this problem, you need to have your images optimized for the web. Alternatively, if you use an x64 system, you can use ExpertPdf x64 optimized version that allows the allocation of a lot more memory than the default x86 version.