How To Check Font Embedding In Pdf

Posted By admin On 02/05/19

I'm aware of the pdftk.exe utility that can indicate which fonts are used by a PDF, and wether they are embedded or not.

The font tab list fonts which are embedded in the PDF file it doesn't mean they are used. – user2284570 Dec 1 '14 at 21:03 add a comment up vote 4 down vote. This video is to show how you can check if your fonts are embedded in your pdf file using Adobe Acrobat.

Now the problem: given I had PDF files with embedded fonts -- how can I extract those fonts in a way that they are re-usable as regular font files? Are there (preferably free) tools which can do that? Also: can this be done programmatically with, say, iText?

simplybest55

7 Answers

You have several options. All these methods work on Linux as well as on Windows or Mac OS X. However, be aware that most PDFs do not include to full, complete fontface when they have a font embedded. Mostly they include just the subset of glyphs used in the document.

Using pdftops

One of the most frequently used methods to do this on *nix systems consists of the following steps:

  1. Convert the PDF to PostScript, for example by using XPDF's pdftops (on Windows: pdftops.exe helper program.
  2. Now fonts will be embedded in .pfa (PostScript) format + you can extract them using a text editor.
  3. You may need to convert the .pfa (ASCII) to a .pfb (binary) file using the t1utils and pfa2pfb.
  4. In PDFs there are never .pfm or .afm files (font metric files) embedded (because PDF viewer have internal knowledge about these). Without these, font files are hardly usable in a visually pleasing way.

Using fontforge

Another method is to use the Free font editor FontForge:

  1. Use the 'Open Font' dialogbox used when opening files.
  2. Then select 'Extract from PDF' in the filter section of dialog.
  3. Select the PDF file with the font to be extracted.
  4. A 'Pick a font' dialogbox opens -- select here which font to open.

Check the FontForge manual. You may need to follow a few specific steps which are not necessarily straightforward in order to save the extracted font data as a file which is re-usable.

Using mupdf

Next, MuPDF. This application comes with a utility called pdfextract (on Windows: pdfextract.exe) which can extract fonts and images from PDFs. (In case you don't know about MuPDF, which still is relatively unknown and new: 'MuPDF is a Free lightweight PDF viewer and toolkit written in portable C.', written by Artifex Software developers, the same company that gave us Ghostscript.)
(Update: Newer versions of MuPDF have moved the former functionality of 'pdfextract' to the command 'mutool extract'. Download it here: mupdf.com/downloads)

Note: pdfextract.exe is a command-line program. To use it, do the following:

This command will dump all of the extractable files from the pdf file referenced into the current directory. Generally you will see a variety of files: images as well as fonts. These include PNG, TTF, CFF, CID, etc. The image names will be like img-0412.png if the PDF object number of the image was 412. The fontnames will be like FGETYK+LinLibertineI-0966.ttf, if the font's PDF object number was 966.

CFF (Compact Font Format) files are a recognized format that can be converted to other formats via a variety of converters for use on different operating systems.

Again: be aware that most of these font files may have only a subset of characters and may not represent the complete typeface.

Update: (Jul 2013) Recent versions of mupdf have seen an internal reshuffling and renaming of their binaries, not just once, but several times. The main utility used to be a 'swiss knife'-alike binary called mubusy (name inspired by busybox?), which more recently was renamed to mutool. These support the sub-commands info, clean, extract, poster and show. Unfortunatey, the official documentation for these tools isn't up to date (yet). If you're on a Mac using 'MacPorts': then the utility was renamed in order to avoid name clashes with other utilities using identical names, and you may need to use mupdfextract.

To achieve the (roughly) equivalent results with mutool as its previous tool pdfextract did, just run mubusy extract ....*

How to embed a font

So to extract fonts and images, you may need to run one of the following commandlines:

Downloads are here: mupdf.com/downloads

Using gs (Ghostscript)

Then, Ghostscript can also extract fonts directly from PDFs. However, it needs the help of a special utility program named extractFonts.ps, written in PostScript language, which is available from the Ghostscript source code repository.

Now use it, you need to run both, this file extractFonts.ps and your PDF file. Ghostscript will then use the instructions from the PostScript program to extract the fonts from the PDF. It looks like this on Windows (yes, Ghostscript understands the 'forward slash', /, as a path separator also on Windows!):

or on Linux, Unix or Mac OS X:

I've tested the Ghostscript method a few years ago. At the time it did extract *.ttf (TrueType) just fine. I don't know if other font types will also be extracted at all, and if so, in a re-usable way. I don't know if the utility does block extracting of fonts which are marked as protected.

Using pdf-parser.py

Finally, Didier Stevens' pdf-parser.py: this one is probably not as easy to use, because you need to have some know-how about internal PDF structures. pdf-parser.py is a Python script which can do a lot of other things too. It can also decompress and extract arbitrary streams from objects, and therefore it can extract embedded font files too.

But you need to know what to look for. Let's see it with an example. I have a file named big.pdf. As a first step I use the -s parameter to search the PDF for any occurrence of the keyword FontFile (pdf-parser.py does not require a case sensitive search):

In my case, for my big1.pdf, I get this result:

It tells me that there are two instances of FontFile2 inside the PDF, and these are in PDF objects no. 15 and no. 16, respectively. Object no. 15 holds the /FontFile2 for font /ArialMT, object no. 16 holds the /FontFile2 for font /Arial-BoldMT.

To show this more clearly:

A quick peeking into the PDF specification reveals the the keyword /FontFile2 relates to a 'stream containing a TrueType font program' (/FontFile would relate to a 'stream containing a Type 1 font program' and /FontFile3 would relate to a 'stream containing a font program whose format is specified by the Subtype entry in the stream dictionary' {hence being either a Type1C or a CIDFontType0C subtype}.)

To look specifically at PDF object no. 15 (which holds the font /ArialMT), one can use the -o 15 parameter:

This pdf-parser.py output tells us that this object contains a stream (which it will not directly display) that has a length of 1.581.435 Bytes and is encoded ( 'compressed') with ASCIIHexEncode and needs to be decoded ( 'de-compressed' or 'filtered') with the help of the standard /ASCIIHexDecode filter.

To dump any stream from an object, pdf-parser.py can be called with the -d dumpname parameter. Let's do it:

Our extracted data dump will be in the file named dumped-data.ext. Let's see how big it is:

Oh look, it is 1.581.435 Bytes. We saw this figure in the previous command's output. Opening this file with a text editor confirms that its content is ASCII hex encoded data.

Opening the file with a font reading tool like otfinfo (this is a part of the lcdf-typetools package) will lead to some disappointment at first:

OK, this is because we did not (yet) let pdf-parser.py make use of its full magic: to dump a filtered, decoded stream. For this we have to add the -f parameter:

What's the size is this new file?

How

Oh, look: that exact number was also already stored in the PDF object no. 15 dictionary as the value for key /Length1...

What does file think it is?

What does otfinfo tell us about it?

So Bingo!, we have a winner: pdf-parser.py did indeed extract a valid font file for us. Given the size of this file (778.552 Bytes), it looks like this font had been embedded even completely in the PDF...

We could rename it to arial-regular.ttf and install it as such and happily make use of it.

How To Check Font Embedding In Pdf

Caveats:

  • In any case you need to follow the license that applies to the font. Some font licences do not allow free use and/or distribution. Pirating fonts is like pirating any software or other copyrighted material.

  • Most PDFs which are in the wild out there do not embed the full font anyway, but only subsets. Extracting a subset of a font is only useful in a very limited scope, if at all.

Please do also read the following about Pros and (more) Cons regarding font extraction efforts:

  • http://typophile.com/node/34377 — not available anymore, but can bee seen on Wayback Machine at https://web.archive.org/web/20110717120241/typophile.com/node/34377
Kurt PfeifleKurt Pfeifle

Use online service http://www.extractpdf.com. No need to install anything.

igoigo

Free Check Font

Eventually found the FontForge Windows installer package and opened the PDF through the installed program. Worked a treat, so happy.

DapizzDapizz
Check

http://www.verypdf.com/app/pdf-font-extractor/pdf-font-extracting-tool.htmlIMO easiest way to extract fonts (Windows).

l00kl00k

PDF2SVG version 6.0 from PDFTron does a reasonable job. It produces OpenType (.otf) fonts by default. Use --preserve_fontnames to preserve 'the font/font-family naming scheme as obtained from the source file.'

PDF2SVG is a commercial product, but you can download a free demo executable (which includes watermarks on the SVG output but doesn't otherwise restrict usage). There may be other PDFTron products that also extract fonts, but I only recently discovered PDF2SVG myself.

Sean LeatherSean Leather

One of the best online tools currently available to extract pdf fonts is http://www.pdfconvertonline.com/extract-pdf-fonts-online.html

Riyafa Abdul HameedRiyafa Abdul Hameed

This is a followup to the font-forge section of @Kurt Pfeifle's answer, specific to Red Hat (and possibly other Linux distros).

  1. After opening the PDF and selecting the font you want, you will want to select 'File -> Generate Fonts...' option.
  2. If there are errors in the file, you can choose to ignore them or save the file and edit them. Most of the errors can be fixed automatically if you click 'Fix' enough times.
  3. Click 'Element -> Font Info...', and 'Fontname', 'Family Name' and 'Name for Humans' are all set to values you like. If not, modify them and save the file somewhere. These names will determine how your font appears on the system.
  4. Select your file name and click 'Save...'

Once you have your TTF file, you can install it on your system by

  1. Copying it to folder /usr/share/fonts (as root)
  2. Running fc-cache -f /usr/share/fonts/ (as root)
Mad PhysicistMad Physicist

protected by CommunityFeb 14 '14 at 17:52

Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?