I recently ran into a PDF file that I was unable to read. I couldn't see any text, only the images. My guess is that the person who created the PDF file used a font unavailable on my Linux laptop. Having no access to the author or the source of the original document, I was left to my own devices to be able to read said PDF. The obvious solution is to copy the text and paste it into a word processor or text editor, unfortunately this PDF was protected and did not allow copying text. Since the information on that PDF was important, I decided to try and see if I could write some code to unlock said PDF file. I searched the web and found iText, a Java library for generation and manipulation of PDF files. I had heard of iText before (it is one of the libraries used by JasperReports), but had no experience with it before. I went to the iText site, read the documentation, and figured out a way to do what I needed to do. This article summarizes the procedure necessary to write code to unlock PDF files using iText.
Looking at the iText documentation, I
noticed that iText includes the com.lowagie.text.pdf.PdfReader
class that, unsurprisingly, reads a PDF into memory. The PdfReader
class has a number of constructor, for my purposes, the simplest one is
one that takes a String
containing the location of the
source PDF file.
I then looked for a method to decrypt PDFs, and could not find one, I
did find a method to encrypt PDFs that takes a parameter specifying what
can be done to the PDF when the user did not enter the correct password.
I figured that if I allowed all actions to be executed, I would
effectively create an unlocked PDF, after trying my experiment, turned
out I was right. The method in question is the encrypt()
method in the com.lowagie.text.pdf.PdfEncryptor
class.
There are several overloaded versions of this method, the one with the
following signature fit our purpose:
encrypt(PdfReader reader, OutputStream os, byte[] userPassword, byte[] ownerPassword,
int permissions, boolean strength128Bits)
FileOutputStream
corresponding to the unlocked
PDF to be written as the second parameter, null
s as both
the user password and owner password, All the PDF permissions or'ed
together as the permissions
parameter, and false
as the last parameter indicating that I didn't want 128 bit encryption.
Permissions are defined as static fields in the com.lowagie.text.pdf.PdfWriter
class.
Here is a code fragment demonstrating what I just explained, it is much easier to visualize with an example:
PdfReader reader = new PdfReader(inputFile);
PdfEncryptor.encrypt(reader, new FileOutputStream(outputFile), null,
null, PdfWriter.AllowAssembly | PdfWriter.AllowCopy
| PdfWriter.AllowDegradedPrinting | PdfWriter.AllowFillIn
| PdfWriter.AllowModifyAnnotations | PdfWriter.AllowModifyContents
| PdfWriter.AllowPrinting | PdfWriter.AllowScreenReaders, false);
See resources to download the complete source (iText required in the classpath to compile and run, the code was built and executed with iText 1.3).
iText makes it pretty easy to manipulate PDF files. I had no previous experience with the library, but was able to accomplish what I needed in very little time. This says a lot about the quality of the library and accompanying documentation. I will definitely consider iText my first choice whenever I need to do PDF manipulation from Java code in the future.