PDF files were designed to promote sharing. Everyone can open them—in their web browser if they have nothing else. Linux lets you manipulate, merge, and split PDF files on the command line.
The Portable Document Format
The Portable Document Format (PDF) solved a problem. When you created a document on a computer and wanted to share it with someone else, sending them the document didn’t always work.
Even if they had the same software package you’d used to create your document, they might not have the same fonts installed on their computer that you had on yours. They’d be able to open the document but it would look wrong.
If they didn’t have a copy of the software you used to create the package they wouldn’t be able to open it at all. If you used software that was only available on Linux, it was pointless sending that document to someone who only used Windows.
Adobe created a new file format in 1992 and called it the portable document format. Documents created to that standard—ISO 32000—contain the images and fonts needed to correctly render the contents of the file. PDF files can be opened by PDF viewers on any platform. It was a cross-platform, simple, and elegant solution.
A PDF file isn’t intended to be malleable like a word-processor document. They don’t readily lend themselves to editing. If you need to change the content of a PDF, it’s always better to go back to the source material, edit that, and generate a new PDF. In contrast to trying to change the content, structural manipulations can be performed on PDF files with relative ease.
Here are some ways to create PDF files on Linux, and how to perform some of the transforms that can be applied to them.
Creating PDF Files on Linux
Many of the applications available on Linux can generate PDF files directly. LibreOffice has a button right on the toolbar that generates a PDF of the current document. It couldn’t be easier.
For fine-grained control of PDF creation, the Scribus desktop publishing application is hard to beat.
If you need to create documents with scientific or mathematical content, perhaps for submission to academic journals, an application that uses LaTeX, such as Texmaker, will be perfect for you.
If you prefer a plain-text workflow, perhaps using Markdown, you can use pandoc
to convert to, and from, a great many file formats, including PDF. We have a guide dedicated to pandoc
but a simple example will show you how easy it is to use.
Install Texmaker first. pandoc
relies on some LaTeX libraries for PDF generation. Installing Texmaker is a convenient way to meet those dependencies.
The -o
(output) option is used to specify the type of file that will be created. The “raw-notes.md” file is a plain-text Markdown file.
pandoc -o new.pdf raw-notes.md
If we open the “new.pdf” file in a PDF viewer we see that it is a correctly-formed PDF.
The qpdf Command
The qpdf
command allows you to manipulate existing PDF files, whilst preserving their content. The changes you can make are structural. With qpdf
you can perform tasks such as merging PDF files, extracting pages, rotating pages, and setting and removing encryption.
To install qpdf
on Ubuntu use this command:
sudo apt install qpdf
The command on Fedora is:
sudo dnf install qpdf
On Manjaro you must type:
sudo pacman -S qpdf
Merging PDF Files
At first, some of the qpdf
command line syntax may seem confusing. For example, many of the commands expect an input PDF file.
If a command doesn’t require one, you need to use the --empty
option instead. This tells qpdf
not to expect an input file. The --pages
option lets you choose pages. If you just provide the PDF names, all pages are used.
To combine two PDF files to form a new PDF file, use this command format.
qpdf --empty --pages first.pdf second.pdf -- combined.pdf
This command is made up of:
- qpdf: Calls the
qpdf
command. - –empty: Tells
qpdf
there is no input PDF. You could argue that “first.pdf” and “second.pdf” are input files, butqpdf
considers them to be command line parameters. - –pages: Tells
qpdf
we’re going to be working with pages. - first.pdf second.pdf: The two files we’re going to extract the pages from. We’ve not used page ranges, so all pages will be used.
- —: Indicates the end of the command options.
- combined.pdf: The name of the PDF that will be created.
If we look for PDF files with ls
, we’ll see our two original files—untouched—and the new PDF called “combined.pdf.”
ls -hl first.pdf second.pdf combined.pdf
There are two pages in “first.pdf” and one page in “second.pdf.” The new PDF file has three pages.
You can use wildcards instead of listing a great many source files. This command creates a new file called “all.pdf” that contains all the PDF files in the current directory.
qpdf --empty --pages *.pdf -- all.pdf
We can use page ranges by adding the page numbers or ranges behind the file names the pages are to be extracted from.
This is will extract pages one and two from “first.pdf” and page two from “second.pdf.” Note that if “combined.pdf” already exists it isn’t overwritten. It has the selected pages added to it.
qpdf --empty --pages first.pdf 1-2 second.pdf 1 -- combined.pdf
Page ranges can be as detailed as you like. Here, we’re asking for a very specific set of pages from a large PDF file, and we’re creating a summary PDF file.
qpdf --empty --pages large.pdf 1-3,7,11,18-21,55 -- summary.pdf
The output file, “summary.pdf” contains pages 1 to 3, 7, 11, 18 to 21, and 55 from the input PDF file. This means there are 10 pages in “summary.pdf”
We can see that page 10 is page 55 from the source PDF.
Splitting PDF Files
The opposite of merging PDF files is splitting PDF files. To split a PDF into separate PDF files each holding a single page, the syntax is simple.
The file we’re splitting is “summary.pdf”, and the output file is given as “page.pdf.” This is used as the base name. Each new file has a number added to the base name. The --split-pages
option tells qpdf
what type of action we’re performing.
qpdf summary.pdf page.pdf --split-pages
The output is a series of sequentially numbered PDF files.
ls page*.pdf
If you don’t want to split out every page, use page ranges to select the pages you want.
If we issue this next command, we’ll split out a collection of single-page PDF files. The page ranges are used to specify the pages or ranges we want, but each page is still stored in a single PDF.
qpdf large.pdf section.pdf --pages large.pdf 1-5,11-14,60,70-100 -- --split-pages
The extracted pages have names based on “section.pdf” with a sequential number added to them.
ls section*.pdf
If you want to extract a page range and have it stored in a single PDF, use a command of this form. Note that we don’t include the --split-pages
option. Effectively, what we’re doing here is a PDF merge, but we’re only “merging” pages from one source file.
qpdf --empty --pages large.pdf 8-13 -- chapter2.pdf
This creates a single, multi-page PDF called “chapter2.pdf.”
Rotating Pages
To rotate a page, we create a new PDF that’s the same as the input PDF with the specified page rotated.
We use the --rotate
option to do this. The +90
means rotate the page 90 degrees clockwise. You can rotate a page 90, 180, or 270 degrees. You can also specify the rotation in degrees anticlockwise, by using a negative number, but there’s little need to do so. A rotation of -90 is the same as a rotation +270.
The number separated from the rotation by a colon “:
” is the number of the page you want to rotate. This could be a list of page numbers and page ranges, but we’re just rotating the first page. To rotate all pages use a page range of 1-z
.
qpdf --rotate=+90:1 summary.pdf rotated1.pdf
The first page has been rotated for us.
Encrypting and Decrypting
PDF documents can be encrypted so that they require a password to open them. That password is called the user password. There’s another password that’s required to change the security and other permission settings for a PDF. It’s called the owner password.
To encrypt a PDF we need to use the --encrypt
option and provide both passwords. The user password comes first on the command line.
We also specify the strength of encryption to use. You’d only need to drop from 256-bit encryption to 128-bit if you want to support very old PDF file viewers. We suggest you stick with 256-bit encryption.
We’re going to create an encrypted version of the “summary.pdf” called “secret.pdf.”
qpdf --encrypt hen.rat.squid goose.goat.gibbon 256 -- summary.pdf secret.pdf
When we try to open the PDF, the PDF viewer prompts us for a password. Entering the user password authorizes the viewer to open the file.
Remember that qpdf
doesn’t change the existing PDF. It creates a new one with the changes we’ve asked it to make. So if you make an encrypted PDF you’ll still have the original, unencrypted version. Depending on your circumstances you might want to delete the original PDF or safely store it away.
To decrypt a file, use the --decrypt
option. Obviously, you must know the owner password for this to work. We need to use the --password
option to identify the password.
qpdf --decrypt --password=goose.goat.gibbon secret.pdf unlocked.pdf
The “unlocked.pdf” can be opened without a password.
qpdf is an Excellent Tool
We’re deeply impressed with qpdf
. It provides a flexible and richly featured toolset for working with PDF files. And it is very fast, too.
Check out their well-written and detailed documentation to see just how much more it can do.