Situatie
PDF files were designed to promote sharing. Everyone can open them—in their web browser if they have nothing else. Linux lets you manipulate, merge, and split PDF files on the command line.
Solutie
The Portable Document Format
The Portable Document Format (PDF) solved a problem. When you created a document on a computer and wanted to share it with someone else, sending them the document didn’t always work.
Even if they had the same software package you’d used to create your document, they might not have the same fonts installed on their computer that you had on yours. They’d be able to open the document but it would look wrong.
If they didn’t have a copy of the software you used to create the package they wouldn’t be able to open it at all. If you used software that was only available on Linux, it was pointless sending that document to someone who only used Windows.
Adobe created a new file format in 1992 and called it the portable document format. Documents created to that standard—ISO 32000—contain the images and fonts needed to correctly render the contents of the file. PDF files can be opened by PDF viewers on any platform. It was a cross-platform, simple, and elegant solution.
A PDF file isn’t intended to be malleable like a word-processor document. They don’t readily lend themselves to editing. If you need to change the content of a PDF, it’s always better to go back to the source material, edit that, and generate a new PDF. In contrast to trying to change the content, structural manipulations can be performed on PDF files with relative ease.
Here are some ways to create PDF files on Linux, and how to perform some of the transformations that can be applied to them.
Creating PDF Files on Linux
Many of the applications available on Linux can generate PDF files directly. LibreOffice has a button right on the toolbar that generates a PDF of the current document. It couldn’t be easier.
For fine-grained control of PDF creation, the Scribus desktop publishing application is hard to beat.
If you need to create documents with scientific or mathematical content, perhaps for submission to academic journals, an application that uses LaTex, such as Texmaker, will be perfect for you.
Install Texmaker first. pandoc
relies on some LaTeX libraries for PDF generation. Installing Texmaker is a convenient way to meet those dependencies.
The -o
(output) option is used to specify the type of file that will be created. The “raw-notes.md” file is a plain-text Markdown file.
pandoc -o new.pdf raw-notes.md
If we open the “new.pdf” file in a PDF viewer we see that it is a correctly-formed PDF.
The qpdf
command allows you to manipulate existing PDF files, whilst preserving their content. The changes you can make are structural. With qpdf
you can perform tasks such as merging PDF files, extracting pages, rotating pages, and setting and removing encryption.
To install qpdf
on Ubuntu use this command:
sudo apt install qpdf
The command on Fedora is:
sudo dnf install qpdf
On Manjaro you must type:
sudo pacman -S qpdf
Merging PDF Files
At first, some of the qpdf
command line syntax may seem confusing. For example, many of the commands expect an input PDF file.
If a command doesn’t require one, you need to use the --empty
option instead. This tells qpdf
not to expect an input file. The --pages
option lets you choose pages. If you just provide the PDF names, all pages are used.
To combine two PDF files to form a new PDF file, use this command format.
qpdf --empty --pages first.pdf second.pdf -- combined.pdf
This command is made up of:
- qpdf: Calls the
qpdf
command. - –empty: Tells
qpdf
there is no input PDF. You could argue that “first.pdf” and “second.pdf” are input files, butqpdf
considers them to be command line parameters. - –pages: Tells
qpdf
we’re going to be working with pages. - first.pdf second.pdf: The two files we’re going to extract the pages from. We’ve not used page ranges, so all pages will be used.
- —: Indicates the end of the command options.
- combined.pdf: The name of the PDF that will be created.
If we look for PDF files with ls
, we’ll see our two original files—untouched—and the new PDF called “combined.pdf.”
ls -hl first.pdf second.pdf combined.pdf
There are two pages in “first.pdf” and one page in “second.pdf.” The new PDF file has three pages.
You can use wildcards instead of listing a great many source files. This command creates a new file called “all.pdf” that contains all the PDF files in the current directory.
qpdf --empty --pages *.pdf -- all.pdf
We can use page ranges by adding the page numbers or ranges behind the file names the pages are to be extracted from.
This is will extract pages one and two from “first.pdf” and page two from “second.pdf.” Note that if “combined.pdf” already exists it isn’t overwritten. It has the selected pages added to it.
qpdf --empty --pages first.pdf 1-2 second.pdf 1 -- combined.pdf
Page ranges can be as detailed as you like. Here, we’re asking for a very specific set of pages from a large PDF file, and we’re creating a summary PDF file.
qpdf --empty --pages large.pdf 1-3,7,11,18-21,55 -- summary.pdf
The output file, “summary.pdf” contains pages 1 to 3, 7, 11, 18 to 21, and 55 from the input PDF file. This means there are 10 pages in “summary.pdf”
We can see that page 10 is page 55 from the source PDF.
Leave A Comment?