Importing files
How to import files ?

Paperwork can import various file formats (PDF, JPEG, etc). The exact behavior of Paperwork may vary slightly depending of the file format.


Each PDF is always considered as a whole document. They are never appended to existing document. They are copied as is in the work directory and are never modified by Paperwork (just moved and renamed).

You can import a single PDF, or you can select a folder. If you select a folder, Paperwork will browse this folder and look for PDFs to import. Already-imported PDFs are simply ignored. Folder is browsed recursively (all the folders inside the folder are also examined).


Regarding images, Paperwork supports a lot of file formats. It supports JPEG, PNG, GIF, BMP, TIFF, etc.

Each image is considered as a page. Currently, you can only import one file at a time.

Images are always appended to the document currently opened. Simply select an empty document ("New document") to create a new document while importing.

OCR and automatic labeling

Paperwork will immediately integrate your document in its work directory, and quickly display it. However, it still has to examine it.

  • PDF : It will look for pages with no text attached. On those pages, it will automatically run OCR. Once all the pages have been examined, it will automatically document labels. Note that this process run mostly in background and may take a few minutes for big PDFs files.
  • Images : It will run OCR on the image to extract its text. If it is the first page of a new document, it will then automatically apply document labels.

Regarding images

Paperwork is a document manager. While it can, it is not designed to handle images with only very little text or photos. Automatic labeling will not work correctly on such documents.

The OCR (Tesseract) works well with black text on white background. Automatic labeling requires as many keywords on the first page as possible.






