Private Homepage of Hartmut Henkel

Experimenting with JBIG2 Support to pdfTeX

Introduction

The following is just an informal write-up from my private experimenting:

Adobe Systems have defined a new filter /JBIG2Decode in their newest PDF format, version 1.4, which allows decoding of image data after the JBIG2 standard. It seems that this feature is first supported by Adobe Acroread version 5.0.

The JBIG2 encoding is for bi-level images only, e. g. scanned texts, where it is told to give very high lossy or lossless compression ratios. It is especially well geared towards compression of multi-page documents, by using a global page with information commonly used by all pages. This rather new standard is worked out by the JBIG Committee. The latest JBIG2 draft standard is available from here as PDF-file.

JBIG2 Data Streams

I don't yet have any program, which would produce JBIG2 files. But some sample data streams are available from here. And there is a small but working ASCII-JBIG2 example in section 3.3.6 of the PDF reference, which can be typed in and binarized, e. g. by some awk tool. It produces two letters 'C', stacked over each other.

The Driver

I have experimented with JBIG2 image inclusion in PDF streams generated by program pdfTeX as part of the teTeX bundle, using the freshest beta version at that time (teTeX-src-beta-20020530.tar.gz). Program pdfTeX already allows JPEG image inclusion, so I could start from source code writejpg.c. The experimental driver is writejbig2.c. This I put into the pdftexdir directory of the teTeX tree on my Linux PC (debian 2.2r6), together with the other drivers. A few other files required patching, just to add jbig2 things similarly to the already existing jpeg things. Here is the list of new/patched files, all in the subdirectory pdftexdir:

The JBIG2 pictures must have the ending '.jb2' or '.jbig2'.

Experimenting

The driver as is only allows inclusion of one page, preset to number 1. I could test the driver only on the about 28 available JBIG2 files, which are of type:

The driver could process all three types. The fresh Linux Acroread, Version x86 linux 5.05 Apr 25 2002, chokes on only one file, 042_13.jbig2, from the above mentioned set with info 'Bad error code'. Don't know why. Another problem is, that above the included images there is a horizontal black hairline, which appears also in the print. I don't know the origin of this, but similar lines sometimes appear also with inclusion of JPEG and other types of pictures.

Open Points

Avoiding the above mentioned hairline.

There is no check of .jbig2 file validity. Program pdfTeX might crash completely on a corrupted file (not tested).

Real JBIG2 multi-page inclusion would be fine to have, utilizing the full JBIG2 compression power by using the same global page information for several image objects from the same JBIG2-file.

Determining the segment data length (section 7.2.7 of the JBIG2 draft standard) by detecting two-byte sequences is not supported.

To do more, I would have to understand the JBIG2 standard :-)

Lots of fprintf(stderr,) statements in the driver code. Well, it's experimental.

Unclear to me is, by which funny way Pascal-variables are related to C-variables in the web2c system. The underscores seem to have a magic role there...

End Remark

For now, this is experimenting result of about one week's evenings/nights. It was just fun to dig (for the first time) a little into PDF objects and the pdfTeX driver interface, to see that additions like a new image driver can be done rather easily through pdfTeX's very practical driver interface, without changing the pdftex.web code, --- and finally to see Acroread even showing the result on paper.

This page first put online 2 June 2002.

News