mp4 file has a additional english audio track.
> March 2015: The German Federal Office for Safety in Information Technology bans JBIG2 from being used for archival purposes.
Here, it would be fair to describe what's going on as OCR with the glyphs not being fixed in advance, but rather being discovered on the fly by the algorithm. The entire concept is to identify sections of the image that "show the same thing", and replace the data in those sections with pointers to a single representative patch. That's really not so different from compressing image data that looks suspiciously similar to a capital A down to the one byte 0x41. It's just that different image sections are being Optically Recognized as "similar to each other" rather than "similar to this hardcoded reference glyph".
"Generating a font from the image and replacing the original image data with that" is a very good description of what's going on here.
[1] Or numbers, or symbols like parentheses. The basic concept is letters.
Can the author actually legally make this guarantee?
Suppose this were to go to court. If there are multiple interpretations for a phrase, and one interpretation is not realizable (due to almost tautological reasons), then the courts are very unlikely to use that definition. Instead, they will likely say there were implicit qualifiers like "to within the limits of what it allowed by the law" and "unless believably threatened with the loss of life, limb, home, or similar serious physical threat" or "following information security principles appropriate for the expected threat model of an civil/economics topic".
But usually there's a step after it called residual coding, where you subtract the predicted image from the original and send the difference to make up for errors. Just leaving that out is, um, interesting.
Obviously because it would make the file bigger if that was included...
The concept behind JBIG2 is good - small variations and random pixels are likely to be scanner noise/dust, so suppressing them can reduce filesize significantly. The problem here is that some JBIG2 implementations can be too lossy, and throw away the few pixels that could make all the difference between e.g. a 6 or an 8.
> Both operation modes have the basics in common: Images are cut into small segments, which are grouped by similarity. For every group only a representative segment is is saved that gets reused instead of other group members, which may cause character substitution. Different to PM&S, SPM corrects such errors by additionally saving difference images containing the differences of the reused symbols in comparison to the original image. At Xerox, by error the PM&S mode seems to have been used not only in the “normal” compression mode but also in the “higher” and “high” modes.
I don't know… This whole thing reminds me of "The King's Breakfast" by A. Milne. Is it so hard for scanner to make exact copy of a piece of paper? It doesn't sound like a rocket science, really. I don't want it to try and compress anything, to leave watermarks or whatever. If I feel I need a better compression — I'll use a separate tool (which Xerox can provide if it wants so), I don't want my sanner/photocopier even to try modifying image somehow without my specific request to do so.
Maybe I don't understand real technological reasons of doing so… but then I really don't. I cannot think of any possible reason of such step being necessary. Except if it doesn't have enough RAM to store image as is, but that really sounds unlikely.
http://everist.org/NobLog/20131122_an_actual_knob.htm#jbig2
(JBIG2 discussion near bottom, rest of page is about electronics.)