Skip to content

Tag Type of UserComment #702

@apologize441

Description

@apologize441

metadata-extractor2.19.0, Java 8

When the tag type is set to ASCII, metadata-extractor fail to parse the UserComment value correctly. UserComment tag is defined with an 8-byte character code prefix (e.g., "ASCII\0\0\0", "Unicode\0", etc.), followed by the actual comment data. And when the tag type is set to ASCII, the value should be null-terminated (\0). It will cause truncation during parsing. See

public String getNullTerminatedString(int index, int maxLengthBytes, @NotNull Charset charset) throws IOException

The current implementation is strictly compliant with the EXIF 3.0 specification, except that it does not validate the tag type defined in the standard. However, ExifTool is able to correctly recognize the UserComment tag in the image. This makes me wonder whether adding some special handling would be beneficial. After all, most users of this library would probably expect to extract as much information as possible.

Recently, China has issued a regulation requiring all companies to embed AI identifiers into the UserComment tag of uploaded files. It is foreseeable that differences in tool implementations will lead to various metadata formats. For example, piexifjs we use on the frontend enforces UserComment to be of type ASCII. 😡

A simple potential improvement could be to add special handling for UserComment and similar tags—for instance, if parsing 8-byte prefix succeeds, treat the tag type as UNDEFINED. Alternatively, in a more general approach, the library could preserve the raw data of each tag value and provide users with an interface to access it. The commons-imaging library has a similar implementation in TiffField getByteArrayValue to get raw data.

Here is the image.
Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions