swscale/aarch64: fix uyvy/yuyv to yuv420p/yuv422p on odd width
interleaved_yuv_to_planar, shared by uyvytoyuv422, uyvytoyuv420,
yuyvtoyuv422 and yuyvtoyuv420, only handled even widths. The packed
UYVY/YUYV macroblocks are pixel pairs and the trailing half macroblock of
an odd width was mishandled:
- the slow path (width <= 31) decrements its pixel counter by two from an
odd value, so it never reaches zero and the loop runs far past the line,
overwriting the destination (observed as a crash in checkasm);
- the fast path (width >= 32) shifts the tail pointers back by width-32 and
reprocesses an overlapping, misaligned tuple, producing wrong samples and
dropping the last chroma column.
Process only whole pixel pairs and emit the trailing odd column from a
per-line epilogue that matches the C reference: for yuv422 one Y, U and V
sample; for yuv420 the Y of both lines of the pair with the chroma averaged
across them, and luma only for the final line when the height is odd. The
empty even part (width 0 or 1) is guarded so the slow path no longer enters
its run-past loop.
All four variants are now bit-exact with the C reference for even and odd
widths. Verified with checkasm under qemu-aarch64.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
(cherry picked from commit
a554d0aa8a64673364baf04affb62ce0df219db7)
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>