Skip to content

Commit 9288f95

Browse files
committed
Another round on SF patch 618135: gzip.py and files > 2G
The last round boosted "the limit" from 2GB to 4GB. This round gets rid of the 4GB limit. For files > 4GB, gzip stores just the last 32 bits of the file size, and now we play along with that too. Tested by hand (on a 6+GB file) on Win2K. Boosting from 2GB to 4GB was arguably enough "a bugfix". Going beyond that smells more like "new feature" to me.
1 parent cd8fdbb commit 9288f95

2 files changed

Lines changed: 15 additions & 7 deletions

File tree

‎Lib/gzip.py‎

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,10 @@ def U32(i):
2424
i += 1L << 32
2525
return i
2626

27+
def LOWU32(i):
28+
"""Return the low-order 32 bits of an int, as a non-negative int."""
29+
return i & 0xFFFFFFFFL
30+
2731
def write32(output, value):
2832
output.write(struct.pack("<l", value))
2933

@@ -295,21 +299,22 @@ def _read_eof(self):
295299
# We've read to the end of the file, so we have to rewind in order
296300
# to reread the 8 bytes containing the CRC and the file size.
297301
# We check the that the computed CRC and size of the
298-
# uncompressed data matches the stored values.
302+
# uncompressed data matches the stored values. Note that the size
303+
# stored is the true file size mod 2**32.
299304
self.fileobj.seek(-8, 1)
300305
crc32 = read32(self.fileobj)
301306
isize = U32(read32(self.fileobj)) # may exceed 2GB
302307
if U32(crc32) != U32(self.crc):
303308
raise ValueError, "CRC check failed"
304-
elif isize != self.size:
309+
elif isize != LOWU32(self.size):
305310
raise ValueError, "Incorrect length of data produced"
306311

307312
def close(self):
308313
if self.mode == WRITE:
309314
self.fileobj.write(self.compress.flush())
310315
write32(self.fileobj, self.crc)
311-
# self.size may exceed 2GB
312-
write32u(self.fileobj, self.size)
316+
# self.size may exceed 2GB, or even 4GB
317+
write32u(self.fileobj, LOWU32(self.size))
313318
self.fileobj = None
314319
elif self.mode == READ:
315320
self.fileobj = None

‎Misc/NEWS‎

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -355,9 +355,12 @@ Extension modules
355355
Library
356356
-------
357357

358-
- gzip.py now handles files exceeding 2GB. Note that 4GB is still a
359-
fundamental limitation of the underlying gzip file format (it only
360-
has 32 bits to record the file size).
358+
- gzip.py now handles files exceeding 2GB. Files over 4GB also work
359+
now (provided the OS supports it, and Python is configured with large
360+
file support), but in that case the underlying gzip file format can
361+
record only the least-significant 32 bits of the file size, so that
362+
some tools working with gzipped files may report an incorrect file
363+
size.
361364

362365
- xml.sax.saxutils.unescape has been added, to replace entity references
363366
with their entity value.

0 commit comments

Comments
 (0)