refactor(merge,split,json): adopt streaming approach and standardize types, address gradle warnings#5803
refactor(merge,split,json): adopt streaming approach and standardize types, address gradle warnings#5803balazs-szucs wants to merge 14 commits intoStirling-Tools:mainfrom
Conversation
…types, address gradle warnings
🚀 PR Test DeploymentYour PR has been deployed for testing! 🔗 Test URL: http://23.22.230.180:5803 This deployment will be automatically cleaned up when the PR is closed. |
There was a problem hiding this comment.
Pull request overview
This pull request implements significant architectural improvements focused on memory efficiency and code modernization for handling large PDF files and archives. The changes include adopting streaming I/O patterns, standardizing data types from boxed collections to primitive arrays, and updating to Jackson 3 with Spring Boot 3 compatibility.
Changes:
- Introduced streaming file I/O via
FileStorage.retrieveInputStream()andstoreInputStream()methods, replacing byte-array-based operations for large files (ZIP, CBZ, PDF conversions) - Standardized PDF JSON model types from
List<Float>andList<Integer>tofloat[]andint[]for better memory efficiency and reduced object overhead - Migrated to Jackson 3 (
tools.jackson.databind.ObjectMapper) with backwards-compatible deserialization settings for primitives
Reviewed changes
Copilot reviewed 55 out of 57 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/src/core/hooks/tools/split/useSplitOperation.ts | Added empty string case handling in split endpoint selection |
| frontend/package-lock.json | Updated npm dependencies (iconify, posthog, svelte, autoprefixer, etc.) |
| build.gradle | Refactored syncAppVersion task with inline code; reduced LINE coverage threshold from 16% to 13% |
| app/core/src/main/resources/application.properties | Updated Spring Boot 3 error handling properties; disabled built-in ProblemDetails to use custom GlobalExceptionHandler |
| app/core/src/main/java/stirling/software/SPDF/model/json/*.java | Changed List to float[] and List to int[] across PDF JSON models |
| app/core/src/main/java/stirling/software/SPDF/service/PdfJsonConversionService.java | Updated to use primitive arrays; renamed isRealJobId to useLazyImages for clarity |
| app/core/src/main/java/stirling/software/SPDF/controller/api/converters/*.java | Updated return types from byte[] to StreamingResponseBody/TempFile for large file operations |
| app/core/src/main/java/stirling/software/SPDF/controller/api/SplitPdf*.java | Refactored split operations to use streaming ZIP output and TempFile instead of ByteArrayOutputStream |
| app/core/src/main/java/stirling/software/SPDF/controller/api/misc/ExtractImagesController.java | Changed to streaming ZIP creation, removed intermediate ByteArrayOutputStream |
| app/core/src/main/java/stirling/software/SPDF/controller/api/pipeline/PipelineController.java | Implemented streaming responses for single and multiple file outputs |
| app/core/src/main/java/stirling/software/SPDF/controller/api/pipeline/PipelineProcessor.java | Added MAX_UNZIP_DEPTH=10 limit to prevent zip bomb attacks; changed character literals |
| app/common/src/main/java/stirling/software/common/util/CbzUtils.java | Refactored to return TempFile and process images one at a time, reducing peak memory usage |
| app/common/src/main/java/stirling/software/common/util/PdfToCbzUtils.java | Changed to return TempFile with streaming ZIP creation |
| app/common/src/main/java/stirling/software/common/util/PdfUtils.java | Optimized single-image rendering by calculating dimensions mathematically instead of rendering |
| app/common/src/main/java/stirling/software/common/service/FileStorage.java | Added retrieveInputStream() and storeInputStream() methods for streaming file operations |
| app/common/src/main/java/stirling/software/common/service/TaskManager.java | Updated ZIP extraction to use streaming instead of buffering entire files in memory |
| app/common/src/main/java/stirling/software/common/model/ApplicationProperties.java | Changed getter calls to direct field access for OAuth and analytics settings |
| Multiple signature/form/provider files | Replaced String literals with character literals for single-character operations (e.g., '.' instead of ".") |
| Multiple controller/util files | Pre-compiled regex patterns as static constants for reuse |
Files not reviewed (1)
- frontend/package-lock.json: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
🚀 PR Test DeploymentYour PR has been deployed for testing! 🔗 Test URL: http://23.22.230.180:5803 This deployment will be automatically cleaned up when the PR is closed. |
app/core/src/main/java/stirling/software/SPDF/controller/api/pipeline/PipelineController.java
Show resolved
Hide resolved
app/common/src/main/java/stirling/software/common/service/FileStorage.java
Show resolved
Hide resolved
app/common/src/main/java/stirling/software/common/service/TaskManager.java
Outdated
Show resolved
Hide resolved
app/core/src/main/java/stirling/software/SPDF/controller/api/misc/ExtractImagesController.java
Outdated
Show resolved
Hide resolved
app/core/src/main/java/stirling/software/SPDF/service/pdfjson/PdfJsonImageService.java
Outdated
Show resolved
Hide resolved
|
/deploypr |
app/common/src/main/java/stirling/software/common/service/FileStorage.java
Show resolved
Hide resolved
🚀 PR Test DeploymentYour PR has been deployed for testing! 🔗 Test URL: http://23.22.230.180:5803 This deployment will be automatically cleaned up when the PR is closed. |
Description of Changes
This pull request introduces several improvements and refactorings focused on memory efficiency, stream-based file handling, and code simplification, especially for handling large files such as ZIP archives and CBZ-to-PDF conversion. The most significant changes include switching from byte-array-based file processing to streaming via
InputStream, optimizing CBZ-to-PDF conversion to reduce memory usage, and generalizing field access patterns for better code clarity.File streaming and memory efficiency:
retrieveInputStreamandstoreInputStreammethods toFileStorage, enabling streaming file reads and writes instead of buffering entire files in memory. This is crucial for handling large files efficiently.TaskManager.extractZipToIndividualFilesto use streaming extraction from ZIP files, storing each extracted file directly viastoreInputStream, and determining file size after storage. This eliminates the need to buffer entire files in memory.CBZ-to-PDF conversion improvements:
CbzUtils.convertCbzToPdfto return aTempFileinstead of a byte array, process images one at a time (reducing peak memory usage), and write the output PDF directly to disk. Also, Ghostscript optimization now writes to a new temp file if requested.General code simplification and consistency:
Checklist
General
Documentation
Translations (if applicable)
scripts/counter_translation.pyUI Changes (if applicable)
Testing (if applicable)