implementation of streaming yaml parsing#106
Merged
jsvd merged 9 commits intologstash-plugins:mainfrom Aug 4, 2025
Merged
Conversation
jsvd
commented
Aug 4, 2025
andsel
requested changes
Aug 4, 2025
Contributor
andsel
left a comment
There was a problem hiding this comment.
Left a couple of suggestions, then I'll LGTM
Co-authored-by: Andrea Selva <selva.andre@gmail.com>
Co-authored-by: Andrea Selva <selva.andre@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Instead of running Psych::Parse which creates the entire Psych-based Tree before emitting the ruby hash dictionary, the plugin should be capable of gradually constructing the final hash dictionary as the parser identifies YAML elements (streaming parsing).
This changeset adds an opt-in streaming parsing for YAML files based on snakeyaml-engine.
The difference in loading a 26MB YAML with 50.000 entries of objects is significant in terms of memory pressure.
With the current psych parser in non streaming mode, memory required to load the YAML + generate the final dictionary is about 1GB:
With this PR using a streaming snakeyaml-engine parser is about 330MB:
It also reduces loading time from 6 seconds to 2 on my laptop.
fixes #107