implementation of streaming yaml parsing by jsvd · Pull Request #106 · logstash-plugins/logstash-filter-translate

jsvd · 2025-07-29T14:06:30Z

Instead of running Psych::Parse which creates the entire Psych-based Tree before emitting the ruby hash dictionary, the plugin should be capable of gradually constructing the final hash dictionary as the parser identifies YAML elements (streaming parsing).

This changeset adds an opt-in streaming parsing for YAML files based on snakeyaml-engine.

The difference in loading a 26MB YAML with 50.000 entries of objects is significant in terms of memory pressure.

With the current psych parser in non streaming mode, memory required to load the YAML + generate the final dictionary is about 1GB:

With this PR using a streaming snakeyaml-engine parser is about 330MB:

It also reduces loading time from 6 seconds to 2 on my laptop.

fixes #107

spec/fixtures/dict.yml

andsel

Left a couple of suggestions, then I'll LGTM

lib/logstash/filters/dictionary/streaming_yaml_parser.rb

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

andsel

LGTM

jsvd added 4 commits July 29, 2025 15:06

draft implementation of streaming yaml parsing

bcf4bb2

remove instance variable and add yaml_load_strategy

d576b09

fix tests and fix non quoted events

e16be4d

add docs for yaml_load_strategy

bbb8547

jsvd marked this pull request as ready for review July 30, 2025 18:48

jsvd changed the title ~~draft implementation of streaming yaml parsing~~ Jul 30, 2025

jsvd added 2 commits July 31, 2025 10:23

[skip ci] bump to 3.5.0

ace3d23

remove global java_imports and remove debugging puts

c2c4586

jsvd closed this Aug 4, 2025

jsvd reopened this Aug 4, 2025

andsel self-requested a review August 4, 2025 10:44

jsvd commented Aug 4, 2025

View reviewed changes

spec/fixtures/dict.yml Outdated Show resolved Hide resolved

Update spec/fixtures/dict.yml

3e342fd

andsel requested changes Aug 4, 2025

View reviewed changes

lib/logstash/filters/dictionary/streaming_yaml_parser.rb Outdated Show resolved Hide resolved

lib/logstash/filters/dictionary/streaming_yaml_parser.rb Show resolved Hide resolved

jsvd and others added 2 commits August 4, 2025 14:19

Update lib/logstash/filters/dictionary/streaming_yaml_parser.rb

d0c92ba

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

Update lib/logstash/filters/dictionary/streaming_yaml_parser.rb

226ac12

Co-authored-by: Andrea Selva <selva.andre@gmail.com>

andsel approved these changes Aug 4, 2025

View reviewed changes

jsvd merged commit 38e4424 into logstash-plugins:main Aug 4, 2025
2 of 3 checks passed

jsvd deleted the streaming_yaml_parsing branch August 4, 2025 14:04

jsvd restored the streaming_yaml_parsing branch August 4, 2025 14:04

jsvd deleted the streaming_yaml_parsing branch August 4, 2025 14:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implementation of streaming yaml parsing#106

implementation of streaming yaml parsing#106
jsvd merged 9 commits intologstash-plugins:mainfrom
jsvd:streaming_yaml_parsing

jsvd commented Jul 29, 2025 •

edited

Loading

Uh oh!

andsel left a comment

Uh oh!

Uh oh!

andsel left a comment

Uh oh!

Labels

2 participants

Conversation

jsvd commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andsel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andsel left a comment

Choose a reason for hiding this comment

Uh oh!

Labels

2 participants

jsvd commented Jul 29, 2025 •

edited

Loading