[Observability:Streams] Fix too_small zod error for ai pipeline suggestions that have empty string grok patterns by couvq · Pull Request #251113 · elastic/kibana

couvq · 2026-01-30T16:55:33Z

Closes https://github.com/elastic/observability-error-backlog/issues/407
Closes https://github.com/elastic/observability-error-backlog/issues/452

Description

The suggestions pipeline was generating grok patterns that had empty string patterns, leading to a too_small error when generating a pipeline suggestion. This PR filters out any patterns that have empty string inputs, which resolved the error we have been seeing.

Before

Screen.Recording.2026-01-30.at.9.22.55.AM.mov

After

Screen.Recording.2026-01-30.at.12.28.12.PM.mov

elasticmachine · 2026-01-30T17:33:58Z

Pinging @elastic/obs-onboarding-team (Team:obs-onboarding)

flash1293 · 2026-02-06T11:13:56Z

@couvq Sorry for the late reply - I tested this and it seems to work fine in the UI.

However, the evals return a zero score:

I took a look at the traces and it can't figure out to create a pipeline that's actually parsing something, so it just gives up. Which might be OK for the data at hand, but then it's not a good eval, since the expected thing happens, so the score shouldn't be 0.

Actually for this data this is probably the better behavior than trying to invent a meaningless pipeline that breaks more than it actually does (a good test we are currently missing). I'd say we should change the eval to expect no pipeline (0 processing steps) in this case.

Wdyt @LucaWintergerst ? If you take a look at the sample data, what outcome would you expect from the LLM?

LucaWintergerst · 2026-02-07T07:06:44Z

I agree, not getting a result here is the better outcome we'd want to test for. If it does suggest one, that would indicate that it's very very eager to do things even if it has very little good reasons to actually try processing things

couvq · 2026-02-15T21:26:54Z

@couvq Sorry for the late reply - I tested this and it seems to work fine in the UI.

However, the evals return a zero score:
I took a look at the traces and it can't figure out to create a pipeline that's actually parsing something, so it just gives up. Which might be OK for the data at hand, but then it's not a good eval, since the expected thing happens, so the score shouldn't be 0.
Actually for this data this is probably the better behavior than trying to invent a meaningless pipeline that breaks more than it actually does (a good test we are currently missing). I'd say we should change the eval to expect no pipeline (0 processing steps) in this case.

@flash1293 I've added a commit to change the eval to expect no pipeline ea59472

flash1293 · 2026-02-16T09:02:29Z

@couvq it still returns a super low score, I think we need a bit of a deeper change here.

Check how to run the evals locally to iterate: x-pack/platform/packages/shared/kbn-evals-suite-streams/README.md

couvq · 2026-02-16T13:00:40Z

@couvq it still returns a super low score, I think we need a bit of a deeper change here.

Check how to run the evals locally to iterate: x-pack/platform/packages/shared/kbn-evals-suite-streams/README.md

@flash1293 Are we targeting to get pretty close to a 1.0 score ideally? which model are you using?

This reverts commit ea59472.

flash1293 · 2026-02-16T15:01:58Z

@couvq If we think the behavior is correct, then a good score would make sense, right? 1 is perfect. I'm using 4.5 sonnet

couvq · 2026-02-17T13:25:06Z

@flash1293 I made some changes to the LLM prompt to add instructions on when not to create a pipeline to handle this case. I also updated the eval logic to provide a perfect score when no pipeline was generated and that was the expected result, as well as providing a 0 score if no pipeline was generated but one was expected. Now the LLM is properly generating an empty pipeline with a perfect score for the new eval. How do you feel about this approach?

flash1293 · 2026-02-17T13:39:49Z

@couvq Thanks a lot for this! Could you run the whole pipeline suggestion eval suite and paste the result here? Soon this should work automatically, but it doesn't yet.

couvq · 2026-02-17T13:56:22Z

@couvq Thanks a lot for this! Could you run the whole pipeline suggestion eval suite and paste the result here? Soon this should work automatically, but it doesn't yet.

 Pipeline Suggestion - structured │ 1 │              mean: 1 │                mean: 1 ║
      ║                                  │   │            median: 1 │              median: 1 ║
      ║                                  │   │               std: 0 │                 std: 0 ║
      ║                                  │   │               min: 1 │                 min: 1 ║
      ║                                  │   │               max: 1 │                 max: 1 ║
      ╟──────────────────────────────────┼───┼──────────────────────┼────────────────────────╢
      ║ Pipeline Suggestion - HDFS       │ 1 │              mean: 1 │             mean: 0.95 ║
      ║                                  │   │            median: 1 │           median: 0.95 ║
      ║                                  │   │               std: 0 │                 std: 0 ║
      ║                                  │   │               min: 1 │              min: 0.95 ║
      ║                                  │   │               max: 1 │              max: 0.95 ║
      ╟──────────────────────────────────┼───┼──────────────────────┼────────────────────────╢
      ║ Overall                          │ 2 │              mean: 1 │             mean: 0.97 ║
      ║                                  │   │            median: 1 │           median: 0.97 ║
      ║                                  │   │               std: 0 │              std: 0.04 ║
      ║                                  │   │               min: 1 │              min: 0.95 ║
      ║                                  │   │               max: 1 │                 max: 1

@flash1293 Looks like my changes broke 4 of the preexisting tests as it is now a bit to eager to return an empty pipeline, I'll tweak the LLM prompt again to fix those.

couvq · 2026-02-17T14:08:11Z

@flash1293 fixed and now all the evals run properly

 ═══ EVALUATION RESULTS ═══
      ╔══════════════════════════════════╤═══╤══════════════════════╤════════════════════════╗
      ║ Dataset                          │ # │ llm_pipeline_quality │ pipeline_quality_score ║
      ╟──────────────────────────────────┼───┼──────────────────────┼────────────────────────╢
      ║ Pipeline Suggestion - Apache     │ 1 │              mean: 1 │             mean: 0.98 ║
      ║                                  │   │            median: 1 │           median: 0.98 ║
      ║                                  │   │               std: 0 │                 std: 0 ║
      ║                                  │   │               min: 1 │              min: 0.98 ║
      ║                                  │   │               max: 1 │              max: 0.98 ║
      ╟──────────────────────────────────┼───┼──────────────────────┼────────────────────────╢
      ║ Pipeline Suggestion - OpenSSH    │ 1 │              mean: 1 │             mean: 0.88 ║
      ║                                  │   │            median: 1 │           median: 0.88 ║
      ║                                  │   │               std: 0 │                 std: 0 ║
      ║                                  │   │               min: 1 │              min: 0.88 ║
      ║                                  │   │               max: 1 │              max: 0.88 ║
      ╟──────────────────────────────────┼───┼──────────────────────┼────────────────────────╢
      ║ Pipeline Suggestion - structured │ 1 │              mean: 1 │                mean: 1 ║
      ║                                  │   │            median: 1 │              median: 1 ║
      ║                                  │   │               std: 0 │                 std: 0 ║
      ║                                  │   │               min: 1 │                 min: 1 ║
      ║                                  │   │               max: 1 │                 max: 1 ║
      ╟──────────────────────────────────┼───┼──────────────────────┼────────────────────────╢
      ║ Pipeline Suggestion - Spark      │ 1 │              mean: 1 │             mean: 0.97 ║
      ║                                  │   │            median: 1 │           median: 0.97 ║
      ║                                  │   │               std: 0 │                 std: 0 ║
      ║                                  │   │               min: 1 │              min: 0.97 ║
      ║                                  │   │               max: 1 │              max: 0.97 ║
      ╟──────────────────────────────────┼───┼──────────────────────┼────────────────────────╢
      ║ Pipeline Suggestion - HDFS       │ 1 │              mean: 1 │             mean: 0.95 ║
      ║                                  │   │            median: 1 │           median: 0.95 ║
      ║                                  │   │               std: 0 │                 std: 0 ║
      ║                                  │   │               min: 1 │              min: 0.95 ║
      ║                                  │   │               max: 1 │              max: 0.95 ║
      ╟──────────────────────────────────┼───┼──────────────────────┼────────────────────────╢
      ║ Pipeline Suggestion - Zookeeper  │ 1 │              mean: 1 │             mean: 0.95 ║
      ║                                  │   │            median: 1 │           median: 0.95 ║
      ║                                  │   │               std: 0 │                 std: 0 ║
      ║                                  │   │               min: 1 │              min: 0.95 ║
      ║                                  │   │               max: 1 │              max: 0.95 ║
      ╟──────────────────────────────────┼───┼──────────────────────┼────────────────────────╢
      ║ Overall                          │ 6 │              mean: 1 │             mean: 0.95 ║
      ║                                  │   │            median: 1 │           median: 0.95 ║
      ║                                  │   │               std: 0 │              std: 0.04 ║
      ║                                  │   │               min: 1 │              min: 0.88 ║
      ║                                  │   │               max: 1 │                 max: 1 ║
      ╚══════════════════════════════════╧═══╧══════════════════════╧════════════════════════╝

flash1293 · 2026-02-17T15:05:56Z

@couvq I'm not sure how 9c7d4c9 discourages the LLM to commit an empty pipeline, can you explain?

elasticmachine · 2026-02-17T15:39:30Z

💚 Build Succeeded

Buildkite Build
Commit: 9c7d4c9

Metrics [docs]

✅ unchanged

History

💛 Build #395474 was flaky faef05c
💛 Build #390905 was flaky 4cd4c7b
💛 Build #389933 was flaky 143843c
💔 Build #389826 failed d598c76
💔 Build #388861 failed d483097
💔 Build #388846 failed e131e56

couvq · 2026-02-17T17:12:16Z

@couvq I'm not sure how 9c7d4c9 discourages the LLM to commit an empty pipeline, can you explain?

@flash1293 The intention there is to explicitly discourage commiting an empty pipeline when a parsing processor is provided. I added it as the 4 failing tests expected processors but the LLM was commiting an empty pipeline.

flash1293

ooh, got it, LGTM

couvq · 2026-02-17T17:56:07Z

@flash1293 thanks for the thorough review!

kibanamachine · 2026-02-17T17:56:17Z

Starting backport for target branches: 9.3

https://github.com/elastic/kibana/actions/runs/22109625238

kibanamachine · 2026-02-17T18:04:11Z

💔 All backports failed

Status	Branch	Result
❌	9.3	Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 251113

Questions ?

Please refer to the Backport tool documentation

…stions that have empty string grok patterns (elastic#251113) Closes elastic/observability-error-backlog#407 Closes elastic/observability-error-backlog#452 ## Description The suggestions pipeline was generating grok patterns that had empty string patterns, leading to a `too_small` error when generating a pipeline suggestion. This PR filters out any patterns that have empty string inputs, which resolved the error we have been seeing. ## Before https://github.com/user-attachments/assets/c8cdb277-d0f0-4272-b94d-0aa244c841a9 ## After https://github.com/user-attachments/assets/8864ad1a-51c9-4b6a-b11c-e3d48668a5ad

kibanamachine · 2026-02-19T18:49:05Z

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 251113 locally
cc: @couvq

couvq changed the title ~~fix too_small error for ai pipeline suggestions~~ Jan 30, 2026

couvq added backport:version Backport to applied version labels release_note:fix Team:obs-onboarding Observability Onboarding Team Feature:Streams This is the label for the Streams Project v9.4.0 v9.3.1 labels Jan 30, 2026

couvq marked this pull request as ready for review January 30, 2026 17:33

couvq requested review from a team as code owners January 30, 2026 17:33

couvq added 2 commits February 15, 2026 15:36

fix too_small error for ai pipeline suggestions

155e9ca

update eval to expect no pipeline suggestions

ea59472

couvq force-pushed the fix_ai_suggestion_too_small_error branch from 4cd4c7b to ea59472 Compare February 15, 2026 21:21

Revert "update eval to expect no pipeline suggestions"

ef66c93

This reverts commit ea59472.

allow empty llm pipeline suggestions and corresponding eval case

faef05c

fix false empty pipeline ai suggestion

9c7d4c9

flash1293 approved these changes Feb 17, 2026

View reviewed changes

couvq merged commit 8405020 into elastic:main Feb 17, 2026
16 checks passed

kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Feb 19, 2026

couvq added backport:skip This PR does not require backporting and removed backport missing Added to PRs automatically when the are determined to be missing a backport. backport:version Backport to applied version labels labels Feb 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Observability:Streams] Fix too_small zod error for ai pipeline suggestions that have empty string grok patterns#251113

[Observability:Streams] Fix too_small zod error for ai pipeline suggestions that have empty string grok patterns#251113
couvq merged 5 commits intoelastic:mainfrom
couvq:fix_ai_suggestion_too_small_error

couvq commented Jan 30, 2026 •

edited by kibanamachine

Loading

elasticmachine commented Jan 30, 2026

flash1293 commented Feb 6, 2026

LucaWintergerst commented Feb 7, 2026

couvq commented Feb 15, 2026

flash1293 commented Feb 16, 2026 •

edited by couvq

Loading

couvq commented Feb 16, 2026

flash1293 commented Feb 16, 2026

couvq commented Feb 17, 2026

flash1293 commented Feb 17, 2026

couvq commented Feb 17, 2026 •

edited

Loading

couvq commented Feb 17, 2026

flash1293 commented Feb 17, 2026

elasticmachine commented Feb 17, 2026

couvq commented Feb 17, 2026 •

edited

Loading

flash1293 left a comment

Uh oh!

couvq commented Feb 17, 2026

kibanamachine commented Feb 17, 2026

kibanamachine commented Feb 17, 2026

kibanamachine commented Feb 19, 2026

Labels

5 participants

Conversation

couvq commented Jan 30, 2026 • edited by kibanamachine Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Before

After

elasticmachine commented Jan 30, 2026

flash1293 commented Feb 6, 2026

LucaWintergerst commented Feb 7, 2026

couvq commented Feb 15, 2026

flash1293 commented Feb 16, 2026 • edited by couvq Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

couvq commented Feb 16, 2026

flash1293 commented Feb 16, 2026

couvq commented Feb 17, 2026

flash1293 commented Feb 17, 2026

couvq commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

couvq commented Feb 17, 2026

flash1293 commented Feb 17, 2026

elasticmachine commented Feb 17, 2026

💚 Build Succeeded

Metrics [docs]

History

couvq commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

flash1293 left a comment

Choose a reason for hiding this comment

Uh oh!

couvq commented Feb 17, 2026

kibanamachine commented Feb 17, 2026

kibanamachine commented Feb 17, 2026

💔 All backports failed

Manual backport

Questions ?

kibanamachine commented Feb 19, 2026

Labels

5 participants

couvq commented Jan 30, 2026 •

edited by kibanamachine

Loading

flash1293 commented Feb 16, 2026 •

edited by couvq

Loading

couvq commented Feb 17, 2026 •

edited

Loading

couvq commented Feb 17, 2026 •

edited

Loading