[Observability:Streams] Fix too_small zod error for ai pipeline suggestions that have empty string grok patterns#251113
Conversation
|
Pinging @elastic/obs-onboarding-team (Team:obs-onboarding) |
|
@couvq Sorry for the late reply - I tested this and it seems to work fine in the UI. However, the evals return a zero score:
I took a look at the traces and it can't figure out to create a pipeline that's actually parsing something, so it just gives up. Which might be OK for the data at hand, but then it's not a good eval, since the expected thing happens, so the score shouldn't be 0. Actually for this data this is probably the better behavior than trying to invent a meaningless pipeline that breaks more than it actually does (a good test we are currently missing). I'd say we should change the eval to expect no pipeline (0 processing steps) in this case. Wdyt @LucaWintergerst ? If you take a look at the sample data, what outcome would you expect from the LLM? |
|
I agree, not getting a result here is the better outcome we'd want to test for. If it does suggest one, that would indicate that it's very very eager to do things even if it has very little good reasons to actually try processing things |
4cd4c7b to
ea59472
Compare
@flash1293 I've added a commit to change the eval to expect no pipeline ea59472 |
|
@couvq it still returns a super low score, I think we need a bit of a deeper change here. Check how to run the evals locally to iterate: |
@flash1293 Are we targeting to get pretty close to a 1.0 score ideally? which model are you using? |
This reverts commit ea59472.
|
@couvq If we think the behavior is correct, then a good score would make sense, right? 1 is perfect. I'm using 4.5 sonnet |
|
@flash1293 I made some changes to the LLM prompt to add instructions on when not to create a pipeline to handle this case. I also updated the eval logic to provide a perfect score when no pipeline was generated and that was the expected result, as well as providing a 0 score if no pipeline was generated but one was expected. Now the LLM is properly generating an empty pipeline with a perfect score for the new eval. How do you feel about this approach? |
|
@couvq Thanks a lot for this! Could you run the whole pipeline suggestion eval suite and paste the result here? Soon this should work automatically, but it doesn't yet. |
@flash1293 Looks like my changes broke 4 of the preexisting tests as it is now a bit to eager to return an empty pipeline, I'll tweak the LLM prompt again to fix those. |
|
@flash1293 fixed and now all the evals run properly |
💚 Build Succeeded
Metrics [docs]
History
|
@flash1293 The intention there is to explicitly discourage commiting an empty pipeline when a parsing processor is provided. I added it as the 4 failing tests expected processors but the LLM was commiting an empty pipeline. |
|
@flash1293 thanks for the thorough review! |
|
Starting backport for target branches: 9.3 |
💔 All backports failed
Manual backportTo create the backport manually run: Questions ?Please refer to the Backport tool documentation |
…stions that have empty string grok patterns (elastic#251113) Closes elastic/observability-error-backlog#407 Closes elastic/observability-error-backlog#452 ## Description The suggestions pipeline was generating grok patterns that had empty string patterns, leading to a `too_small` error when generating a pipeline suggestion. This PR filters out any patterns that have empty string inputs, which resolved the error we have been seeing. ## Before https://github.com/user-attachments/assets/c8cdb277-d0f0-4272-b94d-0aa244c841a9 ## After https://github.com/user-attachments/assets/8864ad1a-51c9-4b6a-b11c-e3d48668a5ad
…stions that have empty string grok patterns (elastic#251113) Closes elastic/observability-error-backlog#407 Closes elastic/observability-error-backlog#452 ## Description The suggestions pipeline was generating grok patterns that had empty string patterns, leading to a `too_small` error when generating a pipeline suggestion. This PR filters out any patterns that have empty string inputs, which resolved the error we have been seeing. ## Before https://github.com/user-attachments/assets/c8cdb277-d0f0-4272-b94d-0aa244c841a9 ## After https://github.com/user-attachments/assets/8864ad1a-51c9-4b6a-b11c-e3d48668a5ad
|
Friendly reminder: Looks like this PR hasn’t been backported yet. |




Closes https://github.com/elastic/observability-error-backlog/issues/407
Closes https://github.com/elastic/observability-error-backlog/issues/452
Description
The suggestions pipeline was generating grok patterns that had empty string patterns, leading to a
too_smallerror when generating a pipeline suggestion. This PR filters out any patterns that have empty string inputs, which resolved the error we have been seeing.Before
Screen.Recording.2026-01-30.at.9.22.55.AM.mov
After
Screen.Recording.2026-01-30.at.12.28.12.PM.mov