Skip to content

Add contains_any and contains_all checksFeat/contains any all checks#2431

Open
cosmicdk wants to merge 6 commits intoGiskard-AI:mainfrom
cosmicdk:feat/contains-any-all-checks
Open

Add contains_any and contains_all checksFeat/contains any all checks#2431
cosmicdk wants to merge 6 commits intoGiskard-AI:mainfrom
cosmicdk:feat/contains-any-all-checks

Conversation

@cosmicdk
Copy link
Copy Markdown

Summary

Fixes #2361.

This PR adds two built-in text matching checks:

  • ContainsAny registered as contains_any
  • ContainsAll registered as contains_all

They allow users to validate whether an output contains at least one or all values from a provided list of strings.

Changes

  • Reused existing text extraction and normalization behavior
  • Added case sensitivity support
  • Exported the new checks from giskard.checks.builtin and giskard.checks
  • Added tests for any/all matching, case sensitivity, JSONPath extraction, Unicode normalization, empty lists, and missing text key handling

Validation

  • Added unit tests in tests/builtin/test_string_matching.py
  • Maintainer CI / local pytest run
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two new text matching checks, ContainsAny and ContainsAll, along with a shared base class ListStringMatching to support validation against lists of strings with Unicode normalization and case sensitivity. Comprehensive unit tests have been added to verify the new functionality. Feedback suggests aligning the default case_sensitive parameter with existing checks for consistency and optimizing the ContainsAll logic to identify matched and missing values in a single pass.

description="Unicode normalization form to apply (NFC, NFD, NFKC, NFKD). Defaults to NFKC.",
)
case_sensitive: bool = Field(
default=False,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The default value for case_sensitive is set to False, which is inconsistent with the existing StringMatching check in the same module (which defaults to True at line 222). To ensure a consistent user experience across similar text matching checks, it is recommended to align the default values unless there is a specific reason for this difference.

    case_sensitive: bool = Field(
        default=True,
Comment on lines +433 to +441
missing_values = [
value
for value, formatted_value in zip(self.values, formatted_values, strict=True)
if formatted_value not in formatted_text
]
matched_values = [value for value in self.values if value not in missing_values]

details["matched_values"] = matched_values
details["missing_values"] = missing_values
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of ContainsAll.run performs two separate passes over the values and includes an O(N*M) check when calculating matched_values (where N is the number of values and M is the number of missing values). This can be optimized into a single pass that populates both lists simultaneously, improving performance and readability.

Suggested change
missing_values = [
value
for value, formatted_value in zip(self.values, formatted_values, strict=True)
if formatted_value not in formatted_text
]
matched_values = [value for value in self.values if value not in missing_values]
details["matched_values"] = matched_values
details["missing_values"] = missing_values
matched_values = []
missing_values = []
for value, formatted_value in zip(self.values, formatted_values, strict=True):
if formatted_value in formatted_text:
matched_values.append(value)
else:
missing_values.append(value)
details["matched_values"] = matched_values
details["missing_values"] = missing_values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

1 participant