feat(checks): add ContainsAny and ContainsAll builtin checks (#2361) by abhigyan631 · Pull Request #2380 · Giskard-AI/giskard-oss

abhigyan631 · 2026-04-10T09:11:03Z

Description

Summary

Added two new built-in string checks (ContainsAny and ContainsAll) to validate output text against a list of strings. This enables evaluating topic coverage and disclaimers natively without writing custom lambda functions.

Implementation Details

Handled via contains_matching.py following the standard StringMatching pattern.
Both checks fully support JSONPath extraction, Unicode normalization (NFKC default), and case-sensitivity settings.
Exposed smoothly at the top level so users can import directly: from giskard.checks import ContainsAny, ContainsAll

Testing

Added comprehensive unit tests in test_contains_matching.py suite.
Validated all permutations of case sensitivity, extraction failure, missing text, and vacuous truths against empty lists. All 10 added tests pass.

Related Issue

Resolves #2361

Type of Change

📚 Examples / docs / tutorials / dependencies update
🔧 Bug fix (non-breaking change which fixes an issue)
🥂 Improvement (non-breaking change which improves an existing feature)
🚀 New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to change)
🔐 Security fix

Checklist

I've read the CODE_OF_CONDUCT.md document.
I've read the CONTRIBUTING.md guide.
I've written tests for all new methods and classes that I created.
I've written the docstring in NumPy format for all the methods and classes that I created or modified.
I've updated the uv.lock running uv lock (only applicable when pyproject.toml has been
modified)

gemini-code-assist

Code Review

This pull request introduces two new check implementations, ContainsAny and ContainsAll, which verify if specific keywords exist within a text string. These checks support Unicode normalization and case-insensitive matching. Feedback suggests refactoring the classes to a common base to reduce code duplication, pre-formatting values for better performance, and handling potential serialization issues with the NoMatch object in the result details. Additionally, it was recommended to ensure that missing values in ContainsAll are reported uniquely for better clarity.

gemini-code-assist · 2026-04-10T09:12:54Z

+@Check.register("contains_any")
+class ContainsAny[InputType, OutputType, TraceType: Trace](  # pyright: ignore[reportMissingTypeArgument]
+    Check[InputType, OutputType, TraceType]
+):
+    """Check that passes if the text contains AT LEAST ONE value from the list.
+
+    Examples
+    --------
+    Direct text and values::
+
+        check = ContainsAny(
+            text="Machine learning is a subset of AI.",
+            values=["machine learning", "ML", "artificial intelligence"],
+            case_sensitive=False,
+        )
+
+    Extract text from trace::
+
+        check = ContainsAny(
+            text_key="trace.last.outputs.response",
+            values=["consult a doctor", "medical advice"],
+        )
+    """
+
+    text: str | None = Field(
+        default=None,
+        description="The text string to search within. If None, extracted from trace using text_key.",
+    )
+    text_key: str = Field(
+        default="trace.last.outputs",
+        description="JSONPath expression to extract the text from the trace.",
+    )
+    values: list[str] = Field(
+        description="List of strings to check against. Passes if at least one is found.",
+    )
+    normalization_form: NormalizationForm | None = Field(
+        default="NFKC",
+        description="Unicode normalization form to apply (NFC, NFD, NFKC, NFKD). Defaults to NFKC.",
+    )
+    case_sensitive: bool = Field(
+        default=False,
+        description="If True, matching is case-sensitive. Defaults to False.",
+    )
+
+    def _format_str(self, value: str) -> str:
+        """Format a string for matching by applying normalization and case handling."""
+        value = normalize_string(value, self.normalization_form)
+        if not self.case_sensitive:
+            value = value.lower()
+        return value
+
+    @override
+    async def run(self, trace: TraceType) -> CheckResult:
+        """Execute the contains-any check."""
+        text = provided_or_resolve(
+            trace, key=self.text_key, value=provide_not_none(self.text)
+        )
+
+        details = {"text": text, "values": self.values}
+
+        if isinstance(text, NoMatch):
+            return CheckResult.failure(
+                message=f"No value found for text key '{self.text_key}'.",
+                details=details,
+            )
+
+        if not isinstance(text, str):
+            return CheckResult.failure(
+                message=f"Value for text key '{self.text_key}' is not a string, got {type(text)}.",
+                details=details,
+            )
+
+        formatted_text = self._format_str(text)
+
+        for value in self.values:
+            if self._format_str(value) in formatted_text:
+                return CheckResult.success(
+                    message=f"The text contains '{value}'.",
+                    details={**details, "matched": value},
+                )
+
+        return CheckResult.failure(
+            message=f"The text does not contain any of: {self.values}.",
+            details=details,
+        )
+
+
+@Check.register("contains_all")
+class ContainsAll[InputType, OutputType, TraceType: Trace](  # pyright: ignore[reportMissingTypeArgument]
+    Check[InputType, OutputType, TraceType]
+):
+    """Check that passes if the text contains EVERY value from the list.
+
+    Examples
+    --------
+    Direct text and values::
+
+        check = ContainsAll(
+            text="The dosage is 200mg. Consult a doctor before use.",
+            values=["dosage", "mg", "consult"],
+            case_sensitive=False,
+        )
+
+    Extract text from trace::
+
+        check = ContainsAll(
+            text_key="trace.last.outputs.response",
+            values=["definition", "example"],
+        )
+    """
+
+    text: str | None = Field(
+        default=None,
+        description="The text string to search within. If None, extracted from trace using text_key.",
+    )
+    text_key: str = Field(
+        default="trace.last.outputs",
+        description="JSONPath expression to extract the text from the trace.",
+    )
+    values: list[str] = Field(
+        description="List of strings to check against. Passes only if every item is found.",
+    )
+    normalization_form: NormalizationForm | None = Field(
+        default="NFKC",
+        description="Unicode normalization form to apply (NFC, NFD, NFKC, NFKD). Defaults to NFKC.",
+    )
+    case_sensitive: bool = Field(
+        default=False,
+        description="If True, matching is case-sensitive. Defaults to False.",
+    )
+
+    def _format_str(self, value: str) -> str:
+        """Format a string for matching by applying normalization and case handling."""
+        value = normalize_string(value, self.normalization_form)
+        if not self.case_sensitive:
+            value = value.lower()
+        return value
+
+    @override
+    async def run(self, trace: TraceType) -> CheckResult:
+        """Execute the contains-all check."""
+        text = provided_or_resolve(
+            trace, key=self.text_key, value=provide_not_none(self.text)
+        )
+
+        details = {"text": text, "values": self.values}
+
+        if isinstance(text, NoMatch):
+            return CheckResult.failure(
+                message=f"No value found for text key '{self.text_key}'.",
+                details=details,
+            )
+
+        if not isinstance(text, str):
+            return CheckResult.failure(
+                message=f"Value for text key '{self.text_key}' is not a string, got {type(text)}.",
+                details=details,
+            )
+
+        formatted_text = self._format_str(text)
+
+        missing = [v for v in self.values if self._format_str(v) not in formatted_text]
+
+        if missing:
+            return CheckResult.failure(
+                message=f"The text is missing: {missing}.",
+                details={**details, "missing": missing},
+            )
+
+        return CheckResult.success(
+            message=f"The text contains all of: {self.values}.",
+            details=details,
+        )


There is significant code duplication between ContainsAny and ContainsAll. Both classes share identical fields (text, text_key, values, normalization_form, case_sensitive), the _format_str method, and the initial logic in the run method.

Consider refactoring this by introducing a private base class (e.g., _BaseContainsCheck) to encapsulate the common fields and helper methods. This would significantly improve maintainability and reduce the risk of inconsistent updates in the future.

gemini-code-assist · 2026-04-10T09:12:54Z

+        details = {"text": text, "values": self.values}
+
+        if isinstance(text, NoMatch):
+            return CheckResult.failure(
+                message=f"No value found for text key '{self.text_key}'.",
+                details=details,
+            )
+
+        if not isinstance(text, str):
+            return CheckResult.failure(
+                message=f"Value for text key '{self.text_key}' is not a string, got {type(text)}.",
+                details=details,
+            )
+
+        formatted_text = self._format_str(text)
+
+        for value in self.values:
+            if self._format_str(value) in formatted_text:
+                return CheckResult.success(
+                    message=f"The text contains '{value}'.",
+                    details={**details, "matched": value},
+                )


The run method has a few areas for improvement:

Efficiency: self._format_str(value) is called inside the loop. Since self.values and the formatting settings are constant for the check instance, these should be pre-formatted once to avoid redundant regex and normalization calls during execution.

Serialization: The details dictionary includes the text variable even if it is a NoMatch object. This can lead to serialization errors (e.g., when converting to JSON) if NoMatch is not a serializable type. It is safer to use None or a string representation for the NoMatch case.

gemini-code-assist · 2026-04-10T09:12:54Z

+        details = {"text": text, "values": self.values}
+
+        if isinstance(text, NoMatch):
+            return CheckResult.failure(
+                message=f"No value found for text key '{self.text_key}'.",
+                details=details,
+            )
+
+        if not isinstance(text, str):
+            return CheckResult.failure(
+                message=f"Value for text key '{self.text_key}' is not a string, got {type(text)}.",
+                details=details,
+            )
+
+        formatted_text = self._format_str(text)
+
+        missing = [v for v in self.values if self._format_str(v) not in formatted_text]
+
+        if missing:
+            return CheckResult.failure(
+                message=f"The text is missing: {missing}.",
+                details={**details, "missing": missing},
+            )


The run method can be optimized and improved for clarity:

Efficiency: Pre-format self.values to avoid redundant calls to _format_str in the list comprehension.

Serialization: Handle NoMatch in details to prevent potential serialization errors.

Clarity: If self.values contains duplicates that are missing from the text, they are currently reported multiple times in the missing list. Using unique values for the missing report is generally clearer for the user.

…-AI#2361)

github-actions Bot added the Scope: Checks label Apr 10, 2026

gemini-code-assist Bot reviewed Apr 10, 2026

View reviewed changes

abhigyan631 force-pushed the feat/contains-any-all-checks branch from 68df0cc to c061197 Compare April 10, 2026 10:13

abhigyan631 and others added 2 commits April 10, 2026 15:44

feat(checks): add ContainsAny and ContainsAll builtin checks (Giskard…

c061197

…-AI#2361)

Merge branch 'main' into feat/contains-any-all-checks

a083194

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(checks): add ContainsAny and ContainsAll builtin checks (#2361)#2380

feat(checks): add ContainsAny and ContainsAll builtin checks (#2361)#2380
abhigyan631 wants to merge 2 commits intoGiskard-AI:mainfrom
abhigyan631:feat/contains-any-all-checks

abhigyan631 commented Apr 10, 2026

gemini-code-assist Bot left a comment

gemini-code-assist Bot Apr 10, 2026

gemini-code-assist Bot Apr 10, 2026

gemini-code-assist Bot Apr 10, 2026

Labels

1 participant

Uh oh!

Conversation

abhigyan631 commented Apr 10, 2026

Description

Summary

Implementation Details

Testing

Related Issue

Type of Change

Checklist

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

gemini-code-assist Bot Apr 10, 2026

Choose a reason for hiding this comment

gemini-code-assist Bot Apr 10, 2026

Choose a reason for hiding this comment

gemini-code-assist Bot Apr 10, 2026

Choose a reason for hiding this comment

Labels

1 participant