Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 3, 2025

PR Details

Description

Implements automatic validation of YAML front matter against content type definitions using JSON Schema. When a markdown file is opened, its front matter is validated in real-time and errors are surfaced as inline warnings.

Implementation

  • ContentTypeSchemaGenerator - Converts content type field definitions to JSON Schema

    • Maps all field types (string, number, boolean, datetime, choice, tags, categories, nested fields, blocks)
    • Handles required fields, enums, number ranges, and nested structures
    • Caches schemas per content type for performance
    • NEW: Validates tags against known taxonomy values from TaxonomyHelper
    • NEW: Validates categories against known taxonomy values from TaxonomyHelper
    • NEW: Validates custom taxonomy fields against their configured options
  • FrontMatterValidator - Validates front matter using AJV

    • Generates user-friendly error messages from JSON Schema validation errors
    • Handles all AJV error types (required, type, enum, format, min/max)
    • Async validation to support taxonomy data retrieval
  • StatusListener integration - Adds schema validation to existing diagnostics

    • Runs alongside existing required field checks
    • Displays errors as warnings in Problems panel and inline
    • Searches only within front matter section (between --- delimiters)
    • Special handling for required field errors using missingProperty parameter

Example

Given a content type:

{
  name: "article",
  fields: [
    { name: "title", type: "string", required: true },
    { name: "rating", type: "number", numberOptions: { min: 1, max: 5 } },
    { name: "status", type: "choice", choices: ["draft", "published"] },
    { name: "tags", type: "tags" },
    { name: "categories", type: "categories" }
  ]
}

Invalid front matter triggers inline warnings:

---
# Missing required "title" - shows warning
rating: 10  # Out of range - shows "must be less than or equal to 5"
status: invalid  # Not in enum - shows "must be one of: draft, published"
tags:
  - javascript
  - reactjs  # Unknown tag - shows "must be one of: javascript, typescript, react"
categories:
  - tutoral  # Typo - shows "must be one of: blog, tutorial, guide"
---

Taxonomy Validation

The validation now checks tags, categories, and custom taxonomy fields against known values:

  • Tags: Validated against available tags from TaxonomyHelper.get(TaxonomyType.Tag)
  • Categories: Validated against available categories from TaxonomyHelper.get(TaxonomyType.Category)
  • Custom Taxonomy: Validated against configured options in the taxonomy definition

This helps catch typos early, enforces consistency, and ensures only valid taxonomy values are used across content.

Known Limitations

Field location detection uses string matching (fieldName:) within front matter section. More precise location reporting would require YAML AST parsing.

Related Issue

Fixes enhancement request for schema validation of front matter.

Motivation and Context

Content type definitions specify structure but weren't enforced at edit-time. This creates a tight feedback loop by validating as users type, catching errors before they cause issues downstream. The addition of taxonomy validation prevents inconsistent tag/category usage and catches typos immediately.

How Has This Been Tested

  • Built successfully with webpack development build
  • CodeQL security scan: 0 vulnerabilities
  • Tested schema generation for all field types including tags, categories, and custom taxonomy
  • Verified validation errors appear correctly in Problems panel
  • Tested taxonomy enum validation with various known and unknown values

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
Original prompt

This section details on the original issue you should resolve

<issue_title>Enhancement: Schema and validation for front matter in markdown files</issue_title>
<issue_description>Is your feature request related to a problem? Please describe.
Currently, markdown files with YAML front matter can vary in structure and syntax, which may lead to inconsistencies and errors during processing.

Describe the solution you'd like

  • Make use of the content type definition, like it would be a JSON schema definition
  • Add automatic validation for the YAML section of markdown files to ensure they conform to the defined schema.

Describe alternatives you've considered

  • Manual validation through external tools.
  • Relying on user attention to syntax and structure without validation.

Additional context
This will help improve reliability, catch errors early, and ensure a consistent structure for all markdown documents using front matter in the project.</issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits December 3, 2025 15:02
Co-authored-by: estruyf <2900833+estruyf@users.noreply.github.com>
Co-authored-by: estruyf <2900833+estruyf@users.noreply.github.com>
Co-authored-by: estruyf <2900833+estruyf@users.noreply.github.com>
Copilot AI changed the title [WIP] Enhance schema and validation for YAML front matter in markdown Dec 3, 2025
Copilot AI requested a review from estruyf December 3, 2025 15:16
@estruyf
Copy link
Owner

estruyf commented Dec 4, 2025

@copilot when you are using tags or categories, there should also be validation on the available tags and categories you can pick from. Show a warning when a tag/category is used that is not in the known list.

…n values

Co-authored-by: estruyf <2900833+estruyf@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 4, 2025

@copilot when you are using tags or categories, there should also be validation on the available tags and categories you can pick from. Show a warning when a tag/category is used that is not in the kn...

Implemented in commit 5de91cf.

The schema generator now:

  • Retrieves available tags via TaxonomyHelper.get(TaxonomyType.Tag) and adds them as enum constraints
  • Retrieves available categories via TaxonomyHelper.get(TaxonomyType.Category) and adds them as enum constraints
  • For custom taxonomy fields, retrieves options from the taxonomy definition by taxonomyId

When users type unknown tags/categories, they'll see inline warnings like: "Field 'tags/0' must be one of: javascript, typescript, react"

Copilot AI changed the title Add JSON Schema validation for front matter based on content type definitions Dec 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants