11# Frontmatter Format
22
3+ ** Frontmatter format** is a simple convention for adding metadata as frontmatter on any
4+ text file in a tool-compatible way.
5+ It extends
6+ [ Jekyll-style YAML frontmatter] ( https://docs.github.com/en/contributing/writing-for-github-docs/using-yaml-frontmatter )
7+ to work with more file formats.
8+
39## Motivation
410
511Simple, readable metadata attached to files can be useful in numerous situations, such
612as recording title, author, source, copyright, or the provenance of a file.
713
8- Unfortunately, it' s often unclear how to format such metadata consistently across
9- different file types while preserving valid syntax, making parsing easy, and not breaking
10- interoperability with existing tools.
14+ Unfortunately, it’ s often unclear how to format such metadata consistently across
15+ different file types while preserving valid syntax, making parsing easy, and not
16+ breaking interoperability with existing tools.
1117
12- ** Frontmatter format** is a way to add metadata as frontmatter on any file.
13- It is basically a micro-format: a simple set of conventions to put structured metadata
14- as YAML at the top of a file in a syntax that is broadly compatible with programming
15- languages, browsers, editors, and other tools.
18+ Frontmatter format is basically a micro-format: a simple set of conventions to put
19+ structured metadata as YAML at the top of a file in a syntax that is broadly compatible
20+ with programming languages, browsers, editors, and other tools.
1621
1722Frontmatter format specifies a syntax for the metadata as a comment block at the top of
1823a file. This approach works while ensuring the file remains valid Markdown, HTML, CSS,
1924Python, C/C++, Rust, SQL, or most other text formats.
2025
21- Frontmatter format is a generalization of the common format for frontmatter used by
22- Jekyll and other CMSs for Markdown files.
23- In that format, frontmatter is enclosed in lines containing ` --- ` delimiters.
26+ Frontmatter format is a generalization of the YAML frontmatter already used by
27+ [ Jekyll] ( https://jekyllrb.com/docs/front-matter/ ) ,
28+ [ 11ty] ( https://www.11ty.dev/docs/data-frontmatter/#front-matter-formats ) , and other CMSs
29+ for Markdown files. In that format, frontmatter is enclosed in lines containing ` --- `
30+ delimiters.
2431
2532In this generalized format, we allow several styles of frontmatter demarcation, with the
2633first line of the file indicating the format and style.
@@ -100,24 +107,24 @@ type Point = tuple[float, float]
100107print (Point)
101108```
102109
103- Here' s an example of a richer metadata in use, from a tool that does video
104- transcription. You can see how it' s useful having a simple and clear format for title,
110+ Here’ s an example of a richer metadata in use, from a tool that does video
111+ transcription. You can see how it’ s useful having a simple and clear format for title,
105112description, history, source of the content, etc.
106113
107114![ Credit for video to @KBoges on YouTube] ( images/example.png )
108115
109116## Advantages of this Approach
110117
111118- ** Compatible with existing syntax:** By choosing a style for the metadata consistent
112- with any given file, it generally doesn' t break existing tools.
119+ with any given file, it generally doesn’ t break existing tools.
113120 Almost every language has a style for which frontmatter works as a comment.
114121
115122- ** Auto-detectable format:** Frontmatter and its format can be recognized by the first
116123 few bytes of the file.
117- That means it' s possible to detect metadata and parse it automatically.
124+ That means it’ s possible to detect metadata and parse it automatically.
118125
119126- ** Metadata is optional:** Files with or without metadata can be read with the same
120- tools. So it' s easy to roll out metadata into files gracefully, as needed file by file.
127+ tools. So it’ s easy to roll out metadata into files gracefully, as needed file by file.
121128
122129- ** YAML syntax:** JSON, YAML, XML, and TOML are all used for metadata in some
123130 situations. YAML is the best choice here because it is already in widespread use with
@@ -197,7 +204,7 @@ Rules:
197204
198205- As a special case, * hash style* files may have an arbitrary number of additional lines
199206 starting with ` # ` before the initial ` #--- ` delimiter.
200- This allows for " shebang" lines like ` #!/usr/bin/bash ` at the top of a file, or for
207+ This allows for “ shebang” lines like ` #!/usr/bin/bash ` at the top of a file, or for
201208 Python
202209 [ inline script metadata] ( https://packaging.python.org/en/latest/specifications/inline-script-metadata/#inline-script-metadata )
203210 to work.
@@ -314,18 +321,18 @@ print(raw_metadata) # 'title: Test Title\nauthor: Test Author\n'
314321
315322## FAQ
316323
317- - ** Hasn' t this been done before?** Possibly, but as far as I can tell, not in a
324+ - ** Hasn’ t this been done before?** Possibly, but as far as I can tell, not in a
318325 systematic way for multiple file formats.
319- I needed this myself, and think we' d all be better off if more tools used YAML
320- metadata consistently, so I' ve released the format and implementation here.
326+ I needed this myself, and think we’ d all be better off if more tools used YAML
327+ metadata consistently, so I’ ve released the format and implementation here.
321328
322- - ** Is this mature?** This is the first draft of this format .
323- But I' ve been using this on my own projects for a couple months .
329+ - ** Is this mature?** This is pretty new .
330+ But I’ ve been using this format and package on my own projects successfully .
324331 The flexibity of just having metadata on all your text files has been great for
325332 workflows, pipelines, etc.
326333
327334- ** When should we use it?** All the time if you can!
328- It' s especially important for command-line tools, AI agents, LLM workflows, since you
335+ It’ s especially important for command-line tools, AI agents, LLM workflows, since you
329336 often want to store extra metadata is a consistent way on text inputs of various
330337 formats like Markdown, HTML, CSS, and Python.
331338
@@ -335,12 +342,19 @@ print(raw_metadata) # 'title: Test Title\nauthor: Test Author\n'
335342 Standardizing headings like title, author, description, let alone other more
336343 application-specific information is beyond the scope of this frontmatter format.
337344
345+ - ** Why not JSON?** Well, JSON is also valid [ YAML 1.2] ( https://yaml.org/spec/1.2.2/ ) !
346+ You can simply use JSON if desired and it should work.
347+ This library uses [ ruamel.yaml] ( https://pypi.org/project/ruamel.yaml/ ) , which is YAML
348+ 1.2 compliant. A few YAML parsers do have issues with corner cases of JSON, like
349+ duplicated keys, special numbers like NaN, etc.
350+ but if you are using simple and clean metadata this isn’t likely to be a problem.
351+
338352- ** Can this work with Pydantic?** Yes, definitely.
339- In fact, I think it' s probably a good practice to define self-identifiable Pydantic
353+ In fact, I think it’ s probably a good practice to define self-identifiable Pydantic
340354 (or Zod) schemas for all your metadata, and then just serialize and deserialize them
341355 to frontmatter everywhere.
342356
343- - ** Isn' t this the same as what some CMSs use, Markdown files and YAML at the top?**
357+ - ** Isn’ t this the same as what some CMSs use, Markdown files and YAML at the top?**
344358 Yes! But this generalizes that format, and removes the direct tie-in to Markdown or any
345359 CMS. This can work with any tool.
346360 For HTML and code, it works basically with no changes at all since the frontmatter is
@@ -353,17 +367,16 @@ print(raw_metadata) # 'title: Test Title\nauthor: Test Author\n'
353367- ** Does this work for CSV files?** Sort of.
354368 Some tools do properly honor hash style comments when parsing CSV files.
355369 A few do not. Our recommendation is go ahead and use it, and find ways to strip the
356- metadata at the last minute if you really can't get a tool to work with the metadata.
357-
370+ metadata at the last minute if you really can’t get a tool to work with the metadata.
358371
359372- ** Does this also work for YAML files?** Yes!
360- It' s fine to have YAML metadata on YAML metadata.
373+ It’ s fine to have YAML metadata on YAML metadata.
361374 There are just two nuances.
362375
363376 Firstly, watch out for duplicate ` --- ` separators, if you insert frontmatter in front
364377 of a file that already has it.
365378
366- Secondly, it' s up to you to use the YAML itself to distinguish whether a file has
379+ Secondly, it’ s up to you to use the YAML itself to distinguish whether a file has
367380 frontmatter or is just a plain YAML file.
368381 Both of these can be avoided if you use plain YAML with ` --- ` separators only when
369382 using frontmatter format.
0 commit comments