Skip to content

Commit 3d5935a

Browse files
committed
Readme update/clarification.
1 parent a45ee1e commit 3d5935a

File tree

1 file changed

+41
-28
lines changed

1 file changed

+41
-28
lines changed

‎README.md‎

Lines changed: 41 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,33 @@
11
# Frontmatter Format
22

3+
**Frontmatter format** is a simple convention for adding metadata as frontmatter on any
4+
text file in a tool-compatible way.
5+
It extends
6+
[Jekyll-style YAML frontmatter](https://docs.github.com/en/contributing/writing-for-github-docs/using-yaml-frontmatter)
7+
to work with more file formats.
8+
39
## Motivation
410

511
Simple, readable metadata attached to files can be useful in numerous situations, such
612
as recording title, author, source, copyright, or the provenance of a file.
713

8-
Unfortunately, it's often unclear how to format such metadata consistently across
9-
different file types while preserving valid syntax, making parsing easy, and not breaking
10-
interoperability with existing tools.
14+
Unfortunately, its often unclear how to format such metadata consistently across
15+
different file types while preserving valid syntax, making parsing easy, and not
16+
breaking interoperability with existing tools.
1117

12-
**Frontmatter format** is a way to add metadata as frontmatter on any file.
13-
It is basically a micro-format: a simple set of conventions to put structured metadata
14-
as YAML at the top of a file in a syntax that is broadly compatible with programming
15-
languages, browsers, editors, and other tools.
18+
Frontmatter format is basically a micro-format: a simple set of conventions to put
19+
structured metadata as YAML at the top of a file in a syntax that is broadly compatible
20+
with programming languages, browsers, editors, and other tools.
1621

1722
Frontmatter format specifies a syntax for the metadata as a comment block at the top of
1823
a file. This approach works while ensuring the file remains valid Markdown, HTML, CSS,
1924
Python, C/C++, Rust, SQL, or most other text formats.
2025

21-
Frontmatter format is a generalization of the common format for frontmatter used by
22-
Jekyll and other CMSs for Markdown files.
23-
In that format, frontmatter is enclosed in lines containing `---` delimiters.
26+
Frontmatter format is a generalization of the YAML frontmatter already used by
27+
[Jekyll](https://jekyllrb.com/docs/front-matter/),
28+
[11ty](https://www.11ty.dev/docs/data-frontmatter/#front-matter-formats), and other CMSs
29+
for Markdown files. In that format, frontmatter is enclosed in lines containing `---`
30+
delimiters.
2431

2532
In this generalized format, we allow several styles of frontmatter demarcation, with the
2633
first line of the file indicating the format and style.
@@ -100,24 +107,24 @@ type Point = tuple[float, float]
100107
print(Point)
101108
```
102109

103-
Here's an example of a richer metadata in use, from a tool that does video
104-
transcription. You can see how it's useful having a simple and clear format for title,
110+
Heres an example of a richer metadata in use, from a tool that does video
111+
transcription. You can see how its useful having a simple and clear format for title,
105112
description, history, source of the content, etc.
106113

107114
![Credit for video to @KBoges on YouTube](images/example.png)
108115

109116
## Advantages of this Approach
110117

111118
- **Compatible with existing syntax:** By choosing a style for the metadata consistent
112-
with any given file, it generally doesn't break existing tools.
119+
with any given file, it generally doesnt break existing tools.
113120
Almost every language has a style for which frontmatter works as a comment.
114121

115122
- **Auto-detectable format:** Frontmatter and its format can be recognized by the first
116123
few bytes of the file.
117-
That means it's possible to detect metadata and parse it automatically.
124+
That means its possible to detect metadata and parse it automatically.
118125

119126
- **Metadata is optional:** Files with or without metadata can be read with the same
120-
tools. So it's easy to roll out metadata into files gracefully, as needed file by file.
127+
tools. So its easy to roll out metadata into files gracefully, as needed file by file.
121128

122129
- **YAML syntax:** JSON, YAML, XML, and TOML are all used for metadata in some
123130
situations. YAML is the best choice here because it is already in widespread use with
@@ -197,7 +204,7 @@ Rules:
197204

198205
- As a special case, *hash style* files may have an arbitrary number of additional lines
199206
starting with `#` before the initial `#---` delimiter.
200-
This allows for "shebang" lines like `#!/usr/bin/bash` at the top of a file, or for
207+
This allows for shebang lines like `#!/usr/bin/bash` at the top of a file, or for
201208
Python
202209
[inline script metadata](https://packaging.python.org/en/latest/specifications/inline-script-metadata/#inline-script-metadata)
203210
to work.
@@ -314,18 +321,18 @@ print(raw_metadata) # 'title: Test Title\nauthor: Test Author\n'
314321

315322
## FAQ
316323

317-
- **Hasn't this been done before?** Possibly, but as far as I can tell, not in a
324+
- **Hasnt this been done before?** Possibly, but as far as I can tell, not in a
318325
systematic way for multiple file formats.
319-
I needed this myself, and think we'd all be better off if more tools used YAML
320-
metadata consistently, so I've released the format and implementation here.
326+
I needed this myself, and think wed all be better off if more tools used YAML
327+
metadata consistently, so Ive released the format and implementation here.
321328

322-
- **Is this mature?** This is the first draft of this format.
323-
But I've been using this on my own projects for a couple months.
329+
- **Is this mature?** This is pretty new.
330+
But Ive been using this format and package on my own projects successfully.
324331
The flexibity of just having metadata on all your text files has been great for
325332
workflows, pipelines, etc.
326333

327334
- **When should we use it?** All the time if you can!
328-
It's especially important for command-line tools, AI agents, LLM workflows, since you
335+
Its especially important for command-line tools, AI agents, LLM workflows, since you
329336
often want to store extra metadata is a consistent way on text inputs of various
330337
formats like Markdown, HTML, CSS, and Python.
331338

@@ -335,12 +342,19 @@ print(raw_metadata) # 'title: Test Title\nauthor: Test Author\n'
335342
Standardizing headings like title, author, description, let alone other more
336343
application-specific information is beyond the scope of this frontmatter format.
337344

345+
- **Why not JSON?** Well, JSON is also valid [YAML 1.2](https://yaml.org/spec/1.2.2/)!
346+
You can simply use JSON if desired and it should work.
347+
This library uses [ruamel.yaml](https://pypi.org/project/ruamel.yaml/), which is YAML
348+
1.2 compliant. A few YAML parsers do have issues with corner cases of JSON, like
349+
duplicated keys, special numbers like NaN, etc.
350+
but if you are using simple and clean metadata this isn’t likely to be a problem.
351+
338352
- **Can this work with Pydantic?** Yes, definitely.
339-
In fact, I think it's probably a good practice to define self-identifiable Pydantic
353+
In fact, I think its probably a good practice to define self-identifiable Pydantic
340354
(or Zod) schemas for all your metadata, and then just serialize and deserialize them
341355
to frontmatter everywhere.
342356

343-
- **Isn't this the same as what some CMSs use, Markdown files and YAML at the top?**
357+
- **Isnt this the same as what some CMSs use, Markdown files and YAML at the top?**
344358
Yes! But this generalizes that format, and removes the direct tie-in to Markdown or any
345359
CMS. This can work with any tool.
346360
For HTML and code, it works basically with no changes at all since the frontmatter is
@@ -353,17 +367,16 @@ print(raw_metadata) # 'title: Test Title\nauthor: Test Author\n'
353367
- **Does this work for CSV files?** Sort of.
354368
Some tools do properly honor hash style comments when parsing CSV files.
355369
A few do not. Our recommendation is go ahead and use it, and find ways to strip the
356-
metadata at the last minute if you really can't get a tool to work with the metadata.
357-
370+
metadata at the last minute if you really can’t get a tool to work with the metadata.
358371

359372
- **Does this also work for YAML files?** Yes!
360-
It's fine to have YAML metadata on YAML metadata.
373+
Its fine to have YAML metadata on YAML metadata.
361374
There are just two nuances.
362375

363376
Firstly, watch out for duplicate `---` separators, if you insert frontmatter in front
364377
of a file that already has it.
365378

366-
Secondly, it's up to you to use the YAML itself to distinguish whether a file has
379+
Secondly, its up to you to use the YAML itself to distinguish whether a file has
367380
frontmatter or is just a plain YAML file.
368381
Both of these can be avoided if you use plain YAML with `---` separators only when
369382
using frontmatter format.

0 commit comments

Comments
 (0)