Skip to content

Commit 13f0dbd

Browse files
committed
First version.
1 parent 5cb0c3f commit 13f0dbd

File tree

10 files changed

+1542
-0
lines changed

10 files changed

+1542
-0
lines changed

‎.github/workflows/ci.yml‎

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# This workflow will install Python dependencies, run tests and lint with a single version of Python
2+
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python
3+
4+
name: CI
5+
6+
on:
7+
push:
8+
branches: ["main"]
9+
pull_request:
10+
branches: ["main"]
11+
12+
permissions:
13+
contents: read
14+
15+
jobs:
16+
build:
17+
runs-on: ubuntu-latest
18+
19+
steps:
20+
- uses: actions/checkout@v4
21+
with:
22+
# Important for versioning plugins:
23+
fetch-depth: 0
24+
25+
- name: Set up Python
26+
uses: actions/setup-python@v5
27+
with:
28+
python-version: "3.12"
29+
30+
- name: Install Poetry
31+
uses: snok/install-poetry@v1
32+
with:
33+
version: latest
34+
35+
- name: Cache Poetry dependencies
36+
uses: actions/cache@v4
37+
with:
38+
path: |
39+
~/.cache/pypoetry
40+
~/.cache/pip
41+
key: ${{ runner.os }}-poetry-${{ hashFiles('**/poetry.lock') }}
42+
restore-keys: |
43+
${{ runner.os }}-poetry-
44+
45+
- name: Install Poetry plugins
46+
run: |
47+
poetry self add "poetry-dynamic-versioning[plugin]"
48+
49+
- name: Install dependencies
50+
run: poetry install
51+
52+
- name: Run linting
53+
run: poetry run lint
54+
55+
- name: Run tests
56+
run: poetry run test

‎README.md‎

Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
# Frontmatter Format
2+
3+
## Motivation
4+
5+
Simple, readable metadata attached to files can be useful in numerous situations, such as
6+
recording title, author file.
7+
Unfortunately, it's often unclear how to store such metadata consistently across different
8+
file types without breaking interoperability with existing tools.
9+
10+
**Frontmatter format** is simply a set of conventions to read and write metadata on many
11+
kinds of files in a syntax that is broadly compatible with programming languages, browsers,
12+
editors, Markdown parsers, and other tools.
13+
14+
Frontmatter format puts frontmatter metadata as YAML in frontmatter or a comment block at
15+
the top of the file.
16+
This approach works with Markdown, HTML, CSS, Python, C/C++, Rust, SQL, and most other
17+
common text formats.
18+
19+
This is a description of the format and a simple reference implementation.
20+
21+
This implementation is in Python but the format is very simple and easy to implement in any
22+
language.
23+
24+
The purpose of this repo is to explain the idea of the format so anyone can use it, and
25+
encourage the adoption of the format, especially for workflows around text documents that are
26+
becoming common in AI pipelines.
27+
28+
## Examples
29+
30+
Frontmatter format is a generalization of the common format for frontmatter used by Jekyll
31+
and other CMSs for Markdown files.
32+
In that format, frontmatter is enclosed in `---` delimiters.
33+
34+
Frontmatter format is a way to add metadata as frontmatter on any file.
35+
In this generalized format, we allow multiple styles of frontmatter demarcation, allowing
36+
for easy auto-detection, parsing, and compatibility.
37+
38+
Below are a few examples to illustrate:
39+
40+
```markdown
41+
---
42+
title: Sample Markdown File
43+
state: draft
44+
created_at: 2022-08-07 00:00:00
45+
tags:
46+
- yaml
47+
- examples
48+
---
49+
Hello, *World*!
50+
```
51+
52+
```html
53+
<!---
54+
title: Sample HTML File
55+
--->
56+
Hello, <i>World</i>!
57+
```
58+
59+
```python
60+
#---
61+
# author: Jane Doe
62+
# description: A sample Python script
63+
#---
64+
print("Hello, World!")
65+
```
66+
67+
```css
68+
/*---
69+
filename: styles.css
70+
---*/
71+
.hello {
72+
color: green;
73+
}
74+
```
75+
76+
```sql
77+
----
78+
-- title: Sample SQL Script
79+
----
80+
SELECT * FROM world;
81+
```
82+
83+
## Format Definition
84+
85+
A file is in frontmatter format if the first characters are one of the following:
86+
87+
- `---`
88+
89+
- `<!---`
90+
91+
- `#---`
92+
93+
- `//---`
94+
95+
- `/*---`
96+
97+
and if this prefix is followed by a newline (`\n`).
98+
99+
The prefix determines the *style* of the frontmatter.
100+
The style specifies the matching terminating delimiter for the end of the frontmatter as
101+
well as an optional prefix (which is typically a comment character in some language).
102+
103+
The supported frontmatter styles are:
104+
105+
1. *YAML style*: delimiters `---` and `---` with no prefix on each line.
106+
Useful for text or Markdown content.
107+
108+
2. *HTML style*: delimiters `<!---` and `--->` with no prefix on each line.
109+
Useful for HTML or XML or similar content.
110+
111+
3. *Hash style*: delimiters `#---` and `#---` with `# ` prefix on each line.
112+
Useful for Python or similar code content.
113+
Also works for CSV files with many tools.
114+
115+
4. *Rust style*: delimiters `//---` and `//---` with `// ` prefix on each line.
116+
Useful for Rust or C++ or similar code content.
117+
118+
5. *C style*: delimiters `/*---` and `---*/` with no prefix on each line.
119+
Useful for CSS or C or similar code content.
120+
121+
6. *Dash style*: delimiters `----` and `----` with `-- ` prefix on each line.
122+
Useful for SQL or similar code content.
123+
124+
The delimiters must be alone on their own lines, terminated with a newline.
125+
126+
Any style is acceptable on any file as it can be automatically detected.
127+
When writing, you can specify the style.
128+
129+
For all frontmatter styles, the content between the delimiters can be any text in UTF-8
130+
encoding.
131+
But it is recommended to use YAML.
132+
133+
For some of the formats, each frontmatter line is prefixed with a prefix to make sure the
134+
entire file remains valid in a given syntax (Python, Rust, SQL, etc.). This prefix is
135+
stripped during parsing.
136+
137+
It is recommended to use a prefix with a trailing space (such as `# `) but a bare prefix
138+
without the trailing space is also allowed.
139+
Other whitespace is preserved (before parsing with YAML).
140+
141+
There is no restriction on the content of the file after the frontmatter.
142+
It may even contain other content in frontmatter format, but this will not be parsed as
143+
frontmatter.
144+
Typically, it is text, but it could be binary as well.
145+
146+
Frontmatter is optional.
147+
This means almost any text file can be read as frontmatter format.
148+
149+
## Reference Implementation
150+
151+
This is a simple Python reference implementation.
152+
It auto-detects all the frontmatter styles above.
153+
It supports reading small files easily into memory, but also allows extracting or changing
154+
frontmatter without reading an entire file.
155+
156+
Both raw (string) parsed YAML frontmatter (using ruamel.yaml) are supported.
157+
For readability, there is also support for preferred sorting of YAML keys.
158+
159+
## Installation
160+
161+
```
162+
# Use pip
163+
pip install frontmatter-format
164+
# Or poetry
165+
poetry add frontmatter-format
166+
```
167+
168+
## Usage
169+
170+
```python
171+
from frontmatter_format import fmf_read, fmf_read_raw, fmf_write, FmStyle
172+
173+
# Write some content:
174+
content = "Hello, World!"
175+
metadata = {"title": "Test Title", "author": "Test Author"}
176+
fmf_write("example.md", content, metadata, style=FmStyle.yaml)
177+
178+
# Or any other desired style:
179+
html_content = "<p>Hello, World!</p>"
180+
fmf_write("example.html", content, metadata, style=FmStyle.html)
181+
182+
# Read it back. Style is auto-detected:
183+
content, metadata = fmf_read("example.md")
184+
print(content) # Outputs: Hello, World!
185+
print(metadata) # Outputs: {'title': 'Test Title', 'author': 'Test Author'}
186+
187+
# Read metadata without parsing:
188+
content, raw_metadata = fmf_read_raw("example.md")
189+
print(content) # Outputs: Hello, World!
190+
print(raw_metadata) # Outputs: 'title: Test Title\nauthor: Test Author\n'
191+
```
192+
193+
The above is easiest for small files, but you can also operate more efficiently directly on
194+
files, without reading the file contents into memory.
195+
196+
```python
197+
from frontmatter_format import fmf_strip_frontmatter, fmf_insert_frontmatter, fmf_read_frontmatter_raw
198+
199+
# Strip and discard the metadata from a file:
200+
fmf_strip_frontmatter("example.md")
201+
202+
# Insert the metadata at the top of an existing file:
203+
new_metadata = {"title": "New Title", "author": "New Author"}
204+
fmf_insert_frontmatter("example.md", new_metadata, fm_style=FmStyle.yaml)
205+
206+
# Read the raw frontmatter metadata and get the offset for the rest of the content:
207+
raw_metadata, offset = fmf_read_frontmatter_raw("example.md")
208+
print(raw_metadata) # Outputs: 'title: Test Title\nauthor: Test Author\n'
209+
print(offset) # Outputs the byte offset where the content starts
210+
```
211+
212+
## FAQ
213+
214+
- **Isn't this the same as what some CMSs use, Markdown files and YAML at the top?** Yes!
215+
But this generalizes that format, and removes the direct tie-in to Markdown or any CMS.
216+
This can work with any tool.
217+
For HTML and code, it works basically with no changes at all since the frontmatter is
218+
considered a comment.
219+
220+
- **Does this specify the format of the YAML itself?** No.
221+
This is simply a format for attaching metadata.
222+
What metadata you attach is up to your use case.
223+
Standardizing headings like title, author, description, let alone other more
224+
application-specific information is beyond the scope of this frontmatter format.
225+
226+
- **Can this work with binary files?** No reason why not, if it makes sense for you!
227+
You can use `fmf_insert_frontmatter()` to add metadata of any style to any file.
228+
Whether this works for your application depends on the file format.
229+
230+
- **Does this work for CSV files?** Sort of.
231+
Some tools do properly honor hash style comments when parsing CSV files.
232+
A few do not. Our recommendation is go ahead and use it, and find ways to strip the
233+
metadata at the last minute if you really can't get a tool to work with the metadata.

‎devtools/lint.py‎

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
import subprocess
2+
from rich import print as rprint
3+
4+
5+
def _run(cmd: list[str]) -> int:
6+
rprint(f"[bold green]❯ {' '.join(cmd)}[/bold green]")
7+
errcount = 0
8+
try:
9+
subprocess.run(cmd, text=True, check=True)
10+
except subprocess.CalledProcessError as e:
11+
rprint(f"[bold red]Error: {e}[/bold red]")
12+
errcount = 1
13+
rprint()
14+
15+
return errcount
16+
17+
18+
def main():
19+
rprint()
20+
21+
errcount = 0
22+
paths = ["frontmatter_format", "tests"]
23+
doc_paths = ["README.md"]
24+
errcount += _run(["codespell", "--write-changes", *paths, *doc_paths])
25+
errcount += _run(["usort", "format", *paths])
26+
errcount += _run(["ruff", "check", "--fix", *paths])
27+
errcount += _run(["black", *paths])
28+
errcount += _run(["mypy", *paths]) # TODO: Enable.
29+
30+
rprint()
31+
32+
if errcount != 0:
33+
rprint(f"[bold red]✗ Lint failed with {errcount} errors.[/bold red]")
34+
else:
35+
rprint("[bold green]✔️ Lint passed![/bold green]")
36+
rprint()
37+
38+
return errcount
39+
40+
41+
if __name__ == "__main__":
42+
exit(main())

‎frontmatter_format/__init__.py‎

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
from .frontmatter_format import (
2+
fmf_insert_frontmatter,
3+
fmf_read,
4+
fmf_read_frontmatter_raw,
5+
fmf_read_raw,
6+
fmf_strip_frontmatter,
7+
fmf_write,
8+
FmFormatError,
9+
FmStyle,
10+
Metadata,
11+
)
12+
from .yaml_util import (
13+
dump_yaml,
14+
from_yaml_string,
15+
new_yaml,
16+
read_yaml_file,
17+
to_yaml_string,
18+
write_yaml_file,
19+
)
20+
21+
__all__ = [
22+
"FmStyle",
23+
"FmFormatError",
24+
"fmf_write",
25+
"fmf_read",
26+
"fmf_read_raw",
27+
"fmf_read_frontmatter_raw",
28+
"fmf_strip_frontmatter",
29+
"fmf_insert_frontmatter",
30+
"Metadata",
31+
"new_yaml",
32+
"to_yaml_string",
33+
"from_yaml_string",
34+
"dump_yaml",
35+
"read_yaml_file",
36+
"write_yaml_file",
37+
]

0 commit comments

Comments
 (0)