This is a module for ConTeXt to typeset Pandoc JSON files directly with ConTeXt.
This is how it works:
-
JSON documents are converted on the fly (in memory) to XML, as if they were parsed XML files, which means that a lua table is created, representing the document object model (DOM);
-
as with other XML-native documents, you define xmlsetups in ConTeXt to associate typesetting templates (xmlsetups) to XML elements; the association is made through XPath-like (lpath) or CSS selectors.
Since version 3.8 (released on September 6th, 2025), Pandoc provides an XML format that is equal to the one produced by this module.
There were some small differences, but the version of September 21st, 2025, fixes them:
-
the
alignattribute becomesalignment -
the
colspanattribute becomescol-span -
the
rowspanattribute becomesrow-span -
the default value of
col-widthattribute is "0" instead of "ColWidthDefault" -
the
head-columnsattribute of<TableBody>becomesrow-head-columns -
the
head-rowsattribute of<TableBody>is suppressed, because now the children of that element are<header>and<body>, that contain respectively the<Row>elements of the body's header and body -
the
<Citation>elements inside a<Cite>element are now embedded in a<citations>element
There are still some differences between the XML output of Pandoc and the output of this module:
-
this module does not add the
<?xml version='1.0' ?>line at the top -
in space before the "/>" closing empty elements
So now you can format the Pandoc documents written in xml format.
If you enable SyncTeX support with the --synctex option, you can
use the xml.functions.addSynctexFields lua function, in particular
in the xml setup matching the <Pandoc> root element.
Example:
\startxmlsetups xml:pandoc
...
\xmlsetsetup{#1}{Pandoc|meta|blocks}{xml:pandoc:*}
...
\stopxmlsetups
\startxmlsetups xml:pandoc:Pandoc
\xmlfunction{#1}{addSynctexFields}
\xmlall{#1}{meta}
\xmlall{#1}{blocks}
\stopxmlsetupsThat function is compatible with
pundok-editor
and
pandoc-include-doc.
If you assembled a book from sub-documents following the conventions
of pandoc-include-doc, you can retrieve the source file matching
a certain point on a certain page in the PDF file.
In particular, it works with the mtx-synctex.lua script in the ConTeXt
distribution. Example:
mtxrun --script synctex --goto --x=200 --y=120 --page=12 book.synctexwhere book.synctex is the SyncTeX file generated by ConTeXt while
typesetting book.xml into book.pdf.
In a GNU/Linux machine, you can create a zip file of the module
with the script create_module_zip.sh in the devhelpers directory:
cd devhelpers
./create_module_zip.shYou can install it with:
mtxrun --script install-modules --install --module dist/t-pandocxml-????.??.??.zipAn alternative is unzipping it in the ~/texmf directory, but it should increase
the startup time of ConTeXt, because that directory is scanned everytime ConTeXt is run.
In the test directory there are some examples.
Once you installed the module, you can do this:
cd test
context test1These are the contents of test1.tex:
% load the module to process pandoc JSON as XML
\usemodule[t][pandocxml]
% load the file that defines xmlsetups for Pandoc (xml:pandoc)
\environment pandoc-xmlsetups
% process test1.json with xml:pandoc setups
\starttext
\xmlprocesspandocjsonfile{test}{test1.json}{xml:pandoc}
\stoptextYou can convert a Pandoc JSON file to its XML equivalent. It's a good way to see how this module converts Pandoc AST items into XML elements.
mtxrun --script pandocjsontoxml mydoc.json mydoc.xmlwhere mydoc.json is your JSON document, and mydoc.xml is its translation to XML.
If you specify only the source file,
mtxrun --script pandocjsontoxml mydoc.jsonyou'll see the XML in the standard output.
Most of Pandoc items are JSON-encoded with a t (type) and a c (content) fields.
The most natural way to make a conversion to XML is using the t (type) field
as the name of an element tag.
So Para items become <Para>...</Para> elements, Emph inlines become <Emph>...</Emph>
elements, and so on.
This is the skeleton of the XML version of a JSON document:
<Pandoc api-version="1,23,1">
<meta>
...
</meta>
<blocks>
...
</blocks>
</Pandoc>So the JSON outer keys, pandoc-api-version, meta and blocks, become, respectively,
an attribute of the root element and its two children elements.
For elements, I kept the capitalization of Pandoc types (meta and blocks are exceptions,
but they are not Pandoc items).
Lowercase tags are used also for items that don't have an explicit name in Pandoc, like list items (<item>), the lines of a LineBlock (<line>), or the terms (<term>) and definitions (<def>) of DefinitionList.
The citations in a <Cite> inline element become <Citation> elements inside a single <citations> element.
For attributes, I preferred a lowercase version (kebab-case when attributes are multi-word). So,
-
Quoted(SingleQuote)becomes<Quoted quote-type="SingleQuote">...</Quoted> -
Math(DisplayMath)becomes<Math math-type="DisplayMath">...</Math> -
RawInlinebecomes<RawInline format="...">...</RawInline>
Items with an Attr behave like this:
-
the identifier goes into the
idattribute -
classes are encoded the same way as HTML, their values are joined with spaces and put in the
classattribute -
other attributes are mapped on XML attributes with the same name (no prefix like
data-in HTML)
You should not have id and class attributes in Attr; in the unfortunate case
you have, they are ignored, because identifier and classes take precedence.
Pandoc can already export documents as ConTeXt .tex files.
It may also call ConTeXt to typeset them as PDF files.
So why typesetting Pandoc (JSON) documents directly with ConTeXt?
Because the standard Pandoc conversion to ConTeXt format can lose much of the information that can be carried by a Pandoc document.
You may use filters to retain some of that information and use it
by injecting RawBlock or RawInline elements of "context" format,
but passing all the items of the Pandoc AST to ConTeXt is easier.
This way, you can also extend the textual elements that Pandoc can handle.
In particular, I'm using some conventions to provide indices and different
kinds of notes (not only footnotes), that are not supported by Pandoc,
and its Writer of the ConTeXt format.
Another way, instead of converting to XML, is parsing the native or JSON Pandoc formats directly in ConTeXt; it's possible, because ConTeXt has Lua libraries to parse JSON or even the native format (through LPEGs). But you must transform the parsed information into ConTeXt macro calls.
I already did some typesetting XML with ConTeXt, and Pandoc internal format can be converted to XML in quite a natural way (see above), so I prefer transforming the JSON files into XML, and have all the tools ConTeXt provides for XML typesetting.
Moreover, the conversion from JSON to XML can be done on the fly in memory by ConTeXt.
Pandoc always converts input files into an internal format, before writing the document in the desired output format.
That internal format carries the maximum of information that can pass through Pandoc, and it is storable as a file in two formats:
-
native: a textual format that represents a document in the way you would instantiate it through Haskell constructors; it's fairly human-readable;
-
json: a JSON representation of the document, where nearly every textual item has a "t" (type) field and a "c" field (content); it's really granular and rather unreadable, despite being a textual format.
The internal model is tree-like, and a transformation of it into XML is pretty straightforward (see above).
Synctex information can be injected in the lua table of the XML representation of a Pandoc JSON document; that information can be later used by a PDF reader to open the source file at the position corresponding to the point clicked in the PDF preview.
In particular, the PDF reader should make pundok-editor, an editor for Pandoc JSON files, open the right JSON file at the start of the paragraph clicked on the PDF preview.
The documents in pundok-editor can be spread in a tree of JSON source files.
Div elements with an include-doc class and a few other attributes are
used as references for the sub-documents to be included as a replacement
of their placeholder contents.
The inclusion and assembling of sub-documents is done with pandoc and a filter: pandoc-include-doc.
Tracing those Div elements and counting the Para elements, it should be possible
to populate the cf and cl fields of XML elements in the XML lua table in ConTeXt.
The cl line is meant for the line in a text source file, in this case I would use
it as a counter of Para, paragraph elements in Pandoc, which is the textual element
that is most similar to a line in a plain text document.
Feeding back those coordinates (source file + paragraph counter), the editor should be able to open the right file, and then count paragraphs to put the cursor at the start of the desired one.
As a refinement, the count should be extended to other paragraph-like blocks
in Pandoc: the ones that contain a list of
Inlines,
like Header, Plain, every line in LineBlock, every term of a DefinitionList
(see Pandoc model).