Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
markup: add --citeproc to pandoc converter
Adds the citeproc filter to the pandoc converter.

There are several PRs for it this feature already. However, I think
simply adding `--citeproc` is the cleanest way to enable this feature,
with the option to flesh it out later, e.g., in #7529.

Some PRs and issues attempt adding more config options to Hugo which
indirectly configure pandoc, but I think simply configuring Pandoc via
Pandoc itself is simpler, as it is already possible with two YAML
blocks -- one for Hugo, and one for Pandoc:

    ---
    title: This is the Hugo YAML block
    ---
    ---
    bibliography: assets/pandoc-yaml-block-bibliography.bib
    ...
    Document content with @citation!

There are other useful options, e.g., #4800 attempts to use `nocite`,
which works out of the box with this PR:

    ---
    title: This is the Hugo YAML block
    ---
    ---
    bibliography: assets/pandoc-yaml-block-bibliography.bib
    nocite: |
      @*
    ...
    Document content with no citations but a full bibliography:

    ## Bibliography

Other useful options are `csl: ...` and `link-citations: true`, which
set the path to a custom CSL file and create HTML links between the
references and the bibliography.

The following issues and PRs are related:

- Add support for parsing citations and Jupyter notebooks via Pandoc and/or Goldmark extension #6101
  Bundles multiple requests, this PR tackles citation parsing.

- WIP: Bibliography with Pandoc #4800
  Passes the frontmatter to Pandoc and still uses
  `--filter pandoc-citeproc` instead of `--citeproc`.
- Allow configuring Pandoc #7529
  That PR is much more extensive and might eventually supersede this PR,
  but I think --bibliography and --citeproc should be independent
  options (--bibliography should be optional and citeproc can always be
  specified).
- Pandoc - allow citeproc extension to be invoked, with bibliography. #8610
  Similar to #7529, #8610 adds a new config option to Hugo.
  I think passing --citeproc and letting the users decide on the
  metadata they want to pass to pandoc is better, albeit uglier.
  • Loading branch information
shoeffner committed Mar 20, 2025
commit 861f9d5b7c2f4868861f9e7238eaccc3e88048eb
50 changes: 50 additions & 0 deletions docs/content/en/content-management/bibliography.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
title: Bibliographies in Markdown
linkTitle: Bibliography
description: Include citations and a bibliography in Markdown using LaTeX markup.
categories: [content management]
keywords: [latex,pandoc,citation,reference,bibliography]
menu:
docs:
parent: content-management
weight: 320
weight: 320
toc: true
---

{{< new-in 0.144.0 />}}

## Citations and Bibliographies

[Pandoc](https://pandoc.org) is a universal document converter and can be used to convert markdown files.

With **Pandoc >= 2.11**, you can use [citations](https://pandoc.org/MANUAL.html#extension-citations).
One way is to employ [BibTeX files](https://en.wikibooks.org/wiki/LaTeX/Bibliography_Management#BibTeX) to cite:

```
---
title: Citation document
---
---
bibliography: assets/bibliography.bib
...
This is a citation: @Doe2022
```

Note that Hugo will **not** pass its metadata YAML block to Pandoc; however, it will pass the **second** meta data block, denoted with `---` and `...` to Pandoc.
Thus, all Pandoc-specific settings should go there.

You can also add all elements from a bibliography file (without citing them explicitly) using:

```
---
title: My Publications
---
---
bibliography: assets/bibliography.bib
nocite: |
@*
...
```

It is also possible to provide a custom [CSL style](https://citationstyles.org/authors/) by passing `csl: path-to-style.csl` as a Pandoc option.
6 changes: 6 additions & 0 deletions docs/content/en/content-management/formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,12 @@ Hugo passes these CLI flags when calling the Pandoc executable:
--mathjax
```

If your Pandoc has version 2.11 or later, it also passes this CLI flag:

```text
--citeproc
```

[Pandoc]: https://pandoc.org/

### reStructuredText
Expand Down
46 changes: 45 additions & 1 deletion markup/pandoc/convert.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,12 @@
package pandoc

import (
"bytes"
"sync"

"github.com/gohugoio/hugo/common/hexec"
"github.com/gohugoio/hugo/htesting"
"github.com/gohugoio/hugo/identity"

"github.com/gohugoio/hugo/markup/converter"
"github.com/gohugoio/hugo/markup/internal"
)
Expand Down Expand Up @@ -64,6 +66,9 @@ func (c *pandocConverter) getPandocContent(src []byte, ctx converter.DocumentCon
return src, nil
}
args := []string{"--mathjax"}
if supportsCitations(c.cfg) {
args = append(args[:], "--citeproc")
}
return internal.ExternallyRenderContent(c.cfg, ctx, src, binaryName, args)
}

Expand All @@ -76,6 +81,45 @@ func getPandocBinaryName() string {
return ""
}

var pandocSupportsCiteprocOnce sync.Once
var pandocSupportsCiteproc bool

// getPandocSupportsCiteproc runs a dump-args to determine if pandoc knows the --citeproc argument
func getPandocSupportsCiteproc(cfg converter.ProviderConfig) (bool, error) {
var err error

pandocSupportsCiteprocOnce.Do(func() {
argsv := []any{"--dump-args", "--citeproc"}

var out bytes.Buffer
argsv = append(argsv, hexec.WithStdout(&out))

cmd, err := cfg.Exec.New(pandocBinary, argsv...)
if err != nil {
pandocSupportsCiteproc = false
return
}

err = cmd.Run()
if err != nil {
pandocSupportsCiteproc = false
return
}
pandocSupportsCiteproc = true
})

return pandocSupportsCiteproc, err
}

// supportsCitations returns true if citeproc is available
func supportsCitations(cfg converter.ProviderConfig) bool {
if Supports() {
supportsCiteproc, err := getPandocSupportsCiteproc(cfg)
return supportsCiteproc && err == nil
}
return false
}

// Supports returns whether Pandoc is installed on this computer.
func Supports() bool {
hasBin := getPandocBinaryName() != ""
Expand Down
106 changes: 102 additions & 4 deletions markup/pandoc/convert_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ import (
qt "github.com/frankban/quicktest"
)

func TestConvert(t *testing.T) {
func setupTestConverter(t *testing.T) (*qt.C, converter.Converter, converter.ProviderConfig) {
if !Supports() {
t.Skip("pandoc not installed")
}
Expand All @@ -34,11 +34,109 @@ func TestConvert(t *testing.T) {
var err error
sc.Exec.Allow, err = security.NewWhitelist("pandoc")
c.Assert(err, qt.IsNil)
p, err := Provider.New(converter.ProviderConfig{Exec: hexec.New(sc, "", loggers.NewDefault()), Logger: loggers.NewDefault()})
cfg := converter.ProviderConfig{Exec: hexec.New(sc, "", loggers.NewDefault()), Logger: loggers.NewDefault()}
p, err := Provider.New(cfg)
c.Assert(err, qt.IsNil)
conv, err := p.New(converter.DocumentContext{})
c.Assert(err, qt.IsNil)
b, err := conv.Convert(converter.RenderContext{Src: []byte("testContent")})
return c, conv, cfg
}

func TestConvert(t *testing.T) {
c, conv, _ := setupTestConverter(t)
output, err := conv.Convert(converter.RenderContext{Src: []byte("testContent")})
c.Assert(err, qt.IsNil)
c.Assert(string(output.Bytes()), qt.Equals, "<p>testContent</p>\n")
}

func runCiteprocTest(t *testing.T, content string, expectContained []string, expectNotContained []string) {
c, conv, cfg := setupTestConverter(t)
if !supportsCitations(cfg) {
t.Skip("pandoc does not support citations")
}
output, err := conv.Convert(converter.RenderContext{Src: []byte(content)})
c.Assert(err, qt.IsNil)
c.Assert(string(b.Bytes()), qt.Equals, "<p>testContent</p>\n")
for _, expected := range expectContained {
c.Assert(string(output.Bytes()), qt.Contains, expected)
}
for _, notExpected := range expectNotContained {
c.Assert(string(output.Bytes()), qt.Not(qt.Contains), notExpected)
}
}

func TestGetPandocSupportsCiteprocCallTwice(t *testing.T) {
c, _, cfg := setupTestConverter(t)

supports1, err1 := getPandocSupportsCiteproc(cfg)
supports2, err2 := getPandocSupportsCiteproc(cfg)
c.Assert(supports1, qt.Equals, supports2)
c.Assert(err1, qt.IsNil)
c.Assert(err2, qt.IsNil)
}

func TestCiteprocWithHugoMeta(t *testing.T) {
content := `
---
title: Test
published: 2022-05-30
---
testContent
`
expected := []string{"testContent"}
unexpected := []string{"Doe", "Mustermann", "2022", "Treatise"}
runCiteprocTest(t, content, expected, unexpected)
}

func TestCiteprocWithPandocMeta(t *testing.T) {
content := `
---
---
---
...
testContent
`
expected := []string{"testContent"}
unexpected := []string{"Doe", "Mustermann", "2022", "Treatise"}
runCiteprocTest(t, content, expected, unexpected)
}

func TestCiteprocWithBibliography(t *testing.T) {
content := `
---
---
---
bibliography: testdata/bibliography.bib
...
testContent
`
expected := []string{"testContent"}
unexpected := []string{"Doe", "Mustermann", "2022", "Treatise"}
runCiteprocTest(t, content, expected, unexpected)
}

func TestCiteprocWithExplicitCitation(t *testing.T) {
content := `
---
---
---
bibliography: testdata/bibliography.bib
...
@Doe2022
`
expected := []string{"Doe", "Mustermann", "2022", "Treatise"}
runCiteprocTest(t, content, expected, []string{})
}

func TestCiteprocWithNocite(t *testing.T) {
content := `
---
---
---
bibliography: testdata/bibliography.bib
nocite: |
@*
...
`
expected := []string{"Doe", "Mustermann", "2022", "Treatise"}
runCiteprocTest(t, content, expected, []string{})
}
6 changes: 6 additions & 0 deletions markup/pandoc/testdata/bibliography.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
@article{Doe2022,
author = "Jane Doe and Max Mustermann",
title = "A Treatise on Hugo Tests",
journal = "Hugo Websites",
year = "2022",
}