Add CJK Friendly Emphasis extension #529

tats-u · 2025-09-14T14:57:19Z

https://github.com/tats-u/markdown-cjk-friendly/

This will reduce the necessity to use the escaped space.

Before (w/ Escaped Space Extension):

太郎は\ **「こんにちわ」**\ と言った。

After:

太郎は**「こんにちわ」**と言った。

tats-u/markdown-cjk-friendly@09202e5

Martin005 · 2025-09-30T12:29:44Z

Hi @yuin. I know you are super-busy. But could you please take a look at this PR and possibly merge it (and release a new version of Goldmark)? 🙂
As Goldmark is used in Hugo, a lot of CJK-based websites built with Hugo would benefit from this change!

yuin · 2025-10-04T11:18:51Z

@Martin005 goldmark already ha an extension that supports CJK emphasis ( extension.WithEscapedSpace ).
I am standing for extension.WithEscapedSpace approach to make emphasis working in CJK. That is more simple and proven way, reStructuredText uses this approach.

fail
ok

(Personally, I don't like the fact that CommonMark incorporates Unicode into its specification in the first place. Such markups hardly exist since Unicode leads unnecessary complexity into spec)

In addition, CommonMark emphasis issues are not only in CJK but for other situations. e.g.

192*.168.*1.1:
- fail
- ok with escaped space

So I do not plan to merge this as built-in functionalities at least now.

@tats-u Could you implement this as your own extension? If you need to modify the core functionality to implement your extension, please create a PR in this repository.

tats-u · 2025-10-04T13:24:13Z

I am standing for extension.WithEscapedSpace approach to make emphasis working in CJK. That is more simple and proven way,

Those that (who) write Markdown are not only humans but also LLMs. LLMs are much more likely to make such a mistake in CJK languages. (Have you not seen it in ChatGPT, GitHub Copilot, or Gemini?) This is proven by the fact that Chinese AI chat interfaces, https://github.com/lobehub/lobe-chat (★66K) and https://github.com/CherryHQ/cherry-studio (★34K) (c.f. ★ of goldmark is 4.3K), understood the necessity of and adopted my remark package. Also Vitepress 2.x adopted my markdown-it package as opt-out (enabled by default). My extension makes it possible to make up for unexpected mistakes. Writers (especially non-programmers and LLMs) do not have to be careful about which characters cause such a strange behavior in most cases. LLMs require more extra effort (instructions) to pay attention on such trouble-making characters only with the espaced space.

https://tats-u.github.io/markdown-cjk-friendly/?s16=KlnOkG8wKgAqAAwwUzCTMGswYTCPMA0wKgAqAGgwAIpjMF8wAjAKAAoAKlnOkG8wKgAqAFMwkzBrMGEwjzAqACoAaDAAimMwXzA&gfm=1&engine=markdown-it

↑writers do not have to pay attention on brackets 「 and 」.

What is the most important is that my extension is compatible with yours. If you meet a corner case that cannot be dealt with mine, you can rely on yours. I recommend to enable both in goldmark or Hugo.

https://github.com/tats-u/goldmark/blob/1c5490c25066d69b6481833e97ed3d82ebedce8c/extension/cjk_test.go#L458

192*.168.*1.1:

I would write 192.*168*.1.1 to emphasis 168 if I were you. I do not think CommonMark authors change this part.

CommonMark emphasis issues are not only in CJK but for other situations.

(Personally, I don't like the fact that CommonMark incorporates Unicode into its specification in the first place. Such markups hardly exist since Unicode leads unnecessary complexity into spec)

It is something like "I don't like Yu Gothic UI. Such a font has quirky hiragana and katakana glyphs.". Yu Gothic UI is a really crappy font but spilled milk cannot be put back in the bowl. What we can do now is to soak up the milk with rags and then clean the floor properly. We have to deal with such a font and a specification like an artificial monster.

Could you implement this as your own extension?

My extension interferes with the Strikethrough extension. I have no idea how to implement it without relying on goldmark main body. This is why I submitted this PR.

https://github.com/tats-u/goldmark/blob/1c5490c25066d69b6481833e97ed3d82ebedce8c/extension/strikethrough.go#L91-L93

https://github.com/tats-u/goldmark/blob/1c5490c25066d69b6481833e97ed3d82ebedce8c/extension/cjk_test.go#L416

yuin · 2025-10-04T16:47:04Z

First of all, I'm not denying your opinion.

It seems that you have implemented this for other libraries as plugins. Could you please do the same for goldmark?

I have no idea how to implement it without relying on goldmark main body.

LLMs may help :) goldmark is widely used, LLMs know a lot of goldmark.

As a final note, if your spec suggestion is adapted as an official spec, I will implement it(or merge PR) promptly.

tats-u · 2025-10-05T00:16:50Z

I just want to take over scanDelimiter of both of the standard emphasis and the GFM strikethrough. markdown-it easily allowed me to do it. Should I fork strikethrough extension like micromark plugin? It is not so terrible since it is not changed so frequently.

Ravlen · 2025-10-14T03:22:03Z

@yuin Regarding this reply: #529 (comment)

I am standing for extension.WithEscapedSpace approach to make emphasis working in CJK. That is more simple and proven way, reStructuredText uses this approach.

I disagree that this is simpler, as it ends up creating two different standards for markdown formatting. We are translating a few thousand markdown files into multiple languages, using Human translators, along with other tooling that includes automation/scripts/AI. But this emphasis issue, if not fixed, would mean an unexpected need to use a different set of markdown syntax rules for each language, to make it work with goldmark.

So using the current workaround, if translating:

English -> French, use standard markdown.
English -> Japanese, start inserting \ around emphasis, but only the ones that also are adjacent to punctuation.
English -> German, use standard markdown.
English -> Chinese, start inserting \ around emphasis, but only the ones that also are adjacent to punctuation.

This is very inconsistent and instead of making things simpler, it creates a huge amount of complexity.

While I cannot verify the code change here (I am not a programmer), enabling one standard for emphasis regardless of the language is logical to me and feels like the best solution for everyone.

And also, regarding thinking of extension.WithEscapedSpace as "an extension that supports CJK emphasis", I feel that this is not real CJK support. Because:

As you suggest, it's for edge cases unrelated to any of the CJK languages, like 192*.168.*1.1
It's not required in all cases, just the punctuation edge cases.
It feels like we need to double escape things. We first add spaces as a kind of escape to the formatting issue, then we need to escape that escape with a \. That does not feel like official support.
CommonMark itself says that inserting spaces around emphasis syntax breaks the spec, so inserting spaces (even escaped ones) feels very much like an anti-pattern: https://spec.commonmark.org/0.30/#example-353
The extremely popular markdownlint strictly prevents adding spaces within emphasis markers: https://github.com/DavidAnson/markdownlint/blob/main/doc/md037.md
- So inserting spaces, even escaped spaces, causes markdownlint to fail. We'd need to set up two different sets of markdownlint configurations, disabling or enabling that check depending on the language used. And on top of that, if disabled, this would mean markdownlint would no longer warn on spaces around delimiters in all cases, even the ones we do want to flag.

yuin · 2025-10-14T05:00:55Z

@Ravlen

Again,

First of all, I'm not denying your opinion.

I'm not going to discuss about spec here.
goldmark is a commonmark parser. You can discuss about spec here.

goldmark is an extensible library.
I'm not going to implement non-standard spec in its core any more.

yuin · 2025-10-14T15:12:17Z

There are people who do not understand no matter how many times I explain, so I will write this again.

I am not denying your opinions.

goldmark is a CommonMark parser, and anything that is not in the CommonMark spec will not be implemented in the core or the builtin extensions any further.

The existing built-in extensions exist only for the following reasons:

Historical reasons: goldmark was originally designed as a replacement for blackfriday, so features included in blackfriday were built-in.
Implementation reasons: Due to the architecture, implementing them as extensions would require major modifications to the core.

There are many PRs that have not been merged for the same reasons as this PR. This is the design philosophy of goldmark.

I think only those who have fully implemented a CommonMark parser can understand this, but the CommonMark spec is extremely complex—even compared to other lightweight markup languages (such as reStructuredText, Asciidoc, etc.). Why has CommonMark become so complex? I do not want to make the goldmark core any more complicated.

Also, as with this PR, I often receive requests like, "I think this spec is better, so please do it like this." It is not possible to fulfill all of these requests at the same time. In fact, trying to fulfill all such requests has led to the proliferation of Markdown-like languages and has broken compatibility.

Therefore, goldmark is designed to be extensible. If you do not like the CommonMark specification or the existing extensions, please feel free to implement your own extensions. If you absolutely need to add functionality to the core in order to implement an external extension, please submit a PR for only that part, and I will incorporate it as much as possible.

Also, I do not intend to have discussions about specifications in the goldmark repository. Specifications that have been incorporated into the CommonMark spec will be implemented as quickly as possible. If you want a specification to be included in the CommonMark spec, please discuss it in the CommonMark spec repository. Specifications that are not included in the CommonMark spec should basically be implemented as external extensions.

Add CJK Friendly Emphasis extension

445795f

tats-u force-pushed the cjk-friendly branch from 49e0150 to 445795f Compare September 14, 2025 15:02

tats-u mentioned this pull request Sep 21, 2025

Rename cjkFriendly to cjkFriendlyEmphasis vuejs/vitepress#4952

Closed

4 tasks

Update IsCJK to Unicode 17

1c5490c

tats-u/markdown-cjk-friendly@09202e5

tats-u force-pushed the cjk-friendly branch from 7773f83 to 1c5490c Compare September 21, 2025 15:08

yuin closed this Oct 14, 2025

Repository owner locked as off-topic and limited conversation to collaborators Oct 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add CJK Friendly Emphasis extension #529

Add CJK Friendly Emphasis extension #529

tats-u commented Sep 14, 2025 •

edited

Loading

Martin005 commented Sep 30, 2025 •

edited

Loading

yuin commented Oct 4, 2025

tats-u commented Oct 4, 2025 •

edited

Loading

yuin commented Oct 4, 2025 •

edited

Loading

tats-u commented Oct 5, 2025

Ravlen commented Oct 14, 2025

yuin commented Oct 14, 2025

yuin commented Oct 14, 2025

Labels

4 participants

Uh oh!

Add CJK Friendly Emphasis extension #529

Add CJK Friendly Emphasis extension #529

Conversation

tats-u commented Sep 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Martin005 commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

yuin commented Oct 4, 2025

tats-u commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

yuin commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tats-u commented Oct 5, 2025

Ravlen commented Oct 14, 2025

yuin commented Oct 14, 2025

yuin commented Oct 14, 2025

Labels

4 participants

tats-u commented Sep 14, 2025 •

edited

Loading

Martin005 commented Sep 30, 2025 •

edited

Loading

tats-u commented Oct 4, 2025 •

edited

Loading

yuin commented Oct 4, 2025 •

edited

Loading