-
-
Notifications
You must be signed in to change notification settings - Fork 281
Add CJK Friendly Emphasis extension #529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
49e0150 to
445795f
Compare
7773f83 to
1c5490c
Compare
|
Hi @yuin. I know you are super-busy. But could you please take a look at this PR and possibly merge it (and release a new version of Goldmark)? 🙂 |
|
@Martin005 goldmark already ha an extension that supports CJK emphasis ( (Personally, I don't like the fact that CommonMark incorporates Unicode into its specification in the first place. Such markups hardly exist since Unicode leads unnecessary complexity into spec) In addition, CommonMark emphasis issues are not only in CJK but for other situations. e.g.
So I do not plan to merge this as built-in functionalities at least now. @tats-u Could you implement this as your own extension? If you need to modify the core functionality to implement your extension, please create a PR in this repository. |
Those that (who) write Markdown are not only humans but also LLMs. LLMs are much more likely to make such a mistake in CJK languages. (Have you not seen it in ChatGPT, GitHub Copilot, or Gemini?) This is proven by the fact that Chinese AI chat interfaces, https://github.com/lobehub/lobe-chat (★66K) and https://github.com/CherryHQ/cherry-studio (★34K) (c.f. ★ of goldmark is 4.3K), understood the necessity of and adopted my remark package. Also Vitepress 2.x adopted my markdown-it package as opt-out (enabled by default). My extension makes it possible to make up for unexpected mistakes. Writers (especially non-programmers and LLMs) do not have to be careful about which characters cause such a strange behavior in most cases. LLMs require more extra effort (instructions) to pay attention on such trouble-making characters only with the espaced space. ↑writers do not have to pay attention on brackets 「 and 」. What is the most important is that my extension is compatible with yours. If you meet a corner case that cannot be dealt with mine, you can rely on yours. I recommend to enable both in goldmark or Hugo.
I would write
It is something like "I don't like Yu Gothic UI. Such a font has quirky hiragana and katakana glyphs.". Yu Gothic UI is a really crappy font but spilled milk cannot be put back in the bowl. What we can do now is to soak up the milk with rags and then clean the floor properly. We have to deal with such a font and a specification like an artificial monster.
My extension interferes with the Strikethrough extension. I have no idea how to implement it without relying on goldmark main body. This is why I submitted this PR. |
|
First of all, I'm not denying your opinion. It seems that you have implemented this for other libraries as plugins. Could you please do the same for goldmark?
LLMs may help :) goldmark is widely used, LLMs know a lot of goldmark. As a final note, if your spec suggestion is adapted as an official spec, I will implement it(or merge PR) promptly. |
|
I just want to take over scanDelimiter of both of the standard emphasis and the GFM strikethrough. markdown-it easily allowed me to do it. Should I fork strikethrough extension like micromark plugin? It is not so terrible since it is not changed so frequently. |
|
@yuin Regarding this reply: #529 (comment)
I disagree that this is simpler, as it ends up creating two different standards for markdown formatting. We are translating a few thousand markdown files into multiple languages, using Human translators, along with other tooling that includes automation/scripts/AI. But this emphasis issue, if not fixed, would mean an unexpected need to use a different set of markdown syntax rules for each language, to make it work with goldmark. So using the current workaround, if translating:
This is very inconsistent and instead of making things simpler, it creates a huge amount of complexity. While I cannot verify the code change here (I am not a programmer), enabling one standard for emphasis regardless of the language is logical to me and feels like the best solution for everyone. And also, regarding thinking of
|
|
There are people who do not understand no matter how many times I explain, so I will write this again. I am not denying your opinions. goldmark is a CommonMark parser, and anything that is not in the CommonMark spec will not be implemented in the core or the builtin extensions any further. The existing built-in extensions exist only for the following reasons:
There are many PRs that have not been merged for the same reasons as this PR. This is the design philosophy of goldmark. I think only those who have fully implemented a CommonMark parser can understand this, but the CommonMark spec is extremely complex—even compared to other lightweight markup languages (such as reStructuredText, Asciidoc, etc.). Why has CommonMark become so complex? I do not want to make the goldmark core any more complicated. Also, as with this PR, I often receive requests like, "I think this spec is better, so please do it like this." It is not possible to fulfill all of these requests at the same time. In fact, trying to fulfill all such requests has led to the proliferation of Markdown-like languages and has broken compatibility. Therefore, goldmark is designed to be extensible. If you do not like the CommonMark specification or the existing extensions, please feel free to implement your own extensions. If you absolutely need to add functionality to the core in order to implement an external extension, please submit a PR for only that part, and I will incorporate it as much as possible. Also, I do not intend to have discussions about specifications in the goldmark repository. Specifications that have been incorporated into the CommonMark spec will be implemented as quickly as possible. If you want a specification to be included in the CommonMark spec, please discuss it in the CommonMark spec repository. Specifications that are not included in the CommonMark spec should basically be implemented as external extensions. |
https://github.com/tats-u/markdown-cjk-friendly/
This will reduce the necessity to use the escaped space.
Before (w/ Escaped Space Extension):
After: