Skip to content
Navigation Menu
Toggle navigation
Sign in
Appearance settings
Platform
AI CODE CREATION
GitHub Copilot
Write better code with AI
GitHub Spark
Build and deploy intelligent apps
GitHub Models
Manage and compare prompts
MCP Registry
New
Integrate external tools
DEVELOPER WORKFLOWS
Actions
Automate any workflow
Codespaces
Instant dev environments
Issues
Plan and track work
Code Review
Manage code changes
APPLICATION SECURITY
GitHub Advanced Security
Find and fix vulnerabilities
Code security
Secure your code as you build
Secret protection
Stop leaks before they start
EXPLORE
Why GitHub
Documentation
Blog
Changelog
Marketplace
View all features
Solutions
BY COMPANY SIZE
Enterprises
Small and medium teams
Startups
Nonprofits
BY USE CASE
App Modernization
DevSecOps
DevOps
CI/CD
View all use cases
BY INDUSTRY
Healthcare
Financial services
Manufacturing
Government
View all industries
View all solutions
Resources
EXPLORE BY TOPIC
AI
Software Development
DevOps
Security
View all topics
EXPLORE BY TYPE
Customer stories
Events & webinars
Ebooks & reports
Business insights
GitHub Skills
SUPPORT & SERVICES
Documentation
Customer support
Community forum
Trust center
Partners
View all resources
Open Source
COMMUNITY
GitHub Sponsors
Fund open source developers
PROGRAMS
Security Lab
Maintainer Community
Accelerator
GitHub Stars
Archive Program
REPOSITORIES
Topics
Trending
Collections
Enterprise
ENTERPRISE SOLUTIONS
Enterprise platform
AI-powered developer platform
AVAILABLE ADD-ONS
GitHub Advanced Security
Enterprise-grade security features
Copilot for Business
Enterprise-grade AI features
Premium Support
Enterprise-grade 24/7 support
Pricing
Search or jump to...
Search code, repositories, users, issues, pull requests...
Search syntax tips
Provide feedback
Saved searches
Use saved searches to filter your results more quickly
Sign in
Sign up
Appearance settings
Resetting focus
You signed in with another tab or window.
Reload
to refresh your session.
You signed out in another tab or window.
Reload
to refresh your session.
You switched accounts on another tab or window.
Reload
to refresh your session.
Dismiss alert
{{ message }}
datajuicer
/
data-juicer
Public
Notifications
You must be signed in to change notification settings
Fork
367
Star
6.4k
Code
Issues
36
Pull requests
26
Discussions
Actions
Projects
Wiki
Security and quality
0
Insights
Additional navigation options
Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security and quality
Insights
Issues
Search Issues
is
:
issue
state
:
open
is:issue state:open
Search
Labels
Milestones
New issue
Search results
Open
Closed
RayBasicDeduplicator 懒加载策略导致无法实现全局去重
Status: Open.
#971
In datajuicer/data-juicer;
·
macroguo-ghy
opened
on Apr 28, 2026
[Bug]: KeyError: 'text_formatter'
bug
Something isn't working
Something isn't working
Status: Open.
#965
In datajuicer/data-juicer;
·
LRY1994
opened
on Apr 14, 2026
HDFS connector
question
Further information is requested
Further information is requested
Status: Open.
#941
In datajuicer/data-juicer;
·
ArdalanM
opened
on Mar 17, 2026
[Bug] JSONStreamDatasource locks first-batch schema and fails on later
null -> concrete type
evolution
Status: Open.
#936
In datajuicer/data-juicer;
·
Mark-Wu2003
opened
on Mar 11, 2026
为什么datajuicer在Ray模式下,不支持groupby算子呢?
question
Further information is requested
Further information is requested
Status: Open.
#920
In datajuicer/data-juicer;
·
lsyel
opened
on Feb 25, 2026
Multi-Branch Execution (DAG feature enhancment)
dj:core
issues/PRs about the core functions of Data-Juicer
issues/PRs about the core functions of Data-Juicer
enhancement
New feature or request
New feature or request
Status: Open.
#915
In datajuicer/data-juicer;
·
yxdyc
opened
on Feb 13, 2026
ETL过程的咨询
question
Further information is requested
Further information is requested
Status: Open.
#909
In datajuicer/data-juicer;
·
SeoMP
opened
on Feb 6, 2026
去重算子CPU利用率非常低
question
Further information is requested
Further information is requested
Status: Open.
#879
In datajuicer/data-juicer;
·
zjuAJW
opened
on Jan 8, 2026
perplexity 算子,在计算中文数据集时,都特别大
question
Further information is requested
Further information is requested
Status: Open.
#878
In datajuicer/data-juicer;
·
nuaabuaa07
opened
on Jan 7, 2026
process后根据图片路径无法加载
question
Further information is requested
Further information is requested
Status: Open.
#872
In datajuicer/data-juicer;
·
liumian576
opened
on Dec 31, 2025
视频打分流程 On Ray 长尾问题
question
Further information is requested
Further information is requested
Status: Open.
#866
In datajuicer/data-juicer;
·
INBreezefall
opened
on Dec 23, 2025
支持HDFS或者iceberg数据源
dj:core
issues/PRs about the core functions of Data-Juicer
issues/PRs about the core functions of Data-Juicer
dj:dataset
issues/PRs about the dj-dataset
issues/PRs about the dj-dataset
good first issue
Good for newcomers
Good for newcomers
question
Further information is requested
Further information is requested
Status: Open.
#848
In datajuicer/data-juicer;
·
gao-xiao-long
opened
on Dec 12, 2025
You can’t perform that action at this time.