-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Description
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version:
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:Current Behavior
milvus-mixcoord log:
[2025/11/03 09:30:45.142 +00:00] [INFO] [datacoord/server.go:345] ["init rootcoord client done"]
[2025/11/03 09:30:45.142 +00:00] [ERROR] [datacoord/server.go:525] ["chunk manager init failed"] [error="Endpoint url cannot have fully qualified paths."] [stack="github.com/milvus-io/milvus/internal/datacoord.(*Server).newChunkManagerFactory\n\t/workspace/source/internal/datacoord/server.go:525\ngithub.com/milvus-io/milvus/internal/datacoord.(*Server).initDataCoord\n\t/workspace/source/internal/datacoord/server.go:350\ngithub.com/milvus-io/milvus/internal/datacoord.(*Server).Init\n\t/workspace/source/internal/datacoord/server.go:336\ngithub.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).init\n\t/workspace/source/internal/distributed/datacoord/service.go:129\ngithub.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).Run\n\t/workspace/source/internal/distributed/datacoord/service.go:256\ngithub.com/milvus-io/milvus/cmd/components.(*DataCoord).Run\n\t/workspace/source/cmd/components/data_coord.go:52\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/workspace/source/cmd/roles/roles.go:121"]
[2025/11/03 09:30:45.142 +00:00] [ERROR] [datacoord/service.go:130] ["dataCoord init error"] [error="Endpoint url cannot have fully qualified paths."] [stack="github.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).init\n\t/workspace/source/internal/distributed/datacoord/service.go:130\ngithub.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).Run\n\t/workspace/source/internal/distributed/datacoord/service.go:256\ngithub.com/milvus-io/milvus/cmd/components.(*DataCoord).Run\n\t/workspace/source/cmd/components/data_coord.go:52\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/workspace/source/cmd/roles/roles.go:121"]
[2025/11/03 09:30:45.142 +00:00] [ERROR] [components/data_coord.go:53] ["DataCoord starts error"] [error="Endpoint url cannot have fully qualified paths."] [stack="github.com/milvus-io/milvus/cmd/components.(*DataCoord).Run\n\t/workspace/source/cmd/components/data_coord.go:53\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/workspace/source/cmd/roles/roles.go:121"]
panic: Endpoint url cannot have fully qualified paths.
goroutine 250 [running]:
panic({0x583f9e0?, 0xc0010a1800?})
/usr/local/go/src/runtime/panic.go:1017 +0x3ac fp=0xc000047f70 sp=0xc000047ec0 pc=0x1e65bec
github.com/milvus-io/milvus/cmd/roles.runComponent[...].func1()
/workspace/source/cmd/roles/roles.go:122 +0x108 fp=0xc000047fe0 sp=0xc000047f70 pc=0x4e8b328
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000047fe8 sp=0xc000047fe0 pc=0x1e9f441
created by github.com/milvus-io/milvus/cmd/roles.runComponent[...] in goroutine 1
/workspace/source/cmd/roles/roles.go:113 +0x138
root@k8s-master-01:/data/zhizixiyuan/poc-code/middleware/milvus# kubectl get pods -n mw-ecmas |grep mil
milvus-attu-69c8bcfb8d-fv6wd 1/1 Running 0 17m
milvus-datanode-6f84994c7-xtk5c 0/1 Running 5 (67s ago) 17m
milvus-etcd-0 1/1 Running 0 3m17s
milvus-etcd-1 1/1 Running 0 4m19s
milvus-etcd-2 1/1 Running 0 5m22s
milvus-indexnode-79676c658d-rvfsx 1/1 Running 1 (17m ago) 17m
milvus-minio-0 1/1 Running 0 17m
milvus-minio-1 1/1 Running 0 17m
milvus-minio-2 1/1 Running 0 17m
milvus-minio-3 1/1 Running 0 17m
milvus-mixcoord-697f5684cc-7fvd2 0/1 CrashLoopBackOff 7 (2m54s ago) 13m
milvus-proxy-5654f85d9c-rm6w8 0/1 Running 5 (35s ago) 17m
milvus-pulsar-bookie-0 1/1 Running 0 17m
milvus-pulsar-bookie-1 1/1 Running 0 17m
milvus-pulsar-bookie-2 1/1 Running 0 17m
milvus-pulsar-broker-0 1/1 Running 0 17m
milvus-pulsar-proxy-0 1/1 Running 0 17m
milvus-pulsar-recovery-0 1/1 Running 0 17m
milvus-pulsar-zookeeper-0 1/1 Running 0 17m
milvus-pulsar-zookeeper-1 1/1 Running 0 17m
milvus-pulsar-zookeeper-2 1/1 Running 0 16m
milvus-querynode-6d9d87dfdb-q4wg6 0/1 CrashLoopBackOff 8 (51s ago) 17m
Expected Behavior
milvus-mixcoord running normally
Steps To Reproduce
Summary of Reproduction Steps
To reproduce issues in Milvus, such as metadata-storage mismatches, you need to set up a controlled environment, induce specific failures (e.g., partial data deletion), and observe the system's behavior. Below is a step-by-step guide.
Set Up a Test Milvus Cluster
Deploy a minimal Milvus cluster using Docker Compose or Kubernetes (e.g., via Helm). For example, with Docker Compose, use the official docker-compose.yml from Milvus documentation. Ensure dependencies like etcd (for metadata) and MinIO (for object storage) are included.
Verify the cluster is healthy by checking component logs and running a basic status command (e.g., curl http://localhost:9091/healthz for Milvus standalone). Confirm all pods (if using Kubernetes) are in "Running" state, particularly etcd, MinIO, and Milvus components like rootcoord, datacoord, and querynode.
Configure Data Operations to Induce Inconsistency
Create a collection and insert sample data using Milvus SDKs (e.g., PyMilvus). For instance, define a schema, insert vectors, and build an index. This populates metadata in etcd and files in MinIO.
Simulate a data inconsistency scenario:
Manually delete specific data files in MinIO (e.g., via mc rm command for MinIO) that correspond to inserted segments, but avoid touching etcd. This mimics accidental deletion or GC conflicts.
Alternatively, corrupt etcd entries by directly modifying/deleting keys (e.g., using etcdctl del on segment metadata) while leaving MinIO files intact. This tests metadata corruption.
Trigger Operations That Expose the Issue
Perform operations that rely on the inconsistent data, such as searching or loading collections. For example, use a PyMilvus script to execute a search query on the affected collection.
Monitor logs of Milvus components (e.g., querynode, datacoord) for errors like "No such key" (indicating missing MinIO files) or "segment not found" (indicating metadata issues). Use commands like kubectl logs -f [pod-name] to track errors in real-time.
Document and Validate the Reproduction
Record the exact conditions, including Milvus version, configuration (e.g., GC settings in milvus.yaml), and the sequence of operations. Tools like milvus-backup can help capture the state pre/post issue.
Reproduce the issue multiple times to ensure consistency. Vary parameters like data size or GC intervals to test robustness. For instance, adjust dataCoord.gc.interval to see if shorter intervals increase inconsistency frequency.
Note: These steps are based on common practices for reproducing Milvus issues. Always test in a non-production environment first. If you need specifics for a particular error (e.g., version-specific commands), provide additional details for tailored steps.Milvus Log
[2025/11/03 09:30:45.142 +00:00] [INFO] [datacoord/server.go:345] ["init rootcoord client done"]
[2025/11/03 09:30:45.142 +00:00] [ERROR] [datacoord/server.go:525] ["chunk manager init failed"] [error="Endpoint url cannot have fully qualified paths."] [stack="github.com/milvus-io/milvus/internal/datacoord.(*Server).newChunkManagerFactory\n\t/workspace/source/internal/datacoord/server.go:525\ngithub.com/milvus-io/milvus/internal/datacoord.(*Server).initDataCoord\n\t/workspace/source/internal/datacoord/server.go:350\ngithub.com/milvus-io/milvus/internal/datacoord.(*Server).Init\n\t/workspace/source/internal/datacoord/server.go:336\ngithub.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).init\n\t/workspace/source/internal/distributed/datacoord/service.go:129\ngithub.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).Run\n\t/workspace/source/internal/distributed/datacoord/service.go:256\ngithub.com/milvus-io/milvus/cmd/components.(*DataCoord).Run\n\t/workspace/source/cmd/components/data_coord.go:52\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/workspace/source/cmd/roles/roles.go:121"]
[2025/11/03 09:30:45.142 +00:00] [ERROR] [datacoord/service.go:130] ["dataCoord init error"] [error="Endpoint url cannot have fully qualified paths."] [stack="github.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).init\n\t/workspace/source/internal/distributed/datacoord/service.go:130\ngithub.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).Run\n\t/workspace/source/internal/distributed/datacoord/service.go:256\ngithub.com/milvus-io/milvus/cmd/components.(*DataCoord).Run\n\t/workspace/source/cmd/components/data_coord.go:52\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/workspace/source/cmd/roles/roles.go:121"]
[2025/11/03 09:30:45.142 +00:00] [ERROR] [components/data_coord.go:53] ["DataCoord starts error"] [error="Endpoint url cannot have fully qualified paths."] [stack="github.com/milvus-io/milvus/cmd/components.(*DataCoord).Run\n\t/workspace/source/cmd/components/data_coord.go:53\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/workspace/source/cmd/roles/roles.go:121"]
panic: Endpoint url cannot have fully qualified paths.
goroutine 250 [running]:
panic({0x583f9e0?, 0xc0010a1800?})
/usr/local/go/src/runtime/panic.go:1017 +0x3ac fp=0xc000047f70 sp=0xc000047ec0 pc=0x1e65bec
github.com/milvus-io/milvus/cmd/roles.runComponent[...].func1()
/workspace/source/cmd/roles/roles.go:122 +0x108 fp=0xc000047fe0 sp=0xc000047f70 pc=0x4e8b328
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000047fe8 sp=0xc000047fe0 pc=0x1e9f441
created by github.com/milvus-io/milvus/cmd/roles.runComponent[...] in goroutine 1
/workspace/source/cmd/roles/roles.go:113 +0x138
Anything else?
Milvus is deployed using Helm, and the Helm configuration file has not been modified, which led to this issue