[Performance]CUDA provider Inference dynamic onnx model is very slow!!!

Describe the issue

Performance Test PC

Hardware	Summary
Windows	Windows 11 Pro OS Version 25H2
CPU	Intel Core Ultra 9 285k 3.7GHz
RAM	DDR5 128GB speed 4400MT/s
GPU	NVIDIA RTX 5090 32G CUDA12.9
Storage	SSD 2TB

Performance Test Data

Images: 60 images (image size: 1180x92)

PP-OCR Model: ch_PP-OCRv5_det_mobile, ch_PP-OCRv5_rec_mobile, ch_PP-LCNet_x0_25_textline_ori_cls_mobile
PP-OCR export to onnx format Obtaining ONNX Models or download from rapid-ocr Model List.

OnnxRuntime inference tool RapidOCRSharpOnnx

CUDA Inference
Microsoft.ML.OnnxRuntime.Gpu.Windows Version="1.24.4"

Time: 00:01:03.3999016 s

DirectML Inference

Microsoft.ML.OnnxRuntime.DirectML Version="1.24.4"

Time: 00:00:11.1410425 s

CUDA provider Inference dynamic onnx model is very slow!!! but DirectML is normal speed

To reproduce

dotnet add package RapidOCRSharpOnnx
dotnet add package OpenCvSharp4.runtime.win
dotnet add package Microsoft.ML.OnnxRuntime.Gpu.Windows

CUDA Inference

        private static void TestParallelBatch()
        {

            //string detectPath = @"D:\code\RapidOCR-3.8.0\python\rapidocr\models\ch_PP-OCRv4_det_mobile.onnx";
            //string recogPath = @"D:\code\RapidOCR-3.8.0\python\rapidocr\models\ch_PP-OCRv4_rec_mobile.onnx";
            //string clsPath = @"D:\code\RapidOCR-3.8.0\python\rapidocr\models\ch_ppocr_mobile_v2.0_cls_mobile.onnx";

            //string saveDir = null;
            string detectPath = @"C:\deeplearning\gitCode\meloht\RapidOCRSharpOnnx\RapidOCRSharpOnnx.TestCommon\Models\ch_PP-OCRv5_det_mobile.onnx";
            string recogPath = @"C:\deeplearning\gitCode\meloht\RapidOCRSharpOnnx\RapidOCRSharpOnnx.TestCommon\Models\ch_PP-OCRv5_rec_mobile.onnx";
            string clsPath = @"C:\deeplearning\gitCode\meloht\RapidOCRSharpOnnx\RapidOCRSharpOnnx.TestCommon\Models\ch_PP-LCNet_x0_25_textline_ori_cls_mobile.onnx";
            //string saveDir = @"C:\code\model\OCRTestImagesResults";

            using RapidOCRSharp ocr = new RapidOCRSharp(new ExecutionProviderCUDA(new OcrConfig(detectPath, recogPath, LangRec.CH, OCRVersion.PPOCRV5, clsPath), _deviceId));
            var list = Directory.GetFiles(@"C:\FtpFiles\OCRTestImages");
            Stopwatch sw = new Stopwatch();
            sw.Start();
            var resPath = ocr.BatchParallelAsync(list.ToList(), receiveAction: ReceiveResult);
            sw.Stop();
            Console.WriteLine($"BatchAsync Time: {sw.Elapsed} s");


            Console.WriteLine("end");
        }

DirectML Inference

dotnet add package Microsoft.ML.OnnxRuntime.DirectML

        private static void TestParallelBatch()
        {

            //string detectPath = @"D:\code\RapidOCR-3.8.0\python\rapidocr\models\ch_PP-OCRv4_det_mobile.onnx";
            //string recogPath = @"D:\code\RapidOCR-3.8.0\python\rapidocr\models\ch_PP-OCRv4_rec_mobile.onnx";
            //string clsPath = @"D:\code\RapidOCR-3.8.0\python\rapidocr\models\ch_ppocr_mobile_v2.0_cls_mobile.onnx";

            //string saveDir = null;
            string detectPath = @"C:\deeplearning\gitCode\meloht\RapidOCRSharpOnnx\RapidOCRSharpOnnx.TestCommon\Models\ch_PP-OCRv5_det_mobile.onnx";
            string recogPath = @"C:\deeplearning\gitCode\meloht\RapidOCRSharpOnnx\RapidOCRSharpOnnx.TestCommon\Models\ch_PP-OCRv5_rec_mobile.onnx";
            string clsPath = @"C:\deeplearning\gitCode\meloht\RapidOCRSharpOnnx\RapidOCRSharpOnnx.TestCommon\Models\ch_PP-LCNet_x0_25_textline_ori_cls_mobile.onnx";
            //string saveDir = @"C:\code\model\OCRTestImagesResults";

            using RapidOCRSharp ocr = new RapidOCRSharp(new ExecutionProviderDirectML(new OcrConfig(detectPath, recogPath, LangRec.CH, OCRVersion.PPOCRV5, clsPath), _deviceId));
            var list = Directory.GetFiles(@"C:\FtpFiles\OCRTestImages");
            Stopwatch sw = new Stopwatch();
            sw.Start();
            var resPath = ocr.BatchParallelAsync(list.ToList(), receiveAction: ReceiveResult);
            sw.Stop();
            Console.WriteLine($"BatchAsync Time: {sw.Elapsed} s");


            Console.WriteLine("end");
        }

Urgency

Platform

Windows

OS Version

windows 11 Pro

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

Microsoft.ML.OnnxRuntime.Gpu.Windows 1.24.4

ONNX Runtime API

C#

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

Microsoft.ML.OnnxRuntime.Gpu.Windows 1.24.4

Model File

Models.zip

Is this a quantized model?

Unknown

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance]CUDA provider Inference dynamic onnx model is very slow!!! #28305

Describe the issue

Performance Test PC

Performance Test Data

CUDA provider Inference dynamic onnx model is very slow!!! but DirectML is normal speed

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Performance]CUDA provider Inference dynamic onnx model is very slow!!! #28305

Description

Describe the issue

Performance Test PC

Performance Test Data

CUDA provider Inference dynamic onnx model is very slow!!! but DirectML is normal speed

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Model File

Is this a quantized model?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions