Ray on Anyscale now available on CoreWeave for AI workloads

This title was summarized by AI from the post below.

55,177 followers

3mo

Excited to announce Ray on Anyscale is now available on CoreWeave to further deliver on our multi-cloud support. This gives users access to CoreWeave’s purpose-built AI infrastructure, optimized for distributed AI at scale with low latency, fast startup times, and high performance GPUs. It's a powerful combination for teams building and scaling production AI workloads. Read more from CoreWeave: https://lnkd.in/dN9AjA3s

Announcing distributed AI on CoreWeave with fully managed Ray on Anyscale coreweave.com

1 Comment

Julian Forero

3mo

Big win Ravindra Gupta

1 Reaction

To view or add a comment, sign in

More Relevant Posts

Pranab P.
1mo
Report this post
I learned this the hard way while scaling an AI system When users batch 1000+ AI tasks, your infra either scales…or burns. And trust me, it usually BURNS FIRST. My first thought was: scaling AI = more GPUs, faster inference, better prompts. But that’s not the real bottleneck. The real problem? Task orchestration. When multiple users trigger concurrent generations, you need to handle: – Async execution – Retries + failures – Credit tracking – Worker stability That’s where Celery + Redis save your system: - Redis Queue: Manages async workloads — no blocked threads, no timeouts - Celery Workers: Scale horizontally when demand spikes - Atomic Updates: Concurrency-safe credit + DB ops - Caching: Stops duplicate LLM calls, saves $$, and compute Once this pipeline clicks: - Backend breathes - Costs drop - Users stay happy Scaling AI isn’t about bigger models - it’s about smarter pipelines. You don’t need more compute. You need better architecture.
Like Comment
To view or add a comment, sign in
Fraser S.
1mo Edited
Report this post
𝐒𝐩𝐞𝐧𝐝𝐢𝐧𝐠 $15𝐊+/𝐦𝐨𝐧𝐭𝐡 𝐨𝐧 𝐬𝐩𝐞𝐞𝐜𝐡-𝐭𝐨-𝐭𝐞𝐱𝐭? 𝐓𝐡𝐞𝐫𝐞'𝐬 𝐚𝐧𝐨𝐭𝐡𝐞𝐫 𝐰𝐚𝐲. As a solutions architect, I've watched customers hit cost ceilings with proprietary speech-to-text(STT) services. A contact center processing 10K hours of calls monthly, can easily spend more than $15K+ on transcriptions alone. The open-source STT landscape has matured in OpenAI Whisper, Mistral AI Voxtral, NVIDIA Parakeet-V2, Microsoft Phi-4, #Nvidia Canary-Qwen-2.5B now rivaling proprietary solutions in accuracy/latency. But evaluating them for your usecase is messy: dependency conflicts, inconsistent APIs, complex setup. So I built 𝑽𝒐𝒙𝑺𝒄𝒓𝒊𝒃𝒆 - a lightweight platform to test multiple open-source STT models through a single interface. #Voxscribe supports 11 STT models including canary-qwen which tops the opensource STT leaderboard on huggingface. Models are cached for reuse, can be tried singly or in a Compare mode with other VoxScribe models. 𝗩𝗼𝘅𝘀𝗰𝗿𝗶𝗯𝗲: ✅ Handles dependency conflicts (transformers version hell = solved) ✅ Compare models side-by-side(Sequentially) ✅ FastAPI backend with clean REST endpoints ✅ Runs on AWS G6.xlarge ($0.805/hr vs. $5K/month) and offers a fixed-transcription cost model for large scale transcription use-cases. 𝐅𝐚𝐢𝐫 𝐰𝐚𝐫𝐧𝐢𝐧𝐠: This is an MVP. There are some bugs I'm still actively fixing. But the core works, and I think it solves a real problem for teams evaluating STT solutions. If you break it please tell me how, if you fix it even better - PRs welcome. Blog Link in the comments.. #OpenSource #AWS #SpeechToText

3 Comments
Like Comment
To view or add a comment, sign in
The Cloud Company

149 followers
1mo
Report this post
🆕 𝗪𝗵𝗮𝘁’𝘀 𝗻𝗲𝘄 𝘄𝗶𝘁𝗵 𝘁𝗵𝗲 𝗔𝗜 𝗛𝘆𝗽𝗲𝗿𝗰𝗼𝗺𝗽𝘂𝘁𝗲𝗿? 𝘃𝗟𝗟𝗠 𝗼𝗻 𝗧𝗣𝗨, 𝗮𝗻𝗱 𝗺𝗼𝗿𝗲 🆕 ➡️ The latest quarterly update on Google Cloud’s AI Hypercomputer introduces major enhancements including a new TPU-optimized backend for vLLM, broader model support, improved profiling tools, and tighter GPU/TPU integration. 🚀 What’s new: • vLLM TPU: A new hardware plugin called tpu-inference brings high-performance TPU support to vLLM, unifying PyTorch and JAX workflows on TPUs with minimal code changes. • Improved hardware / software stack: The update adds broader model coverage (Gemma, Llama, Qwen), better performance than earlier TPU backends, and an upgraded profiling library (XProf Profiler, Cloud Diagnostics XProf) for JAX and PyTorch/XLA. • Expanded tooling and architecture flexibility: New reference recipes for disaggregated inference (e.g., NVIDIA Dynamo on Google Cloud), RL scaling workflows with NVIDIA NeMo RL, and more instrumentation for time-to-first-token (TTFT) and time-per-output-token (TPOT) metrics. 💡 Why it matters For developers, ML engineers and platform teams building large-scale AI systems, these updates mean you can run more open-source, multi-framework models on TPUs with less friction, get deeper insight into performance bottlenecks, and choose architectures (GPU, TPU, disaggregated inference) that better match your model and cost profile. 🔗 Read the full blog here: https://lnkd.in/dZ7vVadC Which model-serving bottleneck is your team most focused on right now: latency, cost, hardware-choice, or framework compatibility? 🤔 #GoogleCloud #AIHypercomputer #TPU #vLLM #Inference #MLInfrastructure #LargeLanguageModels

In Q3 2025, AI Hypercomputer adds vLLM TPU and more | Google Cloud Blog cloud.google.com
Like Comment
To view or add a comment, sign in
Kasun Rathnayaka
1mo
Report this post
Kubernetes is evolving fast for AI/ML workloads, and Google's latest GKE developments are worth noting: • Dynamic Resource Allocation (DRA) - proper GPU/TPU management in core K8s • 65K node clusters supporting 50K TPU chip training jobs • Inference Gateway achieving 30% cost reduction, 60% lower latency • Secondary boot disks: 29x faster container mounting for large ML images What impresses me most: these capabilities are being driven upstream into core Kubernetes, not kept proprietary. DRA and JobSet benefit the entire ecosystem. The infrastructure layer for AI/ML is maturing. Time to rethink how we architect these platforms. https://lnkd.in/gCYv_u3N #Kubernetes #MLOps #CloudArchitecture #AIInfrastructure

Transforming Kubernetes and GKE into the leading platform for AI/ML opensource.googleblog.com
Like Comment
To view or add a comment, sign in
Content Point

115 followers
1mo
Report this post
Big news in AI infrastructure: Tensormesh just secured $4.5M to supercharge server efficiency with advanced KV Caching—promising up to 10x inference improvements. This could be a game-changer for enterprise AI scalability. Would these optimizations impact your AI stack? [via @TechCrunch]

Tensormesh raises $4.5M to squeeze more inference out of AI server loads | TechCrunch https://techcrunch.com

1 Comment
Like Comment
To view or add a comment, sign in
David Pearson
1mo
Report this post
Organisations that want to run large language models (LLMs) on their own infrastructure—whether in private data centres or in the cloud often face significant challenges related to GPU availability, capacity, and cost. Learn how to address these challenges with #RedHat #OpenShift AI. #RHAI #AI

Optimize and deploy LLMs for production with OpenShift AI | Red Hat Developer developers.redhat.com
Like Comment
To view or add a comment, sign in
The New Stack

26,460 followers
1mo Edited
Report this post
ICYMI: Platform teams running AI and machine learning workloads will see immediate benefits from GPU sharing and dynamic allocation capabilities. By Janakiram MSV

Kubernetes Primer: Dynamic Resource Allocation (DRA) for GPU Workloads https://thenewstack.io
Like Comment
To view or add a comment, sign in
Ashish Trada
1mo
Report this post
⚙️ 𝗗𝗮𝘁𝗮𝗯𝗿𝗶𝗰𝗸𝘀 𝗷𝘂𝘀𝘁 𝘁𝗼𝗼𝗸 𝗮 𝗵𝘂𝗴𝗲 𝗹𝗲𝗮𝗽 𝗶𝗻 𝘀𝗶𝗺𝗽𝗹𝗶𝗳𝘆𝗶𝗻𝗴 𝗔𝗜 𝘄𝗼𝗿𝗸𝗹𝗼𝗮𝗱𝘀 — 𝗶𝗻𝘁𝗿𝗼𝗱𝘂𝗰𝗶𝗻𝗴 𝗦𝗲𝗿𝘃𝗲𝗿𝗹𝗲𝘀𝘀 𝗚𝗣𝗨 𝗖𝗼𝗺𝗽𝘂𝘁𝗲! One of the biggest headaches for data and AI teams has always been managing GPU infrastructure — spinning up clusters, tuning configurations, and optimizing costs. Now, Databricks Serverless GPU Compute changes the game completely. 𝗛𝗲𝗿𝗲’𝘀 𝘄𝗵𝗮𝘁 𝗶𝘁 𝗺𝗲𝗮𝗻𝘀 𝗳𝗼𝗿 𝗱𝗮𝘁𝗮 𝗲𝗻��𝗶𝗻𝗲𝗲𝗿𝘀 𝗮𝗻𝗱 𝗠𝗟 𝘁𝗲𝗮𝗺𝘀 👇 💻 𝗡𝗼 𝗜𝗻𝗳𝗿𝗮𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝗛𝗮𝘀𝘀𝗹𝗲𝘀 You don’t need to set up or manage GPU clusters manually. Databricks automatically provisions the right GPU resources based on your workload. ⚡ 𝗜𝗻𝘀𝘁𝗮𝗻𝘁 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 Workloads scale up or down automatically — whether you’re fine-tuning an LLM or running large-scale inference. 💰 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗲𝗱 𝗖𝗼𝘀𝘁𝘀 You pay only for what you use. Serverless GPUs are automatically paused when idle, eliminating wasted spend. 🧠 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗲𝗱 𝘄𝗶𝘁𝗵 𝗠𝗼𝘀𝗮𝗶𝗰 𝗔𝗜 & 𝗠𝗟𝗳𝗹𝗼𝘄 Train, deploy, and monitor models directly within the Databricks ecosystem — with full observability and governance through Unity Catalog. 💡 𝗪𝗵𝘆 𝘁𝗵𝗶𝘀 𝗺𝗮𝘁𝘁𝗲𝗿𝘀: This makes AI development accessible even for teams without deep infrastructure expertise — freeing up time to focus on innovation instead of configuration. 🚀 Serverless + GPU = the best of both worlds for modern data and AI workloads. #Databricks #Serverless #GPUCompute #AI #MLOps #DataEngineering #Lakehouse #MosaicAI
Like Comment
To view or add a comment, sign in
Stefaan Vervaet
1mo
Report this post
2 weeks ago, we participated at The AI Conference in SF, what a vibe!! Some takeaways: 1. AI engineers are becoming AI managers. Software engineers are becoming managers of AI agents, not just code, and are asking for stronger agent orchestration, observability and debugging tools, not just GPUs. 2. Infra is big budget line. Infra costs are ballooning with more data being generated than originally budgeted for and more GPU demand that can be . Unpredictable fees (especially egress) are getting flagged by finance teams and more consistent cost profiles. 3. Vendor lock-in is becoming unacceptable. More teams want to keep optionality open across compute and orchestration platforms. Platforms that offer free data movement are key to provide the flexibility necessary in this growth cycle of tools. 4. Security is top of mind. As adoption grows, so do attack vectors. Advanced teams are asking how to guarantee integrity of LLM weights over time. How do you prove an LLM was not tampered with through data poisoning? Immutable, verifiable logs are key and onchain attestations of LLM snapshots and weights can provide strong integrity validation. Check out the detailed version: https://lnkd.in/eX5m-2m9

Akave Cloud at The AI Conference: Fixing AI Storage Infrastructure akave.com
Like Comment
To view or add a comment, sign in
4D Pipeline

7,114 followers
1mo
Report this post
Planning your 2026 tech roadmap? WebGPU should be on your radar.📍 Browser-based GPU compute is no longer experimental. It's shipping in production browsers today. And it fundamentally changes how AI inference gets deployed. If you're planning AI features or evaluating compute infrastructure for 2026, now's the time to understand where WebGPU fits in your stack. Our latest post explores the technical architecture, real-world implementations, and when to make the transition: https://lnkd.in/gzZiSsDX #WebGPU #AI #WebGL #EnterpriseArchitecture #ClientSideAI

Client-Side AI Is Here: How WebGPU Transforms Your GPU Server Economics blog.4dpipeline.com
Like Comment
To view or add a comment, sign in

55,177 followers

View Profile Connect

Ray on Anyscale now available on CoreWeave for AI workloads

More Relevant Posts

Explore related topics

Explore content categories