Avesha's Smart Scaler introduces a Reinforcement Learning-based intelligent scaling solution for AI workloads, delivering unprecedented performance gains and cost efficiencies.
BOSTON and SAN FRANCISCO, March 19, 2025 /PRNewswire-PRWeb/ -- Avesha's Smart Scaler, part of its Elastic AI Services Suite, for Inference Endpoint scaling and GPU/CPU resource optimization, delivering up to 3x performance gains and reducing inference latency by 75%.
Avesha, a Gartner Cool Vendor and a leader in AI-driven GPU/CPU orchestration, today announced groundbreaking results from its latest benchmarking of Smart Scaler, which dynamically scales GPU resources in proportion to traffic, delivering up to 3x improvement in processing efficiency, 85% larger batch sizes, and 70% higher token throughput per batch for the llambda3-8B model on the Huggingface/TGI framework.. In addition, Smart Scaler demonstrated 2x improvement for the same model on the VLLM framework over TGI and a further 1.5x boost coming from Smart Scaler alone. This enables enterprises to scale AI workloads seamlessly across multiple clusters and cloud environments without overprovisioning or wasted compute.
Smart Scaler, an advanced AI-powered predictive scaling mechanism, dynamically scales resources based on workload demand. The benchmarking results highlight key advantages for AI inferencing and training:
- Higher Instantaneous Throughput: Processed 3X more tokens in a burst enabling faster AI inferencing using the HuggingFace/TGI framework..
- Reduced Latency: AI model inference latency dropped from 8 seconds to 2 seconds.
- Improved Throughput for Industry-Leading AI Models: Llama3-8B workloads had a 31% increase in token throughput, while DeepSeek 7B had a 13.5% boost.
Driving AI Innovation with EGS
For exciting research companies like InpharmD that combine pharmacist expertise with AI to provide state-of-the-art, evidence-based drug information, having the right tools to optimize research and reduce costs is essential.
"With Avesha's Elastic AI Services we're able to optimize our GPU workloads dynamically, ensuring we maximize performance without overpaying for underutilized resources," said Tulasee Rao Chintha, CTO, InpharmD. "This allows us to scale efficiently while keeping our research and operational costs predictable and manageable."
Benchmarking Results Validate EGS Performance
"The benchmarking results speak for themselves—Avesha is setting a new standard for AI workload efficiency for LLMs as well as scientific or specialized models ," said Raj Nair, Founder and CEO at Avesha. Avesha improves interactive performance by 85% and triples overall efficiency, making high-performance AI more accessible and cost-effective for enterprises and startups alike"
Pay-per-work-output pricing
Avesha's innovative high-performance scaling solution enables GPU Cloud Providers to offer pay-per-work-output pricing instead of traditional GPU time-based pricing, significantly reducing costs and making AI development more accessible. This incredible performance improvement creates the opportunity for very competitive pay-per-work-output made feasible by sharing higher performance GPUs – a higher throughput makes the price per work-output lower than a lower priced but slower GPU. .
"With Avesha, startups no longer need to pay for idle GPU hours," added Raj Nair, Founder and CEO at Avesha. "Now, they can only pay for actual AI workloads processed, making it a game-changer for companies creating innovative AI applications while maintaining cost efficiency. We are introducing a FREE Tier for our GPU services available through OCI."
Startups can sign up for the EGS Free Tier today at the OCI Marketplace.
A Hybrid Pricing Model That Maximizes Value
EGS introduces a flexible pricing strategy designed to optimize costs while maintaining high-performance AI scaling:
- Value-Based Pricing – Customers pay for actual performance gains rather than static GPU time.
- On Demand/Spot Pricing – Leverages unused GPU capacity for cost savings.
- Tiered Commitments – Offers long-term cost reductions for enterprise-scale AI workloads.
- Auto-Scaling Capabilities – Dynamically adjusts GPU allocation based on real-time demand.
With this approach, GPU cloud providers also benefit by optimizing resource allocation and monetizing idle capacity efficiently.
About Avesha
Avesha is a pioneer in AI-powered GPU and CPU orchestration & scaling solutions, utilizing Kubernetes to optimize performance across diverse cloud and edge environments. As a Gartner Cool Vendor and a CNCF Sandbox project, Avesha is committed to delivering scalable, high-performance solutions that empower businesses across industries, including finance, retail, media, and healthcare.
For more information, visit www.avesha.io.
Media Contact
Olyvia Rakshit, Avesha, Inc, 1 5046122716, [email protected], www.avesha.io
SOURCE Avesha, Inc

Share this article