Just now, this domestically developed model has claimed the top spot on the Artificial Analysis Output Speed Leaderboard

The latest Output Speed ranking from the global authoritative large model evaluation platform Artificial Analysis shows that StepFun’s newest open-source foundational model, Step 3.7 Flash, achieves an output speed of 409 tokens/s, ranking first among mainstream models. It also ranks highly in key metrics including end-to-end response time, intelligence vs. output speed ratio, and output speed vs. price ratio.

The models compared are Artificial Analysis's official default options

This chart compares Step 3.7 Flash with other mainstream Flash models of similar scale

The models compared are Artificial Analysis's official default options

As Agent applications increasingly transition from demonstration to production environments, the criteria for evaluating large models are evolving.

In the past, industry focus centered on benchmark scores and performance in specific areas such as mathematics, code generation, and reasoning. However, in Agent scenarios, a single task often involves multiple stages—including web browsing, information retrieval, document processing, multi-turn reasoning, and tool invocation—requiring models to run continuously over extended periods while frequently interacting with external systems. Under these conditions, user experience and deployment cost are determined not only by raw model capability but also by engineering factors such as response latency, inference cost, system throughput, and stability.

In other words, the era of Agents tests not just whether a model can complete tasks, but how efficiently and economically it does so.

From this perspective, Step 3.7 Flash, recently released, appears more like a model specifically optimized for Agent use cases. Public test data indicates its focus is not on achieving peak capabilities, but rather on balancing model performance, response speed, and inference cost. This optimization direction closely aligns with current industry demands: for Agent systems requiring high-frequency calls and long-running operations, single-inference cost and response latency have a far greater impact on real-world deployment than isolated benchmark scores.

In fact, this reflects a shared trend across the global large model landscape. Whether OpenAI, Anthropic, or Google, new models launched over the past year have all emphasized inference efficiency, real-time interaction capability, and Agent execution proficiency—not merely parameter count or test set performance.

Model competition is shifting from “who is smarter” toward “who can accomplish more real-world tasks at lower cost.”

From community feedback, Step 3.7 Flash has attracted significant developer attention since its release. On X, one developer commented, “This is why speed is becoming as important as intelligence for truly practical AI products. For intelligent tasks, a model that is fast, open, low-cost to serve, and slightly weaker in performance may be more useful than a highly intelligent model that is too slow or too expensive to scale.”

For China’s open-source model ecosystem, the significance of such models may lie less in a single leaderboard ranking and more in participating in the next wave of competition for Agent infrastructure capabilities. As enterprises deploy more autonomous-executing Agent systems, cost-efficiency, engineering usability, and ecosystem compatibility are emerging as critical metrics—on par with raw model capability.

Ultimately, what will determine whether Agents achieve large-scale adoption may not be the most powerful model, but rather the one that strikes the optimal balance among intelligence, speed, and cost.

References:

https://x.com/ArtificialAnlys/status/2062381047212638697

Source: InfoQ

#AI & Large Models

Disclaimer: Contains third-party opinions, does not constitute financial advice

Share To

Recommended Reading

Ant Group Makes Major Investment, Guanglun Intelligence Completes New Funding Round! The Value Center of Physical AI Has Shifted

9 days ago

Node.js to Natively Integrate Virtual File System, AI-Generated Code Sparks Controversy

9 days ago

Design of Multi-Agent Systems in Large-Scale Engineering Scenarios: The Grab Case Study

11 days ago

OpenAI Details WebRTC Architecture for Scalable, Low-Latency Speech AI

12 days ago

Chief Technology Expert of the Huawei HarmonyOS Rapid Response Team, Xie Guo, has confirmed his attendance at AICon Shanghai and will share insights on the evolution of cross-platform frameworks in the AI era, using HarmonyOS as a case study.