NVIDIA CEO Huang Renxun announces full-scale production of Rubin, alongside the world's most powerful CPU

Yesterday, at NVIDIA’s GTC conference in Taipei, Taiwan, CEO Jensen Huang once again focused the discussion on the future direction of the AI industry.

Different from two years ago, when the emphasis was on the generative AI wave, Huang now presents a new assessment:

"Generative AI has arrived; practical AI has arrived."

The Era of Practical AI Has Arrived

In his view, the most significant change in the AI industry over the past few years is not the continued growth of model parameter scale, but rather AI beginning to function as a genuine production tool that directly impacts economic activity.

To illustrate this shift, Huang first presented data from the code hosting platform GitHub. He pointed out that software development is one of the earliest domains where generative AI has been applied and also represents one of the largest global communities of knowledge workers. Currently, approximately 30 to 40 million professional software engineers worldwide rely on programming for their livelihood, with hundreds of millions more students and amateur developers participating.

In his presentation, GitHub commit volume served as a key metric measuring changes in AI-driven productivity:

2023: ~300 million commits;

2024: increased to 400 million commits;

2025: reached 500 million commits;

And early 2026 data already shows multiple times the previous year's growth rate.

Huang believes these figures reflect how AI-assisted coding tools are significantly boosting software development efficiency.

"Software engineers around the world generate roughly $3 trillion in wage value," he said. "And this software, in turn, underpins nearly $100 trillion in global economic activity."

Based on his estimation, if AI can multiply the productivity of software developers, the resulting economic value will far exceed the scope of the software industry itself.

In recent years, with rapid advancements in code generation tools, the debate over whether programmers will be replaced by AI has remained a central topic. Huang offered a clear response during his speech.

He argues that AI development will not reduce the number of software engineers but instead stimulate companies to hire more developers. The logic is straightforward: if an engineer can produce significantly higher output with AI assistance, companies are more likely to expand R&D investment rather than cut R&D teams.

"Saying AI will reduce employment is pure nonsense," Huang stated.

In his view, what truly determines employment levels is not unit labor cost, but unit labor’s ability to create value. When software engineers can accomplish more work with AI support, market demand for software and digital capabilities will further expand.

Huang then shifted focus to AI infrastructure. He noted that as AI moves from lab environments into real-world production settings, the industry’s focus has transitioned from model capability to token output capacity.

In the past, tokens were merely technical metrics during model execution; today, tokens have become units capable of directly generating revenue. In other words: AI companies are no longer producing traditional software products, but continuously generating tokens.

Whoever can produce more tokens at lower cost and higher efficiency holds stronger commercial competitiveness.

"Because tokens have now become profit units—tokens are now revenue-generating profit units. Since they can now generate profit, AI companies want to build more tokens, generate more tokens, and construct more AI factories. This is precisely why Taiwan’s compute demand is surging. And that’s why you’re all so busy, and your businesses are performing so well. In fact, just look at some of your stock prices," Huang said.

This explains why data center construction is intensifying globally and why AI compute demand in Taiwan is growing rapidly.

In his description, AI factories (AI Factories) are gradually replacing traditional data centers as the core of the next wave of computing infrastructure development.

From Application Era to Agent Era

However, according to Huang, the greater transformation lies not merely in model performance improvements, but in a fundamental shift in computational paradigms.

For decades, computers followed the pattern: application → code → operating system, where users completed tasks by clicking interfaces or entering commands.

In the AI era, a new architecture is emerging: agent → large language model → tool system.

Huang displayed a typical agent system architecture diagram.

In this architecture, the large language model handles problem understanding, reasoning, and planning; the peripheral framework manages context, invokes tools, coordinates task execution, and handles both short-term and long-term memory. To complete tasks, agents can call upon: web browsers, databases, spreadsheet tools, data analytics engines, CAD design software, and various enterprise systems.

The entire process resembles a digital employee rather than traditional software. "In the past, we launched applications, clicked buttons, entered content," Huang said. "In the future, we simply explain our intent to AI, and it automatically writes code, calls tools, and completes tasks."

The rise of agents has sparked another debate: if AI can perform work, will software companies become obsolete?

Huang’s answer is the opposite.

He believes the agent era will give rise to far more software systems than exist today. The reason is that the number of digital agents is no longer limited by population size. In the future, every business process, every operational step, even every individual task, could have its own dedicated agent. And these agents require extensive use of external tools and services to complete their work.

Thus, software will not disappear—it will need to exist in an “AI-callable” form.

"This is one of the best eras for the software industry," Huang stated.

Under this context, NVIDIA’s long-standing CUDA ecosystem is poised for new opportunities.

Previously, CUDA libraries primarily served developers; today, these capabilities can be directly invoked by agents, becoming the toolkit used by agents when executing tasks. In essence, Huang’s message is clear: in the generative AI era, the discussion revolves around what models can do; in the practical AI era, the focus shifts to what models can accomplish.

When AI begins generating revenue, driving GDP growth, and executing complex tasks through agents calling tools, it ceases to be just a chatbot and evolves into a new computing platform.

"NVIDIA Is First and Foremost a Software Company"

After discussing the paradigm shift brought by agents, Huang reiterated a point he has repeatedly emphasized in recent years:

NVIDIA is fundamentally a software company.

Subsequently, Huang elaborated on the core architecture and operational logic of AI agents.

He explained that agents represent the ultimate decoupled and distributed computing model, requiring coordination across vast arrays of diverse compute units. A complete AI agent consists of five core components: model, framework, tools, skills, and runtime, each running on different nodes within data centers. He likened this structure to a working individual: the model acts as the agent’s "brain," responsible for thinking and decision-making; the framework is the "body," carrying the overall operation; the runtime functions like a dedicated studio, enabling various tools to execute effectively. The entire system operates at massive scale to orchestrate compute resources and execute tasks.

According to him, every workflow within an agent is broken down into discrete steps executed across different computer modules. Large language models handle core intelligent tasks such as reasoning, context processing, environmental perception, logical inference, planning, and action execution—activating Grace Blackwell NVLink 72 compute clusters in bulk. During tool invocation phases, CPU handles computation, supporting compatibility with C compilers, Python, JavaScript, and various accelerated computing tools.

Huang believes the current tool application capability of AI agents is still in its infancy, and future upgrades will bring specialization and mastery. To this end, NVIDIA’s CUDA X library suite has undergone a major upgrade, with all library products now accompanied by dedicated AI skill manuals, enabling AI agents to autonomously learn and master tool usage, greatly enhancing their ability to solve core industry challenges. The computational value and application potential of agents invoking CUDA X tools will be vastly unlocked in the future.

Within this comprehensive agent compute ecosystem, hardware and functional modules are clearly divided. Tool computation tasks are jointly completed by CPUs, GPUs, and large models; security frameworks are deployed atop CPUs and NVIDIA’s BlueField DPU security processors, ensuring full-spectrum operational safety; overall task scheduling and orchestration are unified under CPU control, forming a hierarchically clear, distinctly segmented heterogeneous computing system.

In his talk, Huang highlighted the core pain point in AI computing—the memory system. He explained that agents’ working memory relies on KV cache, encompassing memory retention, data compression, information retrieval, structured and unstructured data matching, logical relationship analysis, and ontological association—all highly complex operations with unprecedented processing difficulty. He predicts that iterative upgrades of AI-specific memory systems will trigger a disruptive revolution in the global storage ecosystem.

Compared to traditional software execution modes, Huang emphasized that the new computational paradigm represented by AI agents differs fundamentally. Past software typically ran as single binary files on centralized, single-operating-system platforms, whereas agents adopt a decoupled, distributed, heterogeneous computing logic—a core driver behind NVIDIA’s intensive development of the next-generation Vera Rubin platform.

Regarding the new Vera Rubin platform, Huang clarified that it is neither a single chip nor a conventional GPU product, but a fully integrated revolutionary system. Starting from GPU as the core, it integrates GPU, Vera, and NVLink 72 hardware, leveraging multiple CPUs for global task orchestration, paired with an upgraded revolutionary storage system to establish a full-stack compute foundation. Additionally, the platform incorporates CX-9 hardware, DOCA software stack, and built-in security processors, enabling end-to-end encryption throughout static, transmission, and usage phases of data, protecting high-value AI model data securely via confidential computing architecture.

Huang candidly stated that Vera Rubin is NVIDIA’s most ambitious R&D project in history, involving all 40,000 engineers across the company, alongside deep collaboration with industry partners to bring it to life—an intricately crafted, zero-to-rebuild, ultra-complex system. He admitted that NVIDIA has already transitioned strategically from a single GPU vendor to a full-stack system provider, and the current Vera Rubin system represents the most complex and comprehensive AI compute system ever designed in the industry.

Discussing ultimate industrial demands and corporate transformation directions, Huang said customers and partners are not merely seeking raw computing hardware but aim to build mature, efficient AI factories. Based on this industry trend, NVIDIA is initiating a new strategic transformation. Today, NVIDIA’s core technologies are fully deployed in infrastructure-level applications, while simultaneously collaborating with power plants, cooling systems, grid suppliers, and other industrial ecosystem partners to build a complete AI industry ecosystem.

Going forward, NVIDIA will continue developing full-stack compute systems to provide core support for global clients building scalable, high-performance AI infrastructure.

Notably, in this speech, Huang provided a detailed explanation of NVIDIA’s newly defined industrial positioning, formally introducing the “New Paradigm of AI Factory Ecosystem,” explicitly stating that NVIDIA’s strategic focus has fully evolved from traditional computing ecosystems to a factory-based ecosystem serving the trillions of dollars in AI infrastructure.

Huang differentiated between NVIDIA’s old and new ecological forms. Previously, NVIDIA centered on a computational ecosystem, deeply integrating its compute layer, software, and compute stack into various enterprise platforms and third-party libraries, broadly empowering digital compute needs across industries.

Today’s newly constructed AI factory ecosystem establishes a clear upstream-to-downstream industrial loop: industry partners serve as NVIDIA’s upstream foundational support, while NVIDIA leverages its full-stack technological capabilities to deliver a complete AI factory ecosystem downstream. The core objective is no longer simply shipping GPU chips or compute systems, but helping clients build ultra-complex, ultra-large-scale AI factory infrastructures.

He bluntly stated that AI factories have entered a phase of hyper-investment and hyper-barrier scalability. Currently, a single 1 gigawatt (GW)-level AI factory incurs escalating construction costs—from initial estimates of $20–40 billion to now reaching $50–60 billion, with projections soon exceeding $80 billion and even $100 billion. Trillion-dollar-level project investments mean extreme requirements for deployment stability and operational reliability, demanding one-time completion and immediate operational readiness. Capital expenditure and system complexity have reached unprecedented levels in the industry.

To address the extreme complexity of AI factory construction, NVIDIA leverages Omniverse’s digital simulation capabilities to achieve end-to-end innovation. Unlike traditional computer development—designing chips first, then simulating system operation within devices—NVIDIA now enables the complete construction, simulation, testing, and optimization of all AI factory infrastructure in the Omniverse digital platform before physical construction begins. Through digital emulators and digital architectures, the industry can conduct full-cycle simulations of ultra-large-scale AI systems prior to groundbreaking and massive capital investment, completely eliminating deployment risks and realizing long-held technological ambitions.

Huang specifically highlighted DSX, the core system enabling the rollout of the AI factory ecosystem, establishing a complete infrastructure layout aligned with NVIDIA’s existing product matrix. Here, RTX series corresponds to GPU hardware, DGX to integrated compute systems, and the new DSX platform precisely targets full-scenario AI infrastructure. Leveraging core capabilities spanning systems, software, and the entire technology stack, NVIDIA empowers small and medium enterprises to rapidly deploy world-class AI cloud services.

He cited industry cases to demonstrate the empowerment value of the DSX ecosystem. Numerous formerly mid-sized tech firms, after joining NVIDIA’s AI factory ecosystem and upgrading via the DSX framework, achieved exponential growth—exemplified by CoreWeave, whose valuation has soared to the $50–70 billion range and continues rapid expansion, vividly illustrating the industrial empowerment potential of NVIDIA’s new AI factory paradigm.

Recently, NVIDIA’s collaboration with Nebius has shown equally remarkable growth. Each of these clouds hosts clients of staggering scale: Cursor (a software coding company), Black Mountain Labs (image generation), World Labs (world foundation model), Revolut (a leading AI-powered financial services company), and Shopify.

Here’s another example: Nscale, whose clients include British Telecom and Google. Google is currently using one of their AI clouds—Thinking Machines, a cutting-edge lab company.

Then there’s Naver Cloud in South Korea, serving clients including Bank of Korea, Hyundai, and many other outstanding enterprises.

In Taiwan, there’s also GMI.

Yet, all of these companies require a compute stack. Huang stated, the entire technology stack below is precisely what made NVIDIA renowned.

He explained: "All hardware, software, libraries, and our ability to connect to a global third-party developer ecosystem enable anyone to build an AI cloud. However, today’s AI clouds are extremely complex. This is the software version, this is the computer science version. And the funding version, the asset version—that’s what I showed earlier—an enormous factory. Simply possessing this capability isn’t enough. That’s exactly why NVIDIA has become an AI infrastructure company."

Vera Rubin Architecture Now Fully Operational

When discussing next-generation AI infrastructure development, NVIDIA CEO Jensen Huang announced that the Vera Rubin architecture, based on the new generation GPU platform, has now entered full mass production.

Huang stated that global demand for AI compute is growing at an unprecedented pace. From data center operators to cloud service providers and various enterprise clients, the entire industry chain is intensifying capacity expansion to meet market needs.

"Practical AI has arrived; profitable AI has arrived," Huang said. He believes an increasing number of enterprises now recognize that artificial intelligence is no longer just a technological showcase but a productive tool capable of creating tangible commercial value. Under this context, compute power has become the key bottleneck limiting AI advancement.

To satisfy surging global demand, NVIDIA is driving the large-scale deployment of next-generation AI infrastructure and collaborating with partners worldwide to build AI factories. Huang noted that this is currently one of the most critical tasks in the entire industry ecosystem.

On supply chain development, Huang revealed that the supply chain scale corresponding to the Vera Rubin platform has doubled compared to the previous-generation Grace Blackwell platform. At the same time, production efficiency has significantly improved. Assembling a single Grace Blackwell rack previously took about two hours, but now the process has been reduced to just five minutes.

"This not only means higher capacity, but also faster delivery speed," Huang said. Facing continuous market growth, every link in the supply chain is expanding production scale and improving manufacturing efficiency.

He explained that to support the mass production of the Grace Blackwell platform, the industry had already invested millions of square feet in production facilities. Currently, these partners are further expanding capacity to prepare for the large-scale deployment of Vera Rubin.

At the conclusion of his speech, Huang expressed special gratitude to supply chain partners. He emphasized that the successful full-scale production launch of Vera Rubin would not have been possible without the collective efforts of the entire industry ecosystem.

"I want to thank all of you," Huang said. "Vera Rubin is now fully operational."

When describing the Vera Rubin architecture, Huang defined it as a computing system designed specifically for the agent era—not merely a supercomputer running AI models.

He noted that as AI progresses from model training and inference toward the agent stage, computing demands are undergoing fundamental changes. Agents frequently invoke tools, access databases, and interact in real time with external systems, thus imposing higher requirements on latency, bandwidth, and system coordination. To address this, Vera Rubin adopts a new system design, deeply integrating CPUs, GPUs, networks, storage, and security modules into a complete infrastructure platform tailored for agent workloads.

Huang particularly showcased the Vera Rubin NVLink 72 system. He explained that unlike previous products mainly optimized for pre-training and inference scenarios, Vera Rubin further enhances performance for agent inference. Through the NVLink 72 interconnect architecture and a new system design, the device no longer requires extensive cabling or complex internal connections—improving reliability and significantly reducing deployment and maintenance costs.

Vera CPU Launches with Full Force

Besides the GPU system, NVIDIA also unveiled the Vera CPU, specifically designed for the AI era. Huang believes traditional CPUs were primarily built for human users, but in the future, billions of agents will emerge, with virtually no tolerance for response delays. Thus, the new CPU architecture must undergo comprehensive optimization in single-threaded performance, bandwidth, data transfer efficiency, and energy efficiency.

According to NVIDIA’s plan, the Vera CPU will handle agent orchestration, model scheduling, tool invocation, database access, and storage management, jointly forming the core infrastructure of future AI factories alongside GPUs. Huang stated that as agents become a pivotal direction in the next phase of AI development, computing systems are shifting from being “model-centric” to “agent-centric.”

When discussing the new-generation Vera CPU, Huang noted that NVIDIA has successfully transitioned from the traditional x86 CPU architecture to the Grace architecture, laying a solid foundation for Vera’s widespread adoption.

He pointed out that major data centers, cloud service providers, and AI enterprises partnering with NVIDIA worldwide have already completed certification for the Grace platform. Related software stacks, security systems, and development environments have also been adapted accordingly. On this basis, the deployment barrier for Vera will be dramatically lowered.

Huang believes Vera is poised to become one of the most optimized CPUs for agent workloads. The reason lies in its co-development from inception with the Vera Rubin system, with targeted optimizations for novel AI tasks such as agent inference, tool invocation, database access, and real-time interaction.

To demonstrate performance gains, Huang presented multiple real-world workload test results. In scenarios widely used by enterprises—such as SQL database processing—the Vera CPU achieves approximately three times the performance of existing platforms.

In real-time streaming processing scenarios—applications like financial trading systems and industrial telemetry monitoring requiring continuous handling of massive data streams—performance improvements reach up to six times.

Huang noted that in the CPU domain, performance gains of single-digit percentages are typically considered significant breakthroughs. Achieving multi-fold increases in actual business workloads is uncommon. These gains primarily stem from higher single-threaded performance, larger memory and I/O bandwidth, and faster inter-core data transfer capabilities.

In his view, agents are becoming the next major computing workload following cloud computing and mobile internet. Yet, most existing CPUs are still designed around human users. Going forward, as the number of agents grows continuously, demand for low latency and high responsiveness will further increase.

"We used to build CPUs for humans; now we begin building CPUs for agents," Huang said. A new ecosystem centered around Vera is forming, with ODM manufacturers, server makers, and enterprise clients already positioning themselves in this market. NVIDIA aims to drive a new era of computing platforms—specifically, the CPU market tailored for agents.

Open-Sourcing Nemotron 3 Ultra: The World’s First Hybrid Architecture Combining SSM and MoE

During his discussion of enterprise-grade agent ecosystems, Huang announced the official release of NVIDIA’s next-generation open-source large model, Nemotron 3 Ultra, positioning it as a foundational model for enterprises to build proprietary agents.

Huang stated that in the future, enterprises will employ vast numbers of agents to assist with research and development, validation, simulation, and operations. For example, Cadence Design Systems, an EDA software vendor, is leveraging NVIDIA technology to develop a specialized super-agent for chip design workflows, accelerating chip development cycles by calling simulators, validators, and formal verification tools.

Serving as the bedrock of this ecosystem, Nemotron 3 Ultra adopts the world’s first hybrid architecture combining State Space Models (SSM) and Mixture of Experts (MoE). Huang revealed that compared to existing mainstream open-source models, Nemotron 3 Ultra delivers five times faster inference speed while reducing overall operational costs by 30%.

Beyond the model itself, NVIDIA is also releasing training data, training scripts, and related toolchains. Huang explained that the Nemotron series is trained on large-scale long-range reasoning, complex task solving, and tool invocation datasets. Developers can not only use the model but also continue training, fine-tuning, and building proprietary agent systems on top of it.

Currently, NVIDIA has partnered with CrowdStrike, Salesforce, Palantir Technologies, SAP, ServiceNow, and other enterprises to advance agent deployment in enterprise software, cybersecurity, data analytics, and business process management scenarios.

Huang also revealed that Nemotron 3 Ultra has already been released, and the next-generation Nemotron 4 model is already under development.

Partnering with Microsoft to Redefine the PC

In the latter part of his speech, Huang announced that NVIDIA is collaborating with Microsoft to redefine the personal computer (PC).

Huang reviewed the past 40 years of PC evolution. He noted that from Windows 3.1 to Windows 95, Microsoft established the foundational architecture of the modern PC industry by opening up hardware ecosystems, driver systems, and unified software interfaces, enabling PCs to transition from enterprise devices to mass consumer products and become one of the most ubiquitous computing platforms globally.

Now, with the arrival of the AI era, NVIDIA and Microsoft aim to drive a new wave of PC architectural transformation. Huang revealed that the two companies have collaborated for three years, re-designing the fundamental workings of PCs to adapt to computing demands in the agent era.

As envisioned, future PCs will no longer be mere terminal devices running applications but will be embedded with AI agents capable of understanding users, engaging in dialogue, and autonomously completing tasks. For instance, agents could help users organize files, retrieve information, conduct research, and even proactively execute complex tasks based on user needs.

Huang believes this shift will lead to a complete reconstruction of PC software architecture. Tasks previously handled by applications will increasingly be fulfilled by agents. Meanwhile, large language models will become a core component of the next-generation PC, handling language understanding, visual recognition, audiovisual generation, and task execution—serving as the primary gateway connecting users to computing resources.

He disclosed that he will co-present with Satya Nadella the outcomes of their three-year collaboration and further unveil the next-generation PC platform tailored for the agent era.

Source: InfoQ

#AI & Large Models

Disclaimer: Contains third-party opinions, does not constitute financial advice

Share To