跳到主要內容

華為 Ascend 950PR:字節跳動、阿里下單,CUDA 相容讓中國 AI 晶片開始動搖 NVIDIA 地位 | Huawei Ascend 950PR: ByteDance & Alibaba Place Orders as CUDA Compatibility Shifts the AI Chip War

By Kit 小克 | AI Tool Observer | 2026-03-28

🇹🇼 華為 Ascend 950PR:字節跳動、阿里下單,CUDA 相容讓中國 AI 晶片開始動搖 NVIDIA 地位

美國出口管制了兩年多,沒想到逼出了一個真正的對手。字節跳動(TikTok 母公司)與阿里巴巴已確認計劃向華為下單採購最新 AI 晶片 Ascend 950PR——原因不只是被迫替代,而是這顆晶片做到了一件以前中國晶片做不到的事:模仿 CUDA

Ascend 950PR 到底做了什麼?

華為這次的關鍵突破是新一代軟體生態「CANN Next」(Compute Architecture for Neural Networks)。過去中國 AI 晶片最大的痛點不是硬體規格,而是軟體生態——工程師的所有程式碼、模型訓練腳本、推論框架,全都是用 NVIDIA CUDA 寫的,換晶片就要重寫一切,成本極高。

CANN Next 的目標就是讓現有 CUDA 工作負載可以幾乎不修改地遷移到 Ascend 平台。字節跳動和阿里的測試結果是:確實可行,於是決定下單。

規格與定價:比 NVIDIA 便宜太多

  • 定價估計 $6,900–$9,700 美元/張(約 NTD 22–31 萬)
  • 相比之下,NVIDIA H100 市場價約 $25,000–$30,000 美元
  • 2026 年計劃出貨約 75 萬張
  • 主要客戶:中國大型科技公司、AI 新創

當然,定價便宜的部分原因是中國的製造成本優勢,但更重要的是這個價格點讓很多原本買不起 NVIDIA 的中小型 AI 公司有了選擇。

CUDA 相容到底有多難?

NVIDIA 的 CUDA 護城河之深,業界公認是最難複製的技術壁壘之一。不只是 API 相容,而是整個生態——CUDA Kernels、cuDNN、cuBLAS、TensorRT,以及 PyTorch/JAX 的底層綁定。

CANN Next 選擇走「轉譯層」路線,類似 Apple Silicon 的 Rosetta 2,讓 CUDA 程式碼自動轉換。這個方案不會完美,但能讓遷移成本從「重寫三個月」變成「調試兩週」,這個差距足以改變採購決策。

地緣政治的現實:這場晶片戰爭沒有終點

美國商務部的出口管制自 2022 年起不斷升級,從 A100、H100 到 H20 全面封鎖,反而加速了中國自主晶片生態的成熟。華為 Ascend 950PR 並不完全等同於 H100,但它代表的是:中國 AI 公司開始有了可以用、願意用的本土選擇。

從投資人角度看,這對 NVIDIA 的中國市場影響值得關注——中國曾佔 NVIDIA 年收入的 20%+,現在這塊正在被本土替代方案蠶食。

開發者應該關注什麼?

  • CANN Next 的實際相容度:目前只有少數大廠測試,社群反饋尚不充分
  • PyTorch/JAX 支援狀況:是否能跑主流框架的完整功能,決定能否用於生產
  • 推論 vs 訓練效能:目前大廠主要用於推論,訓練大模型的穩定性待驗證
  • 軟體更新節奏:硬體只是起點,驅動與工具鏈的迭代速度才是長期競爭力

AI 晶片戰爭才剛開始真正有趣。好不好用,試了才知道。


🇺🇸 Huawei Ascend 950PR: ByteDance & Alibaba Place Orders as CUDA Compatibility Shifts the AI Chip War

Two years of US export controls may have inadvertently created NVIDIA's most credible rival. ByteDance (TikTok's parent) and Alibaba have confirmed plans to order Huawei's new Ascend 950PR AI chip — not just because they have no other choice, but because this chip does something Chinese AI hardware has never convincingly managed before: mimic CUDA.

The Real Breakthrough: CANN Next

Huawei's key innovation here is a new software layer called CANN Next (Compute Architecture for Neural Networks). The biggest pain point for Chinese AI chips was never raw hardware specs — it was the software ecosystem lock-in. Every ML engineer's training scripts, inference pipelines, and model optimization work is written for NVIDIA CUDA. Switching chips historically meant rewriting everything from scratch.

CANN Next aims to let existing CUDA workloads migrate to Ascend with minimal changes. ByteDance and Alibaba tested it — it worked well enough — and they placed orders.

Specs & Pricing: The Cost Gap Is Real

  • Estimated price: $6,900–$9,700 per card
  • Compare: NVIDIA H100 at $25,000–$30,000 market rate
  • Planned 2026 shipments: approximately 750,000 units
  • Primary customers: Chinese hyperscalers and AI startups

The lower price point matters enormously. Mid-tier AI companies that could not afford H100 clusters now have an accessible alternative — especially for inference workloads where cost-per-query matters more than peak throughput.

How Hard Is Replicating CUDA, Really?

NVIDIA's CUDA moat is widely considered one of the deepest software lock-ins in tech. It is not just an API — it is the entire ecosystem: CUDA Kernels, cuDNN, cuBLAS, TensorRT, and the underlying bindings for PyTorch/JAX.

CANN Next takes a translation-layer approach, similar conceptually to Apple's Rosetta 2. It will not be perfect at launch, but if it can reduce migration costs from "three months of rewriting" to "two weeks of debugging," that is enough to shift purchasing decisions. That is the bet Alibaba and ByteDance are making.

Geopolitical Context: This Chip War Has No Ending

US Commerce Department export controls have escalated continuously since 2022 — from A100, to H100, to H20 — each round pushing Chinese firms to accelerate domestic alternatives. The Ascend 950PR is not a full H100 replacement, but it represents something new: Chinese AI companies now have a domestic option they will actually deploy at scale.

For NVIDIA investors, this is worth watching. China once represented 20%+ of NVIDIA's annual revenue. That share is being systematically replaced by domestic alternatives — a trend that accelerates with every US policy escalation.

What Developers Should Watch

  • Real-world CANN Next compatibility: Only a handful of large companies have tested it — community validation is lacking
  • PyTorch/JAX full feature support: Production viability depends on complete framework compatibility, not just basic inference
  • Training vs. inference stability: Large-scale model training on Ascend hardware remains less proven than inference
  • Software update cadence: Hardware is just the start — driver and toolchain iteration speed determines long-term competitiveness

The AI chip war is just getting interesting. The real question is not whether Chinese chips can compete on paper specs — it is whether the software ecosystem can reach escape velocity. You won't know until you try it.

Sources / 資料來源


AI 工具觀察站 — 每日精選 AI Agent 與工具趨勢
AI Tool Observer — Daily curated AI Agent & tool trends

留言

這個網誌中的熱門文章

MCP 突破 9700 萬次下載:AI Agent 的「USB-C」為何成為 2026 年最重要的標準? | MCP Hits 97 Million Downloads: Why Model Context Protocol Became the Most Important AI Standard of 2026

歡迎來到 AI 工具觀察站 | Welcome to AI Tool Observer

GitHub Copilot 預設用你的程式碼訓練 AI:4 月 24 日前必須手動退出 | GitHub Copilot Will Train AI on Your Code by Default — Opt Out Before April 24