Vitalik Buterin 本地 LLM 方案實測解析:NVIDIA 5090 跑 Qwen3.5,不上雲也能日常用 AI | Vitalik Buterin Local LLM Setup Explained: Running Qwen3.5 on NVIDIA 5090 — Daily AI Without the Cloud
By Kit 小克 | AI Tool Observer | 2026-04-12
🇹🇼 Vitalik Buterin 本地 LLM 方案實測解析:NVIDIA 5090 跑 Qwen3.5,不上雲也能日常用 AI
Vitalik Buterin 的本地 LLM 方案是以太坊創辦人在 2026 年 4 月公開的完整自架 AI 系統,用 NVIDIA 5090 GPU 跑 Qwen3.5:35B 開源模型,完全不依賴雲端,實現隱私、安全、自主的 AI 使用體驗。
為什麼 Vitalik 要自己架 LLM?
Vitalik 在部落格中直言:2026 年該是扭轉十年來「倒向中心化服務」趨勢的時候了。他引用資安公司 Hiddenlayer 的數據指出,約 15% 的 AI Agent 技能包含惡意指令,這讓他決定把 AI 完全搬到本地執行。
核心理念很簡單——把 LLM 當成「有能力但不可信任的元件」,用跟以太坊智能合約一樣的安全思維來對待它。
Vitalik 本地 LLM 的硬體與軟體組合是什麼?
完整技術堆疊如下:
- GPU:NVIDIA 5090(24GB VRAM),推理速度達 90 tokens/sec
- 模型:Qwen3.5:35B(阿里巴巴開源模型)
- 推理引擎:llama-server + llama-swap 管理多模型切換
- 作業系統:NixOS,用單一宣告式設定檔定義整台機器
- 沙盒隔離:bubblewrap,限制 AI 只能存取白名單內的檔案與網路
- 訊息安全:自製 daemon,AI 可讀取 Signal/Email 但發送需人工批准
效能基準:50 tokens/sec 是門檻
Vitalik 測試了多種硬體組合:
- AMD Ryzen AI Max Pro(128GB 統一記憶體):51 tokens/sec
- DGX Spark:60 tokens/sec
- NVIDIA 5090 + Qwen3.5:35B:90 tokens/sec(日常使用最佳體驗)
他明確表示低於 50 tokens/sec 會「慢到不實用」。
「Human + LLM 2-of-2」授權模型怎麼運作?
這是 Vitalik 方案中最有趣的設計。他開發了一個訊息 daemon,讓 AI 可以讀取通訊內容(Signal、Email),但任何發送動作都需要人類明確批准。這就像加密貨幣的多簽錢包——AI 和人類各持一把鑰匙,缺一不可。
一般開發者能複製這套方案嗎?
技術上可以,但有門檻。你需要:
- 一張 NVIDIA 5090 或同等級 GPU(目前零售約 2,000 美元)
- 熟悉 Linux 環境,最好懂 NixOS
- 願意花時間設定 bubblewrap 沙盒規則
- 接受 35B 模型在複雜推理上仍不如 GPT-5.4、Claude Opus 4.6 等雲端模型
對多數人來說,這更像是一個方向指引而非即插即用的方案。但 Vitalik 證明了:本地 LLM 在 2026 年已經「夠用」了。
FAQ
Q:Vitalik 用的 Qwen3.5:35B 模型能力如何?
Qwen3.5:35B 是阿里巴巴的開源模型,參數量適中,能在消費級 GPU 上流暢運行。日常對話、程式碼輔助、文件摘要都能勝任,但在高難度推理任務上仍落後頂級雲端模型。
Q:本地跑 LLM 真的比用 ChatGPT 更安全嗎?
從隱私角度看,是的——你的資料完全不離開本機。但安全不只是隱私,你還需要防範模型本身的風險(如 prompt injection),這就是 Vitalik 用 bubblewrap 沙盒的原因。
Q:不買 NVIDIA 5090 能跑嗎?
可以,但體驗會打折。AMD Ryzen AI Max Pro 系列有 128GB 統一記憶體,能跑更大模型但速度較慢。蘋果 M4 Ultra 也是選項之一。重點是推理速度要達到 50 tokens/sec 以上才實用。
Q:為什麼選 NixOS 而不是 Ubuntu?
NixOS 的宣告式設定讓整台機器的狀態可以用一個檔案完整描述和複製。對於需要精確控制 AI 執行環境的場景,這比傳統 Linux 發行版更可靠、更容易審計。
🇺🇸 Vitalik Buterin Local LLM Setup Explained: Running Qwen3.5 on NVIDIA 5090 — Daily AI Without the Cloud
Vitalik Buterin's local LLM setup is a fully self-hosted AI system published by the Ethereum co-founder in April 2026. It runs the open-source Qwen3.5:35B model on an NVIDIA 5090 GPU with zero cloud dependency, delivering a private, secure, and sovereign AI experience.
Why Did Vitalik Build His Own LLM Stack?
In his blog post, Vitalik declared 2026 the year to reverse a decade of "backsliding toward centralized services." Citing data from security firm Hiddenlayer, he noted that roughly 15% of AI agent skills contain malicious instructions, motivating him to move AI entirely local.
The core philosophy is simple: treat the LLM as a "capable but untrusted component," applying the same security mindset Ethereum developers use for smart contracts.
What Hardware and Software Does Vitalik Use?
Here is the complete technical stack:
- GPU: NVIDIA 5090 (24 GB VRAM), achieving 90 tokens/sec inference
- Model: Qwen3.5:35B (open-weights from Alibaba)
- Inference engine: llama-server + llama-swap for multi-model management
- Operating system: NixOS, defining the entire machine in a single declarative config
- Sandboxing: bubblewrap, restricting AI to whitelisted files and network access only
- Messaging security: Custom daemon — AI can read Signal/Email but sending requires human approval
Performance Baseline: 50 Tokens/Sec Is the Minimum
Vitalik benchmarked several hardware configurations:
- AMD Ryzen AI Max Pro (128 GB unified memory): 51 tokens/sec
- DGX Spark: 60 tokens/sec
- NVIDIA 5090 + Qwen3.5:35B: 90 tokens/sec (best daily-use experience)
He explicitly stated that anything below 50 tokens/sec "feels too slow to be useful."
How Does the "Human + LLM 2-of-2" Authorization Work?
This is the most interesting design in Vitalik's local LLM setup. He built a messaging daemon that lets AI read communications (Signal, Email), but any outbound action requires explicit human approval. Think of it like a crypto multi-sig wallet — both AI and human hold a key, and neither can act alone.
Can Regular Developers Replicate This Setup?
Technically yes, but there are barriers:
- You need an NVIDIA 5090 or equivalent GPU (roughly $2,000 retail)
- Familiarity with Linux, ideally NixOS
- Willingness to configure bubblewrap sandbox rules
- Accepting that a 35B model still trails cloud models like GPT-5.4 and Claude Opus 4.6 in complex reasoning
For most people, this is more of a directional blueprint than a plug-and-play solution. But Vitalik proved that local LLMs in 2026 are "good enough" for real daily use.
FAQ
Q: How capable is the Qwen3.5:35B model Vitalik uses?
Qwen3.5:35B is an open-source model from Alibaba with a moderate parameter count that runs smoothly on consumer GPUs. It handles daily conversations, code assistance, and document summarization well, but still lags behind top cloud models on hard reasoning tasks.
Q: Is running a local LLM really safer than using ChatGPT?
From a privacy standpoint, absolutely — your data never leaves your machine. But security goes beyond privacy; you also need to guard against the model itself (e.g., prompt injection), which is exactly why Vitalik uses bubblewrap sandboxing.
Q: Can I run this without an NVIDIA 5090?
Yes, but the experience degrades. AMD Ryzen AI Max Pro chips offer 128 GB unified memory and can run larger models at slower speeds. Apple M4 Ultra is another option. The key threshold is reaching at least 50 tokens/sec for practical daily use.
Q: Why NixOS instead of Ubuntu?
NixOS's declarative configuration lets you describe and reproduce your entire machine state in a single file. For scenarios requiring precise control over AI execution environments, this is more reliable and auditable than traditional Linux distributions.
Sources / 資料來源
- Vitalik Buterin - My self-sovereign / local / private / secure LLM setup
- CryptoTimes - Vitalik Pushes Self-Sovereign Computing
- Bitcoin News - Vitalik Warns Against AI Agent Security Risks
常見問題 FAQ
Vitalik 用的 Qwen3.5:35B 模型能力如何?
Qwen3.5:35B 是阿里巴巴的開源模型,能在消費級 GPU 上流暢運行,日常對話和程式碼輔助都能勝任,但高難度推理仍落後頂級雲端模型。
本地跑 LLM 真的比用 ChatGPT 更安全嗎?
從隱私角度是的,資料完全不離開本機。但還需防範 prompt injection 等模型風險,這就是用 bubblewrap 沙盒的原因。
不買 NVIDIA 5090 能跑嗎?
可以,AMD Ryzen AI Max Pro 或 Apple M4 Ultra 也是選項,但推理速度要達 50 tokens/sec 以上才實用。
為什麼選 NixOS 而不是 Ubuntu?
NixOS 的宣告式設定讓整台機器狀態可用一個檔案描述和複製,對精確控制 AI 執行環境更可靠、更容易審計。
延伸閱讀 / Related Articles
- Google TurboQuant 實測解析:KV Cache 壓縮到 3-bit,記憶體砍 6 倍還不掉精度 | Google TurboQuant Explained: 3-bit KV Cache Compression Cuts Memory 6x With Zero Accuracy Loss
- Google AI Edge Gallery 實測:手機離線跑 Gemma 4,AI Agent 不用網路也能動 | Google AI Edge Gallery Hands-On: Running Gemma 4 Offline on Your Phone — AI Agents Without Internet
- Google Gemma 4 開源模型實測:31B 打贏 400B 對手,手機也能跑 AI Agent | Google Gemma 4 Hands-On: A 31B Open Model That Beats 400B Rivals and Runs on Your Phone
AI 工具觀察站 — 每日精選 AI Agent 與工具趨勢
AI Tool Observer — Daily curated AI Agent & tool trends
留言
張貼留言