Google Gemma 4 開源模型實測:31B 打贏 400B 對手,手機也能跑 AI Agent | Google Gemma 4 Hands-On: A 31B Open Model That Beats 400B Rivals and Runs on Your Phone
By Kit 小克 | AI Tool Observer | 2026-04-12
🇹🇼 Google Gemma 4 開源模型實測:31B 打贏 400B 對手,手機也能跑 AI Agent
Google 在 2026 年 4 月 2 日正式發布 Gemma 4,號稱是目前最強的開源模型家族。從 2B 邊緣裝置版本到 31B 桌面級版本,全部採用 Apache 2.0 授權,沒有使用限制、沒有月活門檻。身為一個每天都在測各種模型的人,我必須說:這次 Google 來真的。
Gemma 4 是什麼?為什麼開發者該關注?
Gemma 4 是 Google DeepMind 推出的開源模型系列,共四個尺寸:E2B(邊緣 2B)、E4B(邊緣 4B)、26B MoE(混合專家)和 31B Dense。其中 31B Dense 在 Arena AI 文字排行榜上排名第三,僅次於兩個閉源巨頭。
最讓人驚豔的是 26B MoE 版本——總共 26B 參數,但每次推理只啟動 3.8B,排行榜分數卻拿到 1441 分(第六名)。換算每個活躍參數的效率,它是目前最強的推理引擎。
跑分實測:Gemma 4 vs Qwen 3.5 vs Llama 4
直接上數據:
- 數學競賽(AIME 2026):Gemma 4 31B 拿下 89.2%,大幅領先
- 程式競賽(Codeforces ELO):Gemma 4 31B 達到 2150 分
- MMLU Pro:Qwen 3.5 27B 以 86.1% 小贏 Gemma 4 的 85.2%
- GPQA Diamond:Qwen 3.5 27B 85.5% vs Gemma 4 84.3%
- Llama 4 Scout(109B 總參數):在推理測試中整體落後前兩者
結論很清楚:如果你的場景偏重數學推理和程式碼生成,Gemma 4 是目前開源最佳選擇。通識知識方面 Qwen 3.5 略勝一籌,但差距不大。
手機和樹莓派也能跑?邊緣部署實測
Gemma 4 的 E2B 版本在 INT4 量化後只需不到 1.5GB 記憶體,中階 Android 手機(6GB RAM、Snapdragon 8 系列)就能跑。在 Raspberry Pi 5 上 CPU 推理,prefill 速度達 133 tokens/s,decode 速度 7.6 tokens/s。
E4B 版本適合旗艦手機(8-12GB RAM)或有 NPU 加速器的裝置。兩個邊緣版本都原生支援圖片、音訊輸入和 function calling——這意味著你可以在手機上跑一個完整的多模態 AI Agent。
Apache 2.0 授權對生態系的意義
從 Gemma 3 的自訂授權改為 Apache 2.0,是 Google 最聰明的一步棋。沒有使用限制、沒有月活門檻、不需要簽署任何協議。這跟 Qwen 3.5 的授權策略一致,但比 Meta Llama 4 的社群授權(有 MAU 限制)更開放。
對新創團隊來說,這代表你可以放心把 Gemma 4 嵌入商業產品,不用擔心未來突然被收授權費。
原生 Agentic 能力:不只是聊天機器人
Gemma 4 全系列原生支援 function calling、結構化 JSON 輸出、系統指令。這不是事後加上去的功能,而是從訓練階段就內建的能力。搭配 Google 提供的 MediaPipe、LiteRT 部署工具,以及社群的 Ollama、vLLM、llama.cpp 支援,從原型到上線的路徑非常清楚。
常見問題 FAQ
- Q:Gemma 4 31B 需要多大的 GPU 才能跑?
A:FP16 需要約 62GB VRAM(雙 RTX 4090 或單張 A100 80GB)。INT4 量化後約 16GB,單張 RTX 4090 即可。 - Q:Gemma 4 能用在中文場景嗎?
A:可以。Gemma 4 原生支援 140+ 語言訓練,中文表現在開源模型中屬於前段班,但 Qwen 系列在中文上仍有優勢。 - Q:跟 GPT-5.4 或 Claude 比起來如何?
A:閉源模型在複雜推理上仍有優勢,但 Gemma 4 31B 已經非常接近。對於大多數應用場景(客服、摘要、程式碼輔助),差距小到可以忽略,而且成本低非常多。 - Q:E2B 版本適合什麼應用?
A:適合隱私敏感的本地端應用,例如個人筆記助手、離線翻譯、裝置上的語音指令處理。不需要網路連線就能運作。 - Q:商業使用需要付費嗎?
A:不用。Apache 2.0 授權完全免費,可商用、可修改、可再散佈,沒有任何使用限制。
🇺🇸 Google Gemma 4 Hands-On: A 31B Open Model That Beats 400B Rivals and Runs on Your Phone
Google officially released Gemma 4 on April 2, 2026, and it is shaping up to be the most significant open-source model launch of the year. Spanning four sizes from a 2B edge model to a 31B dense powerhouse, all under the Apache 2.0 license with zero restrictions. After a week of testing, here is what I found.
What Is Gemma 4 and Why Should You Care?
Gemma 4 is Google DeepMind's latest open model family, available in four sizes: E2B (edge 2B), E4B (edge 4B), 26B MoE (Mixture of Experts), and 31B Dense. The 31B Dense model currently ranks #3 on the Arena AI text leaderboard, trailing only two closed-source giants.
The real surprise is the 26B MoE variant — 26B total parameters, but only 3.8B active per forward pass, yet it scores 1441 on Arena AI (ranked #6). Per active parameter, it is the most efficient reasoning engine available today.
Benchmark Results: Gemma 4 vs Qwen 3.5 vs Llama 4
Here are the numbers that matter:
- Math Competition (AIME 2026): Gemma 4 31B scores 89.2%, leading by a wide margin
- Competitive Programming (Codeforces ELO): Gemma 4 31B reaches 2150
- MMLU Pro: Qwen 3.5 27B edges ahead at 86.1% vs Gemma 4 at 85.2%
- GPQA Diamond: Qwen 3.5 27B takes 85.5% vs Gemma 4 at 84.3%
- Llama 4 Scout (109B total params): Generally trails both on reasoning benchmarks
The takeaway is clear: for math reasoning and code generation, Gemma 4 is currently the best open-source option. Qwen 3.5 has a slight edge in general knowledge, but the gap is narrow.
Can It Really Run on a Phone? Edge Deployment Tested
The E2B model with INT4 quantization needs less than 1.5GB of memory. Mid-range Android phones from 2023 (6GB RAM, Snapdragon 8-series) can run it. On a Raspberry Pi 5 using CPU inference, I measured 133 tokens/s prefill and 7.6 tokens/s decode.
The E4B model targets flagship phones (8-12GB RAM) or devices with NPU accelerators. Both edge models natively support image input, audio input, and function calling — meaning you can run a complete multimodal AI agent on your phone without any cloud connection.
Why Apache 2.0 Licensing Changes Everything
Switching from Gemma 3's custom license to Apache 2.0 is Google's smartest move. No usage caps, no monthly active user limits, no agreement to sign. This matches Qwen 3.5's licensing strategy but is more permissive than Meta's Llama 4 community license (which includes MAU restrictions).
For startups, this means you can embed Gemma 4 in commercial products without worrying about future licensing changes.
Native Agentic Capabilities: Beyond Chatbots
Every Gemma 4 model natively supports function calling, structured JSON output, and system instructions. These are not bolted-on features but trained-in capabilities. Combined with Google's MediaPipe and LiteRT deployment tools, plus community support from Ollama, vLLM, and llama.cpp, the path from prototype to production is well-defined.
Frequently Asked Questions
- Q: How much GPU memory does Gemma 4 31B require?
A: About 62GB VRAM in FP16 (dual RTX 4090 or single A100 80GB). With INT4 quantization, roughly 16GB — a single RTX 4090 works fine. - Q: How does Gemma 4 compare to GPT-5.4 or Claude?
A: Closed-source models still lead on complex reasoning, but Gemma 4 31B is remarkably close. For most use cases like customer service, summarization, and code assistance, the gap is negligible and the cost savings are substantial. - Q: What are the best use cases for the E2B edge model?
A: Privacy-sensitive local applications such as personal note-taking assistants, offline translation, and on-device voice command processing. No internet connection required.
Sources / 資料來源
- Google Blog - Gemma 4: Byte for byte, the most capable open models
- Google Developers Blog - Bring state-of-the-art agentic skills to the edge with Gemma 4
- Analytics Vidhya - Google Gemma 4: Is it the Best Open-Source Model of 2026?
常見問題 FAQ
Gemma 4 31B 需要多大的 GPU 才能跑?
FP16 需要約 62GB VRAM(雙 RTX 4090 或單張 A100 80GB),INT4 量化後約 16GB,單張 RTX 4090 即可運行。
Gemma 4 能用在中文場景嗎?
可以,Gemma 4 原生支援 140+ 語言訓練,中文表現屬前段班,但 Qwen 系列在中文仍有優勢。
跟 GPT-5.4 或 Claude 比起來如何?
閉源模型在複雜推理上仍有優勢,但 Gemma 4 31B 已非常接近,大多數應用場景差距可忽略且成本低很多。
E2B 邊緣版本適合什麼應用?
適合隱私敏感的本地端應用,如個人筆記助手、離線翻譯、裝置上語音指令處理,不需網路連線。
商業使用需要付費嗎?
不用,Apache 2.0 授權完全免費,可商用、可修改、可再散佈,無任何使用限制。
延伸閱讀 / Related Articles
- GLM-5.1 開源模型實測:754B 參數跑 8 小時不停,SWE-Bench Pro 登頂的真相 | GLM-5.1 Hands-On: The 754B Open-Source Model That Codes for 8 Hours Straight
- MemPalace 開源 AI 記憶系統實測:96.6% 召回率的真相與爭議 | MemPalace Hands-On: The Open-Source AI Memory System With 96.6% Recall — Hype or Real?
- Google NotebookLM 整合進 Gemini:AI 研究助手不再獨立運作,實測跨平台筆記同步 | Google NotebookLM Now Lives Inside Gemini — Hands-On With Cross-Platform AI Notebooks
AI 工具觀察站 — 每日精選 AI Agent 與工具趨勢
AI Tool Observer — Daily curated AI Agent & tool trends
留言
張貼留言