Google AI Edge Gallery 實測：手機離線跑 Gemma 4，AI Agent 不用網路也能動 | Google AI Edge Gallery Hands-On: Running Gemma 4 Offline on Your Phone

By Kit 小克 | AI Tool Observer | 2026-04-12

🇹🇼 Google AI Edge Gallery 實測：手機離線跑 Gemma 4，AI Agent 不用網路也能動

Google 在 2026 年 4 月丟出一個讓人意外的東西：AI Edge Gallery。這不是又一個雲端 AI 服務，而是一個讓你在手機上完全離線跑大語言模型的 App。搭配最新的 Gemma 4 開源模型，你的手機現在可以做到聊天、看圖辨識、語音轉文字、甚至跑多步驟 AI Agent 工作流——全部不需要網路。

Google AI Edge Gallery 是什麼？為什麼值得關注？

AI Edge Gallery 是 Google 開源的展示型 App，目前已上架 Google Play Store 和 Apple App Store。它不是給開發者的實驗工具，而是面向一般使用者，讓你直接在手機上體驗端側 AI 的能力。

核心賣點很簡單：所有推論都在本地執行，不傳資料到雲端。這對隱私敏感的場景（醫療、金融、企業內部）來說是真正的剛需。

支援哪些模型？手機跑得動嗎？

目前 Edge Gallery 原生支援兩個 Gemma 4 邊緣優化版本：

Gemma 4 E2B（等效 20 億參數）：適合中階手機，記憶體需求不到 1.5GB
Gemma 4 E4B（等效 40 億參數）：適合旗艦機如 Pixel 10 Pro XL，推理品質更好

透過 2-bit 和 4-bit 量化壓縮，這些模型的記憶體佔用極低。實測數據：

Qualcomm Dragonwing IQ8（NPU）：3,700 prefill / 31 decode tokens/秒
Raspberry Pi 5（CPU）：133 prefill / 7.6 decode tokens/秒
4,000 input tokens 跨 2 個 Agent Skills，GPU 上不到 3 秒完成

五大功能模組實測

1. Prompt Lab：本地版 ChatGPT

支援自由提問、文件摘要、語氣改寫、程式碼生成。可調整 temperature、top-k 等參數。實測用它摘要一份 PDF，速度流暢，品質接近雲端模型。

2. Agent Skills：離線 AI Agent

這是最驚豔的部分。Agent Skills 讓模型能執行多步驟自主工作流：查 Wikipedia、生成互動內容、整合其他 AI 模型（文字轉語音、圖片生成）。架構支援 JavaScript Skills（透過隱藏 WebView 執行）和 Native App Intents。

3. Mobile Actions：語音控制手機

搭配 FunctionGemma 270M 微調模型，可以用自然語言控制手機設定——開關手電筒、調音量、啟動 App。完全離線運作。

4. Ask Image：離線看圖辨識

拍照後直接辨識物體、植物、文字，支援 bounding box 座標輸出。不需要網路連線。

5. Audio Scribe：離線語音轉文字

支援語音轉錄和翻譯，在沒有網路的環境下也能用。

開發者該怎麼用？

整個專案是 Apache 2.0 開源，支援：

Android（Kotlin）和 iOS（Swift）原生開發
桌面：Windows、Linux、macOS（Metal）
Web：透過 WebGPU
邊緣裝置：Raspberry Pi 5、Arduino VENTUNO Q

Google 同步推出 LiteRT-LM 效能庫，提供 constrained decoding、動態 context 管理和 CLI 工具。128K context window 在本地加速處理，結構化 JSON 輸出也支援。

限制與不足

老實說，離線模型跟雲端的 Gemini 還是有差距。複雜知識問答偶爾會出錯，無法存取即時資訊。而且 E4B 在非旗艦機上可能會發熱降速。但以「完全離線、零隱私風險」的定位來說，這已經是目前最實用的方案。

FAQ

Q：AI Edge Gallery 需要付費嗎？

A：完全免費，App 和模型都是開源的（Apache 2.0 授權）。不需要 Hugging Face 帳號，直接在 App 內下載模型。

Q：支援哪些手機？最低需求是什麼？

A：最低需求 Android 12 或 iOS 17。中階手機可跑 E2B 模型，旗艦機建議用 E4B 獲得更好效果。App 會根據電量和溫度自動切換模型。

Q：離線 AI Agent 能做到什麼程度？

A：目前支援多步驟工作流，包含查詢知識庫、生成摘要、控制裝置設定。但無法像雲端 Agent 一樣收發 Email 或存取外部 API。適合隱私優先的場景。

Q：跟直接用 Gemini App 比，優勢在哪？

A：核心差異是完全離線。沒有網路也能用，資料不離開手機。對醫療、金融、軍事等場景，這是關鍵需求。

Q：開發者可以客製化 Agent Skills 嗎？

A：可以。支援用 JavaScript 撰寫自訂 Skills，透過 WebView 執行，可存取 fetch()、CDN 函式庫和 WebAssembly。也支援原生 App Intents 擴充。

🇺🇸 Google AI Edge Gallery Hands-On: Running Gemma 4 Offline on Your Phone — AI Agents Without Internet

In April 2026, Google quietly dropped something that caught the AI community off guard: the AI Edge Gallery. This is not another cloud-based AI service. It is an app that lets you run large language models completely offline on your phone. Paired with the new open-source Gemma 4 models, your phone can now handle chat, image recognition, voice transcription, and even multi-step AI agent workflows — all without an internet connection.

What Is Google AI Edge Gallery and Why Should You Care?

AI Edge Gallery is an open-source showcase app from Google, now available on both the Google Play Store and Apple App Store. Unlike most AI developer tools, this one targets everyday users who want to experience on-device AI firsthand.

The core value proposition is simple: all inference runs locally, no data leaves your phone. For privacy-sensitive contexts like healthcare, finance, and enterprise operations, this is not a nice-to-have — it is a requirement.

Which Models Does It Support? Can Your Phone Handle It?

The Gallery natively supports two edge-optimized Gemma 4 variants:

Gemma 4 E2B (Effective 2 Billion parameters): Built for mid-range phones, requiring less than 1.5GB of memory
Gemma 4 E4B (Effective 4 Billion parameters): For flagship devices like the Pixel 10 Pro XL, delivering higher reasoning quality

Through 2-bit and 4-bit weight quantization, memory footprint stays remarkably low. Real benchmarks:

Qualcomm Dragonwing IQ8 (NPU): 3,700 prefill / 31 decode tokens per second
Raspberry Pi 5 (CPU): 133 prefill / 7.6 decode tokens per second
4,000 input tokens across 2 Agent Skills complete in under 3 seconds on GPU

Five Core Modules Tested

1. Prompt Lab: A Local ChatGPT

Supports freeform prompts, document summarization, tone rewriting, and code generation. You can adjust temperature, top-k, and other generation parameters. In testing, summarizing a PDF felt smooth and output quality approached cloud-model levels.

2. Agent Skills: Offline AI Agents

This is the standout feature. Agent Skills enable autonomous multi-step workflows: querying Wikipedia, generating interactive content, and integrating other AI models for text-to-speech and image generation. The architecture supports JavaScript Skills (executed in a hidden WebView) and Native App Intents.

3. Mobile Actions: Voice-Controlled Phone

Using a fine-tuned FunctionGemma 270M model, you can control device settings with natural language — toggle the flashlight, adjust volume, launch apps. All fully offline.

4. Ask Image: Offline Visual Recognition

Take a photo and instantly identify objects, plants, and text. Supports bounding box coordinate output for structured visual analysis. No network required.

5. Audio Scribe: Offline Speech-to-Text

Voice transcription and translation that works without any internet connection.

How Should Developers Use This?

The entire project is Apache 2.0 open-source, supporting:

Android (Kotlin) and iOS (Swift) native development
Desktop: Windows, Linux, macOS (Metal)
Web: Via WebGPU
Edge devices: Raspberry Pi 5, Arduino VENTUNO Q

Google also released LiteRT-LM, a performance library offering constrained decoding, dynamic context management, and CLI tools. The 128K context window is accelerated locally, and structured JSON output is fully supported.

Limitations Worth Noting

Offline models still lag behind cloud-based Gemini for complex knowledge queries. Occasionally the model gets facts wrong, and it cannot access real-time information. The E4B model may also throttle on non-flagship devices due to thermal limits. But for a product positioned as fully offline with zero privacy risk, this is the most practical solution available today.

FAQ

Q: Does AI Edge Gallery cost anything?

A: Completely free. Both the app and models are open-source under Apache 2.0. No Hugging Face account needed — download models directly within the app.

Q: What phones are supported? What are the minimum requirements?

A: Minimum Android 12 or iOS 17. Mid-range phones can run the E2B model, while flagship devices should use E4B for better results. The app dynamically switches models based on battery and thermal conditions.

Q: How capable are the offline AI agents?

A: They support multi-step workflows including knowledge base queries, summary generation, and device control. However, they cannot send emails or access external APIs like cloud-based agents. Best suited for privacy-first scenarios.

Q: What is the advantage over using the Gemini app directly?

A: The core difference is fully offline operation. It works without any network connection, and your data never leaves the device. For healthcare, finance, and defense applications, this is a critical requirement.

Q: Can developers create custom Agent Skills?

A: Yes. You can write custom Skills in JavaScript, executed via WebView with access to fetch(), CDN libraries, and WebAssembly. Native App Intents are also extensible.

Sources / 資料來源

常見問題 FAQ

AI Edge Gallery 需要付費嗎？

完全免費，App 和模型都是 Apache 2.0 開源授權，不需要 Hugging Face 帳號。

支援哪些手機？最低需求？

最低 Android 12 或 iOS 17。中階手機跑 E2B，旗艦機用 E4B 效果更好。

離線 AI Agent 能做到什麼程度？

支援多步驟工作流、知識查詢、裝置控制，但無法存取外部 API。適合隱私優先場景。

跟 Gemini App 比有什麼優勢？

核心差異是完全離線，資料不離開手機，適合醫療、金融等隱私敏感場景。

開發者可以自訂 Agent Skills 嗎？

可以，支援 JavaScript Skills 和 Native App Intents 擴充，完整開源。

延伸閱讀 / Related Articles

AI 工具觀察站 — 每日精選 AI Agent 與工具趨勢
AI Tool Observer — Daily curated AI Agent & tool trends

Stanford 研究登上《Science》：11 個 AI 模型有 47% 機率說你對，即使你錯了 | Stanford Study in Science: AI Models Validate Harmful Behavior 47% of the Time — Sycophancy Is a Real Problem

3月 28, 2026

閱讀完整內容