# Google AI Studio Model Capabilities & Task Distribution

*Provided by Mark on March 15, 2026, to guide Ava's sub-agent orchestration and the "Consult the collective" protocol.*

### General Use & Multimodal Models

* **Gemini 3.1 Pro (Preview):**
  * **Use Case:** Deep reasoning, complex problem-solving, and ambitious agentic or coding tasks that require processing vast amounts of information at once.
  * **Token Limit:** Up to **2 million** input tokens. Output is typically capped at **65,536** tokens.

* **Gemini 3 Flash (Preview) & Gemini 3.1 Flash-Lite (Preview):**
  * **Use Case:** High-volume, cost-sensitive tasks and rapid multimodal reasoning.
  * **Token Limit:** **1 million** input tokens.

* **Gemini 2.5 Pro:**
  * **Use Case:** Highly capable reasoning for large documents and long-context analysis.
  * **Token Limit:** **1 million** to **2 million** input tokens, depending on the tier.

* **Gemini 2.5 Flash & Flash-Lite:**
  * **Use Case:** Balanced performance and speed for general chat, summarization, and high-frequency tasks.
  * **Token Limit:** **1 million** input tokens. Output is capped at **8,192** tokens.

### Image Generation Models

* **Nano Banana Pro (Gemini 3 Pro Image):**
  * **Use Case:** Professional asset production, spatial reasoning, mixing multiple reference images, and rendering accurate text within images.
  * **Token Limit:** **65,536** input tokens (optimized for processing complex prompts and up to 14 reference images).

* **Nano Banana 2 (Gemini 3.1 Flash Image):**
  * **Use Case:** Production-scale visual creation and rapid, high-volume generation.
  * **Token Limit:** **65,536** input tokens.

* **Nano Banana (Gemini 2.5 Flash Image):**
  * **Use Case:** Conversational image editing and low-latency creative workflows.
  * **Token Limit:** **32,768** input tokens.

* **Imagen 4:**
  * **Use Case:** Fast, high-clarity text-to-image generation up to 2K resolution.
  * **Token Limit:** **480** input tokens (designed strictly for short, descriptive text prompts).

### Audio, Video & Music Models

* **Veo 3.1 (Preview):**
  * **Use Case:** Cinematic video generation with natively synchronized audio and advanced camera controls.
  * **Token Limit:** Text prompts are limited to smaller contexts (typically a few thousand tokens).

* **Lyria (Experimental):**
  * **Use Case:** High-fidelity music generation with granular control over instruments, BPM, and vocals.
  * **Token Limit:** Operates on short text or audio prompts to generate up to 30-second audio tracks.

* **Gemini 2.5 Flash Live (Preview):**
  * **Use Case:** Low-latency, real-time bidirectional voice and video agents.
  * **Token Limit:** Default **32,000** tokens (upgradable to **128,000**).

* **Gemini 2.5 TTS (Pro & Flash Variants):**
  * **Use Case:** Controllable, high-fidelity text-to-speech synthesis.
  * **Token Limit:** Can process up to **1 million** input tokens.

### Specialized & Agentic Models

* **Computer Use (Preview):**
  * **Use Case:** "Seeing" digital screens and autonomously performing browser UI actions (like clicking and typing).
  * **Token Limit:** Inherits the **1 million to 2 million** token context.

* **Gemini Deep Research (Preview):**
  * **Use Case:** Autonomous, multi-step research and report generation across hundreds of sources.
  * **Token Limit:** Leverages the massive **2 million** token window.

* **Gemini Embedding 2 (Preview):**
  * **Use Case:** Mapping text, images, video, and audio into a unified vector space for semantic search and RAG.
  * **Token Limit:** Typically **20,480** input tokens per request.

* **Gemma 3 & CodeGemma:**
  * **Use Case:** Open-weight models for local deployment, research, and custom coding tasks.
  * **Token Limit:** Typically ranges from **8,192** to **32,768** tokens.

## Local Inference (Ollama)

* **Ollama v0.20.2** – Local model server providing zero‑cost, offline‑capable text generation.
* **Models & Aliases:**
  * `llama3:latest` (alias `llama3`) – 8B parameter Llama 3, general‑purpose chat.
  * `gemma3:4b` (alias `gemma3‑4b`) – 4.3B parameter Gemma 3, efficient instruction‑following.
* **Key Constraints:**
  * **No tool‑calling support** – can execute OpenClaw tools (read, write, exec, etc.).
  * **Fallback‑only** – Used automatically when primary model (DeepSeek) fails or for non‑tool tasks.
  * **Local only** – Requires Ollama service running on `127.0.0.1:11434`.
* **Fallback Chain (Updated April 4, 2026):**  
  **Primary:** `google/gemini‑3.1‑pro‑preview` (alias `gemini`) – preferred for natural language & structured responses.  
  **Secondary:** `claude‑sonnet‑4‑6` (alias `claude`) – excels at web page coding & internal system issues.  
  **Tertiary:** `deepseek/deepseek‑chat` & `deepseek/deepseek‑reasoner` (aliases `deepseek`, `deepseek‑r1`) – cost‑effective fallback.  
  **Local:** `llama3:latest` (alias `llama3`) – zero‑cost text generation (no tool‑calling).  
  **Backup:** Moonshot (128k/32k/8k), Kimi K2.5, Qwen variants, other Google models.  
  *Note:* Google API credits may be exhausted early in the month; the chain automatically falls back to Claude → DeepSeek → Ollama.
