Google AI Studio: Full Guide to Google’s AI Tools

Technical writer

Infrastructure

10.09.2025

8 min read

Google AI Studio is a web platform from Google for working with neural networks. At the core of the service is the family of advanced multimodal generative models, Gemini, which can handle text, images, video, and other types of data simultaneously. The platform allows you to prototype applications, answer questions, generate code, and create images and video content. Everything runs directly in the browser—no installation is required.

The main feature of Google AI Studio is versatility. Everything you need is in one place and works in the browser: you visit the site, write a query, and within seconds get results. The service allows users to efficiently leverage the power of Google Gemini for rapid idea testing, working with code or text.

Additionally, Google AI Studio can be used not only for answering questions but also as a starting point for future projects. The platform provides all the necessary tools, and Google does not claim ownership of the generated content.

To get started with Google AI Studio, visit its website and sign in with your Google account. You can start building and experimenting with AI applications by creating a prompt, using a starter app, or tuning a model on the dashboard. The platform enables you to interact with the Gemini family of models, allowing you to write and test instructions, generate code, and integrate AI into your own projects.

You have access not only to a standard chat with generative AI but also to specialized models for generating media content, music, and applications. Let’s go through each in detail.

Chat

This is the primary workspace in Google AI Studio, where you work with prompts and configure the logic and behavior of your model.

Chat Options

At the top, there are tools for working with the chat itself.

System Instruction

The main configuration block, which defines the “personality,” role, goal, and limitations for the model. It is processed first and serves as a permanent context for the entire dialogue. The system instruction is the foundation of your chatbot.

The field accepts text input. For maximum effectiveness, follow these principles:

define the role (clearly state what the model is),
define the task (explain exactly what the model should do),
set the output format,
establish constraints (prevent the model from going beyond its role).

Example instruction: "You are a Senior developer who helps other developers understand project code. You provide advice and explain the logic of the code. I am a Junior who will ask for your help. Respond in a way I can understand, point out mistakes and gaps in the code with comments. Do not fully rewrite the code I send you—give advice instead."

Show conversation with/without markdown formatting

Displays text with or without markdown formatting.

Get SDK

Provides quick access to API code by copying chat settings into code. All model parameters from the site are automatically included.

Share prompt

Used to send a link to your dialogue with the AI. You must save the prompt before sharing.

Save prompt

Saves the prompt to your Google Drive.

Compare mode

A special interface that allows you to run the same prompt on different language models (or different versions of the same model) simultaneously and instantly see their responses side by side. It’s like parallel execution with a visual comparison.

Clear chat

Deletes all messages in the chat.

Model Parameters

In this window, you select the neural network and configure its behavior.

Model

Select the base language model. AI Studio provides the following options:

Gemini 2.5 Pro: a “thinking” model capable of reasoning about complex coding, math, and STEM problems, analyzing large datasets, codebases, and documents using long context.
Gemini 2.5 Flash: the best model in terms of price-to-performance, suitable for large-scale processing, low-latency tasks, high-volume reasoning, and agentic scenarios.
Gemini 2.5 Flash-Lite: optimized for cost-efficiency and high throughput.

Other available models include Gemini 2.0, Gemma 3, and LearnLM 2.0. More details about Gemini Pro, Flash, Flash-Lite, and others can be found in the official guide.

Temperature: Controls the degree of randomness and creativity in the model’s responses. Higher values produce more diverse and unexpected answers, usually less precise. Lower values make responses more conservative and predictable.
Media resolution: Refers to the level of detail in input media (images and video) that the model processes. Higher resolution allows Gemini to “see” and analyze more details, but requires more tokens for analysis.
Thinking mode: Switches the model into a reasoning mode. The AI decomposes tasks and formulates instructions rather than outputting a result immediately.
Set thinking budget: Limits the maximum number of tokens for the reasoning mode.
Structured output: Allows developers and users to receive AI responses in predefined formats like JSON. You can specify the desired output format manually or via a visual editor.
Grounding with Google Search: Enables Gemini to access Google Search in real-time for the most relevant and up-to-date information. Responses are based on search results rather than internal knowledge, reducing “hallucinations.”
URL Context: Enhances grounding by allowing users to direct Gemini to specific URLs for context, rather than relying on general search.
Stop sequences: Allows up to 5 sequences where the model will immediately stop generating text.

Stream

The Stream mode is an interactive interface for continuous dialogue with Gemini models. Supports microphone, webcam, and screen sharing. The AI can “see” and “hear” what you provide.

Turn coverage: Configures whether the AI continuously considers all input or only during speech, simulating natural conversation including interruptions and interjections.
Affective dialog: Enables AI to recognize emotions in your speech and respond accordingly.
Proactive audio: When enabled, AI filters out background noise and irrelevant conversations, responding only when appropriate.

Generate Media

This section on the left panel provides interfaces for generating media: speech, images, music, and video.

Gemini Speech Generator

Converts text into audio with flexible settings. Use for video voice-overs, audio guides, podcasts, or virtual character dialogues. Tools include Raw Structure (scenario definition), Script Builder, Style Instructions, Add Dialog, Mode (monologue/dialogue), Model Settings, and Voice Settings.

Main tools on the control panel:

Raw Structure: Defines the scenario—how the request to the model for speech generation will be constructed.
Script Builder: Instruction for dialogue with the ability to write lines and pronunciation style for each speaker.
Style Instructions: Set the emotional tone and speech pace (for example: friendly, formal, energetic).
Add Dialog: Add new lines and speakers.
Mode: Choice between monologue and dialogue (up to 2 participants).
Model Settings: Adjust model parameters, for example, temperature, which affects the creativity and unpredictability of speech.
Voice Settings: Select a voice, adjust speed, pauses, pitch, and other parameters for each speaker.

Image Generation

A tool for generating images from a text description (prompt).

Three models are available:

Imagen 4
Imagen 4 Ultra
Imagen 3

Imagen 4 and Imagen 4 Ultra can generate only one image at a time, while Imagen 3 can generate up to four images at once.

To generate, enter a prompt for the image and specify the aspect ratio.

Music Generation

A tool for interactive real-time music creation based on the Lyria RealTime model.

The main feature is that you define the sound you want to hear and adjust its proportion. The more you turn up the regulator, the more intense the sound will be in the final track. You can specify the musical instrument, genre, and mood. The music updates in real time.

Video Generation

A tool for video generation based on Veo 2 and Veo 3 models (API only). Video length up to 8 seconds, 720p quality, 24 frames per second. Supports two resolutions—16:9 and 9:16.

Video generation from an image: Upload a file and write a prompt. The resulting video will start from your image.
Negative prompt support: Allows specifying what should not appear in the frame. This helps fine-tune the neural network’s output.

App Generation

Google AI Studio instantly transforms high-level concepts into working prototypes. To do this, go to the Build section. Describe the desired application in the prompt field and click Run.

AI Studio will analyze this request and suggest a basic architecture, including necessary API calls, data structures, and interaction logic. This saves the developer from routine setup work on the initial project and allows focusing on unique functionality.

The app generation feature relies on an extensive template library.

Conclusion

Google AI Studio has proven itself as a versatile platform for generative AI. It combines Gemini chat, multimodal text, image, audio, video generation, and app prototyping tools in one interface. The platform is invaluable for both developers and general users. Even the free tier of Google AI Studio covers most tasks—from content generation to MVP prototyping. Recent additions include Thinking Mode, Proactive Audio, and Gemini 2.5 Flash, signaling impressive future prospects.

Infrastructure

10.09.2025

8 min read