MiniMax MCP Server

The MiniMax MCP Server is an open-source project (MIT licensed) maintained by MiniMax. It aims to enable developers to easily call MiniMax's leading text-to-speech, voice cloning, image, and video generation APIs through the standardized Model Context Protocol (MCP), empowering various AI applications.

Core Tools Provided via MCP

The MiniMax MCP Server encapsulates cutting-edge AI models into a series of standardized tool interfaces. According to the official documentation, the following capabilities are currently offered (using these tools may incur API call charges):

text to audio

Text-to-speech. Convert text into natural and fluent audio. Specify `voiceId` and fine-tune parameters like speed, volume, and pitch.

list voices

List voices. Get a list of all currently available voice IDs for selection when calling `text_to_audio`.

voice clone

Voice cloning. Clone a specific voice based on a provided audio file (local path/URL) and assign it a new `voiceId`.

text to image

Text-to-image generation. Generate images based on a text description (`prompt`). Control aspect ratio, quantity, and maintain character consistency by referencing an image.

generate video

Generate video. Create video clips from a text prompt (`prompt`), achieving high-quality T2V (text-to-video) effects.

Open Standards & Technical Implementation

The project is built upon the Model Context Protocol (MCP), offering standardized interfaces and flexible deployment options for easy developer integration.

Official Implementation Versions

To cover a broader developer community, MiniMax officially provides implementations in two mainstream programming languages:

Python Implementation: The main version, code hosted at MiniMax-AI/MiniMax-MCP
JavaScript/TypeScript Implementation: For the Node.js environment, code located at MiniMax-AI/MiniMax-MCP-JS

Transport Mechanism & Deployment

The server supports two communication transport protocols to adapt to different deployment scenarios:

stdio (Standard Input/Output): Primarily for local execution, where the client directly starts and manages the server process. Suitable for handling local file path inputs.
SSE (Server-Sent Events): Based on HTTP, allowing the server to be deployed as an independent service (local or cloud), accessed by clients via a network URL. Recommended for specifying cloud input resources using URLs.

API Key & Region Configuration

An API key must be obtained from the official MiniMax platform before use. Crucially important: The API key must match the region of its corresponding API Host, otherwise an Invalid API key error will occur.

Global Region

Key Source: minimax.io

API Host: $https://api.minimaxi.chat$ (注意域名中的 "i")

Mainland China Region

Key Source: minimaxi.com

API Host: $https://api.minimax.chat$

The server supports configuration via environment variables (e.g., MINIMAX_API_KEY), command-line arguments, configuration files, etc.

Easy Integration into Development Workflows

Following the MCP standard, it seamlessly integrates with various mainstream AI agent clients and development tools, embedding MiniMax capabilities into existing toolchains.

Broad Client Compatibility

According to official documentation, supported clients include, but are not limited to:

Claude Desktop

Cursor

ModelScope (魔搭)

Windsurf

OpenAI Agents

... and other MCP clients

Quick Start Integration

Integration typically involves specifying the MiniMax MCP server startup method in the client configuration (e.g., using the uvx minimax-mcp command) and necessary environment variables (API Key, Host, local output path MINIMAX_MCP_BASE_PATH, etc.).

Dependency Tip: The official Python implementation recommends using uv (a fast Python package manager) for installation and execution. Ensure uv or uvx is in your system path, or specify its absolute path in the configuration.

View Detailed Configuration Guide

The Driving Force Behind the Protocol

The powerful capabilities of the MCP server are rooted in MiniMax's self-developed, industry-leading matrix of foundational AI models. These models are core to achieving high-quality multimodal generation.

Text & Vision Language Models

Such as MiniMax-Text-01 (large-scale MoE language model) and MiniMax-VL-01 (vision language model), providing a solid foundation for understanding and reasoning.

Speech & Audio Models

Such as the advanced Speech series models (Speech-02, etc.), driving high-quality, high-fidelity TTS and realistic voice cloning capabilities.

Image & Video Models

Such as the Image-01 and Video-01 series models (including the Director model emphasizing narrative control), supporting high-quality image generation and cinematic video creation.

The role of the MCP server is to present these powerful proprietary model capabilities to developers through simple, open, standardized MCP protocol interfaces, enabling effective technology output.

Ready to Start Building?

Visit the MiniMax MCP Server's GitHub repository, check out the detailed documentation and examples, integrate leading multimodal capabilities into your AI applications, and explore infinite innovation possibilities.

Read GitHub Docs Get API Key (Global) Get API Key (Mainland China)