The MiniMax MCP Server is an open-source project (MIT licensed) maintained by MiniMax. It aims to enable developers to easily call MiniMax's leading text-to-speech, voice cloning, image, and video generation APIs through the standardized Model Context Protocol (MCP), empowering various AI applications.
The MiniMax MCP Server encapsulates cutting-edge AI models into a series of standardized tool interfaces. According to the official documentation, the following capabilities are currently offered (using these tools may incur API call charges):
Text-to-speech. Convert text into natural and fluent audio. Specify `voiceId` and fine-tune parameters like speed, volume, and pitch.
List voices. Get a list of all currently available voice IDs for selection when calling `text_to_audio`.
Voice cloning. Clone a specific voice based on a provided audio file (local path/URL) and assign it a new `voiceId`.
Text-to-image generation. Generate images based on a text description (`prompt`). Control aspect ratio, quantity, and maintain character consistency by referencing an image.
Generate video. Create video clips from a text prompt (`prompt`), achieving high-quality T2V (text-to-video) effects.
The project is built upon the Model Context Protocol (MCP), offering standardized interfaces and flexible deployment options for easy developer integration.
To cover a broader developer community, MiniMax officially provides implementations in two mainstream programming languages:
The server supports two communication transport protocols to adapt to different deployment scenarios:
An API key must be obtained from the official MiniMax platform before use. Crucially important: The API key must match the region of its corresponding API Host, otherwise an Invalid API key
error will occur.
Key Source: minimax.io
API Host: $https://api.minimaxi.chat$
(注意域名中的 "i")
Key Source: minimaxi.com
API Host: $https://api.minimax.chat$
The server supports configuration via environment variables (e.g., MINIMAX_API_KEY
), command-line arguments, configuration files, etc.
Following the MCP standard, it seamlessly integrates with various mainstream AI agent clients and development tools, embedding MiniMax capabilities into existing toolchains.
According to official documentation, supported clients include, but are not limited to:
Integration typically involves specifying the MiniMax MCP server startup method in the client configuration (e.g., using the uvx minimax-mcp
command) and necessary environment variables (API Key, Host, local output path MINIMAX_MCP_BASE_PATH
, etc.).
Dependency Tip: The official Python implementation recommends using uv
(a fast Python package manager) for installation and execution. Ensure uv
or uvx
is in your system path, or specify its absolute path in the configuration.
The powerful capabilities of the MCP server are rooted in MiniMax's self-developed, industry-leading matrix of foundational AI models. These models are core to achieving high-quality multimodal generation.
Such as MiniMax-Text-01 (large-scale MoE language model) and MiniMax-VL-01 (vision language model), providing a solid foundation for understanding and reasoning.
Such as the advanced Speech series models (Speech-02, etc.), driving high-quality, high-fidelity TTS and realistic voice cloning capabilities.
Such as the Image-01 and Video-01 series models (including the Director model emphasizing narrative control), supporting high-quality image generation and cinematic video creation.
The role of the MCP server is to present these powerful proprietary model capabilities to developers through simple, open, standardized MCP protocol interfaces, enabling effective technology output.
Visit the MiniMax MCP Server's GitHub repository, check out the detailed documentation and examples, integrate leading multimodal capabilities into your AI applications, and explore infinite innovation possibilities.