MiniMax-M1

MiniMax-M1 is an open-source large-scale hybrid attention reasoning model based on a Mixture of Experts (MoE) architecture.

Model Architecture

Mixture of Experts (MoE): MiniMax-M1 adopts a Mixture of Experts architecture combined with a Flash Attention mechanism. This design enables higher efficiency and flexibility in handling complex tasks.
Parameter Count: The model has a total of 456 billion parameters, with 4.59 billion active parameters per token.

Context Handling Capability

Ultra-Long Context Support: MiniMax-M1 natively supports up to 1 million tokens in context length, making it excellent at processing long text inputs—eight times that of DeepSeek R1.

Computational Efficiency

Efficient Inference: When generating 100,000 tokens of text, MiniMax-M1 requires only 25% of the floating-point operations compared to DeepSeek R1, significantly improving inference efficiency.

Training and Optimization

Reinforcement Learning Training: The model is trained using large-scale reinforcement learning (RL), covering a wide range of complex problems, including traditional mathematical reasoning and real-world software engineering environments.
CISPO Algorithm: MiniMax-M1 introduces an innovative algorithm called CISPO, which optimizes training efficiency by pruning importance sampling weights instead of token updates.

Versions and Applicability

Multiple Versions: MiniMax-M1 offers 40K and 80K thinking budget versions to suit different application needs.

Open-Source Features

Openness: As an open-source model, MiniMax-M1 allows developers to customize it according to their needs, promoting technological innovation and knowledge sharing.

Application Scenarios

Long Text Processing: With support for up to 1 million tokens, MiniMax-M1 is ideal for tasks that require handling long inputs, such as document analysis and legal text interpretation.
Complex Reasoning Tasks: The model excels in mathematical reasoning, logical reasoning, and software engineering, capable of handling intricate reasoning problems.
Tool Use: MiniMax-M1 supports structured function calling, capable of recognizing and outputting external function call parameters, making it suitable for scenarios requiring integration with other software or APIs.
Chatbots and APIs: The model provides chatbots with online search and APIs that support video generation, image creation, and speech synthesis—ideal for developing intelligent assistants and multimedia applications.
Education and Research: In the education sector, MiniMax-M1 can assist students with complex assignment analysis and summaries, offering in-depth research support.
Creative Writing: The model can offer inspiration and editorial suggestions for writers and creatives, aiding multi-layered analysis during the writing process.
Data Extraction and Summarization: MiniMax-M1 has accurate information extraction capabilities, making it suitable for meeting minutes and summary generation tasks, quickly producing key insights and overviews.

Introduction