MiniCPM is a series of edge-based large language models (LLMs) jointly developed by WALL Intelligence and the Tsinghua University Natural Language Processing Lab.
MiniCPM Series Model Versions
-
MiniCPM-2B
- Parameters: 2.4 billion (excluding embedding parameters)
- Features: Despite having fewer parameters, it excels in Chinese, math, and coding abilities, outperforming larger models like Llama2-13B in overall performance.
- Use Cases: Suitable for tasks such as text generation, translation, and summarization.
-
MiniCPM-V 2.6
- Parameters: 8 billion
- Features: The latest and most powerful model, supporting multi-image dialogue and reasoning. It handles images of any aspect ratio and performs well in OCR (optical character recognition).
- Use Cases: Multimodal understanding, including image and video descriptions, dialogue, and reasoning.
-
MiniCPM-2B-128k
- Parameters: 2.4 billion (excluding embedding parameters)
- Features: Supports a 128k context length, achieving the best performance on the InfiniteBench evaluation for models below 7B parameters, although performance drops for contexts under 4k.
- Use Cases: Long-text processing tasks, such as generating and analyzing lengthy articles.
-
MiniCPM-1B-SFT
- Parameters: 1 billion
- Features: A more lightweight version fine-tuned on instructions, designed for text and multimodal reasoning on mobile devices.
- Use Cases: Natural language processing and multimodal tasks on mobile platforms.
Application Scenarios
-
Natural Language Processing
- Text Generation: MiniCPM can generate high-quality text content, such as news articles and creative stories.
- Translation: Supports multilingual translation, improving accuracy and fluency.
- Summarization: Extracts key information from long texts to generate concise summaries.
-
Multimodal Understanding
- Image Recognition: MiniCPM-V 2.6 excels at OCR, capable of recognizing complex scene texts.
- Video Analysis: Supports understanding and analyzing multiple images and videos, useful for surveillance and content review.
- Image-Text Dialogue: Handles mixed input of images and text for multimodal dialogue and reasoning.
-
Mobile Applications
- Smart Assistant: MiniCPM can be deployed on smartphones and tablets to provide smart dialogue, information retrieval, and more.
- Real-time Translation: Enables real-time translation on mobile devices, allowing users to communicate across different languages.
-
Education
- Smart Classroom: MiniCPM helps students access study materials and resolve doubts more efficiently, improving the quality of learning.
- Intelligent Tutoring: Provides personalized learning advice and tutoring to help students better understand and master knowledge.
-
Business Applications
- Invoice Recognition: MiniCPM can be used in business settings for tasks like invoice recognition and contract review, where OCR is required.
- Customer Service: Enhances customer service efficiency and satisfaction through intelligent dialogue systems.
-
Cultural Heritage Preservation
- Ancient Text Recognition: MiniCPM’s strong OCR capabilities enable it to recognize and interpret ancient texts, aiding in cultural heritage preservation and research.
Open-Source Versions
- MiniCPM3-4B
- Parameters: 4 billion
- Features: The third generation of MiniCPM models, with overall performance exceeding Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125. It supports function calling and code interpretation, making it suitable for a wider range of tasks.
- Open-source: Available on GitHub.
- MiniCPM-V 2.6
- Parameters: 8 billion
- Features: The most recent and powerful model, supporting multi-image dialogue and reasoning, excelling in OCR tasks, and achieving top scores on multimodal evaluation benchmarks.
- Open-source: Available on Hugging Face and GitHub.
- MiniCPM-2B
- Parameters: 2.4 billion
- Features: Supports a 128k context length and achieved top performance on InfiniteBench for models below 7B, though it has reduced performance with shorter contexts.
- Open-source: Available on GitHub.
- MiniCPM-1B-SFT
- Parameters: 1 billion
- Features: A lightweight version fine-tuned for instruction-following and multimodal reasoning on mobile devices.
- Open-source: Available on GitHub.
Closed-Source Versions
-
MiniCPM-Llama3-V 2.5
- Parameters: 8 billion
- Features: This model surpasses commercial closed-source models like GPT-4V-1106, Gemini Pro, Claude 3, and Qwen-VL-Max in multimodal performance. It has enhanced OCR and instruction-following abilities, supporting over 30 languages and delivering GPT-4V-level multimodal capabilities on edge devices for the first time.
- Use Cases: Suitable for tasks like image recognition, video analysis, and multilingual translation.
-
MiniCPM-V 2.6
- Parameters: 8 billion
- Features: The latest and most powerful model in the MiniCPM series, supporting multi-image dialogue and reasoning. It handles any image aspect ratio and performs exceptionally well in OCR tasks, achieving top scores in several multimodal evaluation benchmarks.
- Use Cases: Multimodal understanding, including image and video description, dialogue, and reasoning.
-
MiniCPM-2B-128k
- Parameters: 2.4 billion
- Features: Supports a 128k context length and achieved top performance on InfiniteBench for models below 7B, though performance drops with contexts under 4k.
- Use Cases: Long-text processing tasks, such as generating and analyzing lengthy documents.