MiniCPM

MiniCPM is a series of edge-based large language models (LLMs) jointly developed by WALL Intelligence and the Tsinghua University Natural Language Processing Lab.

Introduction

MiniCPM is a series of edge-based large language models (LLMs) jointly developed by WALL Intelligence and the Tsinghua University Natural Language Processing Lab.

MiniCPM Series Model Versions
  • MiniCPM-2B

    • Parameters: 2.4 billion (excluding embedding parameters)
    • Features: Despite having fewer parameters, it excels in Chinese, math, and coding abilities, outperforming larger models like Llama2-13B in overall performance.
    • Use Cases: Suitable for tasks such as text generation, translation, and summarization.
  • MiniCPM-V 2.6

    • Parameters: 8 billion
    • Features: The latest and most powerful model, supporting multi-image dialogue and reasoning. It handles images of any aspect ratio and performs well in OCR (optical character recognition).
    • Use Cases: Multimodal understanding, including image and video descriptions, dialogue, and reasoning.
  • MiniCPM-2B-128k

    • Parameters: 2.4 billion (excluding embedding parameters)
    • Features: Supports a 128k context length, achieving the best performance on the InfiniteBench evaluation for models below 7B parameters, although performance drops for contexts under 4k.
    • Use Cases: Long-text processing tasks, such as generating and analyzing lengthy articles.
  • MiniCPM-1B-SFT

    • Parameters: 1 billion
    • Features: A more lightweight version fine-tuned on instructions, designed for text and multimodal reasoning on mobile devices.
    • Use Cases: Natural language processing and multimodal tasks on mobile platforms.
Application Scenarios
  1. Natural Language Processing

    • Text Generation: MiniCPM can generate high-quality text content, such as news articles and creative stories.
    • Translation: Supports multilingual translation, improving accuracy and fluency.
    • Summarization: Extracts key information from long texts to generate concise summaries.
  2. Multimodal Understanding

    • Image Recognition: MiniCPM-V 2.6 excels at OCR, capable of recognizing complex scene texts.
    • Video Analysis: Supports understanding and analyzing multiple images and videos, useful for surveillance and content review.
    • Image-Text Dialogue: Handles mixed input of images and text for multimodal dialogue and reasoning.
  3. Mobile Applications

    • Smart Assistant: MiniCPM can be deployed on smartphones and tablets to provide smart dialogue, information retrieval, and more.
    • Real-time Translation: Enables real-time translation on mobile devices, allowing users to communicate across different languages.
  4. Education

    • Smart Classroom: MiniCPM helps students access study materials and resolve doubts more efficiently, improving the quality of learning.
    • Intelligent Tutoring: Provides personalized learning advice and tutoring to help students better understand and master knowledge.
  5. Business Applications

    • Invoice Recognition: MiniCPM can be used in business settings for tasks like invoice recognition and contract review, where OCR is required.
    • Customer Service: Enhances customer service efficiency and satisfaction through intelligent dialogue systems.
  6. Cultural Heritage Preservation

    • Ancient Text Recognition: MiniCPM’s strong OCR capabilities enable it to recognize and interpret ancient texts, aiding in cultural heritage preservation and research.
Open-Source Versions
  • MiniCPM3-4B
    • Parameters: 4 billion
    • Features: The third generation of MiniCPM models, with overall performance exceeding Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125. It supports function calling and code interpretation, making it suitable for a wider range of tasks.
    • Open-source: Available on GitHub.
  • MiniCPM-V 2.6
    • Parameters: 8 billion
    • Features: The most recent and powerful model, supporting multi-image dialogue and reasoning, excelling in OCR tasks, and achieving top scores on multimodal evaluation benchmarks.
    • Open-source: Available on Hugging Face and GitHub.
  • MiniCPM-2B
    • Parameters: 2.4 billion
    • Features: Supports a 128k context length and achieved top performance on InfiniteBench for models below 7B, though it has reduced performance with shorter contexts.
    • Open-source: Available on GitHub.
  • MiniCPM-1B-SFT
    • Parameters: 1 billion
    • Features: A lightweight version fine-tuned for instruction-following and multimodal reasoning on mobile devices.
    • Open-source: Available on GitHub.
Closed-Source Versions
  • MiniCPM-Llama3-V 2.5

    • Parameters: 8 billion
    • Features: This model surpasses commercial closed-source models like GPT-4V-1106, Gemini Pro, Claude 3, and Qwen-VL-Max in multimodal performance. It has enhanced OCR and instruction-following abilities, supporting over 30 languages and delivering GPT-4V-level multimodal capabilities on edge devices for the first time.
    • Use Cases: Suitable for tasks like image recognition, video analysis, and multilingual translation.
  • MiniCPM-V 2.6

    • Parameters: 8 billion
    • Features: The latest and most powerful model in the MiniCPM series, supporting multi-image dialogue and reasoning. It handles any image aspect ratio and performs exceptionally well in OCR tasks, achieving top scores in several multimodal evaluation benchmarks.
    • Use Cases: Multimodal understanding, including image and video description, dialogue, and reasoning.
  • MiniCPM-2B-128k

    • Parameters: 2.4 billion
    • Features: Supports a 128k context length and achieved top performance on InfiniteBench for models below 7B, though performance drops with contexts under 4k.
    • Use Cases: Long-text processing tasks, such as generating and analyzing lengthy documents.

Information

Categories

Newsletter

Subscribe online

Subscribe to our newsletter for the latest news and updates