Zi Dong Tai Chu is a fully multimodal large model jointly developed by the Institute of Automation, Chinese Academy of Sciences, and the Wuhan Artificial Intelligence Research Institute. The model was first released in 2021 as "Zi Dong Tai Chu 1.0," marking the world’s first tri-modal model that integrates image, text, and audio modalities, enabling unified representation and mutual generation among them. Building on this, the upgraded version "Zi Dong Tai Chu 2.0" was released in 2023, further enhancing its multimodal understanding and generation capabilities.
Model Versions
Zi Dong Tai Chu 1.0
- Release Date: 2021
- Key Features:
The world’s first tri-modal model integrating image, text, and audio modalities. It can achieve unified representation and mutual generation among these modes.
Primarily applied to basic multimodal tasks such as image recognition, text generation, and speech recognition.
Zi Dong Tai Chu 2.0
- Release Date: June 2023
- Key Features:
Building on the 1.0 version, it adds video, sensor signals, and 3D point cloud data, further enhancing multimodal understanding and generation capabilities.
Breaks through key technologies such as multimodal association for cognitive enhancement, with comprehensive abilities in understanding, generating, and associating multimodal data.
Application scenarios have expanded to fields such as healthcare, legal consultation, traffic management, smart manufacturing, and smart cities.
Zi Dong Tai Chu 3.0
- Expected Release: First half of 2024
- Expected Features:
Further improves the model’s ability to empower various industries with the ability to autonomously select and use tools, meeting deeper logical interaction needs.
In the smart driving field, by leveraging large language models and multimodal capabilities, it significantly shortens and optimizes the training process, enhancing the efficiency of intelligent vehicles’ perception of the world.
Further optimizes the cognitive integration of speech, video, and text, as well as functions such as commonsense reasoning.
Pricing Models
Charge by Number of Calls or Data Volume
The API usage of Zi Dong Tai Chu is generally charged based on the number of API calls or the data volume processed. This pricing model is flexible and easy to understand, allowing users to pay according to their actual usage.
- Number of Calls: Each API call is counted towards the total, and users are charged based on the number of calls.
- Data Volume: Charges are based on the amount of data processed, with larger data volumes incurring higher fees.
Token-Based Billing
Zi Dong Tai Chu uses token-based billing as the unit of charge. A token is a basic unit in natural language processing. In Chinese, 1 token typically corresponds to 1-2 Chinese characters; in English, 1 token corresponds to approximately 3-4 letters.
- Chinese Text: 1 token corresponds to around 1-2 Chinese characters.
- English Text: 1 token corresponds to around 3-4 letters.
Subscription Packages
Users can choose from different subscription packages based on their needs. The official provider may offer various package options to meet the needs of different users.
- Basic Package: Suitable for small-scale use with lower costs.
- Advanced Package: Suitable for large-scale use, offering more calls and data processing at a higher cost.
Free Tier
The official provider may offer a free tier, allowing users to try the service within a certain limit before requiring payment for any usage beyond that.
Customized Services
For enterprises or developers with special needs, customized services may be provided, typically with a separate fee structure.
Application Scenarios
Healthcare
- Neurosurgery Navigation: In neurosurgery, it can integrate visual, tactile, and other multimodal information in real-time, assisting doctors in making real-time decisions, thereby improving the accuracy and safety of surgery.
- Multimodal Diagnostic Differentiation: By analyzing multimodal medical data (such as images, text, and signals), it provides more accurate diagnostic results to assist doctors in diagnosing complex cases.
Legal Consultation
- Legal Consultation Services: Through the analysis of multimodal data, it enhances the accuracy and efficiency of legal consultation, handling complex legal issues and providing high-quality legal advice.
Traffic
- Traffic Violation Image Analysis: Combines image and audio data to analyze traffic scenes, identify traffic violations, and improve the efficiency and accuracy of traffic management.
- Intelligent Driving: In the field of intelligent driving, its large language model and multimodal capabilities significantly shorten and optimize the training process, enhancing the efficiency of intelligent vehicles’ perception.
Smart Manufacturing
- Production Process Optimization: By intelligently analyzing and optimizing production processes, it improves production efficiency and product quality, applicable to all stages of smart manufacturing.
Smart Cities
- City Management: Supports city management, traffic scheduling, and public safety by analyzing and processing multimodal data, enhancing the intelligence of urban management.
Smart Tourism
- Cultural Tourism: In the smart tourism field, it combines and analyzes multimodal data to provide personalized travel recommendations and intelligent navigation services, enhancing the tourist experience.
Smart Education
- Educational Assistance: In the smart education field, it analyzes and processes multimodal data to provide personalized learning suggestions and intelligent tutoring, improving education quality and learning outcomes.
Creativity and Entertainment
- Music Understanding and Generation: Capable of understanding and generating music content, applicable to music creation and audio editing.
- Image Generation and Editing: Generates images based on text descriptions, supporting creative design and advertising production.
- Video Generation and Editing: Capable of generating and editing video content, applicable to short video production and visual effects in film and television.
Open-Source Strategy
Zi Dong Tai Chu primarily offers open-source versions, promoting technological innovation through open-source strategies, reducing usage costs, and driving rapid development of large model technology through community collaboration.