Meta has released its latest open-source artificial intelligence model, Llama 4, which includes two main versions: Scout and Maverick. Both models utilize an innovative Mixture of Experts (MoE) architecture, enabling efficient processing of multiple data types, including text, images, videos, and audio.
Qwen2.5-Omni is an end-to-end multimodal AI model released by Alibaba, designed to achieve comprehensive perception capabilities. It can process various input formats, including text, images, audio, and video.
Gemini 2.5 Pro is an AI model launched by Google, hailed as its "most intelligent model" yet. It is designed to handle complex tasks, excelling in reasoning capabilities, coding performance, and multimodal input processing.
Qwen2.5-VL-32B is a multimodal vision-language model released by Alibaba, featuring 3.2 billion parameters. It excels in tasks such as image understanding, mathematical reasoning, and text generation.
Reka Flash 3 is a newly released multimodal language model with 2.1 billion parameters, designed for efficient reasoning and generation.
Mistral Small 3.1 is an open-source multimodal AI model released by the French startup Mistral AI. It features 24 billion parameters and supports both text and image processing.
ERNIE 4.5 is Baidu’s first natively multimodal large language model, capable of processing and integrating text, images, audio, and other data types.
Aya Vision is a set of advanced vision-language models designed to address multilingual performance challenges in multimodal AI systems.