multimodal model

Model

Gemma 3n

Gemma 3n is a multimodal generative AI model launched by Google, specifically designed for efficient operation on mobile devices.

multimodal model Open source

Model

Visit Website

Mistral Medium 3

Details

Mistral Medium 3 is a new multimodal artificial intelligence model developed by the French startup Mistral AI, designed to provide high-performance and cost-effective solutions for enterprises.

multimodal model Large language model

VoiceCanvas

Instant text-to-speech in 50+ languages with voice cloning, powered by advanced AI for natural and clear voice synthesis.

Model

Visit Website

Gemini 2.5 Pro Preview

Details

Gemini 2.5 Pro Preview (I/O Edition) is Google's latest AI model designed to enhance coding capabilities, particularly in building interactive web applications.

multimodal model Large language model

Model

Visit Website

Llama 4

Details

Meta has released its latest open-source artificial intelligence model, Llama 4, which includes two main versions: Scout and Maverick. Both models utilize an innovative Mixture of Experts (MoE) architecture, enabling efficient processing of multiple data types, including text, images, videos, and audio.

multimodal model Open source

Model

Visit Website

Qwen2.5-Omni

Details

Qwen2.5-Omni is an end-to-end multimodal AI model released by Alibaba, designed to achieve comprehensive perception capabilities. It can process various input formats, including text, images, audio, and video.

multimodal model Open source

Model

Visit Website

Gemini 2.5 Pro

Details

Gemini 2.5 Pro is an AI model launched by Google, hailed as its "most intelligent model" yet. It is designed to handle complex tasks, excelling in reasoning capabilities, coding performance, and multimodal input processing.

visual model multimodal model

Model

Visit Website

Qwen2.5-VL-32B

Details

Qwen2.5-VL-32B is a multimodal vision-language model released by Alibaba, featuring 3.2 billion parameters. It excels in tasks such as image understanding, mathematical reasoning, and text generation.

multimodal model visual model Open source

Model

Visit Website

Reka Flash 3

Details

Reka Flash 3 is a newly released multimodal language model with 2.1 billion parameters, designed for efficient reasoning and generation.

Open source multimodal model

Model

Visit Website

Mistral Small 3.1

Details

Mistral Small 3.1 is an open-source multimodal AI model released by the French startup Mistral AI. It features 24 billion parameters and supports both text and image processing.

multimodal model Open source