Venice’s New Model Paradigm

Venice’s New Model Paradigm

Venice is renewing and simplifying our model selection with a streamlined, curated list of LLMs to reduce model sprawl and redundancy, and to highlight which models are best for which tasks.

Venice.ai

Venice is renewing and simplifying our model selection with a streamlined, curated list of LLMs to reduce model sprawl and redundancy, and to highlight which models are best for which tasks.

There are three significant changes with this new model offering: first, we're co-releasing our first model developed in collaboration with the Dolphin team: Dolphin Mistral 24B Venice Edition, Venice’s most uncensored model ever. Moving forward, this model will be our default model and is provided as “Venice Uncensored” in the new model list.

Read the full blog on the new Dolphin Mistral 24B Venice Edition model here

Second, Venice is also adding the newly released Llama 4 Maverick mixture-of-experts model, a faster and vision-enabled upgrade, with the largest context size of all our models at 256,000 tokens. We’ve also added a jailbreak prompt in an effort to mitigate the censorship restrictions inherent within Maverick, and our internal tests resulted in a 3x decrease in refusals. This model is provided as “Venice Large” in the new model list.

Finally, we're adopting a simpler way for users to select the right model for their needs and introducing five distinct model categories.

Five model categories: Uncensored, Reasoning, Small, Medium, Large

Each model has a clear purpose. Larger models are more intelligent. Smaller models are faster. We will always include a link to the source model’s Huggingface page for curious users who want technical details of each model, but the goal is to make it much easier for normal users new to AI to understand which model is best for their use case. All models have been given web search capability with the option to toggle it on or off.

Venice Uncensored (Dolphin Mistral 24B Venice Edition) - New Default

  • Best for: Creative writing, role-playing, philosophical discussions

  • Advantages: Minimal artificial content restrictions, authentic responses

  • Use when: You want maximum creative freedom and authentic interaction

Venice Reasoning (Qwen QwQ 32B)

  • Best for: Problem solving, exploring multiple angles of a question

  • Advantages: Generates a special thinking response prior to composing its final response granting overall improved response quality

  • Use when: You have a complex multilayered query where you want to give the model extra time to return a much higher quality output than other models may be capable of

Venice Small (Llama 3.2 3B)

  • Best for: Quick responses, simple queries, and basic tasks

  • Advantages: Extremely fast, lightweight, ideal for mobile use

  • Use when: You need a quick answer or basic assistance

Venice Medium (Mistral Small 3.1 24B)

  • Best for: Balanced performance, more versatile tasks

  • Advantages: Vision-enabled. Good balance of speed and capability, well-rounded performance

  • Use when: You need reliable responses for moderate complexity tasks

Venice Large (Llama 4 Maverick 17B/128B MoE)

  • Best for: Complex reasoning, detailed explanations, technical analysis

  • Advantages: Enhanced intelligence, 256K token context window, multimodal capabilities

  • Use when: You're working on complex tasks requiring deep understanding

Retiring legacy models

To keep our model selection streamlined and curated, we're retiring a number of models that are currently underused relative to the cost of providing them.

Most notably, we're retiring DeepSeek R1 and replacing it with the newly released Llama 4 Maverick mixture-of-experts model, a faster and vision-enabled upgrade with the largest context size of all our models at 256,000 tokens.

The full list of models to be retired from our chat interface are:

  • Llama 3.3 70B

  • Llama 3.1 405B

  • Qwen 2.5 VL 70B

  • DeepSeek R1 671B

  • Dolphin 72B

These models will be retired fully from our app platform on May 15, with Llama 3.1 405B retiring on May 30.

Why are we retiring DeepSeek from the Venice app?

There were two issues leading us to end DeepSeek support in the web app:

  1. Speed considerations: DeepSeek's greatest strength is its greatest weakness - its process of using thinking tokens gives it high intelligence but makes it an extremely slow model, adding friction to user chats (both for the user awaiting the response, and to all other users awaiting access to the model). Relative to the speed/quality of the new Llama 4 Maverick, it doesn’t make sense to continue support.

  2. Usage patterns: The massive public interest since DeepSeek R1's launch has trailed off, with only around 5% of Venice chats currently utilizing DeepSeek, despite comprising two-thirds of total Venice GPU/inference costs.

DeepSeek will continue to be available in the Venice API for the time being.

Try Venice's new models now

By streamlining our curated list of LLMs into five distinct categories – Venice Uncensored, Venice Reasoning, Venice Small, Venice Medium, and Venice Large – we're making it easier for users to select the best model for their specific needs.

The introduction of the Dolphin Mistral 24B Venice Edition and the Llama 4 Maverick model brings enhanced capabilities and performance, while the retirement of older models eliminates redundancy and enables greater scalability of Venice infrastructure, leading to higher throughput rates for users.

As Venice continues to evolve and improve, this consolidation will enhance the overall user experience, providing faster, more efficient, and more effective interactions with our curated open-source AI models.

For developers and power users, all current models remain available through our API. Learn more about our API capabilities at docs.venice.ai.

Unlike other AI platforms, Venice never stores your conversations or monitors your interactions. Your AI experience remains completely private and uncensored.

Back to all posts
Room