• ThePrompt
  • Posts
  • GPT4V has new rivals: LLaVa 1.5 & Fuyu-8B

GPT4V has new rivals: LLaVa 1.5 & Fuyu-8B

PLUS: Text to Gif, AI app builder by Google, Robots that can self-reward to do complex tasks

Hi folks!šŸ‘‹šŸ» This is The Prompt!

Hereā€™s what we have today:

  • 2 new multi-modal models

  • no-code visual AI app builder by Google

  • NVIDIAā€™s robots can self-reward and do tasks at human speed

  • New text-to-gif model

Let's get it

FEATURED

OpenAI's GPT-4V has new competitors

OpenAI may release developer access for their ā€˜multimodalā€™ GPT-4VĀ model at their upcoming DevDay event in a few weeks.

ChatGPT+ users already have access in the chat, and the first impressions have been very interesting ā€” and divided. People are impressed by certain abilities, but there are also some concerning problems.

However, this week, we got two new multimodal models; both of them are open-sourced & arenā€™t licensed for commercial use (yet?).

Hereā€™s a quick breakdown on their specs/usesšŸ‘‡šŸ»

LLaVa 1.5

LLaVa 1.5 was released by a team of researchers, and like GPT-4V, can answer questions about images.

Whatā€™s interesting about this model is that itā€™s easy to get it running on consumer-level hardware (GPU with less than 8GB of VRAM).

First impressions:

  • it can easily locate an object in a photo;

  • can explain memes;

  • canā€™t reliably recognize text';

Fuyu-8B by Adept

Fuyu-8B is an open-source multimodal model by Adept. It understands ā€œknowledge workerā€ data such as charts, graphs and screens, enabling it to manipulate ā€” and reason over ā€” this data.

First impressions:

  • can locate very specific elements on a screen;

  • can extract details from softwareā€™s UI;

  • answer questions about charts/diagrams;

  • no moderation mechanisms or prompt injection guardrails.

šŸšØ What else is going on

  • Google is working on a secret project named Stubb, a no-code visual builder for AI prototypes that will potentially include multi-modal support with Gemini

  • DeepMind released a paper proposing a framework for evaluating the societal and ethical risks of AI systems.

  • NVIDIA has unveiled Eureka, an AI agent built on GPT-4, that autonomously generates rewards can then be used to acquire complex skills via reinforcement learning like the ā€œpen spinningā€ skill below ā€” at human speed! šŸ¤Æ

šŸ“• Resources

  • [interesting] AI models explained with simple animations

  • [tutorial] How to build your own AI-generated image with ControlNet and Stable Diffusion

  • [API] Turn text to gif with this latest model that works alongside Stable Diffusion

  • [opportunity] The Rundown is hiring ā€œAI tool testerā€

  • [online event] Beyond the hype: Preparing for AI in 2024 (speakers from OpenAI, Meta)

go-pro video of a polar bear diving in the ocean, 8k, HD, dslr, nature footage

šŸ§° Tools

  • LogoTheme: Turn your logo into seasonal logo mockups

  • DoCue: Draft sales proposals with AI

  • Arch: AI Email Copilot

  • Questgen: Generate quizzes with AI

  • Helpix: Solve customer questions on auto-pilot

āœšŸ¼ Prompt of the Day

TOOL

DALLE-3

PROMPT

a flat panda head origami logo, white background

RESULT

Subscribe to keep reading

This content is free, but you must be subscribed to ThePrompt to continue reading.

Already a subscriber?Sign In.Not now