- ThePrompt
- Posts
- GPT4V has new rivals: LLaVa 1.5 & Fuyu-8B
GPT4V has new rivals: LLaVa 1.5 & Fuyu-8B
PLUS: Text to Gif, AI app builder by Google, Robots that can self-reward to do complex tasks
Hi folks!šš» This is The Prompt!
Hereās what we have today:
2 new multi-modal models
no-code visual AI app builder by Google
NVIDIAās robots can self-reward and do tasks at human speed
New text-to-gif model
Let's get it
FEATURED
OpenAI's GPT-4V has new competitors
OpenAI may release developer access for their āmultimodalā GPT-4VĀ model at their upcoming DevDay event in a few weeks.
ChatGPT+ users already have access in the chat, and the first impressions have been very interesting ā and divided. People are impressed by certain abilities, but there are also some concerning problems.
However, this week, we got two new multimodal models; both of them are open-sourced & arenāt licensed for commercial use (yet?).
Hereās a quick breakdown on their specs/usesšš»
LLaVa 1.5
LLaVa 1.5 was released by a team of researchers, and like GPT-4V, can answer questions about images.
Whatās interesting about this model is that itās easy to get it running on consumer-level hardware (GPU with less than 8GB of VRAM).
First impressions:
it can easily locate an object in a photo;
can explain memes;
canāt reliably recognize text';
Fuyu-8B by Adept
Fuyu-8B is an open-source multimodal model by Adept. It understands āknowledge workerā data such as charts, graphs and screens, enabling it to manipulate ā and reason over ā this data.
First impressions:
can locate very specific elements on a screen;
can extract details from softwareās UI;
answer questions about charts/diagrams;
no moderation mechanisms or prompt injection guardrails.
šØ What else is going on
Google is working on a secret project named Stubb, a no-code visual builder for AI prototypes that will potentially include multi-modal support with Gemini
DeepMind released a paper proposing a framework for evaluating the societal and ethical risks of AI systems.
NVIDIA has unveiled Eureka, an AI agent built on GPT-4, that autonomously generates rewards can then be used to acquire complex skills via reinforcement learning like the āpen spinningā skill below ā at human speed! š¤Æ
š Resources
[interesting] AI models explained with simple animations
[tutorial] How to build your own AI-generated image with ControlNet and Stable Diffusion
[API] Turn text to gif with this latest model that works alongside Stable Diffusion
[opportunity] The Rundown is hiring āAI tool testerā
[online event] Beyond the hype: Preparing for AI in 2024 (speakers from OpenAI, Meta)
go-pro video of a polar bear diving in the ocean, 8k, HD, dslr, nature footage
š§° Tools
āš¼ Prompt of the Day
TOOL
DALLE-3
PROMPT
a flat panda head origami logo, white background
RESULT