• ThePrompt
  • Posts
  • AudioPaLM: AI can Speak and Listen ๐Ÿ”ฅ

AudioPaLM: AI can Speak and Listen ๐Ÿ”ฅ

PLUS: SDXL 0.9 - most advanced text-to-image AI

FEATURED

AudioPaLM: A Model That Can Speak and Listen ๐Ÿ”ฅ

The AudioPaLM model is designed to understand and create both text and speech.

It's a combination of two previous models by Google:

  • PaLM-2, great at understanding and creating written text.

  • AudioLM, a model specialized in speech generator

This model can convert speech to text, text to speech, and even speech from one language to another.

Plus, AudioPaLM doesn't just deal with the words people say but also how they say them. This includes aspects like who's speaking (speaker identity) and how the voice sounds (intonation).

Multi-modality improves results

The creators found that they could make this model even better by using the lessons learned from text-based models.

There's a lot more text data out there than speech data, so this helped the model understand speech even better.

And the results are really impressive ๐Ÿ‘‡๐Ÿป

AudioPaLM outperforms *every other model* in translating speech across languages, even ones it wasn't trained on. And, it can also mimic a speaker's voice in different languages from a short audio sample.

NEW TECH

Stability AI launched the most advanced open-source text-to-image model: SDXL 0.9 ๐Ÿ”ฅ

SDXL 0.9 is the most advanced text-to-image model by Stability AI.

It uses a huge amount of information (3.5 billion bits for the base model and 6.6 billion when combining the two models).

The model uses two stages - the first stage creates an initial image, and the second one refines it, adding more precise details.

You can try it here. API is coming soon.

WHAT ELSE IS GOING ON

๐Ÿฆ™Midjourney just pushed a new version (v 5.2). With this version, you can use the โ€œzoom-outโ€ feature. The community has created some great stuff, see link 1, link 2.

๐Ÿ‘€ย 100K+ ChatGPT accounts compromised and sold on the dark web. As reported by Group-IB, the majority (40,000+) of the compromised credentials trace back to the Asia-Pacific region, but many in other regions as well. Make sure you protect your credentials with 2FA.

๐Ÿ‹๐Ÿปโ€โ™€๏ธ ย The Last AI Boom Didn't Kill Jobs. It created jobs.ย Economists looked at the job market across a number of European countries, and both high and less-skilled workers didnโ€™t seem to be significantly affected by software or AI.

RESOURCES

The best resources we came across lately that will help you become better at writing prompts & building AI apps.

๐Ÿ“šย OpenAI developer forumย [ forum to ask questions ]

๐Ÿ‘‹๐Ÿป IMG comparison between Midjourney v5 and v5.2 [Twitter thread]

๐ŸŽฅย Google Searchโ€™s guide on AI-generated content [ useful resource ]

TOOLBOX

The latest AI tools to use or get inspiration from.

PROMPT OF THE DAY

TOOL

Midjourney Zoom feature

PROMPT

Young man with short beard, photograph, soft focus background --ar 2:3 

Custom Zoom + prompt:   On a tropical beach --ar 3:2

RESULT

RESULT WITH ZOOM

Custom Zoom + prompt:   On a tropical beach --ar 3:2