• ThePrompt
  • Posts
  • How to write a great LLM prompt

How to write a great LLM prompt

and use less-powerful models for simpler tasks

Hey everyone,

This week, I want to discuss the need for using less-powerful LLM models for simpler tasks and how different models need specific instructions for better outputs.

I’m seeing four things with folks who use LLMs in production right now:

  • GPT-4 can do many things very well, but it can become quite expensive for production.

  • Builders turn to GPT-3.5 Turbo for cost savings, but struggle with results;

  • They then want to test models for other providers like Google and Anthropic, or an open-sourced model but they don't get the same results.

  • And lastly, they start thinking whether to fine-tune their own model.

Let’s talk about the strategies and how to go through this process.

How to lower costs with GPT-4

If you have a more complex LLM workflow, and you’re using GPT-4 for every step of that workflow, the costs can start to rise.

To reduce the cost, ask yourself this question:

Which part of my workflow can I do with a secondary model?

For example, GPT-3 will do great with summarization, but it won’t do that well at capturing intent from a customer message. You’ll definitely need GPT-4 for that.

Depending on the use-case, you might end up using GPT-3.5 for more complex tasks. To do that, you need to create prompts that are similar to how the model was trained.

In the next section we’ll share some best practices that can improve your output.

How to get better results with GPT-3.5

Below are some tips to make GPT-3.5 work more like GPT-4:

  1. Use “Do” instead of “Don’t”

  2. Separate Instructions from Context with ### or “““

  3. Be direct: Use “You must”, or “Your task is”…

  4. Assign a role

  5. Instead of just writing “conversational style”, add a sentence that follow that style, and ask the model to replicate that

  6. Add info about the end-user

  7. Provide the format structure of the output

  8. Give examples (use Chain of thought prompting for more complex reasoning tasks)

  9. Use emotion prompts like “This is very important for my career”

  10. If your prompt is complex, split it into more prompts and chain them together (it will be easier for GPT-3.5 to follow multiple but simpler prompts)

Please refer to my original post for additional information and examples.

How to prompt Claude?

It’s really interesting to me that we always default to prompting Claude the same way that we’re prompting OpenAI’s models. The reality is that this is a model that has been trained using a completely different set of techniques and methodologies, and requires a prompt design that adheres to those settings.

Here’s a quick rundown on how you should prompt Claude:

  1. Use XML tags like <instructions></instructions> to separate instructions from content

  2. Use “Do” instead of “Don’t”, and be direct

  3. Start the Assistant output with the first token of the expected response

  4. Assign a role, and pass it in the Assistant output as well

  5. Ask Claude to reason through the problem before providing the answer

  6. Provide Examples (few-shot, chain of thought prompting)

  7. If you’re dealing with longer documents, always ask your question at the end of the prompt

  8. Break complex prompts into multiple prompts

If this seems a bit confusing, read my prompting guide for Claude, which includes detailed examples and instructions.

I’m currently working on a prompting guide for Gemini and Mixtral, and will share it soon!

Should you fine-tune your own model?

If you have spent enough time using prompting techniques and your own data in a RAG system, but are still struggling with handling difficult cases and experiencing many hallucinations, it may be time to think about fine-tuning.

Also, consider fine-tuning your model when costs rise and latency slows. In such cases, it may be beneficial to deploy a faster and more cost-effective model.

In all other cases, you should probably do more prompt engineering and RAG building. Let me know if you need help with that, and I can write more about it! (just reply and let me know!)

Are you building with LLMs? Facing some difficulties? Want to learn about a new tool/concept?

Reply to this email and let me know! I read every reply!

Interesting nuggets for this week:

  • Improving GPTStore: PM’s take

  • The shift from models to compound AI systems

  • Sora: The Text-to-Video model by OpenAI

  • Gemini 1.5 Pro: Newest model by Google, that can work with up to 1M token context window size

  • Groq: Run Mistral with 500 tokens per second (this is 18 times faster than any other model inference speed)

Subscribe to keep reading

This content is free, but you must be subscribed to ThePrompt to continue reading.

Already a subscriber?Sign In.Not now