Back To Blogs

Table of Contents

Share on:

Ready to build reliable applications 10x faster?

Back To Blogs

ENGINEERING

Experimenting and Putting Prompt Engineering Tactics into Practice

Liv Wong

Technical Writer

November 27, 2024

7 min read

This is Part 2 of the Prompt Engineering series, which will put prompt engineering tactics into practice, including model choice and LLM parameters like temperature. Check out Part 1 on the importance of creating effective prompts.

Prompt engineering is the practice of enhancing LLM responses by writing optimized prompts and tuning parameters like temperature or topP. There are many strategies and tactics for creating effective prompts, but how do these translate into real-life situations? Which tactics are the most effective? How should I combine prompt writing and parameter tuning? Let’s explore some real examples of prompt engineering put into practice.

Engineering better LLM responses

OpenAI documentation provides six main strategies for prompt engineering:

Write clear instructions
Provide reference text
Split complex tasks into simpler subtasks
Give the model time to "think"
Use external tools
Test changes systematically

These strategies point to the overarching principle of prompt engineering: providing clear, specific, well-structured instructions. There are many ways to split hairs about how exactly to do that, with numerous guides available online (including our previous Guide to Prompt Engineering, where we introduced the basics of creating clear and effective prompts). What we want to do is to dig deeper into how model choice, prompt writing tactics, and parameter tuning interact and interplay to produce the ideal responses.

In the following sections, we will explore each facet in turn and finally put it all together, so you have a concrete idea of what to focus on when implementing prompt engineering in your AI projects.

Model choice

Without any prompt engineering, your choice of LLM serves as the ringfence determining the base quality of the response output. To illustrate, here is the response by Cohere’s command model compared to Mistral’s mistral-small model. Both models were asked to summarize an article about NASA’s Europa Clipper mission:

Summarize ${text}
// Assume that ${text} is a dynamic variable that is injected with the relevant content at runtime

While both models provide a relatively accurate summary of the article, Mistral offers a well-structured response out of the box without additional prompting.

Diagram of Cohere’s command model’s output vs Mistral’s mistral-small-latest’s output, in response to the user query “Summarize text”.

Various LLM models differ in their instinctual, un-engineered responses.

Depending on your use cases and system requirements, Mistral’s responses could be overly-engineered or the more optimal response. Nevertheless, using prompt engineering tactics, you can get command to respond just like mistral-small.

Tactics for effective prompts

To provide clear, specific, well-structured instructions, these are some of the tactics employed in the prompts shown below:

Break up the instructions for clarity.

Instead of combining the article and the instructions in a single sentence, the prompt is now divided into distinct sections: one for the content and another for the instructions on how to evaluate the content.
Demarcate your prompt with special characters or tags where appropriate.

The article content is now enclosed in quotation marks to explicitly mark the text that should be summarized.
Include details about the desired output. Quantify where possible.

In one variation, the prompt now asks for the response to be “clear and concise” and even specifies the desired format of using bullet points. In another, it explicitly states the desired length of the output.

Diagram showing three different LLM prompts, with relevant phrases highlighted to demonstrate the prompt tactics.

There are numerous ways to implement prompt engineering tactics depending on what output you need.

Check out how the responses have improved in each of these prompts.

Prompt: Here is the text from an article: "${text}". Provide a summary of this text in a 
clear and concise manner.

Response: NASA has launched the Europa Clipper mission, a $5.2 billion robotic spacecraft, 
to explore Jupiter's moon, Europa, which is believed to have a global ocean of liquid water 
beneath its icy crust and could potentially support extraterrestrial life. The mission aims to 
determine if Europa has the necessary ingredients for life, including water, organic compounds, 
and an energy source. The spacecraft, the largest and most expensive planetary science 
mission in NASA's history, will conduct 49 close encounters with Europa, studying its ocean, 
atmosphere, and surface features. Scientists are particularly interested in the possibility of 
hydrothermal vents on Europa's ocean floor, which could support simple life forms. The 
mission also aims to find potential landing sites for future lander missions. Europa Clipper's 
findings will shape future exploration of Europa and the search for life beyond Earth.

Temperature

In some cases, it is also useful to tune the temperature of the responses. This LLM parameter affects how creative or deterministic the output will be — the higher the temperature, the more random and creative the output will be.

While temperature is associated with creative tasks like generating a poem or a story, it can also apply to factual writing, like summaries. Take a look at this response below, where the temperature is set to 0.1. The text output reuses similar words or phrasing, and this repetitiveness makes the summary rather stale and unengaging.

Diagram of Cohere’s response to the prompt “Here is the text from an article: "text". Provide a summary of this text in a clear and concise manner.” at 0.1 temperature.

A low temperature favors more frequent words or phrases, which can make the output repetitive.

In contrast, a temperature of 1 leads to more dynamic and lively text, but tends to exaggerate or include less important information. This would mean that the summarizer is prone to creating misleading snapshots of the articles.

Diagram of Cohere’s response to the prompt “Here is the text from an article: "text". Provide a summary of this text in a clear and concise manner.” at 1 temperature.

A high temperature produces rarer words or phrases, which can lead to a focus on less important information.

A more moderate temperature of 0.4 could be the sweet spot for an article summarizer, providing relevant information with livelier language.

Diagram of Cohere’s response to the prompt “Here is the text from an article: "text". Provide a summary of this text in a clear and concise manner.” at 0.4 temperature.

Tune your temperature to a more moderate value to avoid repetitiveness and steer clear of overly-exaggerated information at the same time.

Evaluating your LLM responses

We’ve seen individually how model choice, prompt writing tactics, and LLM parameters can affect your desired output. But how do these factors stack together to get optimized results at scale? While prompt writing tactics get you the biggest mileage in obtaining a particular result, LLM parameter tuning and model choice could be the distinguishing factor between an average response and a stellar one.

Using the following prompt with different models and temperature values:

Here is the text from an article: "${text}". Summarize the text clearly in 280 characters using this format "Article summary: <your_summary_here>".

We get the following responses:

Diagram of different model responses at different temperatures for the prompt “Here is the text from an article: "text". Summarize the text clearly in 280 characters using this format "Article summary: your_summary_here".”.

At a small sample size, it is easy to pick which answer is more desirable. But in a production setting, it is best practice to systematically test changes using a larger sample size to ascertain between randomness and true improvement.

One way to systematically track changes would be to evaluate the LLM responses against a benchmark — using relevant criteria or gold-standard answers. In this case, we want summaries that are concise yet comprehensive — answering the 5W1H (what, who, where, when, why, and how).

Annotated diagram of different model responses at different temperatures for the prompt “Here is the text from an article: "text". Summarize the text clearly in 280 characters using this format "Article summary: your_summary_here".”.

These evaluations can be manually done by humans or AI-assisted. Whichever route you choose, ensure that you can evaluate a large sample set with sufficient variety to cover all the edge cases in your use scenarios.

Optimizing LLM responses for AI app development

In practice, an application that leverages generative AI is likely far more complex than using a simple one-shot prompt. Multiple prompts can be used across different tasks in a process, or AI techniques like retrieval-augmented generated (RAG), computer vision, and other data pre-processing tasks may come into play.

An AI orchestration engine like Orkes Conductor can streamline such developmental efforts with its built-in AI tasks and feature-rich AI prompt builder, where you can test various prompt engineering strategies. With one-step integrations for a dozen LLM providers (including OpenAI, HuggingFace, and more), Orkes’ AI prompt builder unlocks the convenience of plugging, testing, and playing with prompts across models.

Testing Interface for prompts in Orkes Conductor.

Orkes Conductor AI prompt builder, as part of its AI Orchestration capabilities.

Besides testing, these prompts can be templatized, saved, and safely used in a production setting for AI-driven workflows and applications, such as document classification, RAG-based search retrieval, approval journeys, and more.

Wrap up

By implementing various prompt engineering tactics, we have explored effective ways to combine model choice, prompt writing, and parameter tuning and systematically test these changes.

Beyond what we have explored here, there are many other ways to improve model outputs, such as model fine-tuning or retrieval-augmented generation (RAG). A strategic combination of these different techniques and methods will unleash the full potential of generative AI in automating creative, complex, or human-involved tasks.

Conductor is an enterprise-grade orchestration platform for process automation, API and microservices orchestration, agentic workflows, and more. Check out the full set of features, try it yourself using our Developer Edition sandbox, or get a demo of Orkes Cloud, a fully managed and hosted Conductor service.

Related Blogs