Category Recommendation System for Articles

Real-life examples of fine-tuning success – Fine-Tuning – Building Domain-Specific LLM Applications

Real-life examples of fine-tuning success

In this section, we’ll explore a real-life example of a fine-tuning approach that OpenAI implemented, which yielded remarkable outcomes.

InstructGPT

OpenAI’s InstructGPT is one of the most successful stories of fine-tuned models that laid the foundation of ChatGPT. ChatGPT is said to be a sibling model to InstructGPT. The methods that are used to fine-tune ChatGPT are similar to InstructGPT. InstructGPT was created by fine-tuning pre-trained GPT-3 models with RHLF. Supervised fine-tuning is the first step in RLHF for generating responses aligned to human preferences.

In the beginning, GPT-3 models weren’t originally designed to adhere to user instructions. Their training focused on predicting the next word based on vast amounts of internet text data. Therefore, these models underwent fine-tuning using instructional datasets along with RLHF to enhance their ability to generate more useful and relevant responses aligned with human values when prompted with user instructions:

Figure 3.20 – The fine-tuning process with RLHF

This figure depicts a schematic representation showcasing the InstructGPT fine- tuning process: (1) initial supervised fine-tuning, (2) training the reward model, and (3) executing RL through PPO using this established reward model. The utilization of this data to train respective models is indicated by the presence of blue arrows. In step 2, boxes A-D are samples from models that get ranked by labelers.

The following figure provides a comparison of the response quality of fine-tuned models with RLHF, supervised fine-tuned models, and general GPT models. The Y-axis consists of a Likert scale and shows quality ratings of model outputs on a 1–7 scale (Y-axis), for various model sizes (X-axis), on prompts submitted to InstructGPT models via the OpenAI API. The results reveal that InstructGPT outputs receive significantly higher scores by labelers compared to outputs from GPT-3 models with both few-shot prompts and those without, as well as models that underwent supervised learning fine-tuning. The labelers that were hired for this work were independent and were sourced from Scale AI and Upwork:

Figure 3.21 – Evaluation of InstructGPT (image credits: Open AI)

InstructGPT can be assessed across dimensions of toxicity, truthfulness, and appropriateness. Higher scores are desirable for TruthfulQA and appropriateness, whereas lower scores are preferred for toxicity and hallucinations. Measurement of hallucinations and appropriateness is conducted based on the distribution of prompts within our API. The outcomes are aggregated across various model sizes:

Figure 3.22 – Evaluation of InstructGPT

In this section, we introduced the concept of fine-tuning and discussed a success stories of fine-tuning with RLHF that led to the development of InstructGPT.

Summary

Fine-tuning is a powerful technique for customizing models, but it may not always be necessary. As observed, it can be time-consuming and may have initial upfront costs. It’s advisable to start with easier and faster strategies, such as prompt engineering with few- shot examples, followed by data grounding using RAG. Only if the responses from the LLM remain suboptimal should you consider fine-tuning. We will discuss RAG and prompt engineering in the following chapters.

In this chapter, we delved into critical fine-tuning strategies tailored for specific tasks. Then, we explored an array of evaluation methods and benchmarks to assess your refined model. The RLHF process ensures your models align with human values, making them helpful, honest, and safe. In the upcoming chapter, we’ll tackle RAG methods paired with vector databases – an essential technique to ground your enterprise data and minimize hallucinations in LLM-driven applications.

Semantic Kernel – Developing and Operationalizing LLM-based Apps: Exploring Dev Frameworks and LLMOps

Semantic Kernel

Semantic kernel, or SK, is a lightweight, open-source software development kit (SDK); it is a modern AI application development framework that enables software developers to build an AI orchestration to build agents, write code that can interact with agents, and also support generative AI tooling and concepts, such as natural language processing (NLP), which we covered in Chapter 2.

“Kernel” is at the core of everything!

Semantic Kernel revolves around the concept of a “kernel,” which is pivotal and is equipped with the necessary services and plugins to execute both native code and AI services, making it a central element for nearly all SDK components.

Every prompt or code executed within the semantic kernel passes through this kernel, granting developers a unified platform for configuring and monitoring their AI applications.

For instance, when a prompt is invoked through the kernel, it undertakes the process of selecting the optimal AI service, constructing the prompt based on a prompt template, dispatching the prompt to the service, and processing the response before delivering it back to the application. Additionally, the kernel allows for the integration of events and middleware at various stages, facilitating tasks such as logging, user updates, and the implementation of responsible AI practices, all from a single, centralized location called “kernel.”

Moreover, SK allows developers to define the syntax and semantics of natural language expressions and use them as variables, functions, or data structures in their code. SK also provides tools for parsing, analyzing, and generating natural language from code and, vice-versa, generating code from NLP.

You can build sophisticated and complex agents without having to be an AI expert by using semantic kernel SDK! The fundamental building blocks in semantic kernels for building agents are plugins, planners, and personas.

Fundamental components

Let’s dive into each one of them and understand what each one means.

  • Plugins enhance your agent’s functionality by allowing you to incorporate additional code. This enables the integration of new functions into plugins, utilizing native programming languages such as C# or Python. Additionally, plugins can facilitate interaction with LLMs through prompts or connect to external services via REST API calls. As an example, consider a plugin for a virtual assistant for a calendar application that allows it to schedule appointments, remind you of upcoming events, or cancel meetings. If you have used ChatGPT, you may be familiar with the concept of plugins, as they are integrated into it (namely, “Code Interpreter” or “Bing Search Plugin”).
  • Planners: In order to effectively utilize the plugin and integrate it with subsequent actions, the system must initially design a plan, a process that is facilitated by planners. This is where the planners help. Planners are sophisticated instructions that enable an agent to formulate a strategy for accomplishing a given task, often encapsulated in a simple prompt that guides the agent through function calling to achieve the objective.
  • As an example, take the development of a MeetingEventPlanner. This planner would guide the agent through the detailed process of organizing a meeting. It includes steps such as reviewing the availability of attendees’ calendars, sending out confirmation emails, drafting an agenda, and, finally, scheduling the meeting. Each step is carefully outlined to ensure the agent comprehensively addresses all the necessary actions for successful meeting preparation.
  • Personas: Personas are sets of instructions that shape the behavior of agents by imbuing them with distinct personalities. Often referred to as “meta prompts,” these guidelines endow agents with characters that can range from friendly and professional to humorous, and so forth. Additionally, they direct agents on the type of response to generate, which can vary from verbose to concise. We have explored meta prompts in great detail in Chapter 5; this concept is closely related.

Recommendation System for Articles – RAGs to Riches: Elevating AI with External Data

Recommendation System for Articles

Let’s consider a scenario where a news aggregation platform aims to recommend articles similar to what a user is currently reading, enhancing user engagement by suggesting relevant content.

How It Works:

  • Preprocessing and Indexing: Articles in the platform’s database are processed to extract textual features, often converted into high-dimensional vectors using LDA or transformer based embeddings like text-ada-embedding-002. These vectors are then indexed using HNSW, an algorithm suitable for high-dimensional spaces due to its hierarchical structure that facilitates efficient navigation and search.
  • Retrieval Time: When a user reads an article, the system generates a feature vector for this article and queries the HNSW index to find vectors (and thus articles) that are close in the high-dimensional space. Cosine similarity can be used to evaluate the similarity between the query article’s vector and those in the index, identifying articles with similar content.
  • Outcome: The system recommends a list of articles ranked by their relevance to the currently viewed article. Thanks to the efficient indexing and similarity search, these recommendations are generated quickly, even from a vast database of articles, providing the user with a seamless experience.

Now let us walkthrough a scenario where Manhattan Distance will be preferred over Cosine Similarity.

Ride-Sharing App Matchmaking

Let’s consider a scenario where a ride-sharing application needs to match passengers with nearby drivers efficiently. The system must quickly find the closest available drivers to a passenger’s location to minimize wait times and optimize routes.

How It Works:

  • Preprocessing and Indexing: Drivers’ current locations are constantly being updated and stored as points in a 2D space representing a map. These points can be indexed using a tree based spatial indexing techniques or data structures optimized for geospatial data, such as R-trees.
  • Retrieval Time: When a passenger requests a ride, the application uses the passenger’s current location as a query point. Manhattan distance (L1 norm) is particularly suitable for urban environments, where movement is constrained by a grid-like structure of streets and avenues, mimicking the actual paths a car would take along city blocks.
  • Outcome: The system quickly identifies the nearest available drivers using the indexed data and Manhattan distance calculations, considering the urban grid’s constraints. This process

ensures a swift  matchmaking process, improving the user experience by reducing wait times.

Vector DB sample scenario – Music recommendation system using a vector database – RAGs to Riches: Elevating AI with External Data

Vector DB sample scenario – Music recommendation system using a vector database

Let’s consider a music streaming platform aiming to provide song recommendations based on a user’s current listening. Imagine a user who is listening to “Song X” on the platform.

Behind the scenes, every song in the platform’s library is represented as a highly dimensional vector based on its musical features and content, using embeddings. “Song X” also has its vector representation. When the system aims to recommend songs similar to “Song X,” it doesn’t look for exact matches (as traditional databases might). Instead, it leverages a vector DB to search for songs with vectors closely resembling that of “Song X.” Using an ANN search strategy, the system quickly sifts through millions of song vectors to find those that are approximately nearest to the vector of “Song X.” Once potential song vectors are identified, the system employs similarity measures, such as cosine similarity, to rank these songs based on how close their vectors are to “Song X’s” vector. The top-ranked songs are then recommended to the user.

Within milliseconds, the user gets a list of songs that musically resemble “Song X,” providing a seamless and personalized listening experience. All this rapid, similarity-based recommendation magic is powered by the vector database’s specialized capabilities.

Common vector DB applications

  • Image and video similarity search: In the context of image and video similarity search, a vector DB specializes in efficiently storing and querying highly dimensional embeddings derived from multimedia content. By processing images through deep learning models, they are converted into feature vectors, a.k.a embeddings, that capture their essential characteristics. When it comes to videos, an additional step may need to be carried out to extract frames and then convert them into vector embeddings. Contrastive language-image pre-training (CLIP) from OpenAI is a very popular choice for embedding videos and images. These vector embeddings are indexed in the vector DB, allowing for rapid and precise retrieval when a user submits a query. This mechanism powers applications such as reverse image and video search, content recommendations, and duplicate detection by comparing and ranking content based on the proximity of their embeddings.
  • Voice recognition: Voice recognition with vectors is akin to video vectorization. Analog audio is digitized into short frames, each representing an audio segment. These frames are processed and stored as feature vectors, with the entire audio sequence representing things such as spoken sentences or songs. For user authentication, a vectorized spoken key phrase might be compared to stored recordings. In conversational agents, these vector sequences can be inputted into neural networks to recognize and classify spoken words in speech and generate responses, similar to ChatGPT.
  • Long-term memory for chatbots: Virtual database management systems (VDBMs) can be employed to enhance the long-term memory capabilities of chatbots or generative models. Many

generative models can only process a limited amount of preceding text in prompt responses, which results in their inability to recall details from prolonged conversations. As these models don’t have inherent memory of past interactions and can’t differentiate between factual data and user-specific details, using VDBMs can provide a solution for storing, indexing, and referencing previous interactions to improve consistency and context-awareness in responses.

This is a very important use case and plays a key role in implementing RAG, which we will discuss in the next section.

Chunking strategies – RAGs to Riches: Elevating AI with External Data

Chunking strategies

In our last discussion, we delved into vector DBs and RAG. Before diving into RAG, we need to efficiently house our embedded data. While we touched upon indexing methods to speed up data fetching, there’s another crucial step to take even before that: chunking.

What is chunking?

In the context of building LLM applications with embedding models, chunking involves dividing a long piece of text into smaller, manageable pieces or “chunks” that fit within the model’s token limit. The process involves breaking text into smaller segments before sending these to the embedding models. As shown in the following image, chunking happens before the embedding process. Different documents have different structures, such as free-flowing text, code, or HTML. So, different chunking strategies can be applied to attain optimal results. Tools such as Langchain provide you with functionalities to chunk your data efficiently based on the nature of the text.

The diagram below depicts a data processing workflow, highlighting the chunking step, starting with raw “Data sources” that are converted into “Documents.” Central to this workflow is the “Chunk” stage, where a “TextSplitter” breaks the data into smaller segments. These chunks are then transformed into numerical representations using an “Embedding model” and are subsequently indexed into a “Vector DB” for efficient search and retrieval. The text associated with the retrieved chunks is then sent as context to the LLMs, which then generate a final response:

Fig 4.12 – Chunking Process

But why is it needed?

Chunking is vital for two main reasons:

  • Chunking strategically divides document text to enhance its comprehension by embedding models, and it boosts the relevance of the content retrieved from a vector DB. Essentially, it refines the accuracy and context of the results sourced from the database.
  • It tackles the token constraints of embedding models. For instance, Azure’s OpenAI embedding models like text-embedding-ada-002 can handle up to 8,191 tokens, which is about 6,000 words, given each token averages four characters. So, for optimal embeddings, it’s crucial our text stays within this limit.

Popular chunking strategies

  • Fixed-size chunking: This is avery common approach that defines a fixed size (200 words), which is enough to capture the semantic meaning of a paragraph, and it incorporates an overlap of about 10–15% as an input to the vector embedding generation model. Chunking data with a slight overlap between text ensures context preservation. It’s advisable to begin with a roughly 10% overlap. Below is a snippet of code that demonstrates the use of fixed-size chunking with LangChain:

text = “Ladies and Gentlemen, esteemed colleagues, and honored \ guests. Esteemed leaders and distinguished members of the \ community. Esteemed judges and advisors. My fellow citizens. Last \ year, unprecedented challenges divided us. This year, we stand \ united, ready to move forward together”

from langchain.text_splitter import TokenTextSplitter

text_splitter = TokenTextSplitter(chunk_size=20, chunk_overlap=5)

texts = text_splitter.split_text(text)

print(texts)

The output is the following:

[‘Ladies and Gentlemen, esteemed colleagues, and honored guests. Esteemed leaders and distinguished members’, ’emed leaders and distinguished members of the community. Esteemed judges and advisors. My fellow citizens.’, ‘. My fellow citizens. Last year, unprecedented challenges divided us. This year, we stand united,’, ‘, we stand united, ready to move forward together’]

  • Variable-size chunking: Variable-size chunking refers to the dynamic segmentation of data or text into varying-sized components, as opposed to fixed-size divisions. This approach accommodates the diverse structures and characteristics present in different types of data.
  • Sentence splitting: Sentence transformer models are neural architectures optimized for embedding at the sentence level. For example, BERT works best when chunked at the sentence level. Tools such as NLTK and SpaCy provide functions to split the sentences within a text.
  • Specialized chunking: Documents, such as research papers, possess a structured organization of sections, and the Markdown language, with its unique syntax, necessitates specialized chunking, resulting in the proper separation between sections/pages to yield contextually relevant chunks.
  • Code Chunking: When embedding code into your vector DB, this technique can be invaluable. Langchain supports code chunking for numerous languages. Below is a snippet code to chunk your Python code:

from langchain.text_splitter import (

RecursiveCharacterTextSplitter,

Language,

)

PYTHON_CODE = “””

class SimpleCalculator:

def add(self, a, b):

return a + b

def subtract(self, a, b):

return a – b

  • Using the SimpleCalculator calculator = SimpleCalculator() sum_result = calculator.add(5, 3) diff_result = calculator.subtract(5, 3)

“””

python_splitter = RecursiveCharacterTextSplitter.from_language(

language=Language.PYTHON, chunk_size=50, chunk_overlap=0

)

python_docs = python_splitter.create_documents([PYTHON_CODE]) python_docs

The output is the following:

[Document(page_content=’class SimpleCalculator:\n def add(self, a, b):’),

Document(page_content=’return a + b’),

Document(page_content=’def subtract(self, a, b):’),

Document(page_content=’return a – b’),

Document(page_content=’# Using the SimpleCalculator’),

Document(page_content=’calculator = SimpleCalculator()’),

Document(page_content=’sum_result = calculator.add(5, 3)’),

Document(page_content=’diff_result = calculator.subtract(5, 3)’)]

Chunking considerations

Chunking strategies vary based on data type and format and the chosen embedding model. For instance, code requires a distinct chunking approach compared to unstructured text. While models such as text-embedding-ada-002 excel with 256- and 512-token-sized chunks, our understanding of chunking is ever-evolving. Moreover, preprocessing plays a crucial role before chunking, where you can optimize your content by removing unnecessary text content, such as stop words, special symbols, etc., that add noise. For the latest techniques, we suggest regularly checking the text splitters section in the LangChain documentation, ensuring you employ the best strategy for your needs

(Split by tokens from Langchain: https://python.langchain.com/docs/modules/ data_connection/document_transformers/split_by_token).

Evaluation of RAG using Azure Prompt Flow – RAGs to Riches: Elevating AI with External Data

Evaluation of RAG using Azure Prompt Flow

Up to this point, we have discussed the development of resilient RAG applications. However, the question arises: How can we determine whether these applications are functioning as anticipated and if the context they retrieve is pertinent? While manual validation—comparing the responses generated by LLMs against ground truth—is possible, this method proves to be labor-intensive, costly, and challenging to execute on a large scale. Consequently, it’s essential to explore methodologies that facilitate automated evaluation on a vast scale. Recent research has delved into the concept of utilizing “LLM as a judge” to assess output, a strategy that Azure Prompt Flow incorporates within its offerings.

Azure Prompt Flow has built-in and structured metaprompt templates with comprehensive guardrails to evaluate your output against ground truth. The following mentions four metrics that can help you evaluate your RAG solution in Prompt Flow:

  • Groundedness: Measures the alignment of the model’s answers with the input source, making sure the model’s generated response is not fabricated. The model must always extract information from the provided “context” while responding to user’s query.
  • Relevance: Measures the degree to which the model’s generated response is closely connected to the context and user query.
  • Retrieval score: Measures the extent to which the model’s retrieved documents are pertinent and directly related to the given questions.
  • Custom metrics: While the above three are the most important for evaluating RAG applications, Prompt Flow allows you to use custom metrics, too. Bring your own LLM as a judge and define your own metrics by modifying the existing metaprompts. This also allows you to use open source models such as Llama and to build your own metrics from code with Python functions. The above evaluations are more no-code or low-code friendly; however, for a more pro-code friendly approach, azureml-metrics SDK, such as ROUGE, BLEU, F1-Score, Precision, and Accuracy, can be utilized as well.

The field is advancing quickly, so we recommend regularly checking Azure ML Prompt Flow’s latest updates on evaluation metrics. Start with the “Manual Evaluation” feature in Prompt Flow to gain a basic understanding of LLM performance. It’s important to use a mix of metrics for a thorough evaluation that captures both semantic and syntactic essence rather than relying on just one metric to compare the responses with the actual ground truth.

The essentials of prompt engineering – Effective Prompt Engineering Techniques: Unlocking Wisdom Through AI

The essentials of prompt engineering

Before discussing prompt engineering, it is important to first understand the foundational components of a prompt. In this section, we’ll delve into the key components of a prompt, such as ChatGPT prompts, completions, and tokens. Additionally, grasping what tokens are is pivotal to understanding the model’s constraints and managing costs.

ChatGPT prompts and completions

A prompt is an input provided to LLMs, whereas completions refer to the output of LLMs. The structure and content of a prompt can vary based on the type of LLM (e.g., the text or image generation model), specific use cases, and the desired output of the language model.

Completions refer to the response generated by ChatGPT prompts; basically, it is an answer to your questions. Check out the following example to understand the difference between prompts and completions when we prompt ChatGPT with, “What is the capital of India?”

Figure 5.2 – An image showing a sample LLM prompt and completion

Based on the use case, we can leverage one of the two ChatGPT API calls, named Completions or ChatCompletions, to interact with the model. However, OpenAI recommends using the ChatCompletions API in the majority of scenarios.

Completions API

The Completions API is designed to generate creative, free-form text. You provide a prompt, and the API generates text that continues from it. This is often used for tasks where you want the model to answer a question or generate creative text, such as for writing an article or a poem.

ChatCompletions API

The ChatCompletions API is designed for multi-turn conversations. You send a series of messages instead of a single prompt, and the model generates a message as a response. The messages sent to the model include a role (which can be a system, user, or assistant) and the content of the message. The system role is used to set the behavior of the assistant, the user role is used to instruct the assistant, and the model’s responses are under the assistant role.

The following is an example of a sample ChatCompletions API call:

import openai

openai.api_key = ‘your-api-key’

response = openai.ChatCompletion.create(

model=”gpt-3.5-turbo”,

messages=[

{“role”: “system”, “content”: “You are a helpful sports \

assistant.”},

{“role”: “user”, “content”: “Who won the cricket world cup \

in 2011?”},

{“role”: “assistant”, “content”: “India won the cricket \

world cup in 2011″},

{“role”: “assistant”, “content”: “Where was it played”}

]

)

print(response[‘choices’][0][‘message’][‘content’])

The main difference between the Completions API and ChatCompletions API is that the Completions API is designed for single-turn tasks, while the ChatCompletions API is designed to handle multiple turns in a conversation, making it more suitable for building conversational agents. However, the ChatCompletions API format can be modified to behave as a Completions API by using a single user message.

Important note

The CompletionsAPI, launched in June 2020, initially offered a freeform text interface for Open AI’s language models. However, experience has shown that structured prompts often yield better outcomes. The chat-based approach, especially through the ChatCompletions API, excels in addressing a wide array of needs, offering enhanced flexibility and specificity and reducing prompt injection risks. Its design supports multi-turn conversations and a variety of tasks, enabling developers to create advanced conversational experiences. Hence, Open AI announced that they would be deprecating some of the older models using Completions API and, in moving forward, they would be investing in the ChatCompletions API to optimize their efforts to use compute capacity. While the Completions API will remain accessible, it shall be labeled as “legacy” in the Open AI developer documentation.

Tokens

Understanding the concepts of tokens is essential, as it helps us better comprehend the restrictions, such as model limitations, and the aspect of cost management when utilizing ChatGPT.

A ChatGPT token is a unit of text that ChatGPT’s language model uses to understand and generate language. In ChatGPT, a token is a sequence of characters that the model uses to generate new sequences

of tokens and form a coherent response to a given prompt. The models use tokens to represent words, phrases, and other language elements. The tokens are not cut where the word starts or ends but can consist of trailing spaces, sub words and punctuations, too.

As stated on the OpenAI website, tokens can be thought of as pieces of words. Before the API processes the prompts, the input is broken down into tokens.

To understand tokens in terms of lengths, the following is used as a rule of thumb:

  • 1 token ~= 4 chars in English
  • 1 token ~= ¾ words
  • 100 tokens ~= 75 words
  • 1–2 sentences ~= 30 tokens
  • 1 paragraph ~= 100 tokens
  • 1,500 words ~= 2048 tokens
  • 1 US page (8 ½” x 11”) ~= 450 tokens (assuming ~1800 characters per page)

For example, this famous quote from Thomas Edison (“Genius is one percent inspiration and ninety-nine percent perspiration.”) has 14 tokens:

Figure 5.3 – Tokenization of sentence

We used the OpenAI Tokenizer tool to calculate the tokens; the tool can be found at https:// platform.openai.com/tokenizer. An alternative way to tokenize text (programmatically) is to use the Tiktoken library on Github; this can be found at https://github.com/openai/

tiktoken.

Prompt parameters – Effective Prompt Engineering Techniques: Unlocking Wisdom Through AI

Prompt parameters

ChatGPT prompt parameters are variables that you can set in the API calls. They allow users to influence the model’s output, customizing the behavior of the model to better fit specific applications or contexts. The following table shows some of the most important parameters of a ChatGPT API call:

Figure 5.6 – Essential Prompt Parameters

In this section, only the top parameters for building an effective prompt are highlighted. For a full list of parameters, refer to the OpenAI API reference (https://platform.openai.com/docs/ api-reference).

ChatGPT roles

System message

This is the part where youdesign your metaprompts. Metaprompts help to set the initial context, theme, and behavior of the ChatGPT API to guide the model’s interactions with the user, thus setting roles or response styles for the assistant.

Metaprompts are structured instructions or guidelines that dictate how the system should interpret and respond to user requests. These metaprompts are designed to ensure that the system’s outputs adhere to specific policies, ethical guidelines, or operational rules. They’re essentially “prompts about how to handle prompts,” guiding the system in generating responses, handling data, or interacting with users in a way that aligns with predefined standards.

The following table is a metaprompt framework that you can follow to design the ChatGPT system message:

Figure 5.7 – Elements of a Metaprompt

User

The messages from the user serve as prompts or remarks that the assistant is expected to react to or engage with. what is it establishes the anticipated scope of queries that may come from the user.

Assistant – Effective Prompt Engineering Techniques: Unlocking Wisdom Through AI

Assistant

While not mandatory, assistant messages can be included to illustrate the preferred conduct or response patterns of the chatbot.

Now let’s look at an example; if you are developing an application for a fitness AI assistant, a very well-rounded system message might look as follows:

openai.ChatCompletion.create(

model=”gpt-3.5-turbo”,

messages=[

{

“role”: “system”,

“content”:

“You are a fitness AI assistant. Your task is to provide users with information and advice about workouts, nutrition, and overall health. You will help users by answering their questions

to the best of your abilities and providing suggestions based on their input. Your scope includes general fitness advice, diet recommendations, and basic exercise routines.

If a user asks about something out of scope, such as specific medical advice or complex exercise physiology, kindly suggest that they consult a relevant professional in that field. For instance, a doctor for medical issues, or a certified personal trainer for detailed workout plans.

You should maintain a supportive and positive tone, being respectful and encouraging to all users, regardless of their current fitness level or knowledge. Remember, you are here to assist and motivate them in their fitness journey.

Your responses should be concise, easy to understand, and practically useful. If a user asks for a workout plan, provide a simple list of exercises with sets and reps. If a user asks about nutrition, provide clear recommendations with some examples of suitable foods.”

},

{“role”: “user”, “content”: “I’m new to fitness, can you suggest a simple workout plan to get started?”},

{

“role”: “assistant”,

“content”:

“Absolutely, I’m here to help you get started on your fitness journey. Here’s a simple workout plan you can begin with:

1. Warm-up: 5-10 minutes of brisk walking or light jogging

2. Squats: 2 sets of 10 reps

3. Push-ups: 2 sets of 10 reps (you can start with knee push-ups if full push-ups are too challenging)

4. Lunges: 2 sets of 10 reps per leg

5. Plank: Hold for 30 seconds, repeat 2 times

6. Cool down: 5-10 minutes of stretching

Remember, the key to a successful workout routine is consistency. Start with what you can manage and gradually increase the intensity as you get stronger. It’s always important to listen to your body and adjust the workouts according to your comfort level. Lastly, don’t forget to hydrate and eat a balanced diet to fuel your workouts.

Please consult a fitness professional if you’re unsure about the correct form for any of these exercises to avoid injury. Enjoy your workout!”

}

]

)

The following link provides great examples of “Act As” prompts for you to leverage in your system message:

Awesome ChatGPT Prompts (https://github.com/f/awesome-chatgpt-prompts/).

Techniques for effective prompt engineering – Effective Prompt Engineering Techniques: Unlocking Wisdom Through AI

Techniques for effective prompt engineering

In the past two years, a wide array of prompt -engineering techniques have been developed. This section focuses on the essential ones, offering key strategies that you might find indispensable for daily interactions with ChatGPT and other LLM-based applications.

N-shot prompting

N-shot prompting is a term used in the context of training large language models, particularly for zero-shot or few-shot learning tasks. It is also called in-context learning and refers to the techniqueof providing the model with example prompts along with corresponding responses during training to steer the model’s behavior to provide more accurate responses.

The “N” in “N-shot” refers to the number of example prompts provided to the model. For instance, in a one-shot learning scenario, only one example prompt and its response are given to the model. In an N-shot learning scenario, multiple example prompts and responses are provided.

While ChatGPT works great with zero-shot prompting, it may sometimes be useful to provide examples for a more accurate response. Let’s see some examples of zero-shot and few-shot prompting:

Figure 5.8 – N-shot prompting examples

Chain-of-thought (CoT) prompting

Chain-of -thought prompting refers to a sequence of intermediate reasoning steps, significantly boosting the capability of large language models to tackle complex reasoning tasks. By presenting a few chain-of-thought demonstrations as examples in the prompts, the models proficiently handle intricate reasoning tasks:

Figure 5.9 – Chain-of-Thought Prompting Examples

Figure sourced from https://arxiv.org/pdf/2201.11903.pdf.

Program-aided language (PAL) models

Program- aided language (PAL) models, also called program-of -thought prompting ( PoT), is a technique that incorporates additional task-specific instructions, pseudo-code, rules, or programs alongside the free-form text to guide the behavior of a language model:

Figure 5.10 – Program-aided language prompting examples

Figure sourced from https://arxiv.org/abs/2211.10435.

In this section, although we have not explored all prompt engineering techniques (only the most important ones), we want to convey to our readers that there are numerous variants of these techniques, as illustrated in the following figure from the research paper A Systematic Survey of prompt engineering in Large Language Models: Techniques and Applications (https://arxiv.org/pdf/2402.07927. pdf). This paper provides an extensive inventory of prompt engineering strategies across various application areas, showcasing the evolution and breadth of this field over the last four years:

Figure 5.11 – Taxonomy of prompt engineering techniques across multiple application domains