Archives March 2024

Semantic Kernel – Developing and Operationalizing LLM-based Apps: Exploring Dev Frameworks and LLMOps

Semantic Kernel

Semantic kernel, or SK, is a lightweight, open-source software development kit (SDK); it is a modern AI application development framework that enables software developers to build an AI orchestration to build agents, write code that can interact with agents, and also support generative AI tooling and concepts, such as natural language processing (NLP), which we covered in Chapter 2.

“Kernel” is at the core of everything!

Semantic Kernel revolves around the concept of a “kernel,” which is pivotal and is equipped with the necessary services and plugins to execute both native code and AI services, making it a central element for nearly all SDK components.

Every prompt or code executed within the semantic kernel passes through this kernel, granting developers a unified platform for configuring and monitoring their AI applications.

For instance, when a prompt is invoked through the kernel, it undertakes the process of selecting the optimal AI service, constructing the prompt based on a prompt template, dispatching the prompt to the service, and processing the response before delivering it back to the application. Additionally, the kernel allows for the integration of events and middleware at various stages, facilitating tasks such as logging, user updates, and the implementation of responsible AI practices, all from a single, centralized location called “kernel.”

Moreover, SK allows developers to define the syntax and semantics of natural language expressions and use them as variables, functions, or data structures in their code. SK also provides tools for parsing, analyzing, and generating natural language from code and, vice-versa, generating code from NLP.

You can build sophisticated and complex agents without having to be an AI expert by using semantic kernel SDK! The fundamental building blocks in semantic kernels for building agents are plugins, planners, and personas.

Fundamental components

Let’s dive into each one of them and understand what each one means.

  • Plugins enhance your agent’s functionality by allowing you to incorporate additional code. This enables the integration of new functions into plugins, utilizing native programming languages such as C# or Python. Additionally, plugins can facilitate interaction with LLMs through prompts or connect to external services via REST API calls. As an example, consider a plugin for a virtual assistant for a calendar application that allows it to schedule appointments, remind you of upcoming events, or cancel meetings. If you have used ChatGPT, you may be familiar with the concept of plugins, as they are integrated into it (namely, “Code Interpreter” or “Bing Search Plugin”).
  • Planners: In order to effectively utilize the plugin and integrate it with subsequent actions, the system must initially design a plan, a process that is facilitated by planners. This is where the planners help. Planners are sophisticated instructions that enable an agent to formulate a strategy for accomplishing a given task, often encapsulated in a simple prompt that guides the agent through function calling to achieve the objective.
  • As an example, take the development of a MeetingEventPlanner. This planner would guide the agent through the detailed process of organizing a meeting. It includes steps such as reviewing the availability of attendees’ calendars, sending out confirmation emails, drafting an agenda, and, finally, scheduling the meeting. Each step is carefully outlined to ensure the agent comprehensively addresses all the necessary actions for successful meeting preparation.
  • Personas: Personas are sets of instructions that shape the behavior of agents by imbuing them with distinct personalities. Often referred to as “meta prompts,” these guidelines endow agents with characters that can range from friendly and professional to humorous, and so forth. Additionally, they direct agents on the type of response to generate, which can vary from verbose to concise. We have explored meta prompts in great detail in Chapter 5; this concept is closely related.

Vector search strategies – RAGs to Riches: Elevating AI with External Data

Vector search strategies

Vector search strategies are crucial because they determine how efficiently and accurately highly dimensional data (such as embeddings) can be queried and retrieved. Optimal strategies ensure that the most relevant and contextually appropriate results are returned. In vector-based searching, there are primarily two main strategies: exact search and approximate search.

Exact search

The exact search method, as the term suggests, directly matches a query vector with vectors in the database. It uses an exhaustive approach to identify the closest neighbors, allowing minimal to no errors.

This is typically what the traditional KNN method employs. Traditional KNNs utilize brute force methods to find the K-nearest neighbors, which demands a thorough comparison of the input vector with every other vector in the dataset. Although computing the similarity for each vector is typically quick, the process becomes time-consuming and resource-intensive over extensive datasets because of the vast number of required comparisons. For instance, if you had a dataset of one million vectors and wanted to find the nearest neighbors for a single input vector, the traditional KNN would require one million distance computations. This can be thought of as looking up a friend’s phone number in a phone book by checking each entry one by one rather than using a more efficient search strategy that speeds up the process, which we will discuss in the next section.

Approximate nearest neighbors (ANNs)

In modern vector DBs, the search strategy known as ANN stands out as a powerful technique that quickly finds the near-closest data points in highly dimensional spaces, potentially trading off a bit of accuracy for speed. Unlike KNN, ANN prioritizes search speed at the expense of slight accuracy. Additionally, for it to function effectively, a vector index must be built beforehand.

The process of vector indexing

The process of vector indexing involves the organization of embeddings in a data structure called an index, which can be traversed quickly for retrieval purposes. Many ANN algorithms aid in forming a vector index, all aiming for rapid querying by creating an efficiently traversable data structure. Typically, they compress the original vector representation to enhance the search process.

There are numerous indexing algorithms, and this is an active research area. ANNs can be broadly classified into tree-based indexes, graph-based indexes, hash-based indexes, and quantization-based indexes. In this section, we will cover the two most popular indexing algorithms. When creating an LLM application, you don’t need to dive deep into the indexing process since many vector databases provide this as a service to you. But it’s important to choose the right type of index for your specific needs to ensure efficient data retrieval:

  • Hierarchical navigable small world (HNSW): This is a method for approximate similarity search in highly dimensional spaces. HNSW is a graph-based index that works by creating a hierarchical graph structure, where each node represents a data point, and the edges connect similar data points. This hierarchical structure allows for efficient search operations, as it narrows down the search space quickly. HNSW is well suited for similarity search use cases, such as content-based recommendation systems and text search.

If you wish to dive deeper into its workings, we recommend checking out this research paper: https://arxiv.org/abs/1603.09320.

The following image is a representation of the HNSW index:

Figure 4.4 – Representation of HNSW index

The image illustrates the HNSW graph structure used for efficient similarity searches. The graph is constructed in layers, with decreasing density from the bottom to the top. Each layer’s characteristic radius reduces as we ascend, creating sparser connections. The depicted search path, using the red dotted lines, showcases the algorithm’s strategy; it starts from the sparsest top layer, quickly navigating vast data regions, and then refines its search in the denser lower layers, minimizing the overall comparisons and enhancing search efficiency.

  • Facebook AI Similarity Search (FAISS): FAISS, developed by Facebook AI Research, is a library designed for the efficient similarity search and clustering of highly dimensional vectors. It uses product quantization to compress data during indexing, accelerating similarity searches in vast datasets. This method divides the vector space into regions known as Voronoi cells, each symbolized by a centroid. The primary purpose is to minimize storage needs and expedite searches, though it may slightly compromise accuracy. To visualize this, consider the following image. The Voronoi cells denote regions from quantization, and the labeled points within these cells are the centroids or representative vectors. When indexing a new vector, it’s aligned with its closest centroid. For searches, FAISS pinpoints the probable Voronoi cell containing the nearest neighbors and then narrows down the search within that cell, significantly cutting down distance calculations:

Figure 4.5 – Representation of FAISS index

It excels in applications such as image and video search, recommendation systems, and any task that involves searching for nearest neighbors in highly dimensional spaces because of its performance optimizations and built-in GPU optimization.

In this section, we covered indexing and the role of ANNs in index creation. Next, we’ll explore similarity measures, how they differ from indexing, and their impact on improving data retrieval.