IAM done right for LLM-augmented applications
How IAM can help you to leverage the advanced capabilities of LLMs while securing sensitive data and assets.
The role of identity and access management (IAM) in LLM-enhanced applications
As large language models (LLMs) and Generative AI continue to make headlines, the critical role of robust identity and access management (IAM) has become increasingly important, across various contexts, for businesses of all sizes. They must ensure that sensitive or proprietary data are not inadvertently exposed to unauthorized parties by LLM-enhanced applications.
One approach that some may be taking is to develop custom LLMs. They do this by training them with curated data, enabling them to autonomously determine and enforce access restrictions. However, this strategy has significant challenges:
-
Prohibitive costs: Developing a custom LLM capable of managing access control requires intense computational resources and data management, leading to high costs. This approach is best pursued only when the benefits clearly outweigh the hefty resource commitment.
-
Unpredictability of LLMs: The natural unpredictability of some LLM models adds another layer of complexity. Although there are techniques to promote consistent responses, ensuring absolute consistency and accuracy is challenging. False positives in the IAM space can lead to granting unauthorized access to users, which is an unacceptable risk for many organizations.
Considering these challenges, how can organizations effectively leverage the advanced capabilities brought through LLMs while securing sensitive data and assets with well-managed access control?
Reference architecture: Augmenting traditional applications with LLM-specific components
One way to address this question is by introducing a multilayered reference architecture (Figure 1) designed to integrate LLMs into web applications. It follows the traditional three-tier web application architecture:
- Web interface/presentation layer serves as the user interface and enables interaction with the application.
- Database/data layer is responsible for data processing, storage, management and retrieval of data kept locally in this application through traditional databases (DB), thus avoiding retrieval through HTTP calls.
- Orchestration/business logic layer remains pivotal. It implements the business logic and handles requests from start to finish, coordinating the other layers and components to fulfill them. Most importantly, it verifies identities (aka authentication) and enforces access control.
Authentication works similarly to any typical web application where the identity of a user is verified and confirmed using a range of credentials, including passwords or biometric characteristics. As part of this process, relevant user data are often fetched and stored as part of the session. These data commonly include the authenticated user ID, among other details, facilitating the access control required for upcoming requests. Access control is the key to ensure that a user, identified by their user ID, has appropriate access to relevant data before proceeding with their requests.
The key distinction from traditional architecture is the inclusion of components that augment the application with LLM capabilities, highlighted in the orange dotted rectangle in Figure 1 above.
- Vector DB component enables efficient, vector-based searches for data stored locally within this application. It provides semantic search capability, which is well-suited for LLMs and is not supported by traditional DBs.
- Assistant/functions calling component serves a dual purpose: (1) It supports LLMs in processing requests by providing metadata for function calls (typically HTTP calls to web services) and their arguments to integrate external data. And (2), it aids the orchestration layer in executing these calls upon identification of the required HTTP calls and their parameters.
LLMs are tasked with two primary functions related to natural language. The first is to act as a language processor, understanding a user’s request by identifying the necessary information to retrieve or the actions to execute. Initially, LLMs fetch a list of available functions and their schemas from the assistant/functions calling component. Using this information along with the user's request, they create structured data, usually in JSON format, containing instructions about what functions to call with what parameters.
This data is then passed to the orchestration layer, which relies on the assistant/functions calling component to execute the necessary function calls, typically HTTP requests to web services. For instance, in answering a "How many paid time off (PTO) days does John have left?" inquiry, LLMs leverage relevant functions and their metadata provided by the assistant/functions calling component. It then crafts a response, showing that an HTTP call of “get PTO balance by user ID” to PTO service is required, and determines the appropriate user ID to pass in.
The second role of LLMs is to formulate responses to requests in human-understandable languages. Responses are based on information provided by the orchestration layer as a result of making HTTP calls. For example, after a PTO balance is obtained, the orchestration layer supplies this data to the LLMs, together with the same inquiry as “How many PTO days does John have left?”, LLMs would generate a coherent response in natural language, such as “John has 5 days of paid time off remaining.” This capacity of LLMs and GenAI is commonly referred to as “retrieval augmented generation” (RAG). RAG allows LLMs to generate responses with the most contextually relevant results that match the user's query by providing them with relevant data alongside the question posted.
With each component and its role now explained, the remaining question is: How does the orchestration layer continue to play an important role in enforcing IAM, with the integration of these LLM-specific components? We will explore this further through a hypothetical case study, building upon the PTO balance inquiries example explored earlier.
Hypothetical use case example: Enforcing proper IAM for LLM-powered PTO balance inquiry
The sequence diagram below (Figure 2) illustrates the steps involved in retrieving and presenting a paid time off (PTO) balance using the LLM component. Proper access control occurs at Steps 5 and 8, as indicated by the orange highlights in the diagram and descriptions below.
The following are descriptions of each step illustrated in Figure 2.
- Users log in and interact with the application through a web interface powered by LLMs, asking questions like, “How many PTO days does John have left?”.
- The web interface relays the inquiry to the application’s orchestration layer, which coordinates user request processing.
- This layer then involves the LLM component, providing the inquiry along with authenticated user information, particularly their user ID.
- The LLM component identifies the need for a specific call to retrieve PTO balance information, including an HTTP call “get PTO balance by user ID” and one of its parameters as John’s user ID. (Note: We assume that the list of functions and their corresponding schemas has been pre-fetched from the Assistant/Functions Calling component and cached locally to the LLM component. For simplicity of illustration, we use only one HTTP call for this workflow, "get PTO by user ID", instead of another call to “retrieve John’s user ID” first.)
- The orchestration layer conducts an access control check to confirm that the authenticated user has the right to access John’s PTO balance. For example, a possible check would be whether the requestor is John himself or his manager.
- If the user is authorized, the process moves to the next step; otherwise, it halts with an error message. With successful access control verification, the orchestration layer requests the PTO balance via the assistant/functions calling component, which facilitates communication with external services.
- The assistant/functions calling component makes a RESTful API call to the PTO service to obtain the balance given the specified user ID.
- The PTO service performs its own access control to double check that the requesting user has rightful access to John’s PTO data, and returns the balance upon successful verification.
- The PTO balance is sent to the assistant/functions calling component.
- This component then conveys the balance to the orchestration layer.
- The orchestration layer passes the balance, together with the inquiry, “How many PTO days does John have left?” to the LLM.
- The LLM generates a response, such as “John has 5 days of paid time off remaining.” and returns it to the orchestration layer.
- This response is forwarded to the web client/presentation layer.
- The web client/presentation layer displays the response, providing the user with the requested PTO balance in a clear and understandable manner.
As highlighted in the previously described workflow, despite integrating LLMs, the roles of IAM persist unchanged from traditional web applications. These functions continue to be managed by both the orchestration layer and the corresponding web services. These components work together to filter out unauthorized data, before feeding them into LLMs, which would generate human-friendly responses with RAG capability.
The prudent approach to achieving IAM is to use LLMs only for their language capabilities
There’s a critical balance organizations must strike when integrating LLMs into their digital ecosystems while also maintaining the indispensable role of robust IAM. Despite the transformative potential of LLMs to enhance user interactions, their application does not extend to identity and access control. A reference architecture that marries LLM-specific components with traditional web application structures, demonstrates a strategic approach to leveraging LLMs’ capabilities while ensuring data security and compliance through the Orchestration layer and the corresponding web services. This architecture, which positions LLMs only as sophisticated language processors and generators, not only enriches the user experience, but also upholds the stringent IAM measures that are essential.