Share: LinkedIn ↗


Large Language Models (LLMs), by themselves, are powerful tools. Recently, they have been used to great effect in building custom chatbot services and creating intelligent enterprise search over all data sources in your company.

However, in order to harness that power we must build infrastructure around them to support their capabilities. In fact, this is often much of the code that gets written to support custom LLM based applications. In this article, we’ll discuss some of the more pivotal aspects of system architecture that make LLM applications feasible.

Bird’s Eye View

Modern LLM Application Flow (double click to zoom)

As you can see, the code that actually interacts with the language model is just one part of what makes an LLM application come to life. The lower branch of this diagram details the main flow of the LLM invocation, which generates and optimizes a prompt for your use case, and then inspects the output and applies any data processing and business logic relevant to your application. The upper branch of the chart details the “retrieval” process, which is present in all but the most basic LLM applications. This process supplies the language model with the ability to guide its responses by selecting relevant excerpts from your specialized knowledge base. By combining all of these pieces together, we can create a chatbot that can interact with a user in a natural way, while:

  • Referencing your company’s data to inform the chatbot’s questions and answers

  • Filtering the LLM’s responses to ensure undesired or confidential content isn’t sent to the user

  • Extracting the user’s messages and naturally filtering them for any data you’d like to collect and save to your company’s database

User Interface

The user interface is the front-end component of an LLM application where users interact with the system. It should be designed to both capture user input clearly and present system outputs in an understandable manner. For businesses, the UI could be a simple text box on a website, a chatbot integrated into a customer service platform, or even a voice-activated interface. The key is to make user interaction intuitive and seamless, reducing the complexity of what happens behind the scenes to simple commands and responses that feel natural to the user.

Our Favorite: NextJS

At PressW, NextJS is our frontend technology of choice (as well as the current industry standard). It provides a lightweight and flexible way to build highly responsive LLM interfaces with a high degree of customizability and a modern user experience.

Data Orchestration

Data orchestration in an LLM system refers to the processes and systems responsible for managing data flow and integration. This involves sourcing data from various inputs such as company databases, marketing materials, and developer documentation. We then must process that data and channel it to the right components at the right time. Efficient data orchestration ensures that the LLM has access to the necessary data in a format that it can use, which can include pre-processing steps like tokenization, anonymization, and formatting to fit the LLM's needs. This component is critical for performance and accuracy, ensuring that data is current and synchronized across the system.

For your LLM project, we pick from a variety of staple technologies to facilitate data orchestration based on the desired system functionality and constraints. Often, it’s necessary to choose a Vector Database in order to facilitate fast document retrieval based on “semantic search.” These vector databases are used in any application where it’s necessary to provide the LLM with external supporting documentation to help it make decisions and answer questions, and are the backbone of intelligent enterprise search. Further, we leverage a variety of more traditional data storage solutions, such as Relational Databases to power functionality such as output storage, result cacheing, integration with existing production systems, and more.

Prompt Building and Optimization


The prompt building and optimization process is crucial for effectively leveraging an LLM. It is a two pronged approach, the first prong falling in the development phase of the project before the system is deployed. This step involves crafting prompts that clearly communicate the task at hand to the LLM and optimizing them for the best performance. This might involve iterative testing to refine prompts based on the quality of the LLM's responses, or using data about the specific application domain to guide the development of prompts. It's a blend of art and science, requiring an understanding of how LLMs interpret language and the goals of the specific application.

Live Prompt Tuning

The second prong is live prompt tuning, in which a system is developed to first understand the context of the problem and how it relates to the structure and content of the prompts. Then, the system will dynamically choose information, language, and structure that reliably elicit the best responses from the LLM we are interacting with. This is done dynamically at every individual invocation to ensure no input cases are missed.

Content Moderation

Content moderation is an essential part of the LLM system architecture, particularly for public-facing applications. It includes filters and checks to ensure that the output from the LLM is appropriate, respectful, and adheres to messaging guidelines and legal standards. This might involve automated flagging systems for potentially harmful content, human-in-the-loop review processes, or a combination of both. Content moderation protects both the users and the business, maintaining the integrity and trustworthiness of the application.

Content moderation is not just a question of specifically filtering harmful outputs, though. In this stage, we also ensure outputs align with the “brand voice” of the application. It’s also an opportunity to include additional processes for fact checking outputs and ensuring the quality of our responses to the furthest extent we can in an automated system.

Business Logic

The business logic layer is where the application's domain-specific rules and processes are implemented. It takes the output from the LLM and applies the necessary transformations and decisions to achieve the business goals. This could involve parsing the LLM output to extract entities and intent, integrating with other business systems to perform actions, or storing results in a database for future reference. This layer turns the raw capabilities of the LLM into concrete, valuable business processes.


Finally, we’ll need somewhere to host our application. Hosting refers to the infrastructure required to deploy and run an LLM application in a way that is accessible to the end user. It includes considerations of scalability, reliability, security, and compliance. Most often (especially in the case of PressW projects), the solution will be a cloud-based services that provide on-demand scalability and global reach. The chosen hosting solution must ensure that the LLM application is always available and responsive to user requests, with data security and privacy as top priorities.

How to Get Involved

If this seems a little daunting, feel free to reach out to us over here at PressW for some guidance. LLM applications are our jam and we’d be happy to provide help in any capacity from providing an audit of your business and identifying use cases to building out a full fledged LLM application to best suit your use case and business challenges.

We’re also putting together a database to educate anyone on all of the possibilities with custom language model solutions. For now, visit our website and check out our “What’s possible with PressW?” section.

Appendix: Supporting Technologies

LangChain & LangSmith

LangChain stands out by providing a framework specifically designed for building LLM applications, focusing on the integration of LLMs into end-to-end workflows. It offers a modular architecture, allowing developers to easily combine LLMs with other components such as databases, user interfaces, and external APIs. This modular approach significantly reduces development time and complexity, enabling developers to focus on crafting unique features and user experiences. Furthermore, LangChain supports the creation of chain-of-thought prompts, which improve the reasoning abilities of LLMs, making them more effective in understanding context and generating more accurate responses.

LangSmith, on the other hand, enhances LLM development by providing advanced editing and debugging tools designed specifically for working with natural language models. It makes it simple to refine the output of LLMs, and helps ensure that the generated text meets the desired quality and relevance standards. It also offers features for customizing the behavior of LLMs, allowing developers to tailor models to specific domains or user needs without extensive training data or complex machine learning pipelines through targeted prompt engineering and user feedback loops.

Visualization of LangChain Ecosystem


Using FastAPI to make Large Language Models (LLMs) accessible in applications is an excellent choice for several reasons, making it a robust solution for developers aiming to integrate advanced natural language processing capabilities into their applications.

FastAPI is built on Starlette for the web component and uses Pydantic for the data component. It's designed to be fast and efficient, capable of handling asynchronous requests out of the box. This means it can manage multiple requests concurrently, significantly improving throughput and reducing response times when interacting with LLMs, which is critical for applications requiring real-time feedback or processing large volumes of requests.

One of the standout features of FastAPI is its automatic API documentation using Swagger UI and ReDoc. This feature is incredibly beneficial for developers integrating LLMs into their applications, as it provides a clear and interactive interface for testing and understanding how to interact with the API. It simplifies the process of debugging, testing, and sharing the API with other developers or stakeholders. This coupled with the automatic type checking and validation offered via Pydantic makes FastAPI the front runner in interfacing with LLM applications.

Autogenerated Swagger UI for API interaction


At PressW, we use Docker as it allows us to enable our LLM applications to be both platform and cloud agnostic. Depending on the client, we will need to deploy into Google Cloud, AWS, Azure, and even Azure Government. The use of this system allows us to easily deploy our applications on any cloud platform, ensuring flexibility and avoiding vendor lock-in.

Docker packages software in containers that simplify the deployment process and enable easy scaling of applications across different cloud platforms. It also speeds up the development time, especially in the case where many developers are working on the same project. Developers can ignore the woes of traditional environment management and boot a pre-packed docker container to get working right away.

Looking to stay connected?
Get access to the only newsletter you need to implement AI in your business.

Looking to stay connected?
Get access to the only newsletter you need to implement AI in your business.

Looking to stay connected?
Get access to the only newsletter you need to implement AI in your business.