In my previous blog post, I presented a high-level overview of how to enable ChatGPT to tap into your organization’s data, allowing users to engage in natural language conversations with the data. In this post, we will dive deeper into the process, focusing on the Python version of LangChain, a powerful framework for developing applications powered by language models. LangChain simplifies the connection of language models to various data sources while enabling these models to interact with their environment.
LangChain provides a wide range of integrations with numerous language models, vector stores, and third-party services like Google Search and Unstructured. Furthermore, it offers a structured approach to prompt management, chat memory implementation, component chaining, and the creation of agents with access to a suite of tools to augment language models. In this post, we will explore LangChain’s integration with OpenAI embeddings and language models, as well as with the Pinecone vector store.
The Core Strategy
Below are the steps I outlined in my last post:
- Tokenize documents and content in your knowledge base, dividing it into smaller, more manageable sections or chunks.
- Use OpenAI’s API to generate embeddings for each of the chunks.
- Store the embeddings in a database with support for vector searches, enabling efficient storage and retrieval of the chunks.
- For each user query, generate an embedding for the query and perform a similarity search on the database to find the most relevant chunks in the knowledge base.
- Inject the relevant chunks into the prompt to provide contextual information to the OpenAI LLM.
I encourage you to read my previous post, as it will provide valuable context as we go through the code for each of the above steps. You can find all code for this post in this notebook, along with sample outputs.
Tokenize Your Knowledge Base into Chunks
LangChain provides an extensive number of loaders and parsers for many document types. For simplicity, we will use a hardcoded markdown document from LangChain’s readme.md file as our example knowledge base document and parse it using LangChain’s MarkdownTextSplitter, which splits text along Markdown headings, code blocks, or horizontal rules. We are using a maximum chunk size of 500, which works well for this type of content, but may not work as well for other types of content.
markdown_text = """
# LangChain
## What is this?
Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not.
But using these LLMs in isolation is often not enough to create a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge.
The LangChain library is aimed at assisting in the development of those types of applications. There are six main areas that LangChain is designed to help with.
### LLMs and Prompts:
This includes prompt management, prompt optimization, generic interface for all LLMs, and common utilities for working with LLMs.
### Chains:
Chains go beyond just a single LLM call, and are sequences of calls (whether to an LLM or a different utility). LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications.
### Data Augmented Generation:
Data Augmented Generation involves specific types of chains that first interact with an external datasource to fetch data to use in the generation step. Examples of this include summarization of long pieces of text and question/answering over specific data sources.
### Agents:
Agents involve an LLM making decisions about which Actions to take, taking that Action, seeing an Observation, and repeating that until done. LangChain provides a standard interface for agents, a selection of agents to choose from, and examples of end to end agents.
### Memory:
Memory is the concept of persisting state between calls of a chain/agent. LangChain provides a standard interface for memory, a collection of memory implementations, and examples of chains/agents that use memory.
"""
markdown_splitter = MarkdownTextSplitter(chunk_size=500, chunk_overlap=0)
chunks = markdown_splitter.create_documents([markdown_text])
Generate Embeddings for the Chunks
We will use OpenAI to generate embeddings and LangChain’s OpenAIEmbeddings class provides a wrapper around the OpenAI embedding model. To use this class, first obtain an API Key from OpenAI by going to platform.openapi.com and paste it below. Note that API keys should not be hard coded and instead loaded from the environment, but for simplicity, we are passing the API key as a parameter to the constructor.
OPENAI_API_KEY = ""
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
Store the Embeddings in a Vector Database
We are using the Pinecone database to store embeddings and you will also need an API key from them at app.pinecone.io. Normally, we would set the API key in the environment and get it via os.getenv('PINECONE_API_KEY')
, but for simplicity, we are hardcoding it. When you get your API key, their dashboard will show the environment where the API key is valid, which will be something like “us-central1-gcp” or “us-west1-gcp”. Copy and paste the key and environment below:
PINECONE_API_KEY = ""
PINECONE_ENV = ""
# initialize pinecone
pinecone.init(
api_key=PINECONE_API_KEY,
environment=PINECONE_ENV
)
After Pinecode is initialized, we create a new index for our chunks. We specify the number of dimensions to match the dimensions that the OpenAI embeddings API generates. Also, as of this writing, Pinecode is experiencing a surge in traffic and may returns errors or timeouts, so be patient if trying this on your own, and check your Pinecode dashboard after each operation, as the index may be created successfully in the platform even when you received an error.
index_name = "kbtest-idx"
if index_name not in pinecone.list_indexes():
# create index only if it does not exist
pinecone.create_index(index_name, dimension=1536)
pinecone.list_indexes()
Now we load the new index with our chunks using the OpenAI embedding model we created earlier. The from_documents() function returns the Pinecode vector store instance that we can use later in further operations.
store = Pinecone.from_documents(chunks, embeddings, index_name=index_name)
We can confirm that our chunks got stored in Pinecone by querying the index stats and confirming that the vector_count is 7, which is the number of chunks generated by the MarkdownTextSplitter. The same information is shown on their dashboard.
index = pinecone.Index(index_name=index_name)
index.describe_index_stats()
Generate User Query Embeddings and Find Relevant Chunks
Now that the vector store has indexed our chunks, we can query it for the most relevant chunks that match the user query. To keep it simple, we are hardcoding the user query we will be sending to the LLM. We are exposing the vector store as a retriever. Retriever is a generic LangChain interface that makes it easy to combine a vector store with language models. The interface exposes a get_relevant_documents method, which accepts a query and returns a list of relevant documents. The k argument specifies the maximum number of results to return, which in our example is 2, meaning we only want the top two most relevant text chunks.
user_query = "How can you harness the real power of LLMs?"
retriever = store.as_retriever(search_kwargs={"k": 2})
relevant_chunks = retriever.get_relevant_documents(user_query)
Inject the Relevant Chunks into the Prompt
The real power of the retriever interface is that we can feed it directly into a chain and leverage the power of LangChain. In this example, we create a RetrievalQA chain, which is a special-purpose chain for question-answering. This chain automatically retrieves the most relevant chunks of text from the retriever and feeds them to the language model as context. To create the chain, we specify the LLM to answer questions and the retriever (which wraps the vector store) to perform the similarity search on the user query. Here we use OpenAI as the LLM and Pinecone as the retriever.
qa = RetrievalQA.from_chain_type(llm=OpenAI(openai_api_key=OPENAI_API_KEY), chain_type="stuff", retriever=retriever, return_source_documents=True)
qa({"query": user_query})
We also supply a chain type, which defines how context (the relevant chunks) is injected into the prompt. Here we use a stuff
chain type, which means all the relevant chunks are fed to the model without regard for size. Note that stuff
is the default setting, so it does not need to be specified, but we do it here for clarity. We use stuff
in this example because we are limiting the number of chunks being injected to 2 and each chunk has a predefined maximum length of 500 (which we defined in the MarkdownTextSplitter), so we know we will not exceed the model’s token limit. However, a real application requires a lot more thought to ensure it does not exceed the max prompt length.
Lastly, note that when using the RetrievalQA chain, it is not necessary to call the get_relevant_documents() method of the retriever, as we did above, as this is automatically done under the hood when you execute the RetrievalQA chain.
A Robust and Versatile Framework
LangChain stands out as a powerful framework, offering a user-friendly interface that conceals the underlying complexities. With just under 20 lines of code, we successfully integrated a document with the OpenAI LLM, enabling users to interact with the document through natural language. The framework’s extensive ecosystem of integrations simplifies the process of loading data folders, managing various file types, and accessing external resources with content. I am eager to dive deeper into LangChain’s capabilities, particularly its chains and agents, which facilitate the development of highly sophisticated AI-driven applications.