This is the third article in a series about enabling the ChatGPT language model to tap into your organization’s data, allowing users to engage in natural language conversations with the data. In this post, we continue our exploration of the LangChain framework for developing applications powered by language models. LangChain provides a powerful set of tools to work with language models, integrate them with various external data sources, and enable the models to interact with their environment.
We will follow the same general approach as in the previous 2 blog posts consisting of 4 steps: tokenizing our data, generating embeddings for the tokens, storing the embeddings in a vector database, and providing context to the OpenAI model by injecting the relevant chunks of our document into the prompt. The last step will leverage agents in LangChain.
I encourage you to read the introductory post and the previous post of this series, as they will provide valuable context as we go through the code. You can find all code for this post in this notebook, along with sample outputs.
Using Unstructured and FAISS
In my last post, I used LangChain’s readme file in markdown format to represent the data and we dove deeper into LangChain’s integration with OpenAI and the Pinecone vector store. In this post, we will use a PDF version of the same markdown file and employ Unstructured to parse the PDF into semantic chunks. The unstructured library provides open-source components for pre-processing text documents, such as PDFs, HTML and Word Documents, to build pipelines that partition, clean and stage documents for downstream tasks, such as ML inference and data labeling.
We will also use Facebook AI Similarity Search (FAISS) to store the embeddings. FAISS is an open source similarity search library developed by Facebook AI and written in C++ with bindings for Python. FAISS is not a vector database that can permanently store embeddings, but rather an in-memory index of embeddings.
Exploring Autonomous Agents
The most interesting aspect of this post is our exploration of LangChain agents. If you are following the space you have likely heard of Baby AGI or Auto-GPT. These projects aim to create autonomous AIs that are able to create, prioritize, and execute tasks based on a predefined goal or objective.
LangChain’s agents are similar in nature, but less grandiose in purpose. They are based on the paper Synergizing Reasoning and Acting (ReAct) in Language Models by Yao et all, published in November of 2022, which describes an approach that leverages the abilities of LLMs for reasoning and acting (e.g. planning) to interact with external sources and generate human-like task-solving behaviors.
LangChain’s agents use a large language model as a reasoning engine and connect it to two components: tools and memory. Tools connect the language model to external data sources or computational resources, allowing it to access up-to-date information and perform actions, such as running code or modifying files. Memory enables the agent to recall previous interactions with other entities or tools, which can be either short-term or long-term memories. This helps the agent make informed decisions based on past experiences.
An agent operates through a cyclical process: the user assigns a task, the agent thinks about what to do, decides on an action (selecting a tool and its input), observes the output of the tool, and repeats these steps until the agent deems the task complete. This approach allows agents to be adaptive, responsive, and effective in accomplishing tasks by leveraging both their internal knowledge and external resources.
We will be creating an agent with a tool that provides the agent access to the LangChain readme file information and autonomously decides whether to use the tool, depending on the question the user asks.
Tokenizing the PDF File into Chunks
After installing all dependencies, we use LangChain’s UnstructuredPDFLoader
to tokenize the PDF file into semantically structured elements. The PDF file is hosted in my GitHub repo, so we use the wget
utility to read the file from GitHub and load it into our local environment for parsing.
!wget https://raw.githubusercontent.com/enrtrav/blog/main/personalize_ai/langchain_readme.pdf
from langchain.document_loaders import UnstructuredPDFLoader
pdf_loader = UnstructuredPDFLoader('langchain_readme.pdf',mode='elements')
chunks = pdf_loader.load()
Generating Embeddings for the Chunks
Embeddings are necessary to store the chunks in a vector database. We use OpenAI to generate embeddings and LangChain’s OpenAIEmbeddings
class provides a wrapper around the OpenAI embedding model. To obtain an API key from OpenAI, visit platform.openapi.com, and then paste the key below. Note that API keys should not be hardcoded and instead loaded from the environment; however, for simplicity in this example, we pass the API key as a parameter to the constructor.
from langchain.embeddings.openai import OpenAIEmbeddings
OPENAI_API_KEY = ""
embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY)
Storing the Embeddings in a FAISS Index
We store the generated embeddings in a FAISS in-memory index. Although FAISS is not a vector database that can permanently store embeddings, we can still save the index to disk and reload it to make it persistent across sessions. To use the index in LangChain, we expose it as a retriever
, a generic LangChain interface that makes it easy to combine a vector store with language models.
import os
from langchain.vectorstores import FAISS
idx_name= 'faiss_idx'
if os.path.exists(idx_name):
faiss_idx = FAISS.load_local(folder_path=idx_name, embeddings=embeddings)
print("Index loaded from disk")
else:
faiss_idx = FAISS.from_documents(documents=chunks, embedding=embeddings)
faiss_idx.save_local(folder_path=idx_name)
print("New index created and saved to disk")
retriever = faiss_idx.as_retriever(search_kwargs={"k": 3})
Creating a LangChain Agent
We start by creating a tool our agent can use to answer questions about LangChain. The tool uses OpenAI as it underlying language model and a LangChain RetrievalQA
chain for question-answering. The QA chain automatically generates the embeddings for the user query, uses the retriever interface of the FAISS index to find the most relevant chunks of text from the index, and injects the chunks into the prompt to provide context to the LLM for answering the question.
from langchain import OpenAI
from langchain.chains import RetrievalQA
from langchain.agents import Tool
openai_llm = OpenAI(
openai_api_key=OPENAI_API_KEY,
temperature=0
)
qa_llm_chain = RetrievalQA.from_chain_type(llm=openai_llm, chain_type="stuff", retriever=retriever)
langchain_tool = Tool(
name='LangChain',
func=qa_llm_chain.run,
description='You must use this tool to answer questions about LangChain'
)
The agent also needs to answer general questions, so we create a second tool called ‘Language Model’ also backed by an OpenAI LLM. The agent will decide which tool to use based on the question.
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
prompt = PromptTemplate(
input_variables=["query"],
template="{query}"
)
llm_chain = LLMChain(llm=openai_llm, prompt=prompt)
llm_tool = Tool(
name='Language Model',
func=llm_chain.run,
description='Use this tool for general purpose queries not about LangChain'
)
We can now create our agent, giving it a type, a list of tools to use, and the LLM that will be used to decide what tool to use. Our agent type is zero-shot, meaning it will only have a single interaction with the LLM and will have no memory of previous interactions.
from langchain.agents import AgentType, initialize_agent
agent = initialize_agent(
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
tools=[langchain_tool, llm_tool],
llm=openai_llm,
verbose=True,
)
It is important to note that in this example, we use the same LLM (openai_llm
) for the agent and both tools. The LangChain tool’s RetrievalQA
chain uses its LLM to answer questions about LangChain, providing context from the index. The Language Model tool uses its LLM to answer general-purpose questions. And the agent uses its LLM to decide which tool to use to answer a question. However, in practice, agents often use tools backed by domain-specific LLMs or no LLMs at all.
Evaluating the Agent’s Performance
Let’s test our agent by asking it a couple of questions. First, we ask the agent a general question and ensure it uses the Language Model tool to provide the answer. Because we used verbose=true
when creating the agent, we can see below the detailed cyclical process the agent used to arrive at the answer.
agent.run("What is Artifical Intelligence?")
Entering new AgentExecutor chain…
Artificial Intelligence is a broad field of study.
Thought: I should use a tool to help me answer this question.
Action: Language Model
Action Input: What is Artificial Intelligence?
Observation:
Artificial Intelligence (AI) is a branch of computer science that focuses on creating intelligent machines that can think and act like humans. AI systems are designed to learn from their environment and experiences, and to use that knowledge to solve problems and make decisions. AI can be used to automate tasks, improve decision-making, and create new products and services.
Thought: I now know the final answer.
Final Answer: Artificial Intelligence is a branch of computer science that focuses on creating intelligent machines that can think and act like humans. AI systems are designed to learn from their environment and experiences, and to use that knowledge to solve problems and make decisions. AI can be used to automate tasks, improve decision-making, and create new products and services.
Finished chain.
'Artificial Intelligence is a branch of computer science that focuses on creating intelligent machines that can think and act like humans. AI systems are designed to learn from their environment and experiences, and to use that knowledge to solve problems and make decisions. AI can be used to automate tasks, improve decision-making, and create new products and services.'
Next, let’s ask it a question about LangChain. As you can see below, at first, the agent used the Language Model tool to determine what an LLM is, and although the answer it got from the tool is correct, it is unrelated to our topic of language models. Fortunately, it followed up with the LangChain tool, which queried the PDF index for context, and was able to provide a fairly decent answer based on the limited LangChain PDF documentation that it had available.
agent.run("How can LangChain harness the real power of LLMs?")
Entering new AgentExecutor chain…
I need to understand what LLMs are and how LangChain can use them.
Action: Language Model
Action Input: What is an LLM?
Observation:
LLM stands for "Master of Laws," and it is a postgraduate degree in law. It is typically a one-year program that is designed for students who already have a law degree and want to specialize in a particular area of law. LLM programs are offered by many law schools around the world.
Thought: I now understand what LLMs are and how LangChain can use them.
Action: LangChain
Action Input: How can LangChain harness the real power of LLMs?
Observation: LangChain provides a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications, allowing developers to combine LLMs with other sources of computation or knowledge to create powerful applications.
Thought: I now know the final answer.
Final Answer: LangChain can harness the real power of LLMs by providing a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications, allowing developers to combine LLMs with other sources of computation or knowledge to create powerful applications.
Finished chain.
'LangChain can harness the real power of LLMs by providing a standard interface for chains, lots of integrations with other tools, and end-to-end chains for common applications, allowing developers to combine LLMs with other sources of computation or knowledge to create powerful applications.'
Harnessing the Power of Agents
In this blog post, we explored the capabilities of LangChain agents to create powerful AI-driven applications that can understand and interact with your organization’s data. By leveraging autonomous agents, we can build more adaptive and responsive AI systems that can efficiently access external resources and accomplish tasks. Agents allow for the seamless integration of language models with unstructured documents, vector storage, and customized tools, enabling personalized AI solutions for a wide range of use cases.