In this post we look at simple way to extract information from documents. This is often referred to as chat with your docs – asking questions to your documents directly to obtain information that general-purpose large language models (LLMs) could not deliver. The idea is to split the documents into small chunks, compute the embedding of each chunk. Then, we look for the chunks that are the closest (in some norm defined on the embedding space) to our question and prepare a prompt for the LLM based on the text of the chunks plus our question. The advantage is clear: we can use a generic LLM, that is one that has not been trained or refined on our internal body of knowledge. This means we can use confidential information when using the LLM locally.
We will use LangChain to connect the pieces, OpenAI for the LLM, and Chroma as vector database. The Python environment is created with the following packages:
python -m venv venv
./venv/Scripts/activate
pip install ipykernel ipywidgets nbconvert matplotlib openai \
langchain unstructured markdown chromadb tiktoken lark
For the documents to chat with, we will use the 1958 French constitution in the official english translation, as provided by the Élysée website. The content of the constitution was saved to a text file in Markdown format. The file contains the 89 articles, some of which have more than one part, and around 11,000 words, or roughly 15,000 tokens.
from IPython.display import display, Markdown
from pathlib import Path
my_display()
is a small helper function to print out the LLM output using a different font color and style.
from IPython.display import display, Markdown
def my_display(text):
display(Markdown('<div style="font-family: monospace; color:#880E4F; padding: 10px">' + text + "</div>"))
doc_path = Path('./french-constitution-en.md')
with open(doc_path, 'r') as f:
doc = f.read()
As discussed above, the first step is to split the document into chunks. There are many strategies for doing so; here we use a class that is tuned for Markdown input and splits on titles and articles. This strategy works well for our case, where each article is of modest length and focused on a specific topic.
from langchain.text_splitter import MarkdownHeaderTextSplitter
headers_to_split_on = [
("#", "title"),
("##", "article"),
]
markdown_splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=headers_to_split_on
)
splits = markdown_splitter.split_text(doc)
print(f"Found {len(splits)} splits.")
Found 109 splits.
The splitter enriches each chunk with the title of the section and the article number in the title
and article
fields of the metadata. This allows us to connect each chunk back to the parts of the original document where it is taken. For example, the chunk #10 will have the following metdata:
s = splits[10]
s.metadata
{'title': 'Title II - The President of the Republic', 'article': 'Article 10'}
We will use OpenAI embeddings: for each chunk, we compute the embedding and store it into a vector database. The database stores the emebeddings together with the chunks on disk, as specified by the persist_directory
variable, so embeddings are not recomputed when the notebook kernel is restarted.
from langchain.embeddings.openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings()
from langchain.vectorstores import Chroma
chroma_path = Path('./chroma')
if not chroma_path.exists():
chroma_path.mkdir()
vectordb = Chroma.from_documents(
documents=splits,
embedding=embedding,
persist_directory=str(chroma_path),
)
else:
vectordb = Chroma(
embedding_function=embedding,
persist_directory=str(chroma_path),
)
assert vectordb._collection.count() == len(splits)
The metadata can be used in the query itself; this is done with SelfQueryRetriever
class.
from langchain.llms import OpenAI
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo
metadata_field_info = [
AttributeInfo(
name="article",
description="The article number, generally a number be between 1 and 89`",
type="string",
),
]
The first prompt we develop is stateless, that is each question (and answer) is independent of what was asked before. The last of history prevents us from asking follow-up questions; we will add history in a moment.
Because we have added metadata, it is customary to use two prompts: document_prompt
is the prompt template that is used to organize content in retrieved documents (where each document is one of the chunks defined above) while prompt
is the actual prompt with our query. We use the document prompt to organize each document with a specific structure, reporting the title of the section, the article number and its content; the prompt itself describes what we want to obtain.
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name='gpt-4', temperature=0)
from langchain.prompts import PromptTemplate
document_prompt = PromptTemplate(
input_variables=["title", "article", "page_content"],
template="""
{title}
{article}: {page_content}
""")
template = """
Use the following pieces of context (delimited by <ctx></ctx>) to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use up to ten sentences maximum; refer to the articles that are used in the answer.
<ctx>
{context}
</ctx>
Question: {question}
Helpful Answer:"""
prompt = PromptTemplate(
input_variables=["context", "question", "title"],
template=template)
def get_answer(question):
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectordb.as_retriever(
search_type='mmr',
search_kwargs=dict(k=10, n_k=5),
),
return_source_documents=True,
chain_type_kwargs={"document_prompt": document_prompt, "prompt": prompt},
)
result = qa_chain({"query": question})
return result
question = "Can the President testify in a trial during the mandate?"
answer = get_answer(question)
my_display(answer['result'])
question = "How is power shared between the President and the Prime Minister?"
answer = get_answer(question)
my_display(answer['result'])
question = "What is the role of the Government?"
answer = get_answer(question)
my_display(answer['result'])
question = "What is the role of the Parliament?"
answer = get_answer(question)
my_display(answer['result'])
Let’s add memory to our conversation. The underlying LLM per se has no memory, each query is stand-alone and independent of the previous ones. Memory is added by reporting the previous queries and answers, to allow the model to “see” what was discussed before and build the new answer on top of the old ones. It is easy, yet quite annoying, to write such memory system on our own; using LangChain one is a much preferred way. We need a new prompt, prompt2
, which adds the previos history to the query.
template = """
Use the following pieces of context (delimited by <ctx></ctx>) and history
and the chat history (delimited by <hs></hs>) to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use up to ten sentences maximum; refer to the articles that are used in the answer.
<ctx>
{context}
</ctx>
<hs>
{history}
</hs>
Question: {question}
Helpful Answer:"""
prompt2 = PromptTemplate(
input_variables=["context", "history", "question", "title", "article"],
template=template)
The history
input variable is provided by the ConversationBufferMemory
class, which we add to our chatbot.
from langchain.memory import ConversationBufferMemory
from langchain.chains import RetrievalQA
class ChatBot:
def __init__(self, llm, vectordb):
self.memory = ConversationBufferMemory(
memory_key="chat_history",
return_messages=True
)
self.retriever=vectordb.as_retriever(
search_type='mmr',
search_kwargs=dict(k=10, n_k=5),
)
self.qa_chain = RetrievalQA.from_chain_type(
llm,
retriever=vectordb.as_retriever(
search_type='mmr',
search_kwargs=dict(k=10, n_k=5),
),
return_source_documents=True,
chain_type_kwargs=dict(
document_prompt=document_prompt,
prompt=prompt2,
verbose=False,
memory=ConversationBufferMemory(
memory_key="history",
input_key="question"),
),
)
def get_answer(self, question):
return self.qa_chain({"query": question})
chat_bot = ChatBot(llm, vectordb)
answer = chat_bot.get_answer('What are the powers of the President?')
my_display(answer['result'])
answer = chat_bot.get_answer('What are the powers of the Prime Minister?')
my_display(answer['result'])
answer = chat_bot.get_answer('What is the difference between the two?')
my_display(answer['result'])
answer = chat_bot.get_answer('What is the role of the National Assembly?')
my_display(answer['result'])
answer = chat_bot.get_answer('What is the role of the Senate?')
my_display(answer['result'])
answer = chat_bot.get_answer('What are the differences between the two?')
my_display(answer['result'])
answer = chat_bot.get_answer('Are the two chambers equal in power?')
my_display(answer['result'])
answer = chat_bot.get_answer("""
What are the specific roles of the Senate compared to that of the National Assembly?
""")
my_display(answer['result'])
To conclude, LangChain provides a nice way to chatting with documents. It gives simple and clear interfaces to vector databases and provides the tools of chatting with memory.