Building a RAG Chatbot GUI with the ChatGPT API and PyMuPDF
Jamie Lemon & Harald Lieder·April 3, 2024

In this tutorial we will walk you through how to start creating your own chatbot for a web-browser. We are going to use a variety of Python libraries, including PyMuPDF, along with your ChatGPT API key, to create a graphical user interface (GUI) which will be able to answer a user’s inputted questions against an uploaded PDF document. We will demonstrate how to combine backend and frontend technology to deliver an effective solution for the web.
Getting Started
Our solution depends on 3 key libraries as follows:
- LangChain (a framework to construct LLM‑powered apps - used to manage the I/O for the chatbot)
- Gradio (used to create and serve a GUI for the chatbot in the web-browser)
- PyMuPDF (used to load and render the uploaded document for the chatbot)
Essentially we are using LangChain for our back-end, Gradio for our front-end with PyMuPDF as an essential interface between both.
Install dependencies
To ensure we have what we need for both we require to install these dependencies via pip
as follows:
pip install -U langchain
pip install -U langchain-community
pip install -U langchain-openai
pip install -U gradio
pip install -U pymupdf
Download the source code
Clone or download the example code from: https://github.com/pymupdf/RAG . Once you have a local copy you should refer to the contents of the “GUI” folder for the Python source code.
Run the demo
Open up your console and from the “GUI” folder, simply run:
python browser-app.py
The demo should run in a local host environment and serve up a GUI as follows:

GUI app in web browser
How the demo works
The demo will allow you to:
- enter an OpenAI API key to be used for the chat session *
- upload a PDF document
- allow you to submit queries against the document providing an ongoing Q&A session
Try uploading a document and ask the bot “What is this?” You should receive a reply with a summary of the document’s topic. For example:

GUI app in chat session
* Note
Without an OpenAI API key you will not be able to get information from the session as you need permission to access the required services. If you don’t already have an API key, please obtain one from OpenAI.
Explaining the backend code
Let’s go through the main areas of Python code to explain how the demo backend works. This is just for better understanding - the script does not require any adjustments on your part.
Setting the API Key
Initially we provide a function to handle the input of an API key with:
def set_apikey(api_key: str):
print("API Key set")
app.OPENAI_API_KEY = api_key
return disable_box
Note
We will hook this up to our Gradio GUI later.
The App class
We have a single class “my_app” which is instantiated as follows:
app = my_app()
- Aside from the constructor & callable methods in here, the main methods do the following:
process_file
- This uses the PyMuPDFLoader from LangChain to load the PDF document supplied by the user.
build_chain
- This builds the chain with LangChain for the conversational dialogue
Body methods
Within the main body of the Python code ( outside of the “my_app” class ) we have a few other key methods:
get_response
- This sends queries and chat history to the chain, retrieves the page number with the most relevant answer and yields responses to the front-end.
render_file
- This is called as the user submits various queries and if there is a successful response from the Chatbot which may reference a particular document page then the code will use PyMuPDF to render the page of interest to the user.
purge_chat_and_render_first
- This is actually the first method called after a user uploads a document, it is responsible for purging any previous chat history and then it renders the first page of the document to the user to let the user know that the document is ready. Note these lines are critical to purge any previous session:
app.chat_history = []
app.count = 0
Without this a chat session may get confused as it has “knowledge” of previous documents and may try returning unrelated information.
Explaining the frontend code
The frontend code is the Gradio portion of the Python code as follows:
with gr.Blocks() as demo:
with gr.Column():
with gr.Row():
with gr.Column(scale=1):
api_key = gr.Textbox(
. . .
This code describes the grid layout for the GUI - it organizes an area for the input text fields, an area for the chatbot results, query submission and an area for the document. Without going into a separate tutorial about Gradio here we advise finding out more with the Gradio guide.
One critical part of the Gradio code is where we assign functions and variables to the UI controls, for example the API field assigns this is follows:
api_key.submit(
fn=set_apikey,
inputs=[api_key],
outputs=[
api_key,
],
)
This informs the control that the function to use when submitted is set_apikey
and the associated inputs & outputs declare the variable to use.
Another critical function is assigned to the upload button with purge_chat_and_render_first
- this, perhaps overly descriptive, function purges the previous chat session and then uses PyMuPDF to render the first page of the document for our GUI.
Finally we queue and launch the demo to start the web application.
Conclusion
We hope we’ve shown how to utilize existing Python technology to relatively easily create an interactive chatbot. Please let us know on our Github if you encounter any bugs, have any suggestions or think of further enhancements.