Building a RAG Chatbot GUI with the ChatGPT API and PyMuPDF

Jamie Lemon & Harald Lieder·April 3, 2024

PyMuPDFRAGLLM

Building a RAG Chatbot GUI with the ChatGPT API and PyMuPDF

In this article

Getting Started
How the demo works
- Explaining the backend code
- Explaining the frontend code
Conclusion
- Source Code
- Related Blogs

In this tutorial we will walk you through how to start creating your own chatbot for a web-browser. We are going to use a variety of Python libraries, including PyMuPDF, along with your ChatGPT API key, to create a graphical user interface (GUI) which will be able to answer a user’s inputted questions against an uploaded PDF document. We will demonstrate how to combine backend and frontend technology to deliver an effective solution for the web.

Getting Started

Our solution depends on 3 key libraries as follows:

LangChain (a framework to construct LLM‑powered apps - used to manage the I/O for the chatbot)
Gradio (used to create and serve a GUI for the chatbot in the web-browser)
PyMuPDF (used to load and render the uploaded document for the chatbot)

Essentially we are using LangChain for our back-end, Gradio for our front-end with PyMuPDF as an essential interface between both.

Install dependencies

To ensure we have what we need for both we require to install these dependencies via pip as follows:

pip install -U langchain
pip install -U langchain-community
pip install -U langchain-openai
pip install -U gradio
pip install -U pymupdf

Download the source code

Clone or download the example code from: https://github.com/pymupdf/RAG . Once you have a local copy you should refer to the contents of the “GUI” folder for the Python source code.

Run the demo

Open up your console and from the “GUI” folder, simply run:

python browser-app.py

The demo should run in a local host environment and serve up a GUI as follows:

GUI app in web browser

How the demo works

The demo will allow you to:

enter an OpenAI API key to be used for the chat session *
upload a PDF document
allow you to submit queries against the document providing an ongoing Q&A session

Try uploading a document and ask the bot “What is this?” You should receive a reply with a summary of the document’s topic. For example:

GUI app in chat session

* Note

Without an OpenAI API key you will not be able to get information from the session as you need permission to access the required services. If you don’t already have an API key, please obtain one from OpenAI.

Explaining the backend code

Let’s go through the main areas of Python code to explain how the demo backend works. This is just for better understanding - the script does not require any adjustments on your part.

Setting the API Key

Initially we provide a function to handle the input of an API key with:

def set_apikey(api_key: str):

    print("API Key set")

    app.OPENAI_API_KEY = api_key

    return disable_box

Note

We will hook this up to our Gradio GUI later.

The App class

We have a single class “my_app” which is instantiated as follows:

app = my_app()
- Aside from the constructor & callable methods in here, the main methods do the following:
process_file
- This uses the PyMuPDFLoader from LangChain to load the PDF document supplied by the user.
build_chain
- This builds the chain with LangChain for the conversational dialogue

Body methods

Within the main body of the Python code ( outside of the “my_app” class ) we have a few other key methods:

get_response
- This sends queries and chat history to the chain, retrieves the page number with the most relevant answer and yields responses to the front-end.
render_file
- This is called as the user submits various queries and if there is a successful response from the Chatbot which may reference a particular document page then the code will use PyMuPDF to render the page of interest to the user.
purge_chat_and_render_first
- This is actually the first method called after a user uploads a document, it is responsible for purging any previous chat history and then it renders the first page of the document to the user to let the user know that the document is ready. Note these lines are critical to purge any previous session:

app.chat_history = []
app.count = 0

Without this a chat session may get confused as it has “knowledge” of previous documents and may try returning unrelated information.

Explaining the frontend code

The frontend code is the Gradio portion of the Python code as follows:

with gr.Blocks() as demo:

    with gr.Column():

        with gr.Row():

            with gr.Column(scale=1):

                api_key = gr.Textbox(

                  . . .

This code describes the grid layout for the GUI - it organizes an area for the input text fields, an area for the chatbot results, query submission and an area for the document. Without going into a separate tutorial about Gradio here we advise finding out more with the Gradio guide.

One critical part of the Gradio code is where we assign functions and variables to the UI controls, for example the API field assigns this is follows:

api_key.submit(

        fn=set_apikey,

        inputs=[api_key],

        outputs=[

            api_key,

        ],

    )

This informs the control that the function to use when submitted is set_apikey and the associated inputs & outputs declare the variable to use.

Another critical function is assigned to the upload button with purge_chat_and_render_first - this, perhaps overly descriptive, function purges the previous chat session and then uses PyMuPDF to render the first page of the document for our GUI.

Finally we queue and launch the demo to start the web application.

Conclusion

We hope we’ve shown how to utilize existing Python technology to relatively easily create an interactive chatbot. Please let us know on our Github if you encounter any bugs, have any suggestions or think of further enhancements.

Source Code

See: https://github.com/pymupdf/RAG