Skip to content

Exercise 9

  1. Implement a simple retriever using BM25s (pip install bm25s) https://github.com/xhluca/bm25s
  2. Use pymupdf to implement a pdf reader.
  3. Implement a text chunker that creates chunks using chunk_size and chunk_overlap
  4. Create a pipeline that reads a pdf file, creates chunks and loads into the retriever.
  5. Implement a Streamlit-based UI for retrieving and displaying the top_k most relevant chunks to a given query.
  6. (Optional Add an LLM from Mistral to create a RAG chatbot experience.)