Leveraging Retrieval-Augmented Generation in Local Library Systems: The BiblioGPT Prototype
No Thumbnail Available
Files
Date
2025-12-02
Journal Title
Journal ISSN
Volume Title
Publisher
INFLIBNET Centre Gandhinagar
Abstract
In today's digital age, library users often seek concept-oriented information that keyword
searching cannot easily pro-vide, particularly when dealing with complex classification
schemes such as the Dewey Decimal Classification (DDC) or the Library of Congress
Classification (LCC). This work introduces BiblioGPT, a locally deployed conversational
search application designed to fill the gap by providing natural language queries over
structured library knowledge. The architecture combines the open-source Mistral language
model with a Retrieval-Augmented Generation (RAG) pipeline through the WARC-GPT
framework, which ingests and semantically processes WARC (Web ARChive) files as searchable
content utilizing vector embedding (Chroma) and Groq-based inference. An easy-to-use
interface enables smooth interaction, pulling contextually appropriate and accurate responses.
Two case studies, a theoretical and a prac-tical one, demonstrate the prototype's capability
to correctly interpret and answer questions, including assigning the proper DDC numbers to
book titles. Although trained on a smaller dataset than the popular cloud models, BiblioGPT
preserved stable performance while protecting user privacy by being deployed locally. The
results confirm BiblioGPT's promise as a privacy-protecting, scalable solution that reinvents
library system interaction, transforming from inflexi-ble keyword searching to flexible, smarter,
and natural language-supported information retrieval. This paper describes a visionary
strategy for digital library services, setting BiblioGPT as a model for future domain-specific
AI-based li-brary software
Description
14th International CALIBER 2025, Sri Venkateswara University, Tirupati, Andhra Pradesh, November 17-19, 2025
Keywords
BiblioGPT, Dewey Decimal Classification, Generative Pretrained Transformer, Large Language Model, Library of Congress Classification, Mistral, Ollama, Retrieval Augmented Generation, Web ARChive