Leveraging Retrieval-Augmented Generation in Local Library Systems: The BiblioGPT Prototype

No Thumbnail Available

Date

2025-12-02

Journal Title

Journal ISSN

Volume Title

Publisher

INFLIBNET Centre Gandhinagar

Abstract

In today's digital age, library users often seek concept-oriented information that keyword searching cannot easily pro-vide, particularly when dealing with complex classification schemes such as the Dewey Decimal Classification (DDC) or the Library of Congress Classification (LCC). This work introduces BiblioGPT, a locally deployed conversational search application designed to fill the gap by providing natural language queries over structured library knowledge. The architecture combines the open-source Mistral language model with a Retrieval-Augmented Generation (RAG) pipeline through the WARC-GPT framework, which ingests and semantically processes WARC (Web ARChive) files as searchable content utilizing vector embedding (Chroma) and Groq-based inference. An easy-to-use interface enables smooth interaction, pulling contextually appropriate and accurate responses. Two case studies, a theoretical and a prac-tical one, demonstrate the prototype's capability to correctly interpret and answer questions, including assigning the proper DDC numbers to book titles. Although trained on a smaller dataset than the popular cloud models, BiblioGPT preserved stable performance while protecting user privacy by being deployed locally. The results confirm BiblioGPT's promise as a privacy-protecting, scalable solution that reinvents library system interaction, transforming from inflexi-ble keyword searching to flexible, smarter, and natural language-supported information retrieval. This paper describes a visionary strategy for digital library services, setting BiblioGPT as a model for future domain-specific AI-based li-brary software

Description

14th International CALIBER 2025, Sri Venkateswara University, Tirupati, Andhra Pradesh, November 17-19, 2025

Keywords

BiblioGPT, Dewey Decimal Classification, Generative Pretrained Transformer, Large Language Model, Library of Congress Classification, Mistral, Ollama, Retrieval Augmented Generation, Web ARChive

Citation