A Document Reconstruction System for Transferring Bengali Paper Documents into Rich Text Format

INFLIBNET's Institutional Repository

A Document Reconstruction System for Transferring Bengali Paper Documents into Rich Text Format

Show simple item record

dc.contributor.author Chaudhuri, Anirban Ray en_US
dc.contributor.author Singh, Debnath en_US
dc.contributor.author Nasipuri, Mita en_US
dc.contributor.author Basu, Dipak Kumar en_US
dc.date.accessioned 2005-05-10T06:54:33Z en_US
dc.date.accessioned 2010-04-08T08:47:52Z
dc.date.available 2005-05-10T06:54:33Z en_US
dc.date.available 2010-04-08T08:47:52Z
dc.date.issued 2005-02-02 en_US
dc.identifier.isbn 81-902079-0-3 en_US
dc.identifier.uri http://hdl.handle.net/1944/495 en_US
dc.description.abstract The transformation of a scanned paper document into an editable form suitable for further processing such as desktop publishing or archiving in a digital library is a complex process. It requires solutions to several problems – document analysis by acquiring knowledge of document layout by a Page Layout Analyzer (PLA), followed by document recognition, which mainly comprises text recognition by Optical Character Recognition (OCR). Besides these two, another important problem is document reconstruction by transforming content into an electronically editable format by keeping the original layout intact. Core OCR modules exist on different Indian scripts, but no such document reconstruction system is available for Indian scripts. The document reconstruction system reported in this paper is the first of its kind on Indian scripts and it addresses document reconstruction for Bengali document images. The system makes use of the knowledge of both document layout extracted by a PLA in a graphical user interface (GUI) and the results of text recognition steps performed by OCR for transformation of paper documents into Rich Text Format. en_US
dc.format.extent 528075 bytes en_US
dc.format.mimetype application/pdf en_US
dc.language.iso en en_US
dc.publisher INFLIBNET Centre en_US
dc.subject Indian Scripts en_US
dc.subject Desktop Publishing en_US
dc.subject Page Layout Analysis en_US
dc.subject Optical Character Recognition en_US
dc.subject Document Reconstruction en_US
dc.subject Encoding Standard en_US
dc.subject Indian Language en_US
dc.title A Document Reconstruction System for Transferring Bengali Paper Documents into Rich Text Format en_US
dc.type Article en_US

Files in this item

Files Size Format View
05cali_4.pdf 528.0Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record