Implementation of a Web Based Text Extraction Tool using Open Source Object Models

dc.contributor.authorS, Rajeswari
dc.contributor.authorM, Saibaba
dc.date.accessioned2017-08-08T07:06:03Z
dc.date.available2017-08-08T07:06:03Z
dc.date.issued2017-08
dc.description.abstractThe library in our institute is the repository of all reports and design documents. In our library, reports belonging to the past three decades are preserved. The present day scans, reports are all searchable, but the scan reports that are two decades old are not searchable. They are very important and constantly referred by Scientists and Engineers for the fast breeder design purpose. It was decided all these documents would be scanned and added to the collection. When scanning was done, it was found a skew getting introduced into these documents and OCRs were not directly deskewing and required a pre-processing to be applied to the document. Optical character recognition (OCR) tool is a solution to extract the text from image and scanned documents. So it was decided an OCR would be developed using open source libraries and the de-skewing methods necessary would be added to make it useful to our task. This paper discusses what is an OCR, the different steps necessary to extract text from image files using an OCR, the OCR tool development, evaluation of the OCR tool and the de-skewing method implemented in the tool.en_US
dc.identifier.isbn978-93-81232-07-1
dc.identifier.urihttps://ir.inflibnet.ac.in/handle/1944/2099
dc.language.isoenen_US
dc.publisherINFLIBNET Centreen_US
dc.subjectDe-skewingen_US
dc.subjectImage Filesen_US
dc.subjectOCRen_US
dc.titleImplementation of a Web Based Text Extraction Tool using Open Source Object Modelsen_US
dc.typeArticleen_US

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
17.pdf
Size:
604.63 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
P16_S.Rajeswari.pptx
Size:
4.25 MB
Format:
Microsoft Powerpoint XML
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: