This research is conducted to accommodate the needs of visually impaired people through an intelligent system which reads textual information on papers and produces corresponding voice. Indonesian Automated Document Reader (I-ADR) is operated via a voice-based user interface to scan a document page. Textual information from the scanned page is then extracted using Optical Character Recognition (OCR) techniques. A user can then choose to have the system read the whole page, or they can opt to listen to a summary of the information in page. SIDoBI (Sistem Ikhtisar Dokumen untuk Bahasa Indonesia) is integrated into the system to provide summarization feature. The result of either the whole-page reading or summarization is converted to speech through a text-to-speech synthesizer. This whole system is developed under the Free Open Source Software policy and will be distributed openly to all users in need without any cost. This paper is focused on the text segmentation algorithm implemented in I-ADR to extract text from documents with complex layout. We implemented I-ADR text segmentation module using Enhanced CRLA and propose an improved algorithm for text extraction. Evaluation of the proposed system with various page layouts showed promising results.
Paper: “Recursive Text Segmentation for Indonesian Automated Document Reader for People with Visual Impairment”, Teresa Vania Tjahja, Anto Satriyo Nugroho, James Purnama, Nur Aziza Azis, Rose Maulidiyatul Hikmah, Oskar Riandi, Bowo Prasetyo, 3rd International Conference on Electrical Engineering and Informatics (ICEEI 2011) at the Institut Teknologi Bandung, Bandung, Indonesia on July 17-19, 2011 (accepted)