Show simple item record

dc.contributor.advisorHuisman, H.M.
dc.contributor.advisorDrevin, G.R.
dc.contributor.authorVan Zyl, Petrus Andries
dc.date.accessioned2018-04-24T09:26:09Z
dc.date.available2018-04-24T09:26:09Z
dc.date.issued2015
dc.identifier.urihttp://hdl.handle.net/10394/26820
dc.descriptionMSc (Computer Science), North-West University, Potchefstroom Campus, 2016en_US
dc.description.abstractThe automatic extraction and handling of information contained on invoice documents holds major benefits for many businesses as this could save many resources, which would otherwise have been spent on manual extraction. Document Analysis and Recognition (DAR) is a process, which makes use of Optical Character Recognition (OCR) for the recognition and analysis of the contents of physical documents in order to digitally extract and process the information. It consists of four steps, namely pre-processing, layout analysis, text recognition, and post-processing. Pre-processing is used to improve the overall quality of a document image in order to prepare it for the steps that follow. Techniques used for pre-processing have a direct influence on the resulting OCR accuracy as any small deficiencies that pass through this stage are dragged along the rest of the OCR process and ultimately recognized incorrectly. A significant contribution can be made to the relevant research areas and business communities by revealing which preprocessing techniques are the most effective for the analysis and recognition of invoice documents. In order to approach this problem, an exploratory study was first conducted. Case studies were used during which owners and CEOs of five DAR-related companies were interviewed. Transcriptions and content analysis of these semi-structured interviews allowed prevalent themes to emerge from the data. The second study was an experimental investigation. The experiments conducted involved taking a number of invoice document images, performing various pre-processing techniques on the images, and measuring the effect of the techniques on the recognition rates. By acquiring the recognition rates of the different techniques, it was possible to quantitatively compare the techniques with each other. It was revealed that many businesses in the DAR industry make use of the same business process. Much was learnt about the DAR-related software used in the industry, how Intelligent Character Recognition (ICR) should be approached, and what the best scanning practices are. It was also discovered that the use of paper-based information and the need for the electronic processing thereof is increasing, thereby securing the future of the industry. Regarding the efficiency of pre-processing techniques, it was successfully revealed that some techniques do perform better than others. In addition, many findings were made regarding the functioning of some of the techniques used for the experimentsen_US
dc.description.sponsorshipNational Research Foundation (NRF)en_US
dc.language.isoenen_US
dc.publisherNorth-West University (South Africa), Potchefstroom Campusen_US
dc.subjectOptical character recognitionen_US
dc.subjectIntelligent character recognitionen_US
dc.subjectDocument analysis and recognitionen_US
dc.subjectPre-processingen_US
dc.subjectNoise reductionen_US
dc.subjectBinarizationen_US
dc.subjectExploratory studyen_US
dc.subjectExperimental investigationen_US
dc.subjectGround truth texten_US
dc.titleEvaluation of pre-processing techniques for the analysis and recognition of invoice documentsen_US
dc.typeThesisen_US
dc.description.thesistypeMastersen_US
dc.contributor.researchID10066896 - Huisman, Hester Magrietha (Supervisor)
dc.contributor.researchID10063374 - Drevin, Günther Richard (Supervisor)


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record