OCR Data Integrity Research
Intro / Overview:
Development and Implementation of Multi-Stage Data Integrity Verification System for OCR-Extracted Receipt Data in Australian Automated Accounting
Description
During the implementation of this project, we encountered the fact that there is no ready-made comprehensive solution for accounting automation with auto-correction of OCR errors on blurry receipts. Existing tools such as Abbyy FineReader or Dext offer basic OCR extraction, but do not integrate multi-stage verification with ABN API validation, arithmetic consistency checking, and automatic correction in the Australian context.
Therefore, we had to implement this as a full-fledged R&D research with experiments to create a custom system. We removed the extensive literature review, focusing on key gaps: the absence of hybrid (rule-based + ML) approaches for real-time localization and error correction.
Research Objectives
- Develop a multi-stage data integrity verification system
- Implement OCR error detection and correction mechanisms
- Integrate Australian Business Number (ABN) validation
- Integrate Australian Business Number (ABN) validation
- Design hybrid rule-based and machine learning approaches
Methodology
Our research methodology combines experimental validation with practical implementation, focusing on real-world scenarios encountered in Australian business environments. Each experiment addresses specific aspects of the data integrity verification pipeline.