OCR Data Integrity Research

Intro / Overview:

Development and Implementation of Multi-Stage Data Integrity Verification System for OCR-Extracted Receipt Data in Australian Automated Accounting

Description

During the implementation of this project, we encountered the fact that there is no ready-made comprehensive solution for accounting automation with auto-correction of OCR errors on blurry receipts. Existing tools such as Abbyy FineReader or Dext offer basic OCR extraction, but do not integrate multi-stage verification with ABN API validation, arithmetic consistency checking, and automatic correction in the Australian context.

Therefore, we had to implement this as a full-fledged R&D research with experiments to create a custom system. We removed the extensive literature review, focusing on key gaps: the absence of hybrid (rule-based + ML) approaches for real-time localization and error correction.

Research Objectives

  • Develop a multi-stage data integrity verification system
  • Implement OCR error detection and correction mechanisms
  • Integrate Australian Business Number (ABN) validation
  • Integrate Australian Business Number (ABN) validation
  • Design hybrid rule-based and machine learning approaches

Methodology

Our research methodology combines experimental validation with practical implementation, focusing on real-world scenarios encountered in Australian business environments. Each experiment addresses specific aspects of the data integrity verification pipeline.