Scanned page post-processing

Project Info:

When pages are digitized from a printed book, through a proper scanner or (more often) through smartphone photographs, the text next to the binding will appear warped, curled; OCR software is often capable of deskewing an image, applying affine transformation or even perspective transformations (homography), but if part of the page is moderately or badly warped, OCR will fail anyway.

This simple software uses classical image processing techniques and heuristics to identify the individual lines and fits a mathematical model to understand the warping. The image is then resampled column by column to completely compensate for the warping using the inverted model. The method is resilient to errors, since not all the lines need to be perfectly detected for an accurate reconstruction of the whole page.

Project Details:

  • Kind: personal project
  • Group: me alone
  • Technologies: Python, OpenCV
  • Date:February 2021