Skew estimation and correction for handwritten Hindi documents


This work is one of my contributions to the project:"Recognition of handwritten Hindi documents". The project is supervised by Dr. R.M.K. Sinha (rmk@cse.iitk.ac.in) in the course: "Computer Vision and Document processing (EE672)". He has divided the whole project into various subtasks; skew correction being one of them.


Abstract and Motivation

Standard Hindi text is known by the name of Devanagari script. Devanagari script is a script used for several major languages such as Hindi, Sanskrit, Marathi and Nepali and is used by more than 500 million people. As shown in the piece of a scanned document below, an important feature of this script is the horizontal line, called "Shirorekha", on top of each word that holds the letters in a word together.

Till date nobody has considered the skew correction for handwritten Hindi documents. Handwritten documents are more complex to deal with than printed documents due to inconsistencies in handwriting. As in the above shown document, the writer has introduced skew in second line only. Skew is also introduced while scanning. The virginity and complexity of the problem have been the prime source of motivation in working towards it.


Approach

Inspection of above document reveals that "Shirorekha's" (horizontal line on top of words) give the direction of skew for each word. Thus the skew estimation reduces to finding the orientation of "Shirorekha's". It is expected that skew range from -10 degrees to +10 degrees. Thus firstly search for edges in these direction give set of points which are used to estimate skew after this. Hough Transform is used to find which of these points lie along the skew direction. Hough Transform reveals the line and its orientation along which most of these points lie. The selection of useful points in the direction reduces the computation time for the Hough Transform to a very great extent.

The above (left) shown document is a test document. The first step gives the candidate points shown in the right figure which are fed to the Hough Transform which finally estimates the skew angle. The figure also indicates the line which has the maximum number of pixels per unit length, e.g. along which most of the "shirorekhas" point.

After skew estimation the image is corrected by giving an affine transformation in the negative skew angle direction.


Tests & Results

The estimated overall skew for the above left document is -10 degrees. Again the right figure shows the candidate points for the Hough Transform. The line shown in it is the final result of the Hough Transform; it has maximum number of pixels per unit length and indicates the direction of the skew.
The deskewed document is shown in the last figure.

The estimated overall skew for the above left document is 8 degrees. Again the right figure shows the candidate points for the Hough Transform. The line shown in it is the final result of the Hough Transform; it has maximum number of pixels per unit length and indicates the direction of the skew.
The deskewed document is shown in the last figure.

Skew correction is required for the subsequent steps in the recognition of words. The in to the step of recognition is a segmented deskewed word. In the document shown in the beginning, local skew is present in the second line. In the process of word recognition, the document image is first segmented linewise. The first segmented line has no skew:



The second segmented line has skew has to be corrected:



Again the figure below shows the candidate points for the Hough Transform. The line shown in it is the final result of the Hough Transform; it has maximum number of pixels per unit length and indicates the direction of the skew.

The estimated skew is -2 degrees. The following is the deskewed line:

Similarly all the segmented lines are checked and deskewed. The deskewed segmented lines may or may not be clubbed together as per requirements.


Conclusion

It can be seen from the above discussion, that this proposed "Shirorekha" based skew estimation and correction is very robust. In fact presence of "Shirorekha" in Hindi words has proved to be very crucial in this work.

Script having overall skew due to improper scanning can be very easily corrected by this technique. While documents which in each line has its own skew (like the document shown in the beginning) due to inconsistent handwriting can be corrected by first segmenting the document linewise, deskewing each line and then clubbing the deskewed lines together to give a overall deskewed image.


References


mitul@robotics.stanford.edu

Home Page & Other Works