Extract Text From Color Images With Ground Truth Text
At one time or the other, we have all been in a situation where we feel the need to extract text from an image. Extracting text from an image is usually done at a time when you want the text quickly. Normally, enter the words manually by reading them from the source and typing them on the keyboard, but that can be a quite tedious and time taking activity when there are large paragraphs to be copied from an image. A more efficient way to get text from an image is to use an OCR software, instead of just typing everything by hand. Ground Truth Text (gttext) is an open source application which can extract text from almost any image. The program gives you the freedom to select the whole image or just a part of it, depending on your requirement. You can also zoom in and zoom out images if the image is too small or too big.
To start, select the image from which you want to extract text. During the selection of a new file, the program will give you a list of extension filters. Choose the extension of your required image and select the file. If you want to extract text from the whole image, go to Tools –> Copy Text From and select Full Image or just use the Ctrl + F hotkey. If you want to select only a part of the image, go to Tools –> Area Text OCR.
Once done, select your desired text by drawing a rectangle around it and a dialogue box will pop-up showing the copied text. You will have the option to either Cancel, Continue or Try Again. Selecting the Try Again option will run the text recognition again to change any errors that might have occurred in the first try. Select Continue to copy the text to clipboard. Now open any text editor e.g Notepad and past the text there.
Ground Truth Text supports BMP, JPEG, GIF, TIFF and PNG image formats. During testing we encountered some problems regarding font recognition. The issue is that text recognition can be a bit messed up for stylized fonts.The program works fine as long as the text on the image is written in a simple font, without any added design. However, it is unable to recognize the text accurately from abstract font designs.
The program runs on Windows XP, Windows Vista and Windows 7.
Program crash on Windows XP, can’t do OCR of BMP image.