Where parallels cross

Interesting bits of life

Moldable Emacs: capturing text from open images with an OCR mold

Too long; didn't read

Extract text from images with a mold! A mold is available to use imgclip to transform an image in the text it contains.

The problem

I have been reading The Humane Interface by Jef Raskin. Tudor Girba mentioned this book in one of his talks about GT. It is an inspiring book, very recommended! In explaining the ideal user interface, the author takes as an example the extraction of text from images. He basically says that a good interface would be contextual: it would act according to the object at hand. For example, if you have an image with text at your pointer, it should be easy for you to get its text.

Now, how difficult would be to make a mold for this?

And there is a solution

Not difficult at all! First I looked for some OCR library that can extract text reliably. A result that seemed good is imgclip, which you can install with an npm install -g imgclip.

Now say that I take a picture of this text while I am writing it and I open it in Emacs. When I call me/mold on the buffer with the image, this is the new mold I find!

/assets/blog/2021/07/16/moldable-emacs-capturing-text-from-open-images-with-an-ocr-mold/screen-2021-06-23-19-29-07.jpg

And this shows the result I get from extracting the text.

/assets/blog/2021/07/16/moldable-emacs-capturing-text-from-open-images-with-an-ocr-mold/screen-2021-06-23-19-17-48.jpg

The text is imperfect! There are a lot of wrong words: for example "I" gets translated as "1" (the number). Still, it is cool to get text out of an (unsearchable) image!

Also it was simple to implement the mold.

(me/register-mold
 :key "Image To Text"
 :docs "Extracts text from the image using `imageclip'."
 :given (lambda () (and
                    (eq major-mode 'image-mode)
                    (executable-find "imgclip")))
 :then (lambda ()
         (let* ((buffername (buffer-name))
                (self nil) ;; TODO what here?
                (buffer (get-buffer-create (format "Text from %s" buffername)))
                (_ (async-map
                    `(lambda (s)
                       (shell-command-to-string
                        (format "imgclip -p '%s' --lang eng" s)))
                    (list (or (buffer-file-name)
                              (let ((path (concat "/tmp/" buffername)))
                                (write-region (point-min) (point-max) path)
                                path)))
                    `(lambda ()
                       (with-current-buffer ,buffer
                         (erase-buffer)
                         (clipboard-yank))))))
           (with-current-buffer buffer
             (erase-buffer)
             (insert "Loading text from image..."))
           buffer)))

This mold works only if the buffer is an image, and imgclip is available. When you run it, it translates the result via imgclip and displays a "Loading text from image..." while it is busy extracting text. Notice that I instructed imgclip to recognize English (--lang eng): you can redefine the mold for the language you need.

And now that I think about it, a natural extension of this mold is when you open a PDF with text you cannot select. It should be easy to extract an image of the page you are viewing and compose this mold on that. The UNIX saying is just true: worse is better!

Edit: I actually tried that! If you use pdf-tools this is just too easy. You want to call the interactive function pdf-view-extract-region-image. This generates a png view of the current PDF page. Then you can call our OCR mold!

Conclusion

The mold is in my package moldable-emacs: grab it and try it out (after installing imgclip)! The installation is already a bit easier because a nice user ("Tekakutli") started trying it out, which inspired me to put at least some effort in making my extension accessible to others.

So extract (imperfect) text from images with a mold if you wish!

Happy texting!

Comments