Interactive demo | kapi/3-ocr

Do OCR (optical character recognition) on some image.

Example::



    o = "some_image.png" | toImg() | kapi.ocr() # loads image and do OCR on them

    o     # run this in a separate notebook cell for an overview of where the boxes are

    o.res # see raw results received from the OCR service



That returns something like this::



    [[[771, 5, 813, 17], 'round', 0.7996242908503107],

     [[58, 10, 100, 34], '150', 0.883547306060791],

     [[166, 8, 234, 34], '51,340', 0.9991665158446097],

     [[782, 14, 814, 38], '83', 0.9999995785315409],

     [[879, 13, 963, 33], 'UPGRADes', 0.7625563055298393],

     [[881, 53, 963, 69], 'Monkey Ace', 0.9171751588707685],

     [[933, 133, 971, 149], '5350', 0.9001984000205994],

     [[873, 203, 911, 219], '5325', 0.481669545173645],

     [[931, 203, 971, 219], '5500', 0.7656491994857788],

     [[869, 271, 913, 291], 'G800', 0.31933730840682983],

     [[925, 271, 977, 291], '64600', 0.14578145924474253],

     [[871, 341, 911, 361], '5750', 0.5966295003890991],

     [[929, 341, 971, 361], '5850', 0.9974847435951233]]



First column is the bounding box (x1, y1, x2, y2), second column is the text,

and third column is the confidence, from 0 to 1.



Internally, this uses EasyOCR for the recognition. However, from my experience,

this doesn't always get it right. It's particularly bad at symbols like dollar

signs (it thinks it's "S", or "5" instead), periods or commads. So, you can refine

each of the bounding boxes like this::



    ocr = someImg | kapi.ocr()

    ocr[4] | toImg() | kapi.tess() # returns string, uses tesseract OCR instead of EasyOCR for more accuracy for a less complex scene



See also: :class:`Ocr`



- VRAM: 1GB

- Throughput: depends heavily on image resolution, but for 1000x750 images, should be 3-4 images/s
Interactive demo | kapi/3-ocr

Intro

Parameters

Result

Sample api request

Source code