Do OCR (optical character recognition) on some image.
Example::
o = "some_image.png" | toImg() | kapi.ocr() # loads image and do OCR on them
o # run this in a separate notebook cell for an overview of where the boxes are
o.res # see raw results received from the OCR service
That returns something like this::
[[[771, 5, 813, 17], 'round', 0.7996242908503107],
[[58, 10, 100, 34], '150', 0.883547306060791],
[[166, 8, 234, 34], '51,340', 0.9991665158446097],
[[782, 14, 814, 38], '83', 0.9999995785315409],
[[879, 13, 963, 33], 'UPGRADes', 0.7625563055298393],
[[881, 53, 963, 69], 'Monkey Ace', 0.9171751588707685],
[[933, 133, 971, 149], '5350', 0.9001984000205994],
[[873, 203, 911, 219], '5325', 0.481669545173645],
[[931, 203, 971, 219], '5500', 0.7656491994857788],
[[869, 271, 913, 291], 'G800', 0.31933730840682983],
[[925, 271, 977, 291], '64600', 0.14578145924474253],
[[871, 341, 911, 361], '5750', 0.5966295003890991],
[[929, 341, 971, 361], '5850', 0.9974847435951233]]
First column is the bounding box (x1, y1, x2, y2), second column is the text,
and third column is the confidence, from 0 to 1.
Internally, this uses EasyOCR for the recognition. However, from my experience,
this doesn't always get it right. It's particularly bad at symbols like dollar
signs (it thinks it's "S", or "5" instead), periods or commads. So, you can refine
each of the bounding boxes like this::
ocr = someImg | kapi.ocr()
ocr[4] | toImg() | kapi.tess() # returns string, uses tesseract OCR instead of EasyOCR for more accuracy for a less complex scene
See also: :class:`Ocr`
- VRAM: 1GB
- Throughput: depends heavily on image resolution, but for 1000x750 images, should be 3-4 images/s