Gemini is Google's latest AI model, which can be used for free with a limit of 60 queries per minute, and is capable of recognizing text from images. Generally, 1D barcodes are accompanied by human-readable text, which can be used to verify the accuracy of barcode recognition results. In this article, we will use the Flet Python API to build a desktop chat app integrated with both barcode and Gemini APIs. The app will read barcodes from images using Dynamsoft Barcode Reader and perform OCR on text within images using Gemini's text recognition capabilities.
Installation
pip install -U google-generativeai dbr flet
Prerequisites
Flet Python API for Desktop Applications
Flet empowers developers to create desktop applications using Python. It offers a crash course for constructing a real-time chat application, which serves as an excellent starting point.
Our application features a list view for displaying chat messages, a text input field, a button for uploading images, a button for sending messages, and a button to clear the chat history.
-
Chat messages:
chat = ft.ListView( expand=True, spacing=10, auto_scroll=True, )
-
Text input field:
new_message = ft.TextField( hint_text="Write a message...", autofocus=True, shift_enter=True, min_lines=1, max_lines=5, filled=True, expand=True, on_submit=send_message_click, )
-
Button to load an image:
def pick_files_result(e: ft.FilePickerResultEvent): global image_path image_path = None if e.files != None: image_path = e.files[0].path # TODO def pick_file(e): pick_files_dialog.pick_files() pick_files_dialog = ft.FilePicker(on_result=pick_files_result) page.overlay.append(pick_files_dialog) ft.IconButton( icon=ft.icons.UPLOAD_FILE, tooltip="Pick an image", on_click=pick_file, )
-
Button to send a message:
def on_message(message: Message): if message.message_type == "chat_message": m = ChatMessage(message) chat.controls.append(m) page.update() page.pubsub.subscribe(on_message) def send_message_click(e): global image_path if new_message.value != "": page.pubsub.send_all( Message("Me", new_message.value, message_type="chat_message")) question = new_message.value new_message.value = "" new_message.focus() page.update() page.pubsub.send_all( Message("Gemini", "Thinking...", message_type="chat_message")) # TODO ft.IconButton( icon=ft.icons.SEND_ROUNDED, tooltip="Send message", on_click=send_message_click, ),
PubSub
facilitates asynchronous communication across page sessions. Thesubscribe
method enables the receipt of broadcast messages from other sessions, while thesend_all
method allows for sending messages to all active sessions. Whenever a new message is received, the list view is automatically updated to display this new message. -
Button to clear the chat history:
def clear_message(e): global image_path image_path = None chat.controls.clear() page.update() ft.IconButton( icon=ft.icons.CLEAR_ALL, tooltip="Clear all messages", on_click=clear_message, )
Integrating the Dynamsoft Barcode Reader
The Dynamsoft Barcode Reader is an efficient library designed for barcode scanning. To enable barcode scanning in your app, you must integrate this library. Here's how you can do it:
-
Import the Dynamsoft Barcode Reader library and initialize a barcode reader instance using your license key.
from dbr import * license_key = "LICENSE-KEY" BarcodeReader.init_license(license_key) reader = BarcodeReader()
-
Decode the barcode from the uploaded image and send the result to the chat.
def pick_files_result(e: ft.FilePickerResultEvent): global image_path, barcode_text barcode_text = None image_path = None if e.files != None: image_path = e.files[0].path page.pubsub.send_all( Message("Me", image_path, message_type="chat_message", is_image=True)) text_results = None try: text_results = reader.decode_file(image_path) except BarcodeReaderError as bre: print(bre) if text_results != None: barcode_text = text_results[0].barcode_text page.pubsub.send_all( Message("DBR", barcode_text, message_type="chat_message"))
Utilizing Google's Gemini AI for Text Recognition
Gemini can extract text from images. Once you've decoded a barcode, you can employ Gemini to verify the accuracy of the text decoded from the barcode. Here are the steps to use Gemini:
-
Set up the API key for Gemini.
import google.generativeai as genai import google.ai.generativelanguage as glm genai.configure(api_key='API-KEY')
-
Initialize the text and vision models. The vision model takes both text and images as input.
model_text = genai.GenerativeModel('gemini-pro') chat_text = model_text.start_chat(history=[]) model_vision = genai.GenerativeModel('gemini-pro-vision') chat_vision = model_vision.start_chat(history=[])
-
Customize the command to effectively recognize text from the barcode image.
def send_message_click(e): global image_path if new_message.value != "": ... if question == ":verify": question = "recognize text around the barcode" response = model_vision.generate_content( glm.Content( parts=[ glm.Part( text=question), glm.Part( inline_data=glm.Blob( mime_type='image/jpeg', data=pathlib.Path( image_path).read_bytes() ) ), ], )) text = response.text page.pubsub.send_all( Message("Gemini", text, message_type="chat_message"))
Verifying the Barcode Decoding Results with the Accompanying Text
Now, we can check whether the text read from the barcode exists in the text recognized from the image. Since the text extracted by Gemini might include spaces, it's essential to eliminate these spaces prior to comparison.
if barcode_text == None:
return
text = text.replace(" ", "")
if text.find(barcode_text) != -1:
page.pubsub.send_all(
Message("Gemini", barcode_text + " is correct ✓", message_type="chat_message"))
else:
page.pubsub.send_all(
Message("Gemini", barcode_text + " may not be correct", message_type="chat_message"))
Launch the desktop application and test it with some images that contain 1D barcodes:
flet run chatbot.py
Top comments (0)