If you've ever created an experience where your users need to write some information in plain text, I can image how shocking some values were. This story shows how users can complicate our lives, no matter how simple the task.
The ask
We created an AI assistant that helped people with hypertension (high blood pressure) better manage their condition. It communicated with users through Telegram messaging app (it's really accessible, developer-friendly and stable, which is really important for a startup).
We needed to collect daily blood pressure & pulse values from our users. It would allow us to suggest the next best action (everything is great / values are high, try to relax / you should contact your doctor / you need to go to the emergency room). The only really accessible way for getting those values was by asking. I know, so primitive - but effective. The ask was simple: "Please send us our values (example: 120/60/60)"
The harsh reality
The users followed the guidelines for maybe one day. Maybe! In less than 24 hours my beautiful implementation [int(value) for value in text.split('/')]
fell apart.
Users started mistyping, and sending things like: "120.60/60" or "120 60 60". No problem, right? We can just find the 3 numbers in our text using regex and everything is ok. Something like this: r"\D*(\d{1,3})\D*(\d{1,3})\D*(\d{1,3})\D*"
. Great thing about this approach was that it also covered messages like: "systolic: 120, diastolic: 60, pulse: 60".
And then, I lost my mind. Some users started sending: "On date 29.10.21 the pulse is: 120/60/60". Well, now we have to guess that something is a date, and the other thing is a pulse. Because as cardiologists will tell you, values 29/10/21 are technically possible. Good thing is, they are very rare. So we would just guess - if values are withing date range, it's a date.
And from that moment, all was good. Still some edge-cases appeared, but it worked for like 99.5% of the cases.
Battles worth fighting
Months later we implemented a ML solution using BERT. More for fun than anything else, because the added benefit was negligible.
Being part of the startup taught me there are a lot of battles that can be solved, just by investing time and effort. But a startup has very limited resources, and choosing to fight the wrong battles will get it killed.
If you find my content interesting, follow me on twitter. We can share half-baked ideas and discuss engineering challenges.
Top comments (1)
Train BERT to turn the messages into JSON in a format you want. So any message a user types in BERT only grabs the data you need and transforms it into JSON. BERT is good at NLU so if fine tuned it should be able to perform that task. You even have data on the myraid ways hypertension messages can come in. p