When data is ingested into Health Place, it goes through a series of computation and transformation steps, forming an "ingestion pipeline".
One of the essential steps in the pipeline is tagging the listing with concepts from our purpose-built knowledge graph.
You can think of concepts as taxonomies, a vocabulary to give semantic meaning to your data.
For quite a while now, we've already implemented AI, specifically Natural Language Processing (NLP), into our pipeline by using a Bag-of-Words model (BoW) to extract concepts from the text provided. And for the most part, this has worked exceptionally well!
It falls short, though.
When the context surrounding the extracted concept gives it a different meaning, the BoW model doesn't care. Only if the concept is present in the text – in some cases, this can become an issue.
Take the following example:
Supporting You is a seven-week programme which equips young people with the tools to help themselves to strengthen their resilience and emotional well-being.
The programme is suitable for young people who meet the following criteria:
- Aged 11-17
- Beginning to exhibit behaviours, or suggest themselves that they are starting to be affected by low mood, stress or anxiety such that it is beginning to interfere with the enjoyment of life and normal activities
- Have no current intervention or support in place from any other agency for emotional wellbeing or mental health issues
- They do not meet the criteria for a CAMHS referral
- Are able to commit to attending a 7 week Supporting You programme in their local area
Let's focus on the boldened piece of text:
They do not meet the criteria for a CAMHS referral
CAHMS is an acronym for Child and Adolescent Mental Health Services.
In this example, you can see that CAHMS is mentioned in an exclusionary context. This text mentions CAHMS but isn't relevant to CAHMS.
Our previous process couldn't consider this context and would therefore tag this listing with "CAHMS", resulting in a precision problem.
Precision and recall are like buckets in a well; by raising one, you lower the other.
Precision measures what % of results are accurate, whereas recall measures what % of relevant results were actually returned.
So we could tweak our NLP step by improving it's precision, but at the expense of lowering its recall. But only if there was another way.
Enter ChatGPT...
ChatGPT is perfect for understanding context and the use case was immediately obvious to me.
The technique I had in mind was to pass GPT the source text along with the concepts from our knowledge graph that we've extracted. We can then ask GPT to tell us which extracted taxonomies are relevant to the text and which aren't.
This allows us to engineer our previous NLP layer towards higher recall to extract as many relevant concepts as possible and, in the next layer, have GPT filter out the concepts that aren't relevant, providing us with the precision needed.
This process seemingly allows us to have our cake and eat it! Something seldom one gets to do.
Get in touch
If you have any questions, comment below, or find me on LinkedIn: linkedin.com/in/matthew-inamdar
And if you'd like to work for a tech-for-good company like Health Place, then message me for an informal chat at matt@healthplace.io – we're currently looking for a Frontend Engineer!
Image sources:
Top comments (0)