Introduction
Every industry has made a significant transformation in the recent decade. And itβs looking for even more efficient and optimized work. We live in a generation where data plays a major role in various domains, and everyone is aware of the importance of data.
Organizations will receive or collect data in a various way. But they need to keep all the information in one place to access it efficiently and swiftly. Most sectors need to extract the data from the source they receive. And they know it would be a repetitive task and require more effort every time.
Therefore, organizations need to find a solution that reduces human error in extractions and increases efficiency in extracting the data from the input sources.
Quick Fix:
To solve the above problems in data extraction from structured and semi-structured documents automatically, we can use Azure Form Recognizer.
Form Recognizer:
Form recognizer is a cognitive service that uses Machine Learning technology to identify and extract the required data.
Form Recognizer uses deep learning models and enables us to train with custom sample models to fetch the details we required.
Get Started:
Prerequisites:
- To get started with the Form Recognizers, we will need the below.
- Python 3.7 or later is required to use this package.
- You must have an Azure subscription and a Cognitive Services or Form Recognizer resource to use this package.
- Azure Form Recognizer client library for Python.
Create Form Recognizers:
- Login into the Azure portal and search for Form Recognizers and create one.
- Select the Subscriptions and Resources Group (create one)
- Select region as the closet region.
- Provide a name for the Form Recognizer.
- Select pricing tier.
- Review and create it.
Get Keys and Endpoints:
- Open the created form recognizers.
- Check for keys and endpoints on the right side.
Get ready with the input files:
- Sample documents which we used for this demo
Python SDK:
- Create a virtual environment and install the Azure module.
pip install azure-ai-formrecognizer
- Save the keys and endpoint in a config file to call the API from Python.
- Install required modules and libraries and import.
from azure.ai.formrecognizer import DocumentAnalysisClient
from azure.core.credentials import AzureKeyCredential
config = json.load(open('config.json'))
api_key = config['APIKEY']
endpoint = config['APIENDPOINT']
credential = AzureKeyCredential(api_key)
- Once we get the input file, we have to parse the input file and pass it to the API along with the credentials.
with open("input_files/test_file.pdf", "rb") as fd:
document = fd.read()
poller = document_analysis_client.begin_analyze_document("prebuilt-layout", document)
result = poller.result()
- We will be receiving the result in azure.ai.formrecognizer type.
- We can parse over the result and get the data we need ```python
for table_idx, table in enumerate(result.tables):
print(
"Table # {} has {} rows and {} columns".format(
table_idx, table.row_count, table.column_count
)
)
for region in table.bounding_regions:
print(
"Table # {} location on page: {} is {}".format(
table_idx,
region.page_number,
region.polygon
)
)
for cell in table.cells:
print(
"...Cell[{}][{}] has content '{}'".format(
cell.row_index,
cell.column_index,
cell.content,
)
)
- This will give the data of each and every cell in the table
Other Functions:
for page in result.pages:
# prints data in each and every pages in the file
# prints page height, width
# syntax - page.width, page.height, page.unit
pass
for line in page.line:
# prints each and every lines in a page
# syntax - line.context
pass
for word in page.word:
# print all words along with its positions and confidence score
# syntax - word.context, word.confidence, word.position
pass
</code></pre></div><h2>
<a name="key-value-pair-extraction" href="#key-value-pair-extraction">
</a>
Key Value Pair Extraction:
</h2>
<ul>
<li> Similar to table extraction Form Recognizer will help in extracting the key value pair data from document as shown below</li>
</ul>
<p><img src="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/n8fsqnx88qv72rw1zxuy.png" alt="Image description"></p>
<p>Image from <a href="https://azure.microsoft.com/en-us/products/cognitive-services/#features">Microsoft Azure Cognitive Services Demos</a></p>
<h2>
<a name="different-models-available" href="#different-models-available">
</a>
Different Models available:
</h2>
<table>
<thead>
<tr>
<th>Model Type</th>
<th>Model Name</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="3">Document analysis model</td>
<td><a href="https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-read?view=form-recog-3.0.0">Read OCR model</a></td>
</tr>
<tr>
<td><a href="https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-general-document?view=form-recog-3.0.0">General document model</a></td>
</tr>
<tr>
<td><a href="https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-layout?view=form-recog-3.0.0">Layout analysis model</a></td>
</tr>
</tbody>
<tbody>
<tr>
<td rowspan="5">Prebuild Models</td>
<td><a href="https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-w2?view=form-recog-3.0.0">W-2 form model</a></td>
</tr>
<tr>
<td><a href="https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-invoice?view=form-recog-3.0.0">Invoice model</a></td>
</tr>
<tr>
<td><a href="https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-receipt?view=form-recog-3.0.0">Receipt model</a></td>
</tr>
<tr>
<td><a href="https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-id-document?view=form-recog-3.0.0">Identity document model</a></td>
</tr>
<tr>
<td><a href="https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-business-card?view=form-recog-3.0.0">Business card model</a></td>
</tr>
</tbody>
<tbody>
<tr>
<td rowspan="2">Custom Models</td>
<td><a href="https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-custom?view=form-recog-3.0.0">Custom Model</a></td>
</tr>
<tr>
<td><a href="https://learn.microsoft.com/en-us/azure/applied-ai-services/form-recognizer/concept-model-overview?view=form-recog-3.0.0">Composed Model</a></td>
</tr>
</tbody>
</table>
<h2>
<a name="how-to-select-a-model" href="#how-to-select-a-model">
</a>
How to select a model:
</h2>
<table>
<thead>
<tr>
<th>Document type</th>
<th>Data to Extract</th>
<th>Best Model</th>
</tr>
</thead>
<thead>
<tr>
<td>A generic document like a contract or letter.</td>
<td>You want to extract primarily text lines, words, locations, and detected languages.</td>
<td>Read OCR model</td>
</tr>
</thead>
<thead>
<tr>
<td>A document that includes structural information like a report or study.</td>
<td>In addition to text, you need to extract structural information like tables, selection marks, paragraphs, titles, headings, and subheadings.</td>
<td>Layout analysis model</td>
</tr>
</thead>
<thead>
<tr>
<td>A structured or semi-structured document that includes content formatted as fields and values, like a credit application or survey form.</td>
<td>You want to extract fields and values including ones not covered by the scenario-specific prebuilt models without having to train a custom model.</td>
<td>General document model</td>
</tr>
</thead>
<thead>
<tr>
<td>U.S. W-2 form</td>
<td>You want to extract key information such as salary, wages, and taxes withheld from US W2 tax forms.</td>
<td>W-2 model</td>
</tr>
</thead>
<thead>
<tr>
<td>Invoice</td>
<td>You want to extract key information such as customer name, billing address, and amount due from invoices.</td>
<td>Invoice model</td>
</tr>
</thead>
<thead>
<tr>
<td>Receipt</td>
<td>You want to extract key information such as merchant name, transaction date, and transaction total from a sales or single-page hotel receipt.</td>
<td>Receipt model</td>
</tr>
</thead>
<thead>
<tr>
<td>Identity document (ID) like a passport or driver's license.</td>
<td>You want to extract key information such as first name, last name, and date of birth from US drivers' licenses or international passports.</td>
<td>Identity document (ID) model</td>
</tr>
</thead>
<thead>
<tr>
<td>Business card</td>
<td>You want to extract key information such as first name, last name, company name, email address, and phone number from business cards.</td>
<td>Business card model</td>
</tr>
</thead>
<thead>
<tr>
<td>Mixed-type document(s)</td>
<td>You want to extract key-value pairs, selection marks, tables, signature fields, and selected regions not extracted by prebuilt or general document models.</td>
<td>Custom model</td>
</tr>
</thead>
</table>
<h2>
<a name="filedocument-formats" href="#filedocument-formats">
</a>
File/Document formats:
</h2>
<p>PDF, Images, TIFF files can be used in the Form Recognizer.</p>
<h2>
<a name="limitations" href="#limitations">
</a>
Limitations:
</h2>
<ul>
<li> Form Recognizer doesnβt have a pre-build model for generic form extraction, if we need to get the form data from a document which is not in English we need to train the model</li>
<li> Total size of training data set must be less than 500 pages</li>
<li> We can pass a file(PDF, TIFF) of size upto 500MB and 2000 pages.</li>
<li> Page dimensions can be upto 10k x 10k pixles for Images and 17 x 17 for PDFs</li>
<li> Extraction may fail if the table contains only one column.</li>
</ul>
<h2>
<a name="programming-languages" href="#programming-languages">
</a>
Programming Languages:
</h2>
<p>Form recognizers supports below programming languages with the SDK and libraries</p>
<ul>
<li> Python </li>
<li> Java</li>
<li> C# </li>
<li> Javascript</li>
</ul>
<h2>
<a name="conclusion" href="#conclusion">
</a>
Conclusion:
</h2>
<p>This data extractions process can be streamlined with the help of the inclusion of AI/ML in organizations will help in reduce error and increase the efficiency of work.<br>
For a larger organization, it could be a quick process with a more accurate level of extraction with high efficiency helps to focus on the next level in pipelines.<br>
Along with Azure Form Recognizers, we have other services/tools like Instabase, AWS Textract are also highly effective tools which is available in the market.</p>
<h2>
<a name="references" href="#references">
</a>
References:
</h2>
<ul>
<li><a href="https://learn.microsoft.com/en-us/azure/?view=form-recog-3.0.0&product=popular">https://learn.microsoft.com/en-us/azure/?view=form-recog-3.0.0&product=popular</a></li>
<li><a href="https://www.maxinai.com/blog/2021/03/12/ai-document-data-extraction-financial-institutions/">https://www.maxinai.com/blog/2021/03/12/ai-document-data-extraction-financial-institutions/</a></li>
<li><a href="https://azure.microsoft.com/en-us/products/cognitive-services/#features">https://azure.microsoft.com/en-us/products/cognitive-services/#features</a></li>
</ul>
<h2>
<a name="disclaimer" href="#disclaimer">
</a>
Disclaimer:
</h2>
<p>This is a personal blog. The views and opinions expressed here are only those of the author and do not represent those of any organization or any individual with whom the author may be associated, professionally or personally.</p>
Top comments (0)