DEV Community

Marco Gonzalez
Marco Gonzalez

Posted on • Edited on

AWS Serverless Services integration with ChatGPT for image-to-ChatGPTquestion solution

ChatGPT is based on the GPT (Generative Pre-trained Transformer) series of language models developed by OpenAI.

In this demo, I leverage the power of AWS S3, AWS Lambda, Amazon Textract, and OpenAI’s ChatGPT to seamlessly process images containing text and generate intelligent, context-aware responses. By combining cutting-edge OCR (Optical Character Recognition) technology with advanced language understanding, this solution offers a unique way to interact with visual data.

Note: This demo is assuming basic knowledge on Core AWS services, so details for S3 bucket creation, AWS Lambda creation, AWS VPC creation and AWS IAM role creation are omitted.

Use cases:

  • Image to text processing that requires integration with ChatGPT
  • Image to text processing integration with Chatbot solution

Chapters:

  1. General Topology
  2. Image to text convertion
  3. Text to ChatGPT integration
  4. Conclusion
  5. Next steps ##### Services involved in this demo
  • AWS S3
  • AWS Lambda
  • AWS Textract
  • AWS Cloudwatch
  • AWS VPC
  • AWS NAT Gateway
  • AWS Internet Gateway
  • OpenAI ChatGPT API

1. General Topology

Topology

2. Image to text convertion

Steps:

  1. Create 2 S3 buckets in the same region AWS Lambda will be created.

S3

chatgpt-demo-1 : S3 bucket to store images containing text
chatgpt-demo-1-text : S3 bucket to store extracted text

  1. Create an AWS Lambda function (runtime Python 3.9) with the following topology

  • Trigger: S3 trigger will notify AWS Lambda to start function execution when the PUT event type is executed for any file with the "raw_data" prefix. Alternative, you can select a suffix such us .jpg, .png, .pdf if you want to restrict the source files per file type.

Lambda

  • Layer: ZIP archive that contains libraries, a custom runtime, or other dependencies. For this demo, I added a AWS SDK for Python (Boto3) as a zip file for all functions in this demo. You can refer to this link for more details on Layer benefits for AWS Lambda functions: https://towardsdatascience.com/introduction-to-amazon-lambda-layers-and-boto3-using-python3-39bd390add17

  • Permissions: AWS Lambda function should have permission to the following services: AWS S3, AmazonTextract and AWS CloudWatch. Following image is an example of this setup

Lambda

  • Environment Variable: This is an optional setup but truly useful in case you don't want to depend on fixed values, but have the freedom to quickly update AWS S3/API-gateway/etc information within your code.

Environment Variable for destination S3 bucket
Lambda

  • Lambda Code: The following code objective is to collect the image from source S3 bucket, call AmazonTextract API service to extract the text within the image and store the result in a .txt file with "_processed_data.txt" suffix and store in a target S3 bucket.
import json
import boto3
import os
import urllib.parse

print('Loading function')

s3 = boto3.client('s3')

# Amazon Textract client
textract = boto3.client('textract')

def getTextractData(bucketName, documentKey):
    print('Loading getTextractData')
    # Call Amazon Textract
    response = textract.detect_document_text(
        Document={
            'S3Object': {
                'Bucket': bucketName,
                'Name': documentKey
            }
        })

    detectedText = ''

    # Print detected text
    for item in response['Blocks']:
        if item['BlockType'] == 'LINE':
            detectedText += item['Text'] + '\n'

    return detectedText

def writeTextractToS3File(textractData, bucketName, createdS3Document):
    print('Loading writeTextractToS3File')
    generateFilePath = os.path.splitext(createdS3Document)[0] + '_processed_data.txt'
    s3.put_object(Body=textractData, Bucket=bucketName, Key=generateFilePath)
    print('Generated ' + generateFilePath)


def lambda_handler(event, context):
    # Get the object from the event and show its content type
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')
    try:
        detectedText = getTextractData(bucket, key)
        writeTextractToS3File(detectedText, os.environ['processed_data_bucket_name'], key)

        return 'Processing Done!'

    except Exception as e:
        print(e)
        print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket))
        raise e

Enter fullscreen mode Exit fullscreen mode

Text to ChatGPT integration

  1. Create 1 additional S3 bucket in the same region AWS Lambda will be created.

S3 2

chatgpt-4073-output : S3 bucket to store ChatGPT answer in .txt format

2.Create a VPC with 2 subnets (1 private subnet, 1 public subnet), an Internet Gateway and a NAT Gateway AWS will be deployed.

VPC

3.Create an AWS Lambda function (runtime Python 3.9) with the following topology

Lambda

  • Trigger: S3 trigger will notify AWS Lambda to start function execution when the PUT event type is executed for any file with the "_processed_data.txt" suffix.

S3_2

  • Permissions: AWS Lambda function should have permission to the following services: AWS S3 and AWS CloudWatch.

  • VPC: For this AWS Lambda function, we will create within the new VPC, in the private subnet. Access to Public GhatGPT API will be done through NAT-Gateway.

VPC

Environment variables: In order to make an API call to Public API webservice, the following variables were define:

  • model_chatgpt : Selected ChatGPT model for text processing
  • openai_secret_key_env: Security Key to authenticate API call user
  • output_bucket_name: AWS S3 bucket to store results in .txt file.

4.A big kudos to my friend Prakash Rao https://www.linkedin.com/in/prakashrao40/), who is the creator for the following Python script

import os
import json
import boto3
import http.client
import urllib.parse

s3 = boto3.client('s3')

def lambda_handler(event, context):
    # Get the uploaded file's bucket and key
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']

    # Read the text file from S3
    file_content = s3.get_object(Bucket=bucket, Key=key)['Body'].read().decode('utf-8')

    # Store the file content in a Python JSON
    file_json = {'text': file_content}

    # Read OpenAI API credentials from environment variables
    openai_secret_key = os.environ['openai_secret_key_env']

    # Set up HTTP connection to OpenAI API endpoint
    connection = http.client.HTTPSConnection('api.openai.com')

    # Define request parameters
    prompt = file_json['text']
    model = os.environ['model_chatgpt']
    data = {
        'prompt': prompt,
        'model': model,
        'max_tokens': 50
    }
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {openai_secret_key}'
    }

    # Send API request and parse response
    connection.request('POST', '/v1/completions', json.dumps(data), headers)
    response = connection.getresponse()
    response_data = json.loads(response.read().decode())
    completion_text = response_data['choices'][0]['text']

    # Print generated text
    #print(completion_text)

    # Define the output bucket and output key (file name)
    output_bucket = os.environ['output_bucket_name']
    output_key = f"{os.path.splitext(os.path.basename(key))[0]}_chatgpt_result.txt"

    # Upload the generated text to the output S3 bucket
    s3.put_object(Bucket=output_bucket, Key=output_key, Body=completion_text)

    # Return response to API Gateway
    return {
        'statusCode': 200,
        'body': json.dumps({'text': completion_text})
    }
Enter fullscreen mode Exit fullscreen mode

5.Once integration is completed, the S3 bucket will store the results. In addition, you can refer to AWS Cloudwatch Logs > Log groups to check the details of API calls and results.

Conclusion:

Existing AWS serverless services such as Amazon Textract, Amazon Comprehend, Amazon Lex and others are great candidates to integrate with OpenAI ChatGPT. In this demo, I wanted to show how easy this integration can be achieved without exceeding our budget.

Nex steps:

  1. Integrate AWS Lambda to AWS Secret Manager for credentials
  2. Add AWS SNS Service for multiple uploads
  3. Improve existing ChatGPT model

Happy Learning!

Top comments (0)