Around one year ago I created a blog post about the creation of a File Manager service. in this post we will use this service as our base and extend it with content moderation. We'll use GuardDuty and Rekognition to assist in this task. As usual everything will be serverless and event-driven.
Recap
To refresh everyone's memory let's start with a short recap.
The service will store files in S3 and keep a record of all the files in a DynamoDB table. The system overview looks like this, with and API exposing functionality to the user and then carry out the work in a serverless and event-driven way.
The upload flow is initiated by a client calling the API where a Lambda function will creat pre-signed S3 url that the client can use to upload the file. We don't upload file directly over the API, since Amazon API gateway has a max payload size of 10mb, and to support all kinds of files this will become a limitation.
When the client then uploads the file to S3 this will generate an event, the part below the dashed line in the image, this will invoke a StepFunction that will update the file inventory.
Extended architecture
In the extended architecture we add functionality to use GuardDuty S3 malware scanning and Rekognition for image moderation. GuardDuty will scan new files that arrive in the S3 bucket, that I call staging, a tag will will be added to the object and the scan result posted to the default event-bus. The scan result, if OK, will invoke a StepFunction next that utilize Rekognition for image moderation. I have implemented the same logic in this StepFunction and add a tag on the object and post an event onto a event-bus. Finally files are moved to either a quarantine ocr storage bucket.
Every part of the solution is decoupled and can run independently and a saga pattern is applied to move the logic to the next phase.
Now let's dig a bit deeper into each of the parts of this solution.
Malware scanning
The GuardDuty Malware scanning doesn't require much setup. This is a fully managed feature in GuardDuty and the only thing that is required is that a configuration of it. GuardDuty will then pick up new object automatically.
To achieve this flow the only thing we need to do is to create a S3MalwareProtectionPlan
and assign it appropriate permissions. One important thing to remember, if you encrypt your objects in S3 with a Customer Managed Key, don't forget to give GuardDuty permissions to decrypt using this key.
S3MalwareProtectionPlan:
Type: AWS::GuardDuty::MalwareProtectionPlan
Properties:
Actions:
Tagging:
Status: ENABLED
ProtectedResource:
S3Bucket:
BucketName:
Fn::ImportValue: !Sub ${CommonInfraStackName}:staging-bucket-name
Role: !GetAtt S3MalwareProtectionPlanRole.Arn
Image moderation
The moderation part with Rekognition involves a couple of more steps. A StepFunction is invoked by the result from the Malware scan, and call Rekognition to moderate the image. This StepFunction will then tag the object and post the scan result onto EventBridge custom service bus. One important thing to remember, if you encrypt your objects in S3 with a Customer Managed Key, don't forget to give permissions to decrypt using this key. Rekognition will give you a strange error Unsupported
and not a clear error to why it failed in this case.
To achieve this flow the only thing we need to do is to create the StateMachine
and setup the events it should be invoked on.
ModerateImageStateMachineStandard:
Type: AWS::Serverless::StateMachine
Properties:
DefinitionUri: StateMachine/moderation.asl.yaml
DefinitionSubstitutions:
EventBusName:
Fn::ImportValue: !Sub ${CommonInfraStackName}:event-bus-name
Tracing:
Enabled: true
Policies:
- Statement:
- Effect: Allow
Action:
- logs:*
Resource: "*"
- S3FullAccessPolicy:
BucketName:
Fn::ImportValue: !Sub ${CommonInfraStackName}:staging-bucket-name
- EventBridgePutEventsPolicy:
EventBusName:
Fn::ImportValue: !Sub ${CommonInfraStackName}:event-bus-name
- RekognitionDetectOnlyPolicy: {}
Events:
GuardDutyMalwareScanResult:
Type: EventBridgeRule
Properties:
InputPath: $.detail
Pattern:
source:
- aws.guardduty
detail-type:
- GuardDuty Malware Protection Object Scan Result
detail:
scanResultDetails:
scanResultStatus:
- NO_THREATS_FOUND
The StateMachine definition is rather large, and there is need for some magic. Since you can't append tags to an S3 object, we first need to fetch all existing tags, append our new tag and put the entire array of tags on the object. This would probably be easier to do in a Lambda function, but where is the fun in that. Intrinsic functions for the win....
Comment: Moderate images using Rekognition
StartAt: Debug
States:
Debug:
Type: Pass
Next: Get Object Metadata
Get Object Metadata:
Type: Task
Parameters:
Bucket.$: $.s3ObjectDetails.bucketName
Key.$: $.s3ObjectDetails.objectKey
Resource: arn:aws:states:::aws-sdk:s3:headObject
Next: Get Object Tags
ResultPath: $.S3MetaData
Get Object Tags:
Type: Task
Parameters:
Bucket.$: $.s3ObjectDetails.bucketName
Key.$: $.s3ObjectDetails.objectKey
Resource: arn:aws:states:::aws-sdk:s3:getObjectTagging
Next: Is File Supported?
ResultPath: $.s3Tags
Is File Supported?:
Type: Choice
Choices:
- Or:
- Variable: $.S3MetaData.ContentType
StringMatches: image/png
- Variable: $.S3MetaData.ContentType
StringMatches: image/jpeg
Next: Moderate Image
Default: File Not Supported
File Not Supported:
Type: Pass
Next: Add FILE_NOT_SUPPORTED to Object Tags Array
Parameters:
ContentTypes: []
ModerationLabels: []
ModerationModelVersion: "7.0"
ThreatsFound: "-1"
ResultPath: $.RekognitionModeration
Add FILE_NOT_SUPPORTED to Object Tags Array:
Type: Pass
ResultPath: $.scanResult
Parameters:
status: FILE_NOT_SUPPORTED
newTagSet.$: >-
States.StringToJson(States.Format('[{},{}]',
States.ArrayGetItem(States.StringSplit(States.JsonToString($.s3Tags.TagSet),
'[]'),0), '{"Key":"ImageModerationStatus","Value":"FILE_NOT_SUPPORTED"}'))
Next: Tag S3 Object
Moderate Image:
Type: Task
Parameters:
Image:
S3Object:
Bucket.$: $.s3ObjectDetails.bucketName
Name.$: $.s3ObjectDetails.objectKey
Resource: arn:aws:states:::aws-sdk:rekognition:detectModerationLabels
Next: File Supported
ResultPath: $.RekognitionModeration
File Supported:
Type: Pass
Parameters:
ThreatsFound.$: States.ArrayLength($.RekognitionModeration.ModerationLabels)
ContentTypes.$: $.RekognitionModeration.ContentTypes
ModerationLabels.$: $.RekognitionModeration.ModerationLabels
ModerationModelVersion.$: $.RekognitionModeration.ModerationModelVersion
ResultPath: $.RekognitionModeration
Next: Was Threats Found?
Was Threats Found?:
Type: Choice
Choices:
- Variable: $.RekognitionModeration.ThreatsFound
NumericGreaterThan: 0
Next: Add THREATS_DETECTED to Object Tags Array
Default: Add NO_THREATS to Object Tags Array
Add THREATS_DETECTED to Object Tags Array:
Type: Pass
Parameters:
status: THREATS_FOUND
newTagSet.$: >-
States.StringToJson(States.Format('[{},{}]',
States.ArrayGetItem(States.StringSplit(States.JsonToString($.s3Tags.TagSet),
'[]'),0), '{"Key":"ImageModerationStatus","Value":"THREATS_FOUND"}'))
ResultPath: $.scanResult
Next: Tag S3 Object
Add NO_THREATS to Object Tags Array:
Type: Pass
Next: Tag S3 Object
Parameters:
status: NO_THREATS_FOUND
newTagSet.$: >-
States.StringToJson(States.Format('[{},{}]',
States.ArrayGetItem(States.StringSplit(States.JsonToString($.s3Tags.TagSet),
'[]'),0), '{"Key":"ImageModerationStatus","Value":"NO_THREATS_FOUND"}'))
ResultPath: $.scanResult
Tag S3 Object:
Type: Task
Parameters:
Bucket.$: $.s3ObjectDetails.bucketName
Key.$: $.s3ObjectDetails.objectKey
Tagging:
TagSet.$: $.scanResult.newTagSet
Resource: arn:aws:states:::aws-sdk:s3:putObjectTagging
ResultPath: null
Next: Post Scan Result Event
Post Scan Result Event:
Type: Task
Resource: arn:aws:states:::events:putEvents
Parameters:
Entries:
- Detail:
metadata: {}
data:
id.$: $.s3ObjectDetails.objectKey
status.$: $.scanResult.status
scanData.$: $.RekognitionModeration
DetailType: Moderation Scan Completed
EventBusName: ${EventBusName}
Source: ImageModeration
End: true
ResultPath: null
Finalize the upload
The last part of this solution is to react to the moderation and place the content either in the quarantine bucket or the long term bucket. For this I use two different StepFunction with some difference in event that invokes them.
To achieve this flow a new StateMachine is created.
MoveFilesToPermanentStorageStateMachineStandard:
Type: AWS::Serverless::StateMachine
Properties:
DefinitionUri: StateMachine/move-to-permanent-storage.asl.yaml
DefinitionSubstitutions:
EventBusName:
Fn::ImportValue: !Sub ${CommonInfraStackName}:event-bus-name
StorageBucketName:
Fn::ImportValue: !Sub ${CommonInfraStackName}:storage-bucket-name
StagingBucketName:
Fn::ImportValue: !Sub ${CommonInfraStackName}:staging-bucket-name
Tracing:
Enabled: true
Policies:
- Statement:
- Effect: Allow
Action:
- logs:*
Resource: "*"
- S3FullAccessPolicy:
BucketName:
Fn::ImportValue: !Sub ${CommonInfraStackName}:staging-bucket-name
- S3FullAccessPolicy:
BucketName:
Fn::ImportValue: !Sub ${CommonInfraStackName}:storage-bucket-name
- EventBridgePutEventsPolicy:
EventBusName:
Fn::ImportValue: !Sub ${CommonInfraStackName}:event-bus-name
Events:
NoModerationThreatsFoundEvent:
Type: EventBridgeRule
Properties:
EventBusName:
Fn::ImportValue: !Sub ${CommonInfraStackName}:event-bus-name
InputPath: $.detail
Pattern:
source:
- ImageModeration
detail-type:
- Moderation Scan Completed
detail:
data:
status:
- NO_THREATS_FOUND
With a StateMachine definition that is a bit easier to follow then the image moderation part.
Comment: Handle result and copy files to storage bucket
StartAt: Debug
States:
Debug:
Type: Pass
Next: Get Object Metadata
Get Object Metadata:
Type: Task
Parameters:
Bucket: ${StagingBucketName}
Key.$: $.data.id
Resource: arn:aws:states:::aws-sdk:s3:headObject
Next: CopyObject
ResultPath: $.S3MetaData
CopyObject:
Type: Task
Parameters:
Bucket: ${StorageBucketName}
CopySource.$: >-
States.Format('${StagingBucketName}/{}',$.data.id)
Key.$: $.data.id
Resource: arn:aws:states:::aws-sdk:s3:copyObject
ResultPath: null
Next: DeleteObject
DeleteObject:
Type: Task
Parameters:
Bucket: ${StagingBucketName}
Key.$: $.data.id
Resource: arn:aws:states:::aws-sdk:s3:deleteObject
ResultPath: null
Next: Post Event File Moved
Post Event File Moved:
Type: Task
Resource: arn:aws:states:::events:putEvents
Parameters:
Entries:
- Detail:
metadata: {}
data:
id.$: $.data.id
status: STORED
contentType.$: $.S3MetaData.ContentType
fileSize.$: $.S3MetaData.ContentLength
DetailType: Completed
EventBusName: ${EventBusName}
Source: ImageModeration
End: true
ResultPath: null
Conclusion
This was a short post on how I extended my previous built file manager with malware scanning and image moderation. Using only managed services made this a fairly easy task.
To get the full source code and deploy it your self, visit Serverless-Handbook Image Moderation
Final Words
Don't forget to follow me on LinkedIn and X for more content, and read rest of my Blogs
As Werner says! Now Go Build!
Top comments (0)