Table of Contents
Testing strategies out there
When it comes to testing software in general, and testing AWS Serverless Microservices more specifically, there are all kind of ways, shapes and forms to implement testing strategies:
- pyramid of testing
- ice-cream cone strategy ( mmmm, yummy π)
- honeycomb strategy
black box testing strategy
white box testing
& so many more (regression, smoke, explorative, performance).
To the untrained eye, it might sound like throwing a party in your backyard, where a magician is gonna give a performance, with black & white boxes, smoke, honey and of-course the well-deserved ice-cream at the end.
Now, with the risk of being a party pooper, let's put magic aside and explore some practical testing strategies that might help you test your code better.
But first, let's understand the why.
Why should you write tests?
To answer that question more deeply, let me first tell you a quick story.
Story time
It's been close to to 2 years now since we started chopping a Java Monolith into multiple smaller microservices and microfrontends. Back then, we decided to go forward with AWS Serverless & React for our technology stack, thinking it will take no longer than 6 months to slay the beast. Oh boy, how we were wrong! But more to that, later.
When we were doing the first iterations on this big migration job, time and speed were of the essence. One of our teams took it personally and did a tremendous job by laying down a strong foundation for the other teams to join the migration soon. Time was flying by and more & more features were being migrated blazingly fast to what soon became our new platform, epilot 360, AKA "the new world" for us engineers.
Everybody was enthusiastic about it, champagne was popping up all over the place, our clients could not have been more happy with this MVP, where all their dreams and wishes come true in a matter of weeks, days and sometimes hours.
With time passing by, slowly but surely this entire approach of move fast, make clients happy, started to show its weaknesses. More and more problems appeared, both we and our clients started experiencing a feeling of shakiness and instability: "I hope it works when I demo this feature to the clients?", "I thought we fixed this problem last week. Why it happens again?", "Client Feedback: Platform is awesome, but it's not really stable now".
Slowly, reality crumbled in and we were coming to the understanding of "we need to treat quality more seriously". That's when everybody, starting with product team and ending with engineers, stared shifting their mindset more towards quality. Viljami, our Head of Engineering, introduced a new Engineering Principle: "Every week is quality week".
P.S.: More about the principles we follow at epilot here.
Reasons
A good visualisation of why should you write tests can be seen here:
We want to:
- test our features actually work -> make our users happy!
- be confident when releasing something, provide a good sense of stable software.
- apply changes to code without fear of regressions
How to write tests?
CI/CD
As I've said in the intro, there are many strategies for writing tests and how to integrate them. Before going through them more deeply, let's first have a look at how to integrate those tests in the CI pipelines.
At epilot, we have developed & integrated a 4 layers testing strategy directly into our CI/CD pipelines. It looks like this:
Any commit to main branch, will trigger the pipeline to run. First, the unit tests & integration tests will run in parallel. If they succeed, pipeline will deploy the code to dev and staging environments. This is where API Integrated tests run first, trying to spot any problems on APIs, eg: non-versioned breaking changes. If everything goes well, the last testing layer runs: E2E UI tests. If all tests passes, at this point code gets deployed to production and the pipeline ends with success.
p.s.: Please keep in mind this is not an exhaustive model of release, where other factors come into play: feature flags, A/B Testing & other post release testing, Canary Release, but it's good enough to make the case for testing.
Now, let's have a look at the testing strategies themselves.
Unit Tests
At the most basic level, you can write unit tests. There are 2 kinds of unit tests: solitary & sociable unit tests
Solitary unit tests -> test every module in complete isolation, by mocking all dependent modules (including your own written code).
Sociable unit tests -> test modules & module communication, by mocking only the external modules, but letting the test run through the internal code base.
Quick note: some people might prefer the terms unit & component testing here, but I think they still fall down under the category of units of work in the end. Solitary & sociable sounds more appealing to me in this case. Martin Fowler has a good article here about sociable & solitary unit tests.
Show me the code
Let's considered this AWS Lambda Handler, that processes an API Gateway request.
import { APIGatewayProxyEventV2 } from 'aws-lambda'
import { parseToken } from './utils'
export const handler = async (event: APIGatewayProxyEventV2) => {
const { tenantId } = parseToken(event.headers['Authorization'])
if(!tenantId){
return { statusCode: 400, body: JSON.stringify({ message: 'Authorization header is missing tenant id information'}) }
}
// do something with tenantId
const resp = await processRequest(tenantId, event)
return { statusCode: 200, body: JSON.stringify(resp) }
})
Solitary tests
By following solitary unit testing strategy, we would have to mock every external dependency, including our utility parseToken method.
import { parseToken } from './utils'
jest.mock('./utils')
const mockParseToken = jest.mocked(parseToken)
describe('Lambda handler', () => {
it('handles requests with correctly formatted authorization header', async () => {
// given
parseToken.mockReturnValueOnce({tenantId: 'tenant-1'})
// mock every other dependency...
// when
const response = await handler(request)
// then
expect(response.statusCode).toBe(200)
})
})
This is not only tedious work for us developers, but in reality it brings little to no value + it might not catch many bugs. A good exception here is when the function under testing is complex in itself.
More often than not, it's good to favour sociable tests, over solitary tests, since you're interested in the correctness of the microservice implementation, including the integration between internal modules. My functions might handle inputs correctly, but what's the point if they don't collaborate well together?
Sociable tests
Let's pick a more complex testing example, by using a micro-service I've been working on recently: sharing data. Without going into too many details, the microservice code structure looks like this:
There is some internal modules where the business logic is written: sharing-service, grants-service, but there are other Microservices being invoked as well: User, Entity & Permissions microservices. We will mock all the external APIs, including DynamoDB queries.
Pretty much, the code looks like this (excluding helper functions)
export const handler: DynamoDBStreamHandler = async (event, context, callback) => {
try {
return await _handleStreamRecord(event, context, callback);
} catch (err) {
Log.error('Failed to generate permission grants', err);
throw err; // re-throw error for Lambda to retry for up to 3 times
}
};
const _handleStreamRecord: DynamoDBStreamHandler = async (event) => {
const oldImage = unmarshall(event.Records[0].dynamodb.OldImage ?? {});
const newImage = unmarshall(event.Records[0].dynamodb.NewImage ?? {});
if (isEmpty(oldImage) && isEmpty(newImage)) {
return Log.warn('Processing stream data is invalid! No OLD & NEW items');
}
const oldConfig = !isEmpty(oldImage) ? fromDbItem(oldImage) : undefined;
const newConfig = !isEmpty(newImage) ? fromDbItem(newImage) : undefined;
const isPartnerConfigDeleted = event.Records[0].eventName === 'REMOVE' && oldConfig && !newConfig;
if (isPartnerConfigDeleted) {
return await deletePartnerRole(oldConfig);
}
const { regenerate, reason } = shouldRegenerateGrants(event.Records[0], oldConfig, newConfig);
if (!regenerate) {
return Log.info(`No need to regenerate grants: ${reason}`);
}
const newGrants = await buildSharingGrants({ sharingConfig: newConfig });
const { exists, partnerRole } = await checkPartnerRoleExists(newConfig);
if (!exists) {
const partnerRole = await createPartnerRole({ sharingConfig: newConfig, grants: newGrants });
const partnerUserIds = await getUserIds(newConfig.partner_org_id);
await assignRoleToUsers({ role: partnerRole, userIds: partnerUserIds });
newConfig.generated_role_id = partnerRole.id;
await updateGeneratedRoleId(newConfig);
} else {
await updatePartnerRole({
partnerRole,
newGrants,
});
}
};
By following the sociable tests strategy, we will mock only the most external boundaries (external calls to other APIs), but let the code run freely thourgh every internal module.
import * as DB from '@epilot/sharing-db';
import { assignRoleToUsers, deleteRole, loadRole, putRole } from './permissions/permissions-service';
import { getUserIds } from './user/user-service';
jest.mock('./user/user-service');
jest.mock('./permissions/permissions-service');
jest.mock('@epilot/sharing-db');
const mockedLoadRole = jest.mocked(loadRole);
const mockedPutRole = jest.mocked(putRole);
const mockedAssignRole = jest.mocked(assignRoleToUsers);
const mockedDeleteRole = jest.mocked(deleteRole);
const mockedGetUserIds = jest.mocked(getUserIds);
const mockedDb = DB as jest.Mocked<typeof DB>;
By having this kind of setup, you can test how your microservice behaves as a single unit, instead of testing smallers parts of the microservice, and then hoping the collaboration between them does not break.
it('deletes generated role when config is deleted', async () => {
// given
const event = toStreamEvent({
eventName: 'REMOVE',
dynamodb: {
OldImage: marshall({
PK: 'ORG#1',
SK: 'PARTNER#1',
template_role_id: 'role',
generated_role_id: 'generated',
}),
NewImage: undefined,
},
});
// when
await handler(event, defaultContext, defaultCallback);
// then
expect(mockedLoadRole).not.toHaveBeenCalled();
expect(mockedPutRole).not.toHaveBeenCalled();
expect(mockedAssignRole).not.toHaveBeenCalled();
expect(mockedGetUserIds).not.toHaveBeenCalled();
expect(mockedDeleteRole).toHaveBeenCalledWith(...);
});
it('shares data between tenants', async () => {
// given
const previous: DbItem = {
PK: 'ORG#100',
SK: 'PARTNER#200',
template_role_id: 'role',
};
// allow access to contract-1
const current: DbItem = {
...previous,
data: [{ schema: 'contract', entity_id: 'contract-1' }],
};
mockedGetUserIds.mockResolvedValueOnce(['77', '88']);
// when
await handler(
toStreamEvent({
eventName: 'MODIFY',
dynamodb: {
OldImage: marshall(previous),
NewImage: marshall(current),
},
}),
defaultContext,
defaultCallback,
);
// then
expect(mockedPutRole).toHaveBeenCalledWith({...});
expect(mockedAssignRole).toHaveBeenCalledWith({
role: expect.objectContaining({
type: 'partner_role',
organization_id: '100',
partner_org_id: '200',
}),
userIds: ['77', '88'],
});
expect(mockedDb.updatePartnerConfigAttr).toHaveBeenCalledWith(
{
sharing_org_id: '100',
partner_org_id: '200',
},
{ attr_name: 'generated_role_id', attr_value: '100:data_shared_with:200' },
);
});
If the lambda is really slim, with little to no data processing (think of a lambda that just stores data into Dynamodb), you might skip unit tests & jump directly to integration tests.
Integration tests
Next layer of testing is integration testing. This is a more high effort - high reward kind of strategy. It might take some effort to do the setup, but it pays off by catching a completely different set of bugs.
The main idea here for integration tests is to test the microservice code, main lambda handler & all the internal code, database queries, queues messages, published events, etc, in integration.
For instance, you might have a classical REST CRUD API which does some data validation and data processing, followed by persisting the data in DynamoDB database. Your unit tests might verify the processing behaviour is all good and polished, but when testing the API methods in integration, eg: Delete a resource & then verify it's deleted by calling GET, you realise the functionality is not really working: the resource is still persisted after calling delete.
Since this kind of testing requires some kind of local virtualised Cloud, you can leverage docker & localstack. This setup will run your microservice resources in a virtualised cloud, on both your machine and during pipeline execution.
Keep in mind, this style of testing does require mocking resources which are out of microservice's boundary: other APIs within the platform, third party APIs, other microservices Queues, EventBuses, etc.
API Tests
API Tests, or Integrated Tests, is the phase where we test the microservice integrated in the overall system, with all the other microservices.
This setup requires testing code against real AWS Cloud resources, thus we have to deploy the microservice to a test environment. Here at epilot, we are leveraging the staging environment, where we've setup a separate account within our platform, only for testing purposes. For external third party APIs, we are also using staging, or test, accounts on their side.
For writing API tests, there are multiple tools and strategies one can follow. Here at epilot, we are leveraging Datadog Syntethics, which allow for sending HTTPS request, opening WebSockets, gRpc calls. Eg. of a MultiStep test defined with Datadog:
For more advanced tests, eg. where we deal with step function executions, we are simply leveraging jest in combination with axios and AWS SDK JS. This setup allows for inspecting various AWS Resources, via AWS clients.
Eg.
import { waitFor } from 'poll-until-promise'
const LOAD_STATUS_TIMEOUT = 30 * 1000 // 30s. State machine executions are triggered asynchronously, so we should give enough time to wait for finishing the processing part
const RETRY_INTERVAL = 2 * 1000 // 2s. Retry every 2s for state machine status to see if it's finished
const stepFunc = new AWS.StepFunctions({
region: process.env.REGION
})
export const getStateMachineExecution = async (
executionId: string
): Promise<AWS.StepFunctions.DescribeExecutionOutput> => {
const execution: AWS.StepFunctions.DescribeExecutionOutput = await waitFor(
async () => {
const execution = await stepFunc
.describeExecution({ executionArn: executionId })
.promise()
if (execution.status === 'RUNNING') {
throw new Error('Try again')
}
return execution
},
{ interval: RETRY_INTERVAL, timeout: LOAD_STATUS_TIMEOUT }
)
return execution
}
describe('Resources are processed asynchronously', () => {
it('creates resource async and processes it correctly', async () => {
const payload = {...};
const resp = await axios.post(process.env.API_URL, payload);
expect(resp.status).toBe(202)
const execution_id = resp.data.execution_id
// verify state machine execution
const execution = await getStateMachineExecution(execution_id)
// verify state machine status
expect(execution.status).toBe('SUCCEEDED')
const executionOutput = JSON.parse(execution.output)
// verify state machine output
expect(executionOutput.processingData.customer.id).toEqual(
expect.any(String)
)
expect(executionOutput.processingData.opportunity.id).toEqual(
expect.any(String)
)
})
})
With this setup in place, you will also have to think about how to setup test data, and then clean it after tests run. Some of that might be achieved more easily via Delete API calls, but other times you have to leverage AWS Sdk clients. Furthermore, you will have to answer questions like: are my operations idempotent? Will it work if I call the Put/Delete endpoint multiple times / when there is no data to delete?.
E2E Browser Tests
Everybody has a certain opinion about E2E Browser Tests. Some people find them more of an anti-pattern due to the ice-cream testing strategy which often gets implemented, while others heavily rely on them.
I think both parties have their pros and cons, but I realised, in practice, there is a need for them too. The strategy I personally follow here, is to rely on them as Smoke Tests: verifying the basic features are still working. What's a basic feature? That's up for you and your team to decide.
From experience, I'm more inclined to agree that having lots of E2E tests will cause more pains in the future, than problems it solves.
Why?
There's various reasons:
these tests are brittle by nature: they cover a wide range of features by default, thus errors in any feature, UI or API in nature, propagates as test failures.
can become a blocker for release: some basic functionality is broken, thus it block me from releasing a non-related fix/feature.
Thus, be wary of how much you rely on them and how many you write.
Conclusions
Having a good understanding of why tests matter, not only in terms of quality of the system, but more in terms of overall success and customer satisfaction, it's an essential thing for one to grasp.
When it comes to strategies, there are various testing strategies one can employ. What's important here for us engineers is to grasp why we use a certain strategy: what benefits are we looking for, what challenges there are in maintaining such a strategy.
If it were to pick a shape winner, I'd go for a diamond shaped split in 5:
P.S.
If passion for software is what describes you, we are looking for you. Our promise to you.
Top comments (0)