I've been waiting to find the time to write on the shared VPC model that AWS offers as a possibility. It's now been close to 3 years since we built the setup for a client, and been running it ever since.
Key challenges that we wanted to tackle with the solution:
- Low cost network setup
- Simple
- Secure
After a few years, I can say the low cost aspect is there, but simple to maintain - far from it. And from security aspect, limiting traffic is a lot harder than one would expect, as by default AWS does allow and route traffic within a VPC.
The more traditional way
We're all familiar with using a transit vpc or a transit gateway solution to connect vpcs from accounts in the organisation together. I won't dive into this, if you we're looking for above solution - do another google search :)
This picture below just to remind you how best practices would advice you to build connectivity.
What if we just use one VPC, shared to multiple accounts?
We build and organisation with several accounts, for network purposes we have account A for networking, and for simplicity lets assume we have accounts B, C and D for various environments.
Borrowed a picture to illustrate this a bit.
Sharing a subnet within the organization
Sharing a subnet is actually really simple, the only requisite is that resource sharing within the organization must be enabled in Organizations - after its enabled you just specify the targets where to share and which subnets.
Here's an example in cloudformation.
ResourceShareSubnets:
Type: 'AWS::RAM::ResourceShare'
Properties:
AllowExternalPrincipals: false
Name: subnet-share
Principals:
- "<target account id>"
ResourceArns:
- !Sub 'arn:aws:ec2:${AWS::Region}:${AWS::AccountId}:subnet/${Subnet1}'
- !Sub 'arn:aws:ec2:${AWS::Region}:${AWS::AccountId}:subnet/${Subnet2}'
- !Sub 'arn:aws:ec2:${AWS::Region}:${AWS::AccountId}:subnet/${Subnet3}'
After you share the subnet, the subnet will be visible in the Principal targeted, and resources can be created into those subnets.
The Good: Cost aspects and latency
Instead of paying the current list price for Transit Gateway attachements (0,05$/hour/attachement) and 0,02$ per GB of data processed - the network traffic between accounts cost only the 0,01$ per GB that traffic costs within the same region.
So for high traffic volumes between environments, having them run in the same VPC, but their own accounts, saves in traffic costs.
Latency is also a bit smaller
rtt min/avg/max/mdev = 0.572/0.614/0.675/0.031 ms
compared to through Transit Gateway in same AZ
rtt min/avg/max/mdev = 0.877/0.994/1.400/0.150 ms
If you're trying to build the least latency - take a look at sharing the vpc.
The Bad: Operating and managing the network
Segmentation and limiting traffic is a bit painful to put it mildly.
You can utilise NACLs around the subnets to limit traffic, but that's about it. From the network engineers perspective that may be enough in some cases, but in some cases it's not. Depends who you talk to, and what kind of architecture and security they require.
Roughly put - we blocked traffic between environments B, C and D, allowed them just to talk to A. Traffic goes in and out to internet from A through the appliances running there.
The appliances also route traffic between the subnets, as the subnets themselves can't talk to each other directly.
The Ugly: Tags are not copied over
1) The most annoying feature of the shared VPC model, is that RAM only shares the subnet resources to the target account. But it doesn't share resource tags, which means you see the resource ids in the target account but nothing else.
Let's say you have an environment that has multiple subnets shared to an account, for different purposes. Identifying those subnets becomes a pain. You might be running a Kubernetes cluster, that wants networks to be tagged in certain ways for it to know where to place resources.
Solved this by deploying a role across the whole organisation for copying tags, triggering a lambda from EventBridge events associated with network tags.
Type: AWS::Events::Rule
Properties:
Description: "Rule for matching Network TAG changes"
EventPattern:
source:
- "aws.tag"
detail-type:
- "Tag Change on Shared Network Resource"
detail:
service:
- "ec2"
resource-type:
- "vpc"
- "subnet"
- "route-table"
- "network-acl"
The rights you'll need are below, for a solution that checks first which resources have been shared, and how are they tagged. And then only creates/deletes those tags that need to be changed.
In the example below, note its not least privilege. This would allow tagging of all ec2 resources - update yours where needed to allow only required resources.
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: "Allow"
Action:
- ec2:CreateTags
- ec2:DeleteTags
- ec2:DescribeTags
- ec2:DescribeSubnets
- ec2:DescribeVpcs
- ec2:DescribeRouteTables
- ec2:DescribeNetworkAcls
Resource: "*"
To limit the amount of requests on update, our flow is to check which tags are already copied, which have changed, which even exist in the target account.
Wish I could share the code for that - but sadly the IPR for that is with the client. But you get the idea what you need to accomplish - so go build.
2) Another Ugly Duckling is Guardduty faulty positives
In our setup, the internet traffic routes through a VPN to clients datacenter, and goes out from there - as everything is wanted to go through the firewalls there. In future the firewalls will be expanded to AWS.
I'm not 100% sure if this would happen when a instance role is used from wrong account, or is this just because we seem to use them from outside the AWS itself.
Guardduty sees the instance role credentials as a possible security issue, when they're used outside where they are residing in the environment.
So our instances calling services to make changes, pop up in Guardduty - and raise a flag.
Makes sense yes, those credentials shouldn't be used anywhere but within the account in question, so routing traffic to other accounts or out through a VPN onsite connection should raise a flag.
Afterthoughts
Would I do this again?
Short answer, No and you shouldn't do this either - unless you're sure it'll work in your use case perfectly.Would I have done this in the first place?
No, I wouldn't. However client insisted in building the network this way.
However after doing it I see the ingenuity in it, but still think it came with far too hard operability, no one really has the insight into how everything works.
Top comments (0)