After many months of development, my team just announced the general availability of our platform. That milestone seems like a perfect opportunity ...
For further actions, you may consider blocking this person and/or reporting abuse
Thanks for the insights. I wasn't aware of the process isolation, as far as I'm aware every new request starts a new lambda environment in a new container.
That's what surprised us, too - we thought every Lambda got a new environment in a new container. It turns out if you invoke a Lambda that you haven't in a while, it "cold starts", so you get a new environment. Then, that Lambda sits around "warm" waiting for more invocations. That same environment might be used several times before it gets removed from the pool.
That's usually fine, since Lambdas tend to be stateless for most use cases, but in our case state could potentially be mucked with by a user's custom code that we execute.
Did you try a custom runtime?
We didn't. What has your experience been with custom runtimes in Lambda?
I don't have much, but as far as I understood it a custom runtime is basically a HTTP API that passes event data to a "function" whatever that may be.
I'd guessed that you could have used a customer ID in the event data and have the custom runtime spin up isolated "functions" for every customer.
I wrote a thread about this a while back:
twitter.com/loujaybee/status/13463...
Interesting, wouldn't this Redis queue have solved this particular issue even within Lambda-land also? Or does it only work for ECS?
Great question - The Redis queue would have solved the size issue in Lambda, for sure. Is there a great way to invoke the next Lambda in line using a Redis queue? Looking through StackOverflow, it seems like people suggest leveraging SQS to invoke a series of Lambdas in series and pass data between them, but there may be better ways I'm not aware of related to leveraging a Redis queue.
I've heard good things about RSMQ (simple, fast queue abstractions on top of plain old redis). Their TLDR is:
Sounds like it could be a good match for your requirements. Perhaps you could setup an AWS step function to trigger the appropriate next lambda (with RSMQ ID and msg id in the payload to the subsequent lambda function). We use AWS step functions to great success for similar asynchronous lambda processing in a state machine.
With the addition of EFS for Lambda a lot of your problems can be solved with Lambda.
Latency should be vastly reduced doing network disk operations with EFS when you provision transfer rate and set to high iops.
process isolation can be an issue if you have code executed outside the handler functions, these will remain until the Lambda container is thrown away. If you require such isolation, this is where you need to cut code a lot and keep it in your execution handler.
That's a good point. EFS in Lambda is exciting.
WRT the process isolation thing, try running a test of this code in Lambda twice. The first time, you get a nice logged "Hello, world!". The second time you run it, console.log has been redefined and you get a less desirable "Your message has been hijacked".
gist.github.com/taylorreece/70ed16...
There aren't a lot of languages or runtimes where you'd want to allow endusers to hack the global scope. You can certainly use Lambda safely with process isolation by not creating globals and creating and setting any runtime variables inside your handler. Moving to ECS won't solve your problem. Polite suggestion: don't allow your customers to attach things to the global scope. NodeJS has support for isolating the vm or you can just regex the code.
Hey Matt, thanks for linking the
vm
module - it's good to know about. It seems like that should work, though the docs note:For our use case, where our platform runs customers' code which could contain anything, we've had to be a bit more heavy-handed with isolating our runtime environments. We ended up creating
chroot
jails and distinct node processes within our ECS containers to run our customers' code, so each run is guaranteed to not interact with any another.That makes sense and it's obvious that your business puts you in a position to do something that most apps would not want to do (execute untrusted enduser code). My comment was really in response to your gist above. The behavior of globals in Lambda is well documented and predictable. This didn't fit your rather unusual use case, but for most users, a quick read of the docs will arm them with what they need to understand process isolation in Lambda.
I think you did a great job of highlighting what I would say is the ideal path for AWS development nowadays. Start with Lambda, move to ECS if Lambda doesn't fit your needs, and only consider EC2-based applications as a last resort or a lift n' shift strategy for migration.
Thanks, Scott! Lambda sure make it easy to iterate quickly, without needing to sink hours into DevOps tasks, but it definitely does make sense for some use cases to sink the time needed into running on ECS/EC2/etc.
Great read. Would Amazon Step Functions have worked for your business use cases?
Hey Omri - thanks, and great question! I'll start with a disclaimer that I'm no expert in AWS Step Functions, so correct me if I get anything wrong :-)
Similar to Step Functions, Prismatic's platform allows you to build workflows (we call them integrations) where data flows through a series of steps. The integration's steps do a variety of things, like interact with third party APIs, mutate data, manage files, etc. Users can create branches and loops that run based on some conditionals, etc. - all of that is pretty similar to what you'd see in a Step Function state machine definition.
Our platform differs from Step Functions in a number of ways, too. Most notably, our users (typically B2B software companies) can develop one integration and deploy instances of the integration to their own customers. Those instances are configurable, and can be driven by customer-specific configuration variables and credentials, so one integration can handle a variety of customer-specific setups. We also offer tooling - things like logging, monitoring and alerting - so our users can easily track instance executions and can configure if/how they are notified if something goes wrong in their integrations.
It might have been possible to create some sort of shim that dynamically converted a Prismatic integration to a Step Functions state machine definition - I'd need to look more into Step Functions to figure out what difficulties we'd have there. The biggest thing keeping us from doing something like that is probably vendor lock-in. We have customers who would like to run Prismatic on other cloud platforms (or on their on-premise stacks), and implementing our integration runner as a container gives us more flexibility to migrate cloud providers as needed.
Good looking out on the vendor lock-in issue. And I was under the impression that Prismatic was only being sold as a cloud solution, but self-hosted options are great as well. Thanks for responding!
That's a very interesting read, thanks a lot!
Are you using Bull as queue library over Redis?
My experience of SQS is that it's great for decoupling services and queueing jobs when you don't have an end-user waiting for the job to complete. If you need near-real-time user experience, I'd similarly go for e.g. Redis, RabbitMQ or even Kafka. Does that sound reasonable?
Yep! We're big fans of Bull :-)
It gave me a lot of insights. Thanks for sharing.