DEV Community

Serverless Chats

Episode #6: Why Developers Need to Think About Cloud Costs with Erik Peterson

About Erik Peterson:

Erik Peterson is the CEO and founder of CloudZero. Previous to founding CloudZero, Erik was Director of Technology Strategy for Veracode and has nearly 20 years of software industry experience, including senior leadership and technology roles at HP, SPI Dynamics, GuardedNet and Sanctum. Erik has also held IT & InfoSec roles at Moody’s Investors Service, SunTrust Bank, U.S. Embassy Vienna, Austria and the United Nations International Atomic Energy Agency where he provided technical assistance to UN weapons inspectors.

Transcript:

Jeremy: Hi, everyone. I'm Jeremy Daly, and you're listening to Serverless Chats. This week, I'm chatting with Erik Peterson. Hey, Erik. Thanks for joining me.

Erik: Hey. Great to be here, Jeremy.

Jeremy: So you are the CEO at CloudZero in the great city of Boston. So why don't you tell the listeners a little bit about yourself and what CloudZero is up to?

Erik: Sure. So gosh, so I'm a recovering AppSec person, actually, by trade. I think I spent 20 years in the application ⁠— in the security industry trying to move the needle on one thing, which was to get developers to care about security. I didn't necessarily start there, but I I certainly thought a lot about application security through the years and where I think the applications security industry ended up is a good place, focused on the people who create the software that we care about. But about 10 years ago, maybe 11 now, in 2008, I got bit by the cloud bug and I started experimenting with AWS and taking that where I could take it. And I had the good fortune of bringing Veracode, the company I worked at before CloudZero, over into AWS and had a lot of fun doing that and learned a lot along the way. So, recovering AppSec person. Now true cloud connoisseur I hope.

Jeremy: And what's CloudZero all about?

Erik: So CloudZero. It's pretty simple. It gets back to my roots. I want developers to care about cost, right? And so CloudZero, we're the first cloud optimization platform that is specifically built to tie engineering decisions directly at cloud cost. You look at a lot of cloud optimization solutions today, they're focused on the finance team or parts of the organization that are outside of the people who are actually making the decisions writing the code. And so we want to empower DevOps team to make smarter engineering and infrastructure decisions. And we do that by giving them a platform that could allow engineers to understand in real-time the cost ramifications of their actions. So really powerful solution that, ultimately, we're going to help the business manage costs, move faster and drive innovation for it. And we love developers. We're focused on that world.

Jeremy: Do you have any big features coming out that you want to share with the audience?

Erik: Yeah. So we are building a whole set of capabilities for engineering teams to get right into the details of what matters most to them, which is how much are the things that they're actually building costing them, and take out all of the noise. You know, today if you go look at Amazon's Cost Explorer; you look at another product. You see all this data related to cost. All I care about is what is the thing that I'm working on right now? What does it cost me? And how are my decisions affecting that? So we have a number of new dashboards that are coming up for that and a few other little surprises around the corner around anomaly detection coming out this summer.

Jeremy: Awesome. So I wanted to have you on to talk about an extremely exciting topic that I actually, surprisingly, am a little bit passionate about because I do see a tremendous amount of value in this. But I want to talk about cloud computing costs. And obviously, you have quite a bit of experience in this, but where I want to look at this is we now have this sort of very, very granular billing that goes well beyond what maybe a SaaS company might provide. And obviously, you have SaaS bills and that's a metric that you could use. But now that you have cost associated with every sort of cloud engineering action that you take, how do we need to think about this differently? Maybe let's start there.

Erik: So, you know, I think every cloud engineer should view cost as something that they ⁠— their expertise in understanding the bill needs to be something that they feel proud enough to put on the resume. You think about what was the very first Amazon service. It was, a lot of times, it's really easy to say, "Oh, it's SQS or S3." No, it was actually Amazon Billing, right? Because...

Jeremy: That's good point.

Erik: Amazon wasn't going to do anything if they couldn't bill you for it. And over time, they've figured out how to, like you say, get deeper into the metered billing. We have a millisecond billing. EC2 used to be billed by the hour, and now could be billed even tighter than that. You go look at the reports. Everything is kind of normalized to the hour, and it's a little bit more complicated to figure out. But the key kind of thing here is, whether we know it or not, as software architects, engineers, DevOps engineers, when we moved from on-prem in the cloud, we had a whole lot of constraints that just disappeared overnight. And you know, this decision about how much things cost, what used to be made for us, somebody went and bought a bunch of servers. They put it in the basement, and that was all well and good. And then we just tried to maximize our usage of that resource. Now, somebody gave us an Amazon account, and our instincts, as engineers, are a little bit off, because our instincts are how can I get the fastest path to value for my customers, innovate quicker build, you know, new capabilities. And your intuition is to expand to use all available resources in front of you in order to achieve that goal, right? It's certainly what your boss is telling you or the CEO is telling you. And so we go, "oh, I have this infinite scale. Let's go nuts." The problem, the flip side of that, of course, is if I have infinite scale, I also need to have infinite wallet, right? If I don't have infinite wallet, then actually, the reality is I don't actually have infinite scale, and so as engineers, I think we need to move past caring just about performance and uptime, and we need to add a third item to our kind of list of operational metrics, and that's cost.

Jeremy: Yeah, and and I don't know how you knew that I have a bunch of servers in my basement. They're all turned off now, but I literally have a bunch of old servers in my basement.

Erik: All have some dirty, dirty little secrets.

Jeremy: Exactly. You wouldn't believe how many hard drives I have because I didn't want to throw them away when I closed down my data center. But so you mentioned⁠ - and I think this is important - you mentioned this idea of the sort of the purchasing decision, right? And in the past, it has always been okay, we need 100 servers. We need this many copies of Windows server or we need this Oracle license or whatever we need, and those things were fairly easy to plan for. And again, they were these purchasing decisions by sort of the, I guess, the C-levels or the purchasing department or something. And you would follow along with these...

Erik: The powers that be.

Jeremy: The powers that be. Yeah, and you'd have these budgets, but I think that just becomes a little bit more ⁠— it's more difficult to plan for. And we can we get into this more in a few minutes. But I think what's really interesting and what has changed, at least from what I've seen, is now that we have this very detailed and granular billing, we can use, like you said, we could use that cost actually as a KPI for our business to understand how much we're spending for every action that we're taking. And so you could actually see, you know, for a customer that uses X number of Lambda invocations, and this many SNS messages, and this many step function executions, and this much data storage, you could actually calculate very, very closely how much each customer costs you from that cloud infrastructure standpoint. So I just think that's a really, really, really interesting thing that you can do now.

Erik: If you're a SaaS vendor, you know your value delivery chains is built on top of cloud. That's your cost of goods. That's your gross margin. You need to understand that if you're going to deliver a profitable product to the market and and you want that conversation to be part of your entire organization because, I mean, the reality is is that the buying decision is being made by your engineering team now, right? They choose: am I going to use this type of instance or that type of instance? Am I going to implement this kind of code or that kind of code? They make a buying decision every moment of every day. Essentially, every time, every line of code that they write, they're making a buying decision, and and so you have to think about that. And then it gets even more complicated, though, because there are so many intertwined, and particularly in the serverless world, which is so, I think, honestly I'm sure our listeners here will appreciate our point of view, is that we think, we believe serverless is the future of all computing. But, you know, it's even more powerful because you create these very interesting applications that are composed of lots of different services. It's not just Lambda compute. It's I have Lambda connected to SNS passing to SQS, DynamoDB, Kinesis ⁠— all these things flowing together. And I'm not just going to the cheat sheet on Amazon and saying, "well, how much does it cost for one hour of compute?" to try to estimate my costs. No, I now have to think through that whole story, and I think it's kind of a shame that, actually, for most organizations, they consider the state of the art there to be well, let's just try it and see what happens. And a lot of times they try it and test and they go, "Oh, looks like it was gonna cost a couple bucks. Great. Let's ship it." And once it gets into production, it's a much different story, and they just don't ⁠— organizations really struggle with this. It's unfortunate. (10:53)

Jeremy: So speaking of organizations and struggling. So this is sort of like a cultural change, right? I mean, if we think of trying to get our developers to think about costs now. Because in the past it was, "I wrote some code. Here you go." And I think there were a lot of developers who did have, that they were cost-conscious about how much they were spending, you know, depending on how many services they were using and things like that. But I think it's a little bit different now. And you have a term for this, right? You call this FinDevOps? Is that sort of what you mean by FinDevOps? This idea of this cultural change.

Erik: Yes. So I've always viewed DevOps as the culture that comes along with a cloud-driven lifestyle and that it embodies a lot of things. And I've spoken a little bit about the relationship between the cloud and DevOps, and and also SRE as being a practice that you can apply to that culture. The thing that was missing from the, what I thought was really missing, from cloud culture was an appreciation for the spend, appreciation for how much things cost, because there's a real tight relationship between a well-architected system and a cost-effective one. I've looked out over hundreds or maybe thousands of different systems now, and I'll tell you in every time, first place I'll look is the bill, and it'll give me a better insight into the architecture and what's built, sometimes better than any other data source. And so the culture of caring about the cost of things, the financial aspects of it, was missing from engineering, from the engineering discipline, I think, in a big way. And I wanted a kind of draw attention to that. And that's the whole point of FinDevOps. It's also about understanding. Over time, I want an organization to understand, what's the true kind of flow of capital through my system. If every transaction costs me $12 for one customer transaction, but I'm only charging my customers $3 per transaction, that capital flow is is not going to work out well for me in volume over time, right? I need to find the balance. But yeah, I think FinDevO ps has done a good job of kind of drawing attention to this. And I've spoken to a lot of engineers who appreciate this, but their impact or they're kind of introduction to cost has been somebody from the CFO's office coming down to their organization and then spending an hour yelling at everybody because the bill is too damn high. Meanwhile, that person leaves the room, and then the CEO walks in and says, "Why are you guys not delivering more value to the customer more?" And it's complete imbalance between the two organizations. Everybody needs to be able to have one kind of common terminology for understanding as to why we built it the way we built it, how it's delivering value and how much it costs; it needs to be part of that conversation. (14:00)

Jeremy: Yeah, and I think that you have this issue, especially with purchasing departments, that, or whoever the accountants are, seeing these fluctuating bills in the cloud and not understanding. They just say, "Oh, well, it cost X amount of dollars last month. So this month, it should cost the same, right?" But maybe we had more users. Maybe we added users or whatever. Maybe we did something. We added a new feature, and suddenly new features might add thousands of dollars worth of costs. Right? So do you see that sort of butting of heads between sort of the developers or the engineering teams, and then, you know, as you as you call them, the powers that be sometimes?

Erik: There is that tension there. I mean, sometimes it's a healthy tension, but there is that tension there, and it kind of, I mean it goes like this: imagine you needed to explain, let's say, a very complicated system that you constructed, and now you're trying to explain it in French to the Germans, right? You're speaking a different language. And that's the hard part, right? You know, you can ask the question, "Well, why did we spend $20,000 this month on EC2 more than we spent last month," for example. And well, it's because the product team had a new initiative. We had to do a migration. We had to do this. We had to move data from over here. We had security requirements, so we needed to encrypt the data. So we're calling the KMS API a lot. And then that resulted in a whole bunch of new storage and processing. And you're talking, talking, talking, and then you look up and there's just a glazed-over look on the finance guy's eyes and they're going, "Yeah, no, no why did we spend $20,000 more this month? And how much are we going to spend next month?" And they go, "What? I can't talk to you. Get out here." Right? And ultimately you want to tie it back to well, look, this product initiative cost this much money, and we forecast it to be X. And we have an idea, before we actually go down that path, how much it's going to cost and cost has been a part of it. Because there, I mean, for a long time in engineering has been a notion of non-functional requirements, right? What kind of performance requirements do you have? What kind of uptime requirements do you have? And the hard question that I think organizations need to ask themselves is, "Well, what kind of cost or budget requirements do you have?" And at what point are you going compromise the the budget for the user's experience or vice versa? You know, you are you gonna go, "You know what. User experience matters at all costs. Even if it's $1,000,000 in extra spend this month, our users must be absolutely happy." Okay. Make that decision consciously. Today, I don't think anybody's consciously making that decision.

Jeremy: Yeah, and I think that's a really, really good point because you're right. I mean, at some point, we have to trade off certain things. I mean, if we had unlimited scale, then like you said, you would need an unlimited wallet to do that. So I think planning around that is a good point.

Erik: Well, I'll throw in one thing, you know, like this decision. It's not like these decisions are new. But the difference is that they've always been made for us.

Jeremy: That's a good point.

Erik: We had the CFO and the CEO, or maybe the CIO decide, "oh, we need 50 servers of this class and we worked with the teams." And then that's what we put in the basement, right? So now your performance envelope and everything has been made. And when you run out of capacity, everyone kind of at the time, you know, I'd have a good joke about it. Like, "oh, server's down because we're having, everything is so successful. We have 1,000,000 users. Our product is wonderful." But, you know, today people go, "Why is it down? You don't have infinite scale." Well, right. We're so successful, we put the company out of business.

Jeremy: So I think another thing about, sort of where I see this cultural change, is the ability, and you alluded to this, about developers thinking about the actions that they take and how that affects overall costs. And you outlined a developer going back and explaining to somebody, "Okay, well, we had this new initiative. We did this. We had to access KMS more times," or whatever. But I mean, this is something where, you know, how much time should developers be spending on thinking about cost optimization? Because obviously, in a small environment, a small tweak here, a small tweak there, might save you $50 a month, right? But when you get to scale and you have an enterprise serverless application, you might be spending thousands, tens of thousands, $100,000 a month, you know, processing things. Maybe it takes an extra two seconds to do this particular job because you're calling it this way or you're not failing fast enough or whatever. So how much time do you think developers should be spending on cost optimizations? And what kind of experiments could they run to maybe affect the overall bill?

Erik: Yeah, you know, I mean, this is one where there's a lot of different conflicting ideas on this because you ask any product organization, particularly software organization, today and you'll ask them, "What's more important to you: spending all this time at cost, or innovating faster?" And everyone will say innovating faster...

Jeremy: Until the bill comes.

Erik: Until the bill comes, right? And then suddenly it's like, "Whoa, wait a second." And then people tend to think "Well, all right. Well, we'll get around to fixing it when we have a problem." And that was the same situation we got ourselves into with security - application security. It was like, let's get our developers to not care about security right now because it's just going to slow them down. It's a complicated topic. They don't really understand it. Can we just get another team to manage this for us? And then when it's a real problem, they'll come back to us? And that was what the industry tried to do, and guess what was happening? Everybody, even could continue to today, gets hacked, right, left and center. And they realized, "No, no this has to be part of the process of front. It will actually cost us less money." I just heard ⁠— I forget. Moody's just downgraded either ⁠⁠— I don't want to get the name wrong. One of the companies out there got downgraded in the ratings because of their security posture. And I think we need to take a more proactive view to this. Now what made that possible to take a more proactive view for security in the security industry, and it wasn't that the developers, suddenly, we're spending more time necessarily on it. It was that the tooling and the processes, processes in the capabilities of the systems that we're using, all improved to make it possible. I'm a big fan of decision loops, OODA loops, and I think about how can I get cost into that decision loop process so that an engineer could make a quick decision without, an informed or educated buying decision when you're doing things. And that means getting the data to them as quickly as possible. And right now, most engineers get the data at the end of the month in the form of a bill or an angry email or some report that they check even the following day or the following week. And that's kind of ridiculous, right? We live in a real-time world. And if my, you know, I asked this question at a conference recently. I asked everybody. I said, "How long would it be until you knew that your site was down?" Somebody yells out, "It'll be a second. I know it instantaneously." And I'm like, of course, right. "How long would it be until you knew that one of the key transactions, your credit card processing was down on your website?" "[Someone said,] "We'd know in seconds." I'm like, great. "How long would it be until you knew that an engineer on your team wrote a line of buggy code, and it costs $100,000?" Just dead silence.

Jeremy: It would be a while. And unless somebody was checking those bills on a regular basis, you wouldn't see that.

Erik: Well, you wouldn't even see it, even if you, in the moment, I could write a line of code that will cost my company $100,000 in a heartbeat. I can do that as an engineer. That's the power that we have. And so it was dead silence. I think there was a gasp in the back, and somebody finally yelled out, "A month." And I'm like, exactly.

Jeremy: Most likely. Yeah.

Erik: Yeah. You know, there's probably no more critical kind of thing here in terms of doing this. And here's the thing. So first, I think the tooling and the technology has to improve. So that cost can become an operational metric that fits in with the developer lifecycle. And that's the mission that CloudZero's on, obviously, and why we're doing what we're doing. We want to enable that to become part of the engineering team thought process.

Jeremy: So let's get into this discussion about costs being a first-class operational metric. So what do we mean by first-class operational metrics, first of all? In case people don't know.

Erik: Yeah, so what it means is you know, when we're doing design, when we're building, when we do a deploy or we do some tests, we run integration tests ⁠— any type of testing. It is a KPI that we care about when we judge whether or not our application is ready for the world or not. And right now, we care about performance. We care about time availability and things like that. We're not spending enough time thinking about cost as a first class-operational metric. It's important that we look at that and we ask yourselves, "Is that correct or is that wrong?" And we have an idea of what we expect before we release the application to the world. And it becomes part of the KPIs that we track. You know, you walk into an operations center for any major Internet property today and you'll see an operations dashboard that's telling you all kinds of key transactions. And I'm not a huge fan of dashboards. Dashboards are where, I don't know, a lot of things could have died, but at the end of the day, these are the things that people are caring about, and nowhere in any of these dashboards will you see how much money is our current cloud infrastructure costing us? People just simply aren't thinking about that until the end of the day. As companies trying to build innovative products, we're also trying to be, trying to build profitable products. And when I was thinking earlier about you know, when we think about innovation, if I can save you hundreds of thousands of dollars, maybe millions of dollars, because I have taken a little bit more time to think about how I've constructed my application, those are dollars that I can invest back into my engineering process. You know, you speak to any engineering manager about what really helps them move innovation faster, and they'll say, "Head count. More engineers on the team." Even though, sometimes you can't two pizza teams and all that. At the end of the day, stuff is built by people. And if I can add more people to my team or invest more in the technology they're using, then I could move faster. And right now, we take this I think, additional innovation budget that we have, and we almost lazily ship it off to the cloud providers because we think there's no better way to do it.

Jeremy: Yeah, and I think, you know, again, the other thing about first-class metrics, and I guess we're maybe getting a little ahead of ourselves, but that ability to measure a KPI and then just determine whether it's good or bad, and whether that has some sort of impact ⁠— I think the granularity of billing is sort of the perfect KPI to tell you something is right or something is wrong.

Erik: Yeah. I mean, the vision for CloudZero, when I first started working on CloudZero, I was haunted by the fact that the systems we were building were getting way more complicated than what any one person could understand. And my point of view was that we should think of the cloud providers, you know, the cloud as a computer, and the cloud providers as an operating system, and there is nothing that understands really what's going on in that operating system today. We were all too focused on the agents that were telling us what was going on inside of EC2 and you know, in Windows and Linux. And I think that's just the microkernel. We should really not care at all about that. What's going on in this big, complicated system? And so I went looking for every data source of that operating system could provide with the goal of pulling it into CloudZero to build this deep understanding. And the day that I started looking at the billing data was really impactful, because there is no other data source across all of your cloud infrastructure that tells you more about literally everything that's going on. Because if it's happening, Amazon want stability for it. Right? So it is all right there. The challenge with this data source is that it's got great accuracy, but it has horrible latency, and there's no way to correlate this data source with all the actions and activities that folks are taking. When I spin up a new machine, it takes a while before I know the actual cost of it. Or I build out a new system where I have, I set up ⁠— you know that example I gave about KMS earlier. We were working with one customer, and they did a migration from one system to another, and they estimated out what the cost was going to be. And we generated an alert and came back and said, "Wait a sec. You have this cost spike here. Completely unpredicted." And they go, "What the heck is going on?" Well, your team, it looks like here you wrote some code that is calling the KMS API millions of times and they're like, "Oh, that's part of the migration. Jeez, we had no idea." And we're like "Well, what if you just zip up that data into one blob instead of writing it all in these individual components and call the KMS API significantly less?" And they're like, "Well, that would be easy." Instant, instant cost savings, right?

Jeremy: Yeah.

Erik: But the hard part about all that was getting that information to the developers at a time where they actually could even consciously think about the code. Because if I'd come back to that team a month later or a year later, who knows what, and said, "Oh, hey, you know, we found this thing." They'd be like, "What?" I don't even remember what code I wrote two days ago, much less a month ago. Useless advice, right? So that's kind of the last mile for cost optimization as well is getting this data to engineers when they're making the decisions in the moment.

Jeremy: How quickly do you get that data from the billing? You know, maybe not just specifically with CloudZero, but do you have access to that billing data through through an API? Can you get that pretty quickly?

Erik: So quickly, and the answer today really for just about everybody is no, because the fastest that, let's just pick on AWS here for a second, is going to send that data out is maybe every eight to 12 hours. They're gonna drop a big blob of information into an S3 bucket, and you're going to be able to poke around at it, and then maybe they'll decide that they forgot to apply some credits, and so a day later, they'll apply those credits, and then maybe a week later, there'll be some additional modifications that need to be made because they have a specialized arrangement in terms of their negotiated cost and things like that. And then some one time cost will flow into there and all kinds of noise kind of mixes into the thing. And so even if you look at it as quickly as the information's coming out, these eight to 12 hours, it doesn't necessarily tell you the complete story. And so the real magic here is taking that information and combining it with all of the operational activity that's going on in the environment and being able to model out and extrapolate that and use some of these fancy fancy terms like machine learning, and I don't want to be too buzzword-y. I'll fit blockchain in here at some point.

Jeremy: Just in a serverless, machine learning, AI,  blockchain system. Yeah, sure.

Erik: The reality is there's data there and all the data sources that your cloud provider gives you, each one individually doesn't tell you the full story. It's what, the answer is, when you combine them all together, you get a much more accurate picture of what's happening. And then you start getting in the realm of where you can talk about costs in a much more relevant timeline than every eight to 12 hours.

Jeremy: Yeah, so then if I was to have a new deployment and maybe I introduced that bug, that $100,000 bug per month, or whatever. So how quickly would I know; how quickly could I find out about that?

Erik: Yeah. I mean, our objective is that you'll find out about it the moment it hits the wire. And that you'll be getting a Slack message or something, saying, "Hey, the change you've just made is going to have serious ramifications." The place that we want to be is just like we know we had an adverse impact on the performance of my application, because we do performance testing after you make a change ⁠— you know how much additional CPU load or response time or or things like that ⁠— what are those things gonna hit, land on? We want to also have that information about how much did the cost profile change in my account? And it may be in test that we only see, like, a couple pennies. But if we're running a production, we'll go, "Yeah, but that couple pennies extrapolated out, we get about 5,000,000,000 transactions or something like that. That's going to result in a 20% decrease in our margin." That becomes a real logical decision. And then when the product team, equipped with that information, goes back to the business and says, "Okay. We've completed this new functionality that you asked us for, but it's going to reduce the profitability of the company by 20%. Should we proceed?" And they might say "Yes." They'll go, "You know what? It's still good enough for us, but we're going to prioritize fixing that in the next release because we want to drive the profitability of the business up." Now they're making really educated decisions that are going be very powerful to building a profitable, nimble business. And they won't be sitting here just kind of grasping at sort of why are we doing cost optimization? Because it feels good? No, because we're trying to build a profitable, profitable business.

Jeremy: So that's a really good segue I think into sort of the last thing I wanted to talk to you about. So sort of around this idea of not just total cost of ownership, right, because we know that ⁠— or we know, I should say we know ⁠— but we assume right and most of the anecdotal data tells us that moving apps to serverless can reduce your total cost of ownership because you're not paying for the operations people. You're not managing those servers anymore. Obviously, the price goes up a little bit, you know, depending on, which service is you're using. But I think it's a really interesting approach to look at what the predicted cost would be and how you could actually fit those into your product roadmap. Like how you would build out your product roadmap thinking about cost as one of the factors, right? So it's no longer just, "Oh, we can add a new feature because we just, we already have 50 or 100 or 1000 servers running. And we could just stick it on those. And maybe the CPU will go up a little bit, but it won't cost us any more," as opposed to saying, "Oh, well, we're adding a face detection feature or sentiment analysis feature, and all of a sudden, now we're hitting up against Amazon recognition or doing sentiment analysis with one of their ML AI services." So that's really interesting to me. Maybe how do you kind of go about using these KPIs to plan for new products?

Erik: Yeah. I mean, the vast majority of things running in the cloud today were systems that were lifted and shifted. And they function and they may cost a little bit much. Corey Quinn had a thing about, he said legacy apps is just an unpopular term for applications that currently generate profit or revenue and that they're everywhere. If we think about the kind of serverless revolution right now, almost all of the serverless activity that I'm seeing is in new application development. Although, there are some notable examples I've seen mainframe applications being moved to serverless. I've seen things that aren't necessarily totally greenfield. I mean, you've probably seen a ton of this, some of this as well.

Jeremyb: Sure.

Erik: But it's still an enormous amount of kind of greenfield development in the serverless space. CloudZero,  fortunately, had the decision to make when we started building: are we going to do serverless or are we going to go with containers? We said, "You know what. Let's go with serverless." It was a really good decision. I think we might, because of our own product and the serverless decisions that we made, we might be the only start-up that has a cloud bill that is constantly decreasing, even as we add more customers to the platform. Because every engineering decision, we know the cost of it internally. And it's been really powerful in changing our culture, what we were talking about earlier. But, you know, thinking out into the future, the opportunity to get significant return on engineering investment in improving these legacy applications is going to come from really being able to prove out the value of that re-architecture or rewrite. And today, nobody really has the tools to do it effectively. And so most people are just happy enough to leave well enough alone. But if you have the ability to get it, to analyze that entire operating system, everything that's going on in there, and then come back with a pretty accurate understanding of how that system's working and and identify the parts that could be replaced by a surveillance system, not necessarily all of it, just components of it, so I call this a serverless rightsizing (37:48). I think that the activity that's been built around EC2 rightsizing is, a lot of time, kind of a wasted effort. But think about this notion of serverless rightsizing. Take a legacy system, find the most expensive components of it and then rightsize it onto serverless with a cost-justification or cost-benefit analysis that is done for you in an automated way. That gives you real justification for why you do that. And with serverless, we've seen it. You might see 100X return on that investment. The challenges, of course, is nobody really knows where to start, and they don't have that data in-hand to justify that investment. And so they, you know, they don't have the opportunity to do that. But when the business has that data, they can make that data-driven decision about why they might replace this component of system. That's where I think we're going to see the real serverless revolution take off because that cost-benefit analysis is going to really drive the rationale behind why people are going serverless.

Jeremy: So alright, I'm gonna ask you one more question. This is for any developer who is out there listening who has ever had to talk to the powers that be about getting a budget. So traditional operations, I guess, would kind of fall under the CPAEX where you buy a certain amount of servers, or, you know, it's not really that OPEX thing. It's not that idea that you're being metered, right? Have you seen this argument from purchasing departments? And how do you overcome that? How do you convince the purchasing department that, yes, this metered-building that we don't necessarily know how much it's going to cost is a better way to do it than for us to just have a budget that caps us that something?

Erik: I mean, this is, in a lot of ways, this is actually I don't know if you knew to tie this back or not, if I've told you this story, but this kind of ties back to my origin story for getting Veracode onto the cloud. We had a really interesting project for a client, and we weren't in AWS yet. And we wanted to build this out in AWS  because we needed the scale. We were going to need thousands of servers to do it. But I needed to convince the CFO that it was a good idea. So I went to our CFO and I said, "Hey, can we build this project for this client. We think it's going to be really amazing." He said okay, because I needed the company credit card and so you could imagine me having this conversation. He's like, "Alright, so I've only heard nightmare stories about this cloud thing. Here's the company credit card. Your budget is $3000." And about a week later, $2997. We figured out how to do it. About 1500 spot instances with like a heavily optimized, kind of homegrown, autoscaling solution. And it was a wonderful challenge, and we had a lot of fun doing it, and we built a really cost effective product, and that was kind of my origin story in terms of thinking about cost as part of engineering, because I think constraints as engineers is a really powerful thing. It helps kind of bound the problem that we're trying to solve, and so I would suggest to any engineer or any anyone out there listening to this, when they think about it, is try to challenge yourself with a budget, and try to manage towards that budget. Become an educated consumer. When you go to the restaurant, I mean, we all do this naturally. When you go to a restaurant or order things off the menu, we see price tags there. Today, when we order off the menu for whichever cloud provider were using, we don't actually see the price tags there. It's actually kind of, I think it's strange that when you've spin up infrastructure using the console of any cloud provider, it doesn't like pop back a message thta just says "Oh, and what you just did is going cost, you know, x dollars an hour, right?" Why is that missing? Well, it's probably because it's not in their best interest.

Jeremy: Maybe they don't want to tell you.

Erik: They don't want you to know that because yeah, you're going to go to the restaurant and you're going to go like, "Oh, somebody else is paying the bill. I'll have the surf and turf. The wagyu beef. That sounds delicious. Everything has been prepared for me." But so I would really strongly suggest to folks, try to manage themselves to a budget, even if no one else is holding them to a budget. Because even if they think no one's holding them to a budget, somebody, somewhere, is holding somebody to a budget. To think about that and and actually try to be irrational, because most people would say a $3000 budget building anything in the cloud is nuts, but it can be done. And you can really do some very powerful things. And what you will discover when you hold yourself to that budget is this tight relationship between a well-architected system and a cost-effective one. And it will make you a better engineer through that process.

Jeremy: Awesome. All right, well, let's leave it there. So listen, I want to thank you, Eric, for joining me, and sharing all of your knowledge. Where can people find out more about you and CloudZero?

Erik: So they can certainly find us on CloudZero.com, and and read all about us. We've got a great blog there. I hope everybody spends some time providing some feedback. And the usual sources on Twitter and whatnot, which I'm sure you'll put links in. But, you know, true story, my Twitter handle, which is impossible to say, was my original DND character from when I was in high school, so that's the hidden fact that your listeners will, that I've now shared with the world.

Jeremy: Silvexis.

Erik: Silvexis. S-i-l-v-e-x-i-s.

Jeremy: And then @CloudZeroInc. And then also, if people want to email you. Can they do that?

Erik: Yeah. You know they can. They can reach out to me. It's really simple. It's erik@cloudzero.com.

Jeremy: Perfect. Alright, well, I will make sure we get all of that in the show notes and thanks again, Eric. Appreciate it.

Erik: Wonderful. Thanks, Jeremy. Had a great time

Episode source