System Design Interview with a Google engineer

See how someone tries to solve the Design a free food app problem and more. Watch a mock interview with a Google engineer below.

Interview Summary

Problem type

Design a free food app

Interview question

Distributing 6M burgers in 10mins. People will come and click on a button on App get My Free Burger. No one should get more than 1 burger. We should not promise more than 6M people that they will get a free burger.

Interview Feedback

Feedback about Immutable Penguin (the interviewee)

Advance this person to the next round?

  Yes

How were their technical skills?

4/4

How was their problem solving ability?

4/4

What about their communication ability?

3/4

Distributing 6M burgers in 10mins. People will come and click on a button on App get My Free Burger. No one should get more than 1 burger. We should not promise more than 6M people that they will get a free burger. Functional Requirements: TC asked all good scoping questions related to the problem. TC talked about what happened when 6M burgers were done. Non-functional requirements: Availability vs Consistency: Did not talk about the this Asked the latency expectancy. Asked QPS. I asked if they can calculate from given information that their are 20M users. Estimations: TC skipped this entirely. API Design: Skipped this. High Level Architecture: TC used the API gateway and said it would take care of authentication and rate-limiting. TC identified that this system is write heavy TC talked about read and write ratio. TC talked about multiple write replicas. TC talked about consistency. TC identified that there could be concurrent requests from the same user. TC talked about applying the Unique constraint on the DB. TC also tried to cover the other requirement of not promising more than 6M burgers. TC initially suggested that they would have a common counter. They identified that there is a need for concurrency control to avoid overpromising burgers. TC said that locking with 35k qps is not practical. TC then said that they can work on the distributed counts. TC talked about version count and we can see if the counter is always incrementing. TC had the distributed counters in distributed memory. TC also talked about the approach where there would be a counter per instance of the burger giveaway service. TC talked about the issue in the second approach that there could be an issue of losing the count if a server goes down. Tc used the incremental counters. TC understood that there is additional coordination. TC could not come up with a way to avoid it. I gave a hint towards the decremental counter. TC understood the fact that this oucl be easier with a lot less coordination. This shows TC is easy to communicate and can understand the feedback and hints pretty easily. Tc understood the point that they can choose the in-memory database while storing the userIds. TC was able to identify the faults in the suggestions and design the really good system. Database: TC decided at last to have the DB in-memory. Designing for Scale: Sharding: TC talked about sharding the database and talked about service discovery(zookeeper). Load Balancing: TC used load balancing Fault Tolerance TC talked about LB being a single point of failure. Tips for Future Interview: - Add Estimation and API-Design steps after Non-Functional Requirements

Feedback about Red Maelstrom (the interviewer)

Would you want to work with this person?

  Yes

How excited would you be to work with them?

4/4

How good were the questions?

4/4

How helpful was your interviewer in guiding you to the solution(s)?

4/4

The interview was great. I have never seen the question before, and I was able to learn some interesting things from it (e.g. using a decrementing counter). The feedback at the end was actionable and clear.

Interview Transcript

Red Maelstrom: Hey, hello, how are you? Immutable Penguin: Good. How are you? Red Maelstrom: I'm good. I'm good. So yeah. So can you share your context in which level you are applying for Google? Immutable Penguin: So actually, yeah, sure, sure. So actually, my interview coming up in a few days. Monday is actually for Uber. And it's L five A, I am planning on interviewing with Google eventually. But right now, it's like hiring is frozen. But yeah, for now, I want to focus on L five A, at the Uber. Red Maelstrom: Let me know if you need a reference at Google. Yeah. Sure. Awesome. So let's start with the question. And like, I will consider like, keep my question and my feedback rounds. L5 at Google. Will that be okay? Yeah Immutable Penguin: Yeah, yeah. Red Maelstrom: Cool. Yeah. Awesome. So basically, like, we will focus 45 minutes on actual system design, like you driving the discussion. Like, I will keep 10-15 minutes for feedback. So yeah and do have any drawing tool in mind, which you want to use. Immutable Penguin: I've been using excalidraw. Red Maelstrom: Oh nice yeah. Cool. Can you share the link? Immutable Penguin: Yes, I will share a link right now. Here we go. Let me put it in the chat here. Red Maelstrom: Cool, awesome. Immutable Penguin: Okay, so distributing 6 million burgers in 10 minutes, people will come and click on a button on app, get my free burger. Now once you get more than one burger, we should not promise more than 6 million people that they will get a free burger. Okay. Okay. So I think there's some clarification that I might need here. So let me just put out like a section called functional requirements. Let me start clarifying here. So there's, I guess there's an existing app, it seems like. Okay, so there's some existing gap of some kind, that will have some sort of promotion, like maybe, is it like a screen that's going to pop up says, like, you can get a free burger, if you click here? Red Maelstrom: Yep. Immutable Penguin: Okay, so Red Maelstrom: We'll focus on the backend, we don't need to worry about the app functionality. Immutable Penguin: Got it, got it. Okay. No one should get more than one burger. So I'm assuming we're going to have some sort of identifier for the users. Is it going to be only for logged in users? Red Maelstrom: Yeah. Immutable Penguin: Okay, so only for a user with Red Maelstrom: Um can you give me just one one minute someone is at the door yeah sorry. Immutable Penguin: Okay, are you back now? Red Maelstrom: Yeah. Immutable Penguin: Okay. So when you say distributive? Red Maelstrom: Are you typing somewhere? Immutable Penguin: No, oh, I was typing on the, on the interviewing io, sure. I just for the functional requirements, and I'll go to the excalidraw for the design. But so when you say 6 million burgers in 10 minutes, do you mean that people have only 10 minutes to claim? That meaning this whole promotion will be running for 10 minutes only? Red Maelstrom: Yes, that's correct. Immutable Penguin: Okay, so people will have 10 minutes to click on this. Okay. So no one should get more than one burger and we should not promise more than 6 million that they will get a burger. Okay. So I'm assuming that first we could kind of push this push the screen to the app or open this offer to initially 6 million people. But the question is, like, how do we know Red Maelstrom: No, it won't be like that. So like then we are expecting 20 million people to open our app and try to claim it. Yeah, Immutable Penguin: We are expecting how many million people sorry Red Maelstrom: 20 million people. We are expecting 20 million people to participate in this. 20 million people would be like using the app to claim the burger. Like, we have to come up with some mechanisms so that we don't promise more than 6 million people. Immutable Penguin: So okay, so just let me so let me understand so 20 million people will be getting the screen. But once they click on Red Maelstrom: Screen is visible to everyone during that time who has an account. Yeah, so we'll be, but we are expecting like the let's say, there are 30 million people on the app. But we are expecting like 20 million to participate in this. Immutable Penguin: Okay, so that's meaning so anybody that's over 6 million we'll say okay, sorry, like all three have been claimed already? Red Maelstrom: Yes, yes. Yes. Immutable Penguin: Okay. Over 6 million will get a sorry message. Okay. That seems okay. So that seems like and I might assuming now all the user information and all that, is that already existing, like all the account information? Red Maelstrom: Yeah, you don't need to worry about logging authentication. Or you can assume whatever information you need to build this system exist in the user table. Or you can also exist there is a user service, which gives all the information. Immutable Penguin: Okay, got it. So it seems like the non functional requirements is built out from that, I guess my first question would be in terms of latency, when when they put, when they click get my free burger? How fast are they gonna have to get like a response to say like, Oh, you've got the free burger or you didnt? Red Maelstrom: 100 milliseconds. Immutable Penguin: 100 millisecond response to see if they get a burger. Okay. All right. Now in times, okay, in terms of okay, we said that. So in terms of I mean, let me just think in terms of data storage, I guess, do we need to like store historically, which users got burgers or just for just like a count? Well, one second. So since we don't want to, we don't want anybody to get more than one burger, I think we have to store which users, basically, we have to store 6 million records of like a user ID, and just the user ID really, really just the user ID. So like, we need to store 6 million user IDs. Now, in terms of how many requests we're gonna get a second. So can you tell me how many requests I get a second? I guess I can get an idea by like, 10 minutes. But Red Maelstrom: yes, yeah. So basically, we can assume like fairly distributed on traffic or 10 minutes. So from 20 million, you can come up with some way. Immutable Penguin: Right? So 20 million users divided by 10 minutes. So that would make 2 million users a minute. And divided by 60 would be 200,000 divided by six divided by six is like about just be something like, I don't know, 35, 35k requests per second about. Okay. All right. And which other non functional requirements? Oh, is there any like infrastructure that's already existing, that we could use like maybe like distributed counters or anything like that? Red Maelstrom: You can assume that, but you can talk about like assumptions about expectations from that infrastructure. Immutable Penguin: Okay. All right. I think that's kind of what we need for the non photocurrents. All right. So let me let me kind of move on to things like the high level design just in terms of like from a user perspective is pretty simple. But let's let's design it anyway. So I'm shifting to excalidraw now. So here we have, oh, and another thing is, which I don't think will really affect the design. But what type of clients are we going to? Support is like mobile and web? Red Maelstrom: Yep Mobile and web. Immutable Penguin: Okay. All right, mobile and web clients. Alright. So here we have just the client. And I think the first step that we'll have is sort of an API gateway. API gateway is going to take care of like, you know, things like first of all, like, just authentication. Like, you know, if the user has an account, like authenticating user also like SSL termination, maybe rate limiting. It's like in case there's like a malicious user that wants to spend the, you know, spend the service. So will, you know, the API gateway will take care of all that. Let's see, like, authentication. Rate limiting. Red Maelstrom: I'm sorry, I'm just typing some feedback notes. Let me know if my keystrokes are annoying you. Immutable Penguin: No, it's fine. I don't hear it. Okay. All right. Now we have, we have load balancing. And I'm gonna make a section here, just for like, come back later, things I want to discuss later. So first thing is going to be like, load balancer single point of failure, how how we don't want to make the load balancer single point of failure. But let's have here a bunch of servers. So we'll have just like the API servers here, replicated. Okay, and, and API servers will will talk to the, the service so the actual will be like a new service. Burger, burger giveaway service.So let's have this. Let's have that over here. Way, service. Okay. Now, obviously, we need some sort of storage behind this. So what's the first thing we'll do is just have a have a clue what is going on here? Let's have a database. And we'll have to talk about what type of database and the schema so DB, let's put in the come back later, to type and schema. So now, I think the challenge is so okay, we're gonna have, we're gonna have 35 requests per second, 35k requests per second. So it seems like we're not going to, for every, it seems like this will be pretty write heavy. Because basically every request unless like, they already got a burger. But we're assuming most users are not going to click more than once. So I think, I think most users will just be like a write request. So it's gonna be pretty write. Yeah. Red Maelstrom: Just to correct you. Like there could be multiple requests from single user, we should consider, like, not allowing that. Immutable Penguin: Right. Yes, no, yeah, we definitely have to consider not allowing it. I'm just I'm just thinking in terms of like, read and write. Like ratio. So you're saying that it's possible that the read ratio that that reads will be like, significantly, significantly higher than rights because users will a lot of users are going to click more than once. Okay, Red Maelstrom: Yeah, it depends on how you design it like Yeah, but I'm saying like a lot of user can reclick claim. Try to claim the burger multiple times. Like how, whether that how you do, make sure take care of this multiplex how many reads or writes? Immutable Penguin: Right? I see. I see. That makes sense. Okay, so let me so as to seems like this is a very important component of the system. So let's let's think about that right now. It would seem that like, if we have like multiple write replicas, we would want this to be strong consistency, right? We want that we shouldn't have like one user writing to the database that they claimed the burger, and then read from another database that doesn't say that they claim the burger. Yes. So we want there to be consistency. Now, I guess the question is, what happens if there are concurrent requests from one user, right, like, so like, so user clicks twice really quickly. But the they click twice really quickly before, like, the first request, like has a has time to like, execute, and be written into the database. So I'm just thinking that it seems like that if we would have if we would have, like, a unique constraint on the database in terms of like, key in terms of like user, we can sort of as long as there's strong consistency, I think we can take care of I think we can take care of like the double the double burger problem. So like, so what happens user writes to the database, and then it if, if another kind of request that comes like right after it even after, you know, even if they're concurrent, a SQL database that has acid, ACID properties should not allow the second request to, to finish to execute, to execute successfully. So the second request will get sort of like an error response saying that it wasn't successful. So only the first burger will be claimed. Okay, so that I think so. So I think I think it makes sense here, since we need strong consistency, even though it's write heavy, it's kind of write heavy. But sir, I think a sequel system could handle 35k requests per second. And even if it can, we can always shard the database. But I think it makes sense over here it says we need strong consistency to use a relational database system, like something like MySQL or Postgres, Postgres. Great. Yeah. And I think pretty much the schema will just be I don't know that we need anything except for like just the user ID or maybe time like, user ID, and then like, time of claim, maybe. That's pretty much it. Okay. Now, let's see which other which other issues you can run into. That we would need to solve for so the first thing, first thing is possibly sharding the database so that sharding would mean like, we would just like hash by user ID hash the user ID and, and route those requests to those to those to those instances. And then another thing if we end up sharding, the database, like another thing we might want to discuss is like, service discovery, or something like Zookeeper. Okay, now, I guess another thing to make the kind of to make the system efficient, I guess, let's see which other issues we have found that we have to handle. First of all, maybe adding a cache here to make reads a little bit more efficient. That would be one thing to discuss. And also another thing to discuss is like the counts, we want to know we want to know that we don't exceed 6 million burgers, right. So so here's what I guess where we could run into concurrency issues, right? Because let's say, let's say we're at 5 million, right? 999999. And, and two users tried to claim a burger. In that case, we might be able we might overcommit. So okay, so it would seem like a naive, maybe approach to a naive approach to solve this with to be have would be to have a different record in a database, or maybe like a separate table, or one instance that says, How many burgers are left, right? How much inventory do we have starts with 6 million and then as the council is up, it goes, you know, goes down. And then we could I guess, again, this is very naive. But, you know, a naive approach would be like to put a lock on this count, right? So that every user that that clicks, we lack the count, and we decrement it by one, we can't we need the reason why we need some sort of lack or some sort of concurrency man is because because we if two users try to decrement the count at the same time, we don't want it to be decremented just once we want it to be decremented multiple times. So okay, so that's what happens if we so that's the solution of putting a lack the problem of lack is, obviously it's not going to sustain all this traffic, right? Because we have one, we have one record that's trying to be written to 35 times, 35000 times a second. So that's not gonna, that's not gonna, it's not gonna hold up. Fine. Yes, I think we have to look into some sort of replication here. Or, well, not replicate. Yeah, replication, but actually like distribution of these of these counters. And then the question is how to aggregate them right? Once we once we distribute them. So let's say we have let's say we have a bunch, I'm trying to think of a solution to come up with over here, we have like maybe like a bunch of counters, a bunch of counters. Okay, but then the question is, we need to sort of aggregate all the counts, right? So like, what one one solution we can have is like have like distributed counters, and that we have like some sort of some sort of just round robin load balancer that writes every that writes every request to like a different instance. And we can have like versioning, or even we can even have locks really on these counters. Since if we have enough instances, the lack is hopefully not going to take too much time. Or there's another another approach of like having a version, the counter, so to make sure that like every write is actually an increment. So like, actually, yeah, so that would be one thing. That would be one way to do it. But the question is, we need to aggregate these counts. Now. That's not a problem to actually aggregate the counts. We can have a process doing that we can have a worker that kind of like just queries every, every counter, and then we set like, counts the total amounts of count for each instance, like we sets it, and then and goes on. The only problem is, how do we do it in a way that doesn't exceed that we make sure that we don't exceed 6 million. So since we're distributed here. Okay, so we have to think of strategy have a strategy for doing that. Okay, so, yeah, we have distributed counters, and we have a process that kind of aggregates them. And we need to make sure that we don't exceed 6 million. So I'm just wondering if we can somehow write to each counter. Like what's the total left like after each ad aggregation? We say this is Red Maelstrom: So sorry, I got confused a little bit. Now you're talking about like having distributed counters. Right? Immutable Penguin: Right. Let me let me add that actually to the system. So that Red Maelstrom: Yeah, that will help me to understand. Yeah, thank you. Immutable Penguin: Yeah, sure. So burger giveaway service is going to talk to the database, and a cache. That's also gonna have like, I don't know how to what kind of box to make for distributive counters, but distributed counters. And we're gonna have, okay, I'm gonna have a process here. Workers like aggregator aggregating workers, like, angregators. And these aggregators are going to. So what are the aggregators gonna do? They will just write to a separate, just a very simple table, you know, that's probably separate from the main database that holds the users because that one is going to be distributed. This one is going to be like a very simple, just one record. Like it's going to be like, a database with total table. What whole table? Okay, sorry about that. So there's still like a total aggregate is going to write to the total table. So this is my model. Currently, this is the design currently. So we have distributed counters, we have an aggregator. I guess the problem I'm trying to solve right now is, even after we have the aggregator, we don't want to find out after the fact that we're above 6 million. Right. So let's say Red Maelstrom: Sorry like few, logistic questions like where the distributive counters are. Is your burger get given a service is a cluster of servers, right? Immutable Penguin: Definitely. It's replicated. Red Maelstrom: And where these distributed counters are saved. Immutable Penguin: Okay, so these distributed counters would be like, and it would be like, and it would be probably like, in memory count, or like maybe something like Redis or something like that. It's like a cluster. And I guess we can have like a load balancing level, a load balancing tier that distributes them evenly. Or maybe we have, or maybe we even have it. Well, I think it's even possible to have to be honest, to make it quicker, have it. Okay, so the problem is, like, I guess a simple approach would be to have each burger giveaway service instance, you could just save it in memory. But the problem is we want some sort of persistence layer, right? We don't want to use we don't want to lose counts, right? So that's why I'm thinking to actually have it as a separate, like, maybe Redis tier that persists. That also has like a, it's a memory, but it has a persistent storage as a backup. Right? That's what I'm thinking to have like Redis is persistent. Persistent layer, don't know how to spell persistence layer, there we go. So this is all just be a cluster of, you know, of Redis servers. Does that make sense? Red Maelstrom: Yeah, I understand. So basically, you're saying like, if you had these counters in memory on the burger giveaway service? What was the pro con on that like? Immutable Penguin: Yeah no, the pro is that it's, that'll be really fast, right? You don't have to talk to a external service. The con is that if it if a instance goes down, we lose the count we lose the count? Red Maelstrom: Makes sense, I totally relate to that. Immutable Penguin: I'm just thinking, but like, what would be the are we going to be able to like maybe just do a count just on the database itself, right. So whatever the if we can do a cancel, do we need distributive counters? Or maybe we can like on this main database here. I'm wondering if we can just keep doing like an aggregator that goes over all the instances of these databases, and does accounts there. So do we actually need to distributed counters? That's my question. Red Maelstrom: So like, yeah, that that is okay. One other thing like, what is the role of total thing like count where you would use the total thing? This aggregator and then this total thing where you are using it? Immutable Penguin: You mean, where we're reading from it. Right? You mean where we're reading from the total? Red Maelstrom: Okay, I'm sorry, I did not understand. So can you help me with a so request comes to burger giveaway service. So distribute counters are like your incremental counter or decremental counters. Immutable Penguin: We can do it as incremental counters, right? Increments, and then it's resets, reset as the aggregator aggregates it, like resets every time it's aggregated. Red Maelstrom: Oh, okay. But then again, you need some coordination between these distributed counters and aggregators and total count. Right? Immutable Penguin: Right. Exactly. So then me I hit right, that is true. But even when we need some, some sort of coordination, right? Because Red Maelstrom: For this you have the decrementing counter. Immutable Penguin: I'm sorry, which which counter? Red Maelstrom: If you are decrementing counters, like let's say, if you have 10. Let's say you if you have like six distributed counters, they will initially be 1 million each. Immutable Penguin: I see. Okay, yeah, that's a good, that's a good point. Oh, that's a good point. So if we have decrement in counters. The benefit of that is that once a counter is finished, we won't use that one anymore. But we can still go to a different counter and check. Does any other counter have have capacity still? And, and since since we'll have like a load balancer, or it'll come with maybe the Redis cluster will give us like for free, like an even distribution, we can assume that it's not going to be like one count. One one counter or finishes much quicker than the other. Okay, yeah, that is a good point. Okay. Yes, that is a good point. So in that case, we don't need the aggregator. And we don't need, we don't need the total table. Right. All we need is distributed counters. Let's just write here. Decrementing. So basically, we start with as many divided by, you know, as many, 6 million divided by as many instances as we want to have. And, and, yeah, okay, so yeah, so we could, again, so we we can use the same schema as before, using like versioning or a lack on these counters to make sure that each each one is actually decremented. Okay, awesome. Cool. And, yes, I think that takes care of the counter problem. Red Maelstrom: Okay, and like where you would save what, what is being stored in db? Is this db. Immutable Penguin: Okay, so, yeah, in this db right here. We're, first of all, we're going to add a cache, but we're just going to store just the specific users that already claimed a burger. So I'll have so basically before okay, so we will have a cache, I just want to just try and trying to figure out how it would work out with the cache, because we need to write to the cache as well as we're reading from the cache. So let me try to figure out what the order of operations is. So the burger giveaway service, first thing we have to do is, I guess, write to the counter Red Maelstrom: Like do you need cache or db both? For saving 6 million user IDs Immutable Penguin: Do we need a cache. I mean, Red Maelstrom: Can you estimate the storage for 6 million users user IDs? Immutable Penguin: Yes, 6 million user IDs will be able to sit on one instance for sure. But the question is to speed up the reads, but I guess I guess you're right that that we don't know. Now I'll tell you what, I'll tell you why I'm going to reconsider the cache right now. Because generally, you'll use your cache, when you have some data that you know, was going to be, you know, more frequently accessed than others. Right, right. In this case, we don't know, which data will be more frequently accessed, we don't know if there's going to be a specific user that's going to just keep on trying or whatever. So we don't necessarily, and even if like one doesn't even have, like, one keeps on trying, it doesn't mean that they'll they'll still keep on trying, you know, in 10 minutes, maybe it'll be different users that will keep on trying, right. So I think that's why it makes sense, maybe not to have a cache. And I guess it's okay, if like subsequent, I guess it's a little more acceptable, if like, the Red Maelstrom: And your all entire data can sit in memory, right. So technically, it would be in memory database not to cache. Because like 6 million could be Immutable Penguin: Right 6 million could very easily fit in memory. So I guess we can have a Redis layer for this as well. Redis backed by persistence, as for both of these, right, so like we have Redis backed by persistence. For both of these components, that's a good point. Yes. Okay. So then, Red Maelstrom: yeah, so let's, let's go back to the flow again. Immutable Penguin: Right. So let me go back to the flow. So yeah, go ahead. Red Maelstrom: Yeah. So basically, what kind of load balancer you are having, like, between the API servers is it's it's a round robin, right? Immutable Penguin: Between the API servers in the burger giveaway service. Yeah, we'll have a round robin load balancer. Red Maelstrom: And this distributed counters would be on some, like, extra component in memory counters. Which counter to use is like, how will you know, like, which counter to use? From burger giveaway service? Immutable Penguin: Yeah, I'm not familiar with exactly what the different products but I think what we would need is in memory, an in memory counters that has a persistence layer as well. Anything that? Right, and probably just like, values? All right Red Maelstrom: Let's assume that there exists something with like, which counter any particular request goes to? How will that determine how will you determine that thing? From burgers service to counter? Which which counter? You would be? Immutable Penguin: I don't think it really matters. I think we can use a round robin, Round Robin, Red Maelstrom: You would add a load balancer inbetween? Immutable Penguin: Well, I'm assuming that that the that the database itself kind of can take care of that, like the database service, like once you spin up a cluster? Like it'll take care of like the load balancing? I'll take care of I think there are, I'm pretty sure that there are products out there that will you know, that you spin up a cluster, and it'll take care of like routing each request to like, a separate instance, based on Red Maelstrom: so you, let's say you don't have this thing. How, how would you do that? Immutable Penguin: So then you would need some sort of like service discovery, just to tell you which instances are are available. And you would use a round robin scheme to show like you would have. So you would have like a right, you would have like zookeeper over here. Red Maelstrom: Cool. Yeah. Let's not complicate this design. Yeah, fine. Understand, like somehow it works. So yeah, so just just like helping, like with the flow from any particular request, when it comes to burger giveaway service was more worried the first step, whether adding entry in db, the first step or incrementing counters. Immutable Penguin: Yeah, so I think the first the first step is going to be to decrement the counter because we need to know if we have capacity available, right? So we search for a instance that, you know, actually has capacity available. Okay. So I see, I see. Well, your question, actually, your question is actually a pretty good one, because we also need to have some logic that so maybe we can have this actually on the client on the kind of on the server level. I have some logic that says, try it, try it, try an instance. It's the counts as zero. You know, try keep try, you know, try a different instance right, try any other instance randomly, and or not, well, not randomly. Well, maybe we can have some sort of left some sort of service that I don't know if there's something that exists, but I'm just thinking that like, Red Maelstrom: How about like simple caching, simple hashing on user ID, and then like hashing on a user ID and based on hash going to one of the counters. Immutable Penguin: I don't know if that would work, because what happens if one counter kind of runs out of capacity, but there's still capacity left and other counters? Red Maelstrom: Yeah, good. Good point. Yeah. Immutable Penguin: So that's why I'm thinking that's I'm thinking we have maybe some sort of service discovery service that, that that in addition to having the health check, right, it has a health check. And but in addition to having a health check, it also checks like, what's the current count is? If the current count is zero? It just brings the it just brings it offline or it deregisters, that service? So now, you know, when we when we query, Zookeeper, again, we only have the instances that are currently still have capacity left. Right. So so that would help us like, you know, route? Yeah, we would have to have some sort of logic on the on the on the request that says like, if if it's zero, we try a different server. And if we can't find any that has been as capacity left, then just like fail the request, we're gonna have to have that logic. But anyway, but once we get a successful read from the distributed counter, meaning we found one that still has capacity available. That's when, Oh, I see what you're saying. That's when we write to the database. But I guess we also have to check the database before, right, we have to check to see if this user are ready. Before we actually decrement the counter, we have to check that this user didn't get a burger yet. So it seems like maybe we would have to have some sort of like status. I think maybe what we have to have is like maybe a status together with the user ID in the database and the main database. So like, we have a user ID and we say like pending, right? So the first check is to the to that in memory database. And if it doesn't, if it exists, even in a pending state, it's if the record exists right away, return a failed response. If it's a doesn't exist, we write one with a pending a pending status. And then we have to make sure that we have counts available. So we make use of the counter. And if we have available, we decrement and then write succeeded into the database. And we return a success response. Otherwise, we go back to the database and just write failed. And, you know, a failed failed status. So yeah, seems like we would have to have two, two checks to the main database, and then one for the counter. Red Maelstrom: Cool, awesome. Yeah, I think this will work. That's pretty good design, like you go a lot of things. So now, let's say like, this is a design, which you have, what do you think could be? What do you do in terms of making it operational in terms of operational excellence, like what alerting, monitoring you do here. Immutable Penguin: Right so anytime, like, right, so anytime a monitoring service, I guess we can just monitor basically all levels of the service like meaning we can check throughput, right, we can have like a monitoring service that checks throughput to the database to the to the main database and make sure that if it goes down, you know below what we expect, we have an alert. The same thing with the counter is basically on all levels we check. We also check response times. We check what else we can also check like memory utilisation on the distributed counter instances or on the main database as well. We can have like memory utilisation measurements and alerts. But yeah, I think those are pretty much the main. Maybe we could also we could also like another thing we can monitor is like response error responses. If let's say like an error response, error responses go up, then maybe we can, we can see. Which is there a specific user that keeps on making requests and maybe sort of block that user at the at the API gateway level or like an earlier level? Red Maelstrom: Yeah. Cool. One last question. You talked about API gateway is rate limiting, what kind of rate limiting this would be? Immutable Penguin: Um, what kind of rate limiting? That's a good question because over here? That's a good question. Actually, what I was thinking about rate limiting, I was thinking of like for the for like, this API gateway would sit in front of like, all of our different services, like I'm assuming like, this app has more than just a burger giveaway service. So but the truth is, you make a good point, that rate limiting, maybe we can have like a limit of one on on the specific, you know, endpoint on the burger giveaway endpoint. But I don't think we would want to have a rate limit of one. Because what happens if for some reason we have an error? Yeah, so but we would at some rate limiting would the I'm not sure what you mean by like, what type of rate limiting but what I'm, what I'm imagining is, like some configuration where, where, by where we are, we have a specific configuration per endpoint of like, how many requests we can have, like, per minute or per hour or something. So like, some sort of algorithm, that Red Maelstrom: The rest of the things won't be like overwhelmed yeah Immutable Penguin: Exactly. Yeah. Yeah. Red Maelstrom: So it's kind of circuit breaker. Okay. Got it. Yeah. Yeah, that's such a solid design. Let's, let's move to the feedback part of it. Like, yeah, I think like most of the things were, like, really good. Like you really did well. In functional requirement, non functional requirements. I think one thing like, do you have any framework in mind, one thing I think is like you missed quite a few details like, like you, you missed, like estimation part overall, like you did not talk about estimations at all. Yeah. Immutable Penguin: In terms of like storage? Red Maelstrom: Yeah, it could be like storage estimations. Yeah. Because like here, that storage estimation was important to make a decision, like whether your in memory database is enough, whether you need caching or not, like, so you were using cache and thing, which, which were not relevant to this design, though, you were able to correct it and understand and get that feedback. That's a good positive point. But like, it doesn't look good. Like, it's expected that a senior engineer can ask or gather all the numbers, which are important for design. And so registration is one of them. So yeah, to like, basically, like you have which structure right, so you talk about functional requirements, you talk about non functional requirements. So you can have a third section saying that estimations, and there you can do the estimation about QPS. Storage, and those things. So that like you don't skip it, like added to the format, like you already have format, right? So to add that one step to your format, and that will be good. I'm not saying like, it's not a big deal. But like, these are the small things. Like I want to feed back to be on that improvement side. I don't want to give like everything is good and just go. Like, I know, like, most of the things are good. 90% of things are good. I just want to focus on 10%. So that like, there is no one stone, any stone unturned so that you can work on it. So yeah, so there is no there is no question on your skills or your preparation. I'm just talking about like you could have. So interview, you can introduce estimation as one section, and you won't miss such critical data. And then there wouldn't be back and forth in your design decisions when you're deciding, deciding on storage, applying caching applying database, so cool and yeah so basically, I will also suggest you to add one more section in here. Like after like so, estimations and API design. So basically, here it was, here, it was a single API. But let's say like, if there isn't a question like Uber system or something, right, so there would be multiple API's user facing API's, right? And it gives you that feeling signal that you are thinking about abstraction first, and then like, basically, that's how you usually do the coding also, right? You talk about the interfaces, and then you go and implement those interfaces. So similarly, like you talk about the context of your service, and then you go and come up with a high level diagram for implementing those service. Immutable Penguin: Makes sense. Yep. Makes sense. Red Maelstrom: Yeah. I'm not saying like, miss like skipping it. wouldn't be a big issue. But like just adding it will give you the it will make you fall around sound like a very structured, experienced engineer. So yeah, that's why I think this, these two sections would be good could help you to bring some structure into your thoughts or like or it's not about like thoughts. It's about like how interviewer perceive your design. So it gives him a structure like, Okay, fine. Can we talk about estimates? Can you talk about API's? And now he is going to implement those API's using different services? Immutable Penguin: So you see, you would suggest talking about API before the high level design? Red Maelstrom: Yes, yes, yes. So just after those two missions normally just know, you can have like the section functional, non functional, then estimation section, and then API design section and talk about the API's, we can talk about, like user facing API, or let's say, in some cases, there will be some background API's, which are not user facing but like, you might want to talk about them, you can talk about them, so that you don't miss on any critical thing. And then you did, what you did was anyway, good. Like, there is no scope of improvement, at least in your design process. Only one small thing which you can additionally do is like, you do talk about you reason about your design choices, and you do talk about them and like, but the interview is one hour interview, if you have time, maybe you can talk about like you can explicitly talk about, okay, this is a trade off. And these are the two choices. This is a pro and this is a con, you can even write that down. So that like, it gives a clear signal that you identified a trade off. You talked about pros and cons, doing verbally is also fine. But like, if you do it, like if you write it down, it gives easy to note, it makes it easy to note down for interviewer also that okay candidate data trade off analysis, and he, he or she did come up with a pros and cons. Immutable Penguin: So in this design did you see like a trade off discussion, or you Red Maelstrom: You already did. Yeah, you did talk about like, the unit talk about like, pessimistic locking, optimistic locking versioning we did talk about different, different solutions. You did make a right choice. So yeah, I'm not telling you miss out. Immutable Penguin: You mean like to actually write it down? Okay, cool. Red Maelstrom: Yeah, I mean, like, if you write it down, and so it gives like clear indication that you did identify that trade off. It will be easy for interviewer also to take notes. And if he's a new interviewer, he might miss this, even if you're saying that. So, so just be on 100% from your side, you get that signal? Maybe do it for one one of the trade off not for everything. But yeah, Immutable Penguin: Right. I see. I think got it. That makes sense. Yeah. Red Maelstrom: I also like your style of like, coming up with high level problems, and then keeping them back for later. That also shows that you you just want to come complete your design first and then go into the individual components that shows like you have done that gives a signal that you have worked on such things where there are a lot of things. You go in implementation details later and firstly come up with a different components. So yeah, so yeah, keep doing that's a that's a that's a good thing you're doing. Yeah, well, I have only this feedback. I will have I will write my detailed notes into the feedback. Yeah, but Immutable Penguin: Okay, I just one question I had is about like the service discovery or like the, like Zookeeper. So I noticed you didn't want to go too much into that. Do you think that this was a good strategy for for, like, you know, how to balance the distributed counter load? Red Maelstrom: Um, yeah, that's what I, in my opinion, it would have been an overkill, you could have just come up with like, some consistent hashing algorithms which hash your user IDs into some numbers and then you can take like, and then from that number, you can go to one of the counters, even that would have been like that would because like having a additional zookeeper and I have would add one more latency hop. So here, every service can just come up with like hash that user ID know the hashing algorithms that it can hash user ID. And come up with some numbers which can on go to the counter to the problem of you like some new counters might be finished early but like statistically hashing algorithms are expected to be uniformly distributed. So happening, having that situation happening is statistically rare. You can rely on stats. So yeah, it's not like some positive or negative. Yeah. But I, I personally don't expect like one particular solution to be true. But like, I just want, I just expect handed to no reasoning behind their choices. And you are, you're doing fine. You're doing good at that. So yeah. Hashing algorithm also works. But there will be an easier solution here. Immutable Penguin: Gotcha. That makes sense. That makes sense. So you think for L5 that Google, this would be a hire decision this interview? Red Maelstrom: Yeah. This would be a higher decision. Immutable Penguin: Okay, awesome. All right. I really appreciate it. This was good feedback. And good question. So appreciate it. Red Maelstrom: Cool. Awesome. All the best. Yeah, I'm only one additional thing, which I talked to every candidate who's going for Google, is that your coding interviews, also has like coding interviewer also decides your level. Unlike meta, or something like Uber, the Coding Interview does not decide to level but Google coding interviews also decide the level. So don't go with like, go with the approach of just solving the problem. Do consider about asking questions, like, will entire graph will fit in memory, or do I need to worry about it? And once we're done with the code or something to talk about testing, validating input, making the code more modular. Don't even shy away from talking about paralysing approach. So yeah. So I'm just saying like, these additional things. So like Google coding also decide the level so do give signals that you are experienced engineer, and you're not just one who just come and write a code. Immutable Penguin: Got it. Makes sense. Okay. That's a good point. Red Maelstrom: Awesome. Yeah. All the best. I'm, I'm really hopeful you will get to at least google over on Google. So Immutable Penguin: Awesome. Thank you so much. Red Maelstrom: Yeah, I think yeah, you need to get a feedback after this. So Immutable Penguin: Yeah, I will. I'll get Yeah, I was very satisfied. So I'm gonna give the good feedback. Red Maelstrom: Awesome. Thank you. Bye Bye. Immutable Penguin: Bye Bye.

Practice the Design a free food app problem and more

Become awesome at interviewing, and get actionable feedback from engineers at top companies like Google – it’s 100% anonymous!