We helped write the sequel to "Cracking the Coding Interview". Read 9 chapters for free

interviewing.io

Please read our definitive guide on Meta's hiring process and questions

An Interview with a Meta engineer

Watch someone solve the design live comments problem in an interview with a Meta engineer and see the feedback their interviewer left them. Explore this problem and others in our library of interview replays.

Interview Summary

Problem type

Design Live Comments

Interview question

Design a live comments feature. Live commenting is a feature that allows clients to publish real-time comments on live videos or pictures without the need to refresh or reload.

Interview Feedback

Feedback about Utilitarian Lemur (the interviewee)

Advance this person to the next round?

No

How were their technical skills?

4/4

How was their problem solving ability?

3/4

What about their communication ability?

3/4

Overall weak no hire as M1. ✅ What went well: 1. Good functional requirement 2. Good non-functional requirement and why they’re there 3. Good assumption read vs write ratio and calling out about hot spots (celebrities) 4. Good API design, lacking response 5. Decent data schema. Includes the “live” problem 6. Good data flow 7. Good component responsibilities 8. TC drive the discussion well 9. Core puzzle: talk about using push (webscoket/long polling). Didn’t talk about the alternatives 10. Deep dive: talk about CAP theorem and how it applies in this app. And decided to go with high availability. But doesn’t tell us what is the implication on the system 11. Deep dive: talk about fail tolerance on storage ❌ What can be improved: 1. Spent 20 mins in functional + non-functional + numbers. Benchmark is 10 mins 2. Doesn’t really explain what non-functional mean. What’s the implications 3. Doesn’t really explain what the numbers mean. What’s the implications 4. Lack tradeoffs discussion 5. Lack 2-3 more deep dives (DB choices, DB sizes, connection with choices numbers) 6. API lack responses Action items: 1. Memorize all non-functional requirement possibilities, then just shoot them (yes, no) 1. Reliability: 99.99% system availability 2. Scalability: can handle up and down traffic 3. Security: only one public endpoint. Code is executed safely 4. Durability: store data for 10 years 5. Latency: p95 200ms 6. High Availability vs Strong Consistency 2. Explain by usecases (read, end to end) vs by sections (API, then data schema, then high level design) 3. Don’t clarify, just shoot ahead. Let the interviewer disrupt you on checkpoints 4. Not all numbers are important. QPS is always important. Storage number is usually important. 5. Think about Core Puzzle, and at least make the tradeoff about Core Puzzle Level implications: 1. E4 you have to do things right 2. E5 you have to do trade-offs (SQL vs NoSQL, core puzzle, push vs pull, REST vs GraphQL, sync vs async) 3. E6 you have to go deep (offline support, multi-language support, battery optimization) 4. E7 you have to impress (something that most people don’t know) ## Tips: 1. Easiest way to sound smart (and have opportunity for deep-dive): user → LB → API gateway → service (don’t forget to mention API gateway) 1. API Gateway allows you to put these following deep dive: authentication, security, rate limiting, throttling, transformations, analytics and monitoring 2. Details: https://twitter.com/alexxubyte/status/1567177071725793283 2. Keep functional + non-functional + quantitative analysis down to 10 mins, no more than this. 3. On quantitative analysis, you can use quick numbers: 1. 1MB → 1e6 2. 1M DAU → 1e6 3. 1 day → 86400 sec → close to 100,000 sec → 1e5 4. Usage on computing QPS → 1M DAU → 1e6 / 1e5 → 10 request/sec. 5. Usage on computing storage -> 1M DAU * 1 MB * 5 years -> 1e6 * 1e6 * 5 * 400 -> 1e6 * 1e6 * 2e3 -> 2 * e15 4. You can talk high level design + API design + data schema + data flow at the same time. You can save 5-10 mins by doing this. 5. Persist all the discussion into writing/drawing. You don't know when your interviewer will actually get a chance to write your feedback 1. They might get into other meeting after your interview and only get the chance to write feedback 4 hours after the interview 2. By that time your interviewer will forget whatever you're talking about 3. So write/draw down your discussion 6. Master your drawing tool. Practice with excalidraw (cuz Facebook uses excalidraw) 7. Stop in each milestones and ask question: "am I going to the right direction, or do you want me to go deep into somewhere else?" 1. For example: stop after discussing non-functional requirement 2. But don't ask in every assumptions that you have to clarify. Cuz this will waste a lot of time. Every minute counts! 8. Your biggest enemy is not your tech skill. It's usually time. Remember that you only have 35 mins.

Feedback about Digital Cactus (the interviewer)

Would you want to work with this person?

Yes

How excited would you be to work with them?

2/4

How good were the questions?

3/4

How helpful was your interviewer in guiding you to the solution(s)?

3/4

Interview Transcript

Digital Cactus: Hello there. All right, cool. So just double checking before we start. You are here for system Design interview for Meta, right?

Utilitarian Lemur: That's right. Yeah.

Digital Cactus: Okay. And you are sure that you are looking for design and product design?

Utilitarian Lemur: Yes. System design for an engineering manager. Engineering Manager is the role that I'm currently doing, yeah.

Digital Cactus: Okay. Engineering Manager. Okay, cool. And what level are you targeting?

Utilitarian Lemur: M one.

Digital Cactus: Okay, cool. Ah, sounds quite straightforward. So M one is going to be between L five and L six. So I am going to adjust the interview to that level. And before we start, I'm just going to be conducting a normal Meta mock interview and I'm going to give you a feedback in the last couple of minutes. Is this something that align with what you are expecting or do you have other things in mind?

Utilitarian Lemur: Oh, no, it's pretty much the same thing. It's like for an M one, if it's like an L five, L six, then that's perfect. That's the expectation from my end as well.

Digital Cactus: Okay, cool. So the interview at Meta is typically 45 minutes, right? But really the first and then last five minutes is just Q and A. So the real interview itself is actually just technical. Just 35 minutes.

Utilitarian Lemur: Got it.

Digital Cactus: So let's try to actually simulate that experience. So you can also reflect your timekeeping and time management correctly. So now it's two. Let's try to finish by 37 and use the remaining time for feedback. Sounds good to you?

Utilitarian Lemur: Yeah, that sounds good.

Digital Cactus: A question that can go between alpha and alpha. Okay, I'll give you this actually, let's draw something in the whiteboard. So are you okay? I can see you here. So let's say this is like a mobile app of Facebook and you have some content here. Some content like video, photo, whatever, right? Time to time. You will see comments below that photo. Okay, whatever. So the interesting thing happened if you are just staring at the screen, right, not doing anything, just looking at the screen, enjoying the video or the photo. And then sometimes you will see somebody's typing, and then somebody's typing gets resolved into a new comment commentary. So this new comment is what we call that appears out of nowhere, right? You don't pull to refresh, you don't do anything. You're just enjoying your photo and enjoying your type, right? But suddenly it shows up out of nowhere. We call it live comment.

Utilitarian Lemur: Got it.

Digital Cactus: So please be send me live comment.

Utilitarian Lemur: All right, so pretty much the feature build that is being asked for is like a live comment feature.

Digital Cactus: Yeah, the live comment.

Utilitarian Lemur: All right, I'm switching back to the text just so that I can walk through what the functional non functional requirements are. Functional requirements from this end. A quick couple of questions from my end would be just to clarify the scope. Like when the comments show up, should we do any sort of filtering? What I mean by this is that sorry.

Digital Cactus: Go ahead. What do you mean by filtering?

Utilitarian Lemur: So filtering here pretty much means that do we need to look at the text and censored words or anything along those lines, or is this just like, comment as is? We are just posting it?

Digital Cactus: No, I don't think so. I don't think you need filtering.

Utilitarian Lemur: I would say it's basically along the lines of, like, privacy, security, anything along those lines. But the short answer seems to be.

Digital Cactus: No. Yeah, no, it's out of scope of this project.

Utilitarian Lemur: All right, perfect. And let me do we allow people to update the comment, delete the live comment? No, it just creates all right. The other questions that I had there is the comments just text, please, or do you support media, like uploading Photos video?

Digital Cactus: Yeah, I want to cover that, but I don't think that's important right now.

Utilitarian Lemur: All right, okay. Just for now. Right. Quick time check. I'll just use one more minute to ask one more question over here, I think. All right. No text, no testing is reaction to the comment. So what I mean by this is.

Digital Cactus: Pretty simple, but no.

Utilitarian Lemur: All right. I think this makes things very clear. The only thing that we have to worry about is, like, creating and reading the comments. There's no filtering, so there's no processing involved. It's just text based for now. There's no reactions that we have to do for the common set of features. I think. Apart from that, did I have any other questions for Functional requirements? Yeah, no, I think this makes at least the functionality very clear in terms of what we need to do.

Digital Cactus: And.

Utilitarian Lemur: I'll just move on to the non Functional requirements part.

Digital Cactus: Wait, before we go, you basically list me, like, what it is not. You haven't told me what it is. Sorry, here you basically listed what it is not. You haven't told me what it is. What is in functional requirement? You mentioned what is not in the scope. This all here, like, all four, is basically what's not in the scope.

Utilitarian Lemur: Yes, what is in scope is this is what I've understood from what is in scope is creating and reading comments. Second one is support for text based comments. Third one is basically along the lines of, like, ordering comments by time based on what time is posted. We just ensure that those comments come in over here. And the fourth one is I think you also mentioned this is where someone is typing that is in scope as well. So these are the four main things that I've seen based on what is in scope, based on what you mentioned in the diagrammatic representation, and also just based on the scope questions that I asked before. Is there anything else that I'm missing?

Digital Cactus: No, I think that looks straightforward. Yeah, that sounds about right.

Utilitarian Lemur: Perfect. Right. I'll move to the Non Functional Requirements. Basically here the questions that I'm going to ask is.

Digital Cactus: Okay, based on your knowledge about Facebook, how much do you think will be the number of people that will write the videos of photos?

Utilitarian Lemur: So the average number, basically what I'm getting at is excluding celebrities and celebrity personalities, an average photo would get around like, I would say at least a maximum of like 25 to 30 comments. That's pretty much like how I'm thinking about this. And this would pretty much mean that if I'm translating this to, like, Facebook scale, you have 1 billion users. And even if you assume that like 1% of these users comment on a post that would translate group, I would say 10 million. And if that is the case, then you have 10 million people that are commenting like twice or twice.

Digital Cactus: I would.

Utilitarian Lemur: Say four comments per person. You're looking at like 40 million comments per day. That's pretty much like I mean, these are assumption numbers that I'm making. But does this on average look reasonable in terms of how many live comments would we expect to be written at any given point in time per day? And this is globally and fights due to celebrities.

Digital Cactus: Okay, nice. Good call. Out on thinking about Philanthropies. Sure. This seems like a good number.

Utilitarian Lemur: All right.

Digital Cactus: No, this seems like a good number assumption. Sure.

Utilitarian Lemur: I'm sorry.

Digital Cactus: This seems like a good number assumptions. All right, what else?

Utilitarian Lemur: I think after this is where we would go into sorry, let me just quickly think about this. So you would have 40 million comments. Is there anything else that I'm missing with this kind of a thing? Now, you can go for two particular events, which is either like high consistency or high availability. Trade off.

Digital Cactus: Okay.

Utilitarian Lemur: Now, with a live comment feature, one of the things that we could definitely think about is having a more high availability with eventual consistency. As like the goal that we could go for mainly to ensure that when I'm reading a comment versus like, person A, reading comment would see ten while person B would see 15, and eventually the information becomes consistent. That product experience would be better, at least in my the way that I'm thinking about this is like having the ability for a person to see all the comments and then eventually catching up to consistency across different set of users would be a better experience versus enforcing consistency. Because as soon as someone loads a photo, it's better experience as much as possible.

Digital Cactus: Let's see. Yeah, sure, sounds fine.

Utilitarian Lemur: Yeah. And pretty much what this also means is that if you look at, like, the quorum calculations would be lower. It basically means that the number of machines that give us consistent information like for example, closest machine we can just be like within our system diagram. If we say there are like four or five replicas and if just one of the replicas returns us the information, we just say okay, this is what we are going to show. Rather than doing a quorum calculation of like, okay, give me responses from four different servers, seeing if they are consistent and if they're not consistent, not returning anything back. It's pretty much a translation of what this means that I wanted to go, okay, all right, so this is with non functional requirements. You have the trade offs that I've just described and like the number of user calculations. Now I wanted to go to the high level design, and this pretty much is divided into three subsections. One is API. Patterns. And here would be like the three main things that I would cover is API patterns, data model, and capacity estimation. Now, keeping time in consideration over here, I'll just go through all of these things very quickly. So here you would have create, post, ID equals something and comment equals this text. That would be the create and then read comment. Also, I apologize if my keyboard is too loud. I'm trying to type as softly.

Digital Cactus: No, it's fine. I just realized that I forgot to ask you several things here. There are two things that I want to ask. The first one is the latency requirement. And then second one is, is the comment going to be persistent on the manufacturer requirement?

Utilitarian Lemur: Okay, I missed asking these two. Okay. I think from a latency requirement, it goes back to the availability question in terms of how quickly do you want to return the information. This should be low latency and if the comment going to be persistent. Yes. And this goes back to one of the probably functional requirements that I should have asked is replay feature. Since we are doing live comments over here, do we want the ability to be like, when someone clicks the video all over again and when the live section is done, do you want to reload it in the particular feature? So I would say comments going to be persistent. It should ideally be yes, because you would still like to see the comments. So if it makes sense that there's two things, like one, we need to have a persistent storage for the comments feature as well.

Digital Cactus: Replay feature? I don't get what you mean.

Utilitarian Lemur: So if I go to the whiteboard again so you saw the commentary show up as like a live comment, and if.

Digital Cactus: It.

Utilitarian Lemur: But let me just redo this. If I load the same video again.

Digital Cactus: It it.

Utilitarian Lemur: I see the comments loading one by one as though I was watching the video all over again. That's pretty much what I meant by the live comment video.

Digital Cactus: Oh, I see. So it doesn't apply for photo, right? Only for video.

Utilitarian Lemur: Got it. Okay.

Digital Cactus: But no, you don't need to do the replay.

Utilitarian Lemur: Got it. Okay. All right. I think that makes it clear. So comment should persist, but no replay view and latency requirement is low. Latency requirement, low latency. Right. Latency consistency and availability? All taken care of. But yes, sorry. Going back to the high level design, the API patterns are basically create and update over here. Data model will be apart from the major data models that we look for, which is, like, user level information, post level information, comments is new table that we would like to do. And I'm going with a traditional relational data model. It's very familiar to me, which is the reason why I'm giving a relational data model. We can, of course, go for, like, the non relational data model as well. But in my case, just to move the conversation very quickly over here, I'm just going to outline how the comment section looks like. You're going to have the ID column, you're going to have the Text column, and then you're going to have Post ID so that you can see which comment section it was posted by and User ID, which user posted this and all of this. If I am looking at this just to be ensuring that we have enough unique IDs being generated, you would have 64 bits, which means it takes eight bytes of storage text. If you're allowing for like, 50 characters, then you would be 50 bytes. Same thing for Post ID and UID as well, which will be eight by two timeshare 24 minutes. So 50, 56, 24. I'm going to approximate it to 80 bytes just so that I have a nice even number that I'm working with. And if I multiply this by 40 million comments that we expect globally per day, then you're looking at like, KB, MB, two GB of storage is pretty much what you're estimating on a daily basis combined both the same capacity estimation. All right, any questions here before I move on to the high level design?

Digital Cactus: This one I'm still not sure about this one. The API for this because I assume this is a Rest API, right? Is this a rest API?

Utilitarian Lemur: What is oh, so this is the Write API. So this would create the information, create a comment for a given Post ID, whereas this is reading the information for a given Post ID.

Digital Cactus: Yeah. So.

Utilitarian Lemur: This is Rest API is pretty much what I'm going with right now.

Digital Cactus: Okay.

Utilitarian Lemur: You can have multiple configurations in terms of having GraphQL over here as well. But just for familiarity purposes, I'm just going with Rest API right now. You can definitely have implementations in, like, GraphQL over here where you define a schema, and then you have a payload that is sent along the lines of something along these lines that request a response over here. Sorry, just give me 1 second query of create that would have something along the lines of, like, Host ID, and then you would have the information that is being returned in terms of text users, et cetera. But that's a more complicated way of approaching this that I don't think I have enough time to get into which is the reason why I'm just going with the simple API patterns. The simple rest API.

Digital Cactus: Okay, sure. I'm actually wondering about this particular one read perspective, but I will let you explain it's up to you. How are you going to explain this?

Utilitarian Lemur: Sure. So the basic idea here is that given a post ID, we need to get list of comments. If that is the case. What this API is basically telling us is that go to the read path and what I want is comments. Or you can do the other route of like read post ID and type equals comments. But the idea behind this is that you have a specific path and for the path that you provided over here, you get what the post ID is and then you get what type of information do you want which is common. And each one of this triggers like a function in the server side.

Digital Cactus: Yes, it makes sense, but I'm just wondering from the user experience, are you going to keep calling to this API to get the newest comments? No.

Utilitarian Lemur: I wanted to tackle that in the high level system design in terms.

Digital Cactus: Of how I was actually going. That's why exactly I said I'm going to let you explain whatever you want, but I just don't think this is the right way to do it.

Utilitarian Lemur: Okay, so in this case this would be okay, let me clarify. This would be in this case, either Http long pole or a WebSocket connection is pretty much what we would provide over here to get this ongoing set of information. If your question was around like, what type of protocol would I be using, it would be one of these two which would either be a long pole or a WebSocket. Now I would prefer a WebSocket connection in this case mainly because of the fact of it's bi directional. So that not only doing instead of doing like a separate create comment every time, you can use the same connection and send comments as well. Whereas if you do a long poll with server sent events, it's a perfect way to subscribe for events in terms of like push comments being sent to you. But if you have to send a comment, you will need to send a different Http request for it.

Digital Cactus: Okay, cool, sounds good.

Utilitarian Lemur: All right, I know, time check. We are nearly out of time. So I'm just going to draw out like the basic level diagram over here. Going to use this as the main thing. I'm just going to go silent here while I keep drawing like the major block level diagrams. This then you would have storage.

Digital Cactus: That's it.

Utilitarian Lemur: Create service layer. Then you have it.

Digital Cactus: It.

Utilitarian Lemur: Should be it's, it very high level. The way this workplace is going to work is that you have a client over here that is giving a read subscription to like the load balancer. Now from here you have a configuration service. Configuration monitoring service over here. Monitoring is basically taking a look at all the service layer and figuring out which machines are performing as expected. The configuration is trying to now figure out a layer where it acts as the source of route to figure out which machine is the closest to the user based on what the load balancer is asking us. So you would have come here, come here. One, two is basically returning like this machine is the one that is available. What then happens is that this now sends a request to this particular machine, a single machine that says that, okay, this is the user and this is the video or photo that they want to subscribe to for live events. And what the subscription service now does is there are a bunch of machines over here stores a mapping of user ID to post ID. These set of machines. What they pretty much do is that they help understand for the WebSocket connection that gets created between the client machine and this, sorry, not user ID, but rather connection ID. It's basically a way of saying that for each connection there's a user that has a connection to the user and for the connection which is the post that is being mapped to. And this is stored locally on the set of machines over here, mainly because of the fact of like, you don't need to persist this information anywhere outside the machine because if a machine goes down, the connection with the client also goes down. So the client needs to reestablish the connection all over again. So that particular you don't need to persist it, which is the reason why you have it locally on the machine. Now once this is done, this machine now subscribes to what we call as a push service. And what this push service basically does is I forgot to do one more thing over here.

Digital Cactus: I will let you finish.

Utilitarian Lemur: What the push service basically does is for any message queue events that are sent from the comment creation service, it pretty much figures out like, okay, this is a new comment that is coming in for this particular post ID. Now I need to figure out which is the machine ID that is subscribed to this particular event comments for this particular session. So what the push service does is it looks like the memory cache over here figures out which is the machine for which this comment needs to be pushed for, looks that up, sends the information back to this particular machine, and now this machine knows, like, okay, this is the connection that is interested in this particular post, and pushes the information back. So this bi directional connection helps in this manner of like or at least the flow is like you have the right event from here, the right event goes to this common creation service. This common creation service handles all these events and this machine key over here. And once this is done, this push service acts like the consumer over here, gets events from here, pushes it back to right connection. ID. This is a super simple way of me showing how the information flows across the board. Now, the reason why I've created a queue over here is that there is another service layer over here that pulls these comments from these message queues and then sends it to the persistent storage so that you have the list of all the comments that have occurred for a particular post.

Digital Cactus: ID. So if the message queue dies, the comment dies altogether?

Utilitarian Lemur: That's a good question. So to that extent at least you would have multiple message queues that are created. The second thing that you could also do is add a proxy service or add a data layer in front of this so that you have information that is stored in some manner because you want to prevent what I'm calling as single point of failures. But yes, you do face the risk of like if you don't have the message to have that information sent, then you would return it. But the multiple ways that you can avoid this is like in these manner, I'm just going to outline it over here. One writes to disk in the message view machine. That is one way that you can avoid this. The other way that you can do this is you can have an in memory cache that is sent behind while the common service is publishing this. It's like it goes over here and it also does this. So in that case, you have a failover over here. But again, this also has like what happens when this fails and what happens if this fails. You could have multiple backups of this at any given point in time. You're not relying on one set of machines, but this gets replicated as soon as you have this information over here. You have replication that is built out so that you have some sort of persistence layer. But if you look at systems like Kafka and other message queue systems, they do have built in storage and built in replication mechanisms that help you avoid these kind of scenarios. But you can solve this problem with two. One, writing to in memory caches at any given point in time and figuring out how the recovery should work. Because not only over here, you could have failures in this particular service as well. Where it's read a comment, reads like a message, from here it fails. What do you do in that scenario? There should be a retry mechanism that goes back, figures out if a message queue has been completed or not, which means that you need to now worry about in automate the machine persistent. You have acknowledgments that have been sent to the message queue. So only if a message queue receives two acknowledgments from two of its consumers does it mark a message as completed so that you know which message is the one that is successfully completed in terms of both writes and both pushing to the subscription.

Digital Cactus: Okay, my one last question. So I see that you are doing read here and then there is like this mapping here. So does it mean that this read API is basically make an entry in this mapping or where does this mapping comes from?

Utilitarian Lemur: So the mapping comes from as soon as this particular client establishes a connection with a machine in the subscription service. That is when the mapping is updated locally in a machine within the subscription service.

Digital Cactus: So there is a separate API or is it like part of the read API?

Utilitarian Lemur: Oh, it's part of the read API. So it's more or less like the handshake that is required in order to establish a WebSocket connection. So it would be read, let's say, for example, in terms of Http long polling or connection over here, it would be a three phase process where it would initially say read comment, it would send an Acknowledgment back in terms of like, yes, it's been completed and then the client again reestablishes a connection. So that would be like three phase process in terms of whatever protocol requires, in terms of how many steps that is required, that is pretty much what this read API is supposed to do.

Digital Cactus: Okay, so this mapping and connection ID or post ID, you are relying for the underlying Http protocol to actually build this and maintain this. How do you maintain this? Are you going to maintain this manually or going to be relying on Http protocol to do it for you?

Utilitarian Lemur: So this would be maintained as soon as let's say there are three steps in terms of establishing connection. Like, one is send intent. Now the load balancer sends like, okay, I've acknowledged your intent.

Digital Cactus: It.

Utilitarian Lemur: Established connection. It's at the third stage when it says, okay, fine, the intent is acknowledged. The second one is acknowledgment from our server side. That yes. Okay, now it seems like there are enough machines. You can now establish a connection. And in the third part of the handshake process is when this mapping gets created and this mapping is created within the machine that is managing this. So what I mean by this is like the manual phase is what is throwing me off, but it pretty much is like it's stored locally within the machine, unless you meant manual in a very different manner.

Digital Cactus: Okay, so this means are you going to use long falling here or like a WebSocket here? The first read and the second read is going to be different then, right?

Utilitarian Lemur: Yes. So I would prefer using a WebSocket connection mainly because let's say this read path over here, if the same client that is established like the read, he's also supposed to send like writes, then you would use the same WebSocket connection to send the information. Now in that piece it's up to the load balancer or whatever abstraction that we place over here to figure out what is the type of information or what payload is being sent or what message is being sent. And based on that particular message, send it to either the subscription service or like the common create service.

Digital Cactus: Okay? And I guess one last thing. We are a bit off of time, but I still don't quite get what is this post resurface responsibility? Is it like, to push down the information to the client or is it to kind of do fanning out a particular.

Utilitarian Lemur: It'S doing the fanning out, sir. The push service is purely for signing out purposes. So as soon as the message queue sends a message out. So, for example, when the create happens, this entire service responsibility is to figure out which machines have subscribed for which post IDs. And for each of those post IDs send out, like, the events, push events.

Digital Cactus: Okay.

Utilitarian Lemur: It's for the pan of service. That's pretty much what this responsibility of this push service is. Now, if you zoom out, there's always the scenario of, like, you have multiple data centers as well. So the other responsibility that this push services also have is like, if there are no machines that are present that are subscribed to this particular post ID, then it sends the information to another data center, the push service in another data center as well. Now, there is, of course, latency questions and everything else that needs to be considered. But the idea behind this is that this service has two responsibilities of fanning out comments that are currently in subscription for the current data center and also sending the payload to other data centers that would be needed to ensure that you're sending the comments to everyone that is subscribed to the particular post ID.

Digital Cactus: Okay, let's talk about feedback. Give me a second. Let me just write this down. Okay. I feel like you are technically quite strong here, but there are some critical pieces where you know, that is missing SNL five above, right? So I'm going to give you a week, no higher SM one here. So let me elaborate here. Right. The so I think the the core point here is that there are two core points here that that basically qualifies an l Five to l six. The first one is that the question itself, some of the question can only get you to l four. Some questions get you to l six. Some questions can get you to l seven. Basically, it depends on the complexity of the problem. This question happens to be a question that can go that can evaluate someone all the way from l five to l eight. So the degree of complexity of these questions can vary depending on who talks about it. The second thing here is that to get you to l five sorry, while we are talking about that, let me. Also give you assessment, right? So in this particular question, I think you managed to answer that quite well, right? So I think I don't really mind in that sense to give you an L Six level evaluations because you explain it quite well and then your data flow, and then the schema, and then the component responsibilities, everything makes sense. It maps really well, end to end, right? So I think that's well done. The second part about being on the evaluations for people, let me just drop you the evaluation criteria here. Here we go. So for L Four, basically you just have to explain things correctly, right? For any question for L Five, you have to explain trade offs, SQL, NoSQL, Cap theorem, that the one that you mentioned, push versus pull, sync versus Asynchronous. So these things needs to be explicit, right? Why? Something needs to use SQL. SQL, why do you use push factors? So a bunch of those stuff, and this I feel is the one that is lacking here. You did discuss about trade off, about Cap theorem, you discussed briefly about rest and GraphQL, but not really like pros and cons. But for L Five here, it is very important that you not come up with just one solution. You have to come up with at least two. You have to talk about the pros and cons explicitly and you have to decide why, why you choose one versus the others, right? So the Cap, the RM that you did is actually one good example, even though you didn't really talk about the pros and cons, but you actually explained quite well. But for example, things that you can do here, SQL of course is NoSQL, right? You don't really need asset property here, you don't really need relational database. So possibly NoSQL is a good option here, possibly, right? Other thing here is because you want to use message queue, because you want to make sure, because it's okay to do eventual consistency, you want to use message queue to reduce the load of a particular right server, whether right surface, the right comment surface, you want to reduce the load. That's why you are leaning to use Asynchronous process to fan out the comments, right? But this means you have to explicitly say that there is an alternative solution which is removing the message queue altogether and say they're like, hey, this actually can be synchronous, this is the pros, this can be asynchronous by adding message Q here. This is the pros and this is cons. I decide to use message Q. Why? And that's the level of L five. If you want to get to L Six, this data, this pros and cons needs to go back to your numbers and non functional requirements. For example, because I do eventual consistency, which is in my non functional requirement, then it's actually okay to do Asynchronous, right? Or for example, because I want the latency to be less than 100 millisecond then I cannot do pull, right? Because if I do pulling like asking the server every single 100 millisecond it's going to be expensive on the bandwidth so that's why I am leaning to use push, right? So the point here is less about which one is better, which one is the correct one, but more about the thought process on why you spoof versus pull that is dictated by your requirement. So making sure that your choices is dictated by a requirement like either that's by non functional requirement or that's by your number requirement that is the requirement to get you to L six. So this two is I feel that is lacking from your discussion, right? Explicit trade off and backing it up with your requirement. So that's the two things that I feel is very lacking for the particular discussion. And then everything else I feel is fine. And here's the thing. I feel like you can actually call out if I give you some hints and if I give you some directions during the interview, I feel like you can call that out, right? I think you technically are savvy enough to understand some things and say like hey, you got to sort the trade off or whatever, right? But you didn't manage to do that because you spent too much time on the function run on functional N numbers you spend 20 minutes there instead of ten minutes. That is actually the benchmark. So that's why I think what you have to do in term of how to get better during the interview the first thing that you have to do is you need to manage your time better and how to do that the first in my suggestion is this one number three, right? So when you talk about functional requirement and non functional requirement instead of asking and clarifying to the interviewer every single point you shoot ahead. You just talk and write on what you think is correct and then you give opportunity for the interviewer to ask a checkpoint hey, this is what I think, what do you think? So instead of like one point and 1.1 by one you do clarification you do it in one go. Why? Because you need to manage your time better by giving a checkpoint where the interviewer can disrupt you and can direct you to the more correct direction versus correcting it one by one. This is actually a more efficient way to do interviews it's probably not as effective but it's for sure more efficient so you have to balance between is it more effective or is it more efficient? But this is just one tips to make you more efficient. Because again, the problem is here I feel like this is the biggest problem. You spend one third more you can spend ten minutes more than what are you supposed to do? And ten minutes is basically one third of the interview time. It's 30% of the interview time. You don't have that much time to waste. The second one is on non functional requirement. Instead of thinking about what functional requirement do I need here? But you just memorize about like, okay, these are like ten possible non functional requirements. And you just say like, yeah, this is needed. No, this is not needed. Just to give you an example, these are a few possibilities of non functional requirement, right? This is not exhaustive, but this is good enough, right? There are six possibilities you can think about, like, okay, do I need reliability here? No, I don't need it scalability. Do I need to handle up and down traffic? Oh, yeah, I might need to handle spike on traffic. Do I need security? Not really. Do I need the data to be stored forever? Yeah, the comment needs to be stored forever. What about latency? So it's more about like yes and no questions. That's just like, okay, what does this system need? Does that make sense? Again, it is more like an efficiency play. Less about whether this is more effective. This is just a way for you to come up with non functional requirement faster.

Utilitarian Lemur: Got it?

Digital Cactus: Yeah. And non functional requirement, what is important is not for you to list them out. What is important about non functional requirement is to talk about the implication. So, for example, latency, 200 millisecond, why does it matter? Similarly, when you talk about numbers, you talk about 3.2gb of storage on daily basis. Okay, cool information, but what is it for? What does it mean? Does it mean like, I only need one instance of database to store everything, right? What does 40,000 40 million comments per day means globally? Does it mean I need to horizontally scale the system? Does it mean I have to use specific kind of database that is high throughput on write? That's basically like the reason why you want to talk about the nonfunction requirement, to talk about the implication of them not just listing them out. That, hey, I know about this term, right? No, that's not the point. The point is, yeah, I know this term and I know what does it mean for the system? It means I have to choose SQL database. It means I have to use Cassandra because I need high throughput. It means I have to use Push because I need low latency. It means I need to protect my system under PPC because I need security. So those are what's important about non functional requirements. And it's very lightweight, right? When you talk about security, you'll just say like, yeah, I need security. It means I need to protect my system under PPC. It's super fast under 10 seconds. But it is like the difference between L five and L six. Sorry. Yeah l five and l six Right. And finally, just another efficiency play. So the way that you explain here is like, you talk about, okay, this is the API and then this is the schema. And then let's talk about high level design, right? So instead of doing that, what you can do so let's go to whiteboard is actually while you talk, you start with high level design, you start drawing it, right? You start with the flow. And then while you are talking about one flow, you say like, okay, by the way, this flow, this is the API, right? And then when you talk about all the way to the persistence hey, on this particular red flow, this is the data schema. Why is this more efficient? Because you can explain things end to end and you can just copy paste things end to end, right? So there is a difference between talking about things section by section. Okay, now I'm going to talk about API. Okay, next about the schema. Okay, next about the high level design, you can actually explain it like in one flow, right? You talk about, okay, this is the flow for read. By the way, the API for read is this. And the schema that I'm going to use is this. Okay, let's continue about write. The API I'm going to use for write is this. And then I'm going to use the same database and the same schema for the write. This has been proven to be at least 20% more efficient in term of the timing compared to explaining section by section. Again, efficiency play, not about effectiveness, but efficiency play. I mean, some interviewer will be confused, but this is something that you have to call out earlier saying like, hey, I'm going to explain things end to end. So I'm going to explain API while I'm also talking about the flow, right?

Utilitarian Lemur: Got it.

Digital Cactus: So bear with me in terms of what will be discussed in the API and bear with me about what's going to be discussed in the Schema. So that is some tips for you. But overall I think technically you are quite safe. So I don't think you need to read books or watch YouTube videos about hey, what is CDN, what is load balancer? I think you are good enough. You just have to manage your time better and understand that what you explain here right now is not sufficient to get you to m one level. Because you don't talk about trade off and you don't connect your discussion against numbers.

Utilitarian Lemur: Got it? Yeah, it's more or less organization is where I need to work on more and more.

Digital Cactus: Yeah, but that should be once you are aware of the problem, it should be quite preferable to fix.

Utilitarian Lemur: Okay.

Digital Cactus: I think that's mainly it one extra tips for you. I mean, I have a lot of tips here that you can read yourself. But one minor extra tips for you is in every single system design, you got to start thinking about what the core puzzle is. So for this particular question, the core puzzle is basically like how do you get the messages, how do you get the comments? And there are actually two solutions, right? You either pull or push. And this is actually the most important trade off that you have to talk about. During the interview, which you didn't, you immediately say push, right?

Utilitarian Lemur: Yes.

Digital Cactus: You should have talked about the alternative. The push is not necessarily going to be wrong, by the way. But if you think this is pull is wrong, you got to tell me why it's wrong.

Utilitarian Lemur: Got it?

Digital Cactus: Yeah. And that core Puzle is the most important trade off that you have to make. So if you cannot come up with other trade off, like you don't want to talk about SQL, SQL, API GraphQL, if you don't have time to do that, at least just make this one trade off about the core Puzle.

Utilitarian Lemur: Got it? Yeah.

Digital Cactus: Core puzzle. I think that's all my feedback, and we are a little bit over time. Any question? Do you have?

Utilitarian Lemur: Nothing much. So the only thing was, what does core puzzle mean?

Digital Cactus: Oh, the one that I just mentioned to you, basically, if you think about system design, everything will kind of look the same. Like a client talking to load balancer, talking to API gateway, and then talking to some services, and then talking to some database. Maybe you sprinkle in queuing system or Caps here and there, right? But between one system design and the others, there is going to be like one core problem that we want you to solve. That's the core Puzle. Like, for example, in this question, the core Puzle is like, how do I actually get the messages? I don't really care. How do you store it? Because it's going to be the same between it's going to store either in SQL or SQL database. Sure, I don't care. What is interesting is about how do you actually store the data, how do you actually get the comments live? Let me give you another example. Say I want you to build a system about storing and surfing images on Instagram. Then the core puzzle is basically, hey, how do I surf the images to you in multiple resolution? How do I make sure that the image is not going to be too big? If it's too big, my low end phone is going to crash. But if it's too small, then I'm going to see that image and the trade off here is going to be two, right? I can try to make multiple cache image at the beginning when I write editor base, and I only fetch accordingly. Or I can try to download on the fly and try to change the resolution on the fly. There is the pros and cons between the solution. But do you understand that this is the most important part about storing the image for Instagram, for example, and this is the core Puzle, essentially.

Utilitarian Lemur: Got it? Okay. It's basically the core problem that I need to solve for.

Digital Cactus: Exactly. Using another example matching between Uber driver and their rider. Right. What is the core problem here? Well, the core problem is the core puzle is like, how do I query efficiently? How do I match and query efficiently for a given geographical radius? I need to know which driver is actually in this area, and I want to map them against this user. And the solution. There are many things you can use, like full table scan, which is not efficient. You can use geohash, you can use Quattri and many other ways to do it. But that's the trade off that we have to talk about.

Utilitarian Lemur: Got it. Yeah. No, that makes perfect sense.

Digital Cactus: So typically you will know what the corporate is. The moment you listen to the problem, at least you will have some gut feeling about it. Okay, this is what needs to be solved here. The API data schema, those two are not so important. Those are just generic system design.

Utilitarian Lemur: But as long as at least at like an E six E five level, it's about focusing on the core problem and you solving it or getting to solving it as quickly.

Digital Cactus: That is correct.

Utilitarian Lemur: Right. I think this is super useful. Thank you so much.

Digital Cactus: Okay, any other last question before we end session?

Utilitarian Lemur: Yeah, no, this is about it. Thank you again.

Digital Cactus: All right, cool. Good luck on the interview, man. See you.

Utilitarian Lemur: Bye. All.

Unique ID generation

Microsoft Interviewer

Unique ID generation

Invincible Cloud, a Microsoft engineer, interviewed Golden Possum

Watch interview

Order statistic of an unsorted array

Google Interviewer

Order statistic of an unsorted array

Intergalactic Avenger, a Google engineer, interviewed Supersonic Taco in Java

Watch interview

Most frequent integer and pairs of integers sum

Google Interviewer

Most frequent integer and pairs of integers sum

Paisley Wallaby, a Google engineer, interviewed Propitious Bear in Java

Watch interview

Triplet Array

Google Interviewer

Rocket Wind, a Google engineer, interviewed Whirlwind Alligator in C#

Watch interview

See more interviews

We know exactly what to do and say to get the company, title, and salary you want.

Interview prep and job hunting are chaos and pain. We can help. Really.

App screenshot