Interview Transcript
The Inimitable Thunderstorm: Hello.
Nihilistic Hawk: Hello.
The Inimitable Thunderstorm: Hi, how's it going?
Nihilistic Hawk: It's going good how you doing?
The Inimitable Thunderstorm: Doing good. So can you tell me a little bit about your experience and what you're looking to achieve from the session?
Nihilistic Hawk: Yeah, definitely. Basically been working as a software engineer for about five, six years, initially been working like, in a financial area, some things I've been doing, like building infrastructure for, like distributing messages for like, currency pay rates, so utilising like Kafka as the basis for that infrastructure support into the proprietary cloud. And you kinda put in libraries for people to use to communicate, to produce and consume messages on a broker, later on, like worked on things like developing end to end testing frameworks, and like helping people with their testing, as far as like automation to allow for, like automated deployment to different environments, especially production, just making sure we have automation to track testing and other things to make sure everything's still intact as we move fast and quickly. And then yeah I did some design for like, financial mean for like, ads impressions, like development pipeline. I guess for me, I'm just trying to see, get an understanding of the process. I mean, system design specifically. And just Yeah, try to see what what folks are looking for in this type of interview. For the future, If I do like system design interviews, I know what to do.
The Inimitable Thunderstorm: Okay, that sounds good. So that's, like so the format I usually do at we start. And basically, you try to go through the problem. And then at the end, I can give you like the feedback and some hints. So let's start. So the problem that we want to go through today is basically an image processing service. The whole idea is that you would basically have a photo for you or like, any picture, and you want to apply like some kind of filter on that could be like you want to make it cartoon, or basically you want to enhance it or change it to black and white, like basically a lot of filters. So you will open the client, the client could be an application or could be a web portal. It's it's really flexible. And then you would select which image that you want. And basically choose the filter, and submit. And then after the image is processed, you will get a notification back that notification could be an email could be push notification could be SMS, it's really up to you with a download link to the photo after applying the filter to it. And that's basically what you want what we're trying to build.
Nihilistic Hawk: Okay, so first thing you can do is, like write down requirements, make sure we on the same page and we cover everything that's required. Yeah, I can put a section in the text that talks about like, requirements. That's so I know, you talked about developing an image processing system. I guess, as far as functional requirements maybe we can make this separate functional, you mentioned the user has access to different photos, I guess, where does those photos originate from?
The Inimitable Thunderstorm: The user would have that it could be on their phone or like it really doesn't matter. Like if they basically send a photo to our system.
Nihilistic Hawk: Will the user require an account to utilise the system, or what?
The Inimitable Thunderstorm: No it doesn't need an account. No, it doesn't need to have an account. It can be anonymous.
Nihilistic Hawk: Okay, so they have photos on their phone, for example. So that's or their laptop, whatever the client may be. And they can use so they can upload their photo to the system, I guess as far as the interface. Like, how would there be interacting with the UI? Would they be looking at the page and uploading photos there is there a UI involved?
The Inimitable Thunderstorm: No, the UI is basically irrelevant. It can be an application, it can be like web interface. So we basically want to focus more on the
Nihilistic Hawk: Ok from an API perspective. Yes. Okay. Okay, I mean, I guess from a user can take a photo. So can it be, is it one photo at a time, or they can bulk upload photos. Or do we want to limit that?
The Inimitable Thunderstorm: One photo.
Nihilistic Hawk: Okay, so upload one photo, apply for filters. So I guess we will determe. I mean, I'm assuming the system will determine a different type of filters. And do we need to go into details about what those filters are? Is it like on social media filters, things like that?
The Inimitable Thunderstorm: Yeah, similar to that.
Nihilistic Hawk: Okay, I guess just to start off user can upload a photo. And I guess some things like what type of extensions will be allowed? There's different types of extensions or photo types.
The Inimitable Thunderstorm: So it doesn't really matter what extension like basically, whatever. Like supported extensions.
Nihilistic Hawk: Okay. Fair enough. Let me just put that as a note. Actually, yeah I just wont put that at all. And then talk about user can select filters. I guess as far as what those filters are? Do we want to go into what the different types of filters are? Or should we? I guess I'm not I don't really use, like social media that much. As far as I guess. The filters maybe that can add light, make it darker. I guess we want to get some
The Inimitable Thunderstorm: Yeah, it can like make it darker, it can make it black and white can make it cartoon, it can remove like face wrinkles, and stuff like that. It's really like a lot of filters. And basically, administrators can add more filters. So the filter list can expand.
Nihilistic Hawk: Okay. So people who manage the system can add more filters.
The Inimitable Thunderstorm: Yes.
Nihilistic Hawk: Okay. Okay, I guess as far as I remember, you mentioned like user selects the filter. And I guess there's a amount of latency or amount of time, that's allowed, because you mentioned that we could process the image and then some notification can happen, whether it be email, what have you. So there could be some latency as far as
The Inimitable Thunderstorm: maybe a few minutes or something like that, depending on how complicated as a filter.
Nihilistic Hawk: Okay, but I guess, essentially, when its done with the filter will notify the user. As far as what that notification can be. I know, you mentioned email. Can we add that as one and maybe add some other options?
The Inimitable Thunderstorm: It can be email, it can be push notification really up to you, whichever.
Nihilistic Hawk: Okay, just add a few things. All right. I guess this is okay. For now, unless you want to add something, user uploads the photos to the system, they can send like photos they can be added by the system as far as the system should have ability to create new filters. And then as far as the processing, the user will get notified. And download image.
The Inimitable Thunderstorm: Yeah.
Nihilistic Hawk: Okay. And there could be some latency. We haven't talked about what that is, but so from what I expect some latency is a lot. I guess that goes into I guess, maybe how far from the metrics and understanding like how I should scale for this? Which is we do call the non non functional requirement section, I guess, how many users are we trying to build this system for? That give me understanding as far as like the bandwidth and how much requests will be computed?
The Inimitable Thunderstorm: So it's basically the surface would be worldwide, but we can start with one country?
Nihilistic Hawk: One, so we just started with one initial country as America USA.
The Inimitable Thunderstorm: Yes.
Nihilistic Hawk: Okay. So if you want to make it available to them, I mean, there's about 300 million people in the US or I guess this is just more estimate, okay.
The Inimitable Thunderstorm: Okay, fair enough.
Nihilistic Hawk: Well cap it out to like 300 million people to start off with the first country. We won't really have an idea as far as how many active users there'll be. I mean, that will equivalate to I guess, for example, monthly active users or we could just cap it off 300 million and just say, yeah, the worst case scenario.
The Inimitable Thunderstorm: Yeah we can try two 300 million.
Nihilistic Hawk: Active? Okay, fair enough. Monthly users per say. So we say that's an a daily basis. I mean, yeah, would 300 million be using? Would they be actively using it on a daily basis on a daily basis?
The Inimitable Thunderstorm: Daily basis could be like 10 million or something? Or like, yeah.
Nihilistic Hawk: Okay. So we'll say about 10 million daily users. Active daily users. And I guess with that being said, for the day, maybe if we can try to estimate this to potentially requests per day, or even requests per second, if we want to get to that granular, I mean, I guess, if there's 10 million people using it activity per day, can we made about 10 million requests per day? Or? Oh, no, no, they could be doing multiple requests. So it's a multiple? Yeah go ahead.
The Inimitable Thunderstorm: Yeah, we can have multiple so we can say like, maybe 150,000 requests per second or something like that?
Nihilistic Hawk: Okay, fair enough. Thats per second. And I guess I guess can you, I mean, is there a requirement to store any information? Are we storing any data? I mean, I guess from what I say you're just applying the filter, but is there any data that needs to be stored on the system?
The Inimitable Thunderstorm: Yes we need store the images up to three months?
Nihilistic Hawk: Okay, for the past three months? So that's the retention period?
The Inimitable Thunderstorm: Yes.
Nihilistic Hawk: Okay, so three months attach a period of every image. Okay, fair enough. Okay, requests per second. I guess as far as like how much data we're going to be storing, we storing about three months worth of data, we get about 150 requests per second. I guess as far as we estimation, how much the size of each request, I want to guess that'll be in the range of kilobytes. That'd be a few kilobytes, depending on how many
The Inimitable Thunderstorm: Image size could be like 3 and 4 megabytes? Could be even five.
Nihilistic Hawk: Okay, five, that'd be like to max or the average.
The Inimitable Thunderstorm: No, that's a max.
Nihilistic Hawk: Oh okay gotcha. Okay, I guess you would like to use the worst case scenario. So lets say we at 150,000 requests are the worst the MAX SIZE per se, we can limit the size. So that's going to be about 750,000 MB per second.
The Inimitable Thunderstorm: Yeah.
Nihilistic Hawk: I guess we can put that down to the next stage. So next thing is gigabytes. That's 50 gigabytes per second. And I guess from a day perspective, we want to turn that by 60 to get that in a minute. Times about 60 Again, and get an hour. And then 24. I mean, if it's being actively used for the whole day. Yeah, we can calculate the worst case scenario based on this logic. So I guess 3600 times 24. Will be about, let me just 74 I guess. Let's just put that to 72. Zero. Let's just round it off at approximately2000. So we times that back 750 GB. I guess I won't be able to do that at the top of my head, but let's just say 800 800. So that'll be 64. We add four zeros. One two three. Another two, comma, comma. Okay, so we can say about 64 million. I mean, this is still in GB per day. So we could take this down. So convert this down to. So we got terabytes, petabytes, so 64 petabytes per day, potentially, if my math is not if my math is correct, approximately. Okay, of data that's being stored every day. And we're doing like three months of data. So we can times that by 36. And we times that by about 90 days, so that'd be probably the max amount of data that will store in the database. And I guess requests per second 150,000 you've got decent size of that, I guess, as far as like, consistency. Is this need to have? On every right, we have the latest read? Or can we use eventual consistency? Like for any data that's being stored in multiple partitions?
The Inimitable Thunderstorm: What do you mean?
Nihilistic Hawk: So like, Okay, let's look at what the system is we're uploading images. Were storing that image. Okay, fair enough. So I'm guessing like whenever we want to access, so the data that's being stored is images? Whenever you want to access that data do we want to always want to get the latest read or can we use eventual consistency?
The Inimitable Thunderstorm: What do you mean by the latest read? Like, basically, you upload an image and you apply a filter on it, it doesn't have like multiple versions.
Nihilistic Hawk: Just one photo. Okay. I mean, we'll just keep it simple for now.
The Inimitable Thunderstorm: You can upload like 10 images, but then there is no like, they're not treated as the same. Even if it's the same images. It's still
Nihilistic Hawk: Yeah, I got you. So once you load the image, there won't be any changes to that image, it just gonna be stored in database.
The Inimitable Thunderstorm: Yeah, it's basically you upload an image. And because it's stored, and then after the operation, there is like the after operation. And basically, that's the full operation. It has two images before and after. Even if there's the same image that used like 10 times you have like, 10, before and after images.
Nihilistic Hawk: Okay. Fair enough. I guess, I mean, I want to, we can assume that we want to make this event available as possible. I mean, just for what having an efficient system. And if anything fails, I want to just keep a note on that, like to make the system smart. If like a server goes down, or anything like that, to keep in mind, like what the strategy will be, I guess for fault tolerant. But I guess for now, you can stop there, maybe look at creating an interface. So I can understand like, the different services and contracts that I can create, if you don't mind. Yeah I think we have a decent understanding as far as the data being stored and amount of requests per second. And that, and whatnot.
The Inimitable Thunderstorm: Yeah.
Nihilistic Hawk: So I guess this is move on to, like defining some API's and interfaces for the system. I mean, I guess it's pretty much from the highest level, you upload a photo. So we can put that as like the one function of the system. Simply, I call it upload, I guess, what would you need to do this transaction? Obviously, you need the photo? Maybe we'll convert that to bytes, or whatever. I mean, we don't have to go to that much detail. But you have the photo, of course. I mean, I don't need any information about I guess, yeah, potentially do, I need to identify as far as like, who upload this photo? Because I can be the ID that I can use to send a push notification later on. So maybe I can have that as part of the Yeah, this API, some way of identifying the user, like put first their email, phone number, what have you. Or, we could just do a generic term incase some user ID that we can use later on to map to some metadata. And I guess the main thing is the photo. Oh, yeah. The filter that's been used. So I guess. Yeah, we'll probably want to add like what's the selection, what filter they want to apply so send that to the system. So it knows what to do when it's processing it. And some basic things like maybe timestamp that, things like that, I think that's okay for that. Selecting filters, I guess when it comes to you upload the photos, so then you're going to actually process or apply the filter. We could just simply say, Yeah, let's be explicit, lets just say, this applying filter. So basically, the upload will kick that off. And then the next step autonomous, that will be applying the filter. So you still have the photo and the filter ID. And this could be an ID per se. Pretty much taken similar contract, I mean, similar items, so you just need the photo, as you don't need, you don't even know what to use as you just need the photo ID to actually apply that logic. So we can have some service to encapsulate like the actual entity in initiation of applying a filter, or like a filter service, or what have you to actually do that. And after the upload, yeah, we can initiate that. And I guess, as far as, when that's done, I guess we can push a notification to back to the user with new image. So we want a service to handle that logic capsulated there, whether it be email, push, things of that nature, will will have a service to to handle that specific implementation the way we want to do it. And then, you know, basically want to know who we sent it to. And, yeah, I think we should just stop there for now as far as the method we do. We can hold on on that. I guess, as far as like, I guess, before I get to like highlighting services, maybe we can just talk about data that's being persisted to entities that we can define. If that's really necessary. I mean, we're not really storing user information, they don't even account, we still want to store images. Okay, so one thing we can do is, I guess, yeah, that'd be an entity, some options we have is to just use a metadata and maybe store in a relational database, and then use the actual like, images, we can utilise potentially like CDNs, to actually store the images. And then we could take advantage of things like geo distribution, faster access to the client, things of that nature. There'll be some trade off as far as like really understanding we're going to push the images through there. Or its going to pull it at an interval that some things we'll need to take in consideration. I guess filters was one thing that we wanted to find. So we also talked about like, people could add more filters, things of that nature. It doesn't necessarily have to be stored in a database, but has to be stored somewhere. So we know, basically, what to I mean, what we have, like what options we have, so we can display that to the user. I guess for our schema, we can have like some type of ID, the logic. Logic being like, what do you want to do to the filter, so that could be it could be in a microservice potentially. And basically, we can have this as an interface and have different implementations of it. And basically, depending on what the user picks, we'll apply their respective logic that I feel like we should have a different name for that. But yeah, let's do that for now. Okay. I think we stop there for now maybe define some services, we're getting to that point where we have enough understanding of services to contract between them. So maybe we could define some services and try to put in highlight a diagram representating the system. Yeah, we probably call it just like a high level description. So I guess going back to interface, we upload the service so basically, yeah, we can have a service that basically logic is to upload the photos. So we can say, image upload service. Going back to the contract, yeah, we take the photo, some user information that we can utilise later to send the notification. Understand what the filter times are. Yeah, I think that's okay. So that's one. Business logic, or that represents a single responsibility. As far as applying the filter, I guess we can add the filter service that does that. So that can be the location where we can, as far as application code, we talked about interfaces and implementing them and things of that nature. We can also I guess, persistence as well. Because if you want the system to like understand what filters we have, you have that data available I would imagine it'll be not that many filters, we can just use the relational database, potentially. And then an actual logic, like what you do can be reside in application layer. But nevertheless, we have a filter service that has, I calculate that and I guess the last thing will be to call notification service. Which, following the single responsibility principle is just to like one, this image has done processing. The job of this service is to just notify the user with the new image. Okay, going back to the scale, like so do nonfunctional requirements. So we have about 10 million active daily users, 150 requests per second. So I guess, load balancer makes sense we have for this decent amount of load to just handle. Yeah, the request that's coming in as far as what algorithm we want to use. I mean, I just start off like starting off with a round robin. If that's okay. I guess Oh, yeah. One thing I forgot to mention earlier is, I mean, doing this day and age, like whenever you try to make a session within our system, we want to use like encryption, to make sure like, we can't get hacked. I mean that's something...What was that?
The Inimitable Thunderstorm: What are we going to encrypt here?
Nihilistic Hawk: know, just any communications between the client and server?
The Inimitable Thunderstorm: Oh you're basically talking about like having like HTTPS communication?
Nihilistic Hawk: Yes.
The Inimitable Thunderstorm: Yeah. It's really not a little bit low level.
Nihilistic Hawk: That's standard. Yeah. That comes, that goes without saying, I just want to notify, I mean mentioned that. Where was I . oh yeah I was talking about the load balancer. So yeah all make sense to have that. Yeah, things like DNS. I've already talked about a content delivery network a potential place for storing the images. And then, I guess, I mean, since you know, it goes without saying we use HTTPS, we can just use a web server that serves as a reverse proxy. So when we go past that, we can remove the encryption and just allow for the application when we get to the application layer that those communications will be a lot faster without being slowed down by HTTPS. Maybe Yeah, let's go to draw mode. Okay, actually, I haven't used this too much. So sorry. Yeah.
The Inimitable Thunderstorm: I think you're going very well, so far. Like I have very little notes on your approach.
Nihilistic Hawk: Okay. Yeah, we'll just just start off with the things that I was talking about. Obviously, the client, the text is pretty big. Yeah, we'll fix that later. So yeah, standard for any system, you want your DMS, DNS so you, nobody types in IP addresses. And then alright talked about CDN so add that. For the images, if you want to store them and take advantage of those capabilities. Yeah, talked about the load balances taking 150 requests per second. So that will help as far as the algorithm for that. Yeah, that's something, theres different algorithms we could use briefly mentioned that lets just draw this out first. And then, yeah, HTTPS. We have a web server to handle? Yeah, the termination of the encryption. So we that doesn't slow us down on the app layer. Yeah, this is different on Google Drive. But um, yeah, I can I think you see what's going on here. I have, I guess, behind this app layer.
The Inimitable Thunderstorm: Totally fine. If you have, like, drawing is not perfect. You can walk me through, like, the flow after you complete it.
Nihilistic Hawk: Okay. I guess 150 requests per second. Nice amount of load 300 million. And we're gonna add more. I mean, yeah, I think micro service approach will be okay, here. Like those services that I mentioned, we could just deploy maintain them as micro services. And it is easy to create, like, you can use, like, if using Java, you could use Spring Boot frameworks, and things like that to deploy help you deploy them a lot faster. But I guess, for visual purpose, we'll talk about the image I guess maybe the same time, do you know how to decrease the text size? You know how to decrease the text size the tax is super big? I guess is, okay. Let's just keep going, talk about the image. Upload service. So going back to single responsibility principle, and this is just uploading images. So talk to the database. And then, you know, persist that information there and send it off to the filter service, I guess as far as like, how it communicates with the filter service. To like utilise the capabilities of where the image upload service has been deployed, we can just make the filter, basically, one option is to make it asynchronous. So we're just like, passes and then go and then apply the filter logic, talked about the low level code a little bit. But then when that's done, it can also send asynchronous message to a notification and basically focus on sending a notification back with a new image back to the client. Yeah, so let's just make all those communications asynchronous for now. And I guess the last thing will be how that notification service will be communicating back to the client. So let's talk about that last if we have time. Will be talking into any databases image upload? And then yeah. Service, text is here. Okay. So let's go to this one is a filter service it if you see where my mouse is at. I guess for as far as notification? I mean, if we're doing email, then I guess it depends on what type of notification you're gonna do.
The Inimitable Thunderstorm: You can do email since it is the simplest.
Nihilistic Hawk: Okay. Yeah, if it's emailed then there's no really, we're not talking about to the client as if it was a push notification, we just send an email to the user. Make sure they have the image. And then yeah, want to make sure you get that information on the upload or you don't know to send this to? And as far as I guess that where we want to store the email, I mean, it could be a pass through. Zone upload. Yeah. So you get the email that filter push notification. Yeah, I guess it just needs to be a pass through from the image upload. We'll just keep on passing it to different services till it gets to a push notification. Because we don't have no requirement to like create user accounts and store metadata for user. I guess is there anything? Yeah, there's a lot of things because hone in on. In the system is anything I want to focus on specifically. Going back to the requirements I guess one thing, yeah, briefly mentioned this, as far as, like how the filter serverice work. Oh, yeah. So there's gonna be some type of storage to store the filters. So the system can just know what filters we have already. I would imagine to be in to a lot. So we can store that in a relational database. As far as images as well, at least a meta, the IDs for an image that could work together with the CDN to store that information. But as far, the logic? Yeah, we want to make sure we use objects, I guess, yeah, we can just use interfaces and implement them. With different filters, we want to filter something being extract. So what is a filter? Basically, this has logic, as far as how you want to apply it to an image. And I guess that could be the one abstract method that we have for it. And then as people want to create filters, or just sound great implementations in the filter service, like I said before, and make sure we persist that information to whatever storage we have. So it's available. So we know the state of filters that we have in the system.
The Inimitable Thunderstorm: Yeah. So one of the problems that we have right now, this is just a list of basically is extra components that we have. But we don't have yet like an overall image of how those components are connected. And basically, how they scale? Like, for example, you're talking about putting stuff in storage? What kind of storage? And how can we scale it? Is it a database? A file storage?
Nihilistic Hawk: I guess started off with entities that we have. So we talked about images. I mean, we're gonna have potentially 300 million users loading. So it could be a decent amount of scale. And then if you add more countries will be more. Yeah, there'll be more. I mean, I guess as far as how we want to store a database, I guess the first thing for me is whether we want to use a relational SQL versus no SQL. I guess, if I had serious auditing to determine that is I mean, if I think no SQL as far as like high reads, crazy amount of data that I'm accessing frequently, at a high rate, I guess as far as the images are concern. I mean, for now, we're just doing that. Right. So I mean, we could use like, potentially MySQL, I mean, it has options. Like for scalability, like if you want to do any partitioning, maybe we can do that. As well as I guess one thing is, we didn't mention this as far as reads, right ratio. When we talk about use cases, we have been reading the image information. So the download of image per se, will be a result of the filter service now be passed on. So that's not I guess, yeah, let's stick with a relational database. Because we're going to be like write heavy. We don't have use cases of files, we're going to be reading images yet. But I guess like a MySQL database will give us options to like if you want to create indexes partitioned will have those options available. And I don't think it'd be too crazy that you won't be able to handle like, I wouldn't say let's use a no SQL database right off the bat for that entity, and same thing for filters like I would imagine, filters are limited. I mean, I don't use social media too often but it's not going to be like millions of filters. So we can also use that MySQL database they can be a part of the same database images and filters for now, yeah, with us as far the same database and like I said, have options to scale even though not worried about filters to my domain concern will be images. But like I said, to give reduce the load, as far as how much storage is, at least from the storage perspective, we're not actually storing the images there. We're just storing metadata, an ID that references an image, and maybe some like timestamp information, things of that nature. So I don't think it'll be like, scaling too much and storing too much crazy data that mySQL won't be able to handle.
The Inimitable Thunderstorm: Okay. I think like usually in system design, I tend to like, leave 20 minutes at the end, like in coding questions, I tend to leave like 10 minutes, but in system design, I tend to leave like 20 Because I want to give post feedback and hints. Okay, so basically, overall, it really like the decision of basically, whether you would pass like a system design interview would depend on like, which level you're applying for. So do you have a view which you're applying for?
Nihilistic Hawk: Well, I'm not actively interviewing. I mean, I just did some interviews. But yeah, I'm not actually the interview at the moment. I guess. Like I have about six years of experience, I probably say mid level or senior engine engineer. Okay.
The Inimitable Thunderstorm: So basically, if you're applying for mid level, I think with what you presented so far, it's going to be a pass. But for like a senior level, it will be challenging, and they tell you, like what could you have done differently to make it like a pass impulse levels? Okay. So the whole idea of the system design, interview, you design a lot of surface. So I can tell from like your experience, you went through a lot, one thing that you would notice with system design are actually two things. First of all, it takes a long time. Like, it's not really something you can do in one hour, it takes a lot of iterations. So it is usually like weeks of work depending on how big it is. And the second part, like if you compare systems design, to basic like problem solving, problem solving is a binary work, whatever code you write is either going to solve it or not. But for system design, it really doesn't matter. What you do, it is going to do something. So if you just basically like for your system design, you put like a single machine with everything is a monolith. And basically, it's like a website. So it's like, pre 2000 era, it is going to work. But is it going to scale? And basically, does everything the way we want? That's a different question. So what is expected in a system design is basically you should believe in everything as an architect. So it's basically you can think of it as we are a company, we put you as an architect. So we basically, we are the client, you as an architect, back slash business developer, you are leading everything. So one thing most people don't know, in system design interviews, 80%, of passing that interview depends felt like asking the right questions. 20% is everything else, and out of that 20%, like 15% or more. So basically, the majority of the remaining 20% is about how did you reach the solution, more than solution itself? solution itself was like 5% of the full image. So it is not really like the top thing to do. So the thing that missed from this conversation. First of all, at some points, this is very important. If you're applying for senior, it's less important if you're applying from it. You didn't totally. So you're basically for example, when I was talking about like how to scale database use, you basically give me options of like, are we going to choose like a relational versus like no sequel? Basically, our if you're going to use like relational, are we going to use MySQL are we going to do like shouting, stuff like that? So it's basically telling me all the different options that are available that basically we can use to scale. That's not cleaning, like basically, you need to treat me as a client who really doesn't understand those options. So you need to tell me which option you choose. You take a decision on that. And basically tell me why did you choose that option? So that's the first thing that you need to do. Second thing, okay, you missed some questions, which are very important to determine, like the complexity of the system, for example, when it came to the image, you didn't ask about, like, the format of the image he asked about, like his extension, but not the format formats, like basically how many megapixels? When it comes to image processing, the size of image is not really relevant. What is more important is how many megapixels because if you look at an image, which is like a BMP, and basically a JPG, or JPG could be, like 50k 50 kilobytes, but it has much more megapixels than PMP, which is like two megabytes. And basically image processing is dependant on the number of megapixels, or basically the number of pixels in the image much more than the size, the size is good for calculating the bandwidth for the network. But the megapixel is basically for calculating the complexity of the processing itself. You also didn't ask are we going to do videos or just images. One very important piece of information that you also didn't ask is the format of the filter. Is it like a black box binary files? Is it like an equation? Is it like a text file? What is exactly the filter? Also, one of the things that you didn't ask is basically how many times a user is allowed to upload an image per day, or per hour or whatever thing? Because one of the issues that we're having here is basically how can we prevent like a distributed denial of service attack? What if someone upload like a million images per hour or something like that? One of the things that you missed on the API design, you design the API to notify the user using User ID. But we said that we may not have an account. So it needed to basically take something like user email or something like this is really not that big of a deal, because it's a very small detail. But what is more important, is basically you missed an API to lift the filter itself. Like basically, if I opened the application, I said that I'm going to choose a filter, and upload an image, I will not be able to choose a filter, unless there is an API that I call to give me the list of available filters. So here's a missing API. What else on the options? Yeah, oh yeah the final solution itself, we had a list of elements, which is basically not really a system diagram, but more like a logical component list that list what what is there, but basically not how they interact together. So one thing I always tell people when system design, you can think of system design as a Lego project. So Lego components are the same, like there are three that defined or predetermined amount of different Lego pieces, the way you put them together is different, you can put them together and come up with a car, you can put them together and come up with like a castle. So what you listed is basically the Lego pieces, but they are not placed together to show me like how the system would flow. So one thing that you need to do, or basically to correct this, first of all, is to how to ask the questions in a good format that would allow you to collect all information. So to do this, I added some text here you can see at the end. So to ask questions in a way that would guarantee you that you get all the different pieces that you missed, you need to have a framework of asking a question, this is a framework that works for me really fine. And it's basically you need to split your questions into three groups. The first group is why questions second group is what questions the other group is how questions. So why questions are very useful to understand the purpose of the system understanding the purpose, basically, will bring you a lot of information at once. Like basically, if you asked me what is the purpose of this system, I will tell you we are making that to basically collect information from images that would would allow us to build a better, like image face recognition system. And we are targeting one country, but the goal is to target will it work? Those are basically some of the questions that you asked about, like scale and stuff like that just by answering one question. The second group is what what is basically defined into two subgroups. First one is business requirements, which is basically functional requirements. What are the business goals? Here, you need to ask about all things that defines a system, like the image format. Are we using videos or not? Stuff like what is the filter format, all of that. Second group is not really relevant to an interview, but relevant in real life, which is basically just logistical requirements. Like, are we building an MVP or building a full product? What is the team size? What is the budget stuff like that? Last the group is basically about how, how are we building the system? It has two subgroups. First of all, One is product requirements product requirements is basically a mix between non functional requirements and features. For example, a very good question that would come Cisco is basically image extensions is basically can the user apply multiple filters on the same image or not? So questions that basically extend your functional requirements and also on non functional requirements. Last group is technical requirements are we allowed to use, like cloud service or we need to build like on premise or basically bare metal? Like, which options? And that will answer questions like, can I build my storage system or I can use s3? Once you answer those questions, and basically make sure that you ask every relevant question that 80% of interview is not. Next question, or next action is basically build or the advice here, never mix high level with low level. So a high level is basically a components that you had in drawing mode. Low Level is basically the definition of API's. Mixing high level with low level always ends up with you forgetting some details. My advice is basically go level by level. Let me give you an example. The first thing that you need to build like after collecting all the questions, is this image. Can you see that link?
Nihilistic Hawk: Yes.
The Inimitable Thunderstorm: Can you open it?
Nihilistic Hawk: Yeah I'm here, yeah.
The Inimitable Thunderstorm: So this is not really a system detailed system design. But it basically if you look at that image, you can understand the flow. So you have the user on the top, the user would send a request to a service called orchestrator. To get a list of filters. The orchestrator would call the filter Manager, which would load the list of filters, push it back to the user, user then can select an image, select the filter, push it back to the orchestrator, what the orchestrator will do is first upload the image into the photo manager, which in turn, push it to s3 storage. And then it will basically encapsulate that request into a package and push it to a queue. On the other side of a queue, we have a photo processor node, which basically extract those lists of requests. Each request would have a photo ID, which is basically an ID for the photo in s3 storage, a filter ID, and basically a user information which could be used an email, what the photo processor would do it, basically extract the filter from the filter manager, extract the image from the photo manager, what is the image after it is done, it will push the result back to photo manager, which push it back to image storage. And then after all is done, it will send basically a package containing the user information and the link to the final image and another queue. on that other side of the queue, we have a notification service that will just extract those packages from the queue and push it to the user. So you can look at that image. And they basically tell you the user journey. Now there is another image that another link that I pasted, which is basically the admin journey, how the admin can add new filters, the admin would connect directly to the filter manager. So like a web interface, or something, which is basically can push new filters into the filter manager, different filter manager is connected to two nodes. One of them is a filter meta data, that's a database. And the second part is a filter storage. And that's basically an s3 storage for the filter file. The filter here is a binary file. So it's basically just a binary file describing the filters that the the, the node, the basic processing node can understand and deal with. Now, when you put it in this way, first of all, you can do all of that in like 10 minutes. Second of all, it is very hard to understand any details because all that is on the same level, it's very high level. So after you completed that felt like 95% of the interview. Now, the next question that your interviewer would ask is basically, okay, how can you basically scales the filters metadata database, how can you scale the photo processor? What would happen if a request goes to orchestrator and then the node dies? So if I asked you how to scale photo processor, that's the third image. Now that's we are going one level deeper, which is basically how can we define the photo processor? So photo processor is basically a list of a cluster of nodes on top of those nodes. We have, in this case, it's not really like a load balancer. It's like a job, that's extracting requests from like a queue from the previous image, and it sends those into a load balancer. And the load balancer will determine which node to hit from. On top of that cluster, we have like a Redis cache that has a list of most commonly used filters, it will look for the filter in the cache, if it's not in the cache, and to load it from the filter manager. And then it will load the image from the photo manager. It posts everything and then it sends to to the next one. So it's basically to like to drive the interview to go level by level. And then any question that it's going to be asked, you will become very easy to answer like, how can you scale the filter meta data, maybe we can use like a CQRS pattern, like basically multiple read patterns, and maybe a couple of flights, but sorry, multiple weeks and expenses and a couple of white instance. And we can use like a CQRS model. What if orchestrator failed, maybe we have like a job that would retry or maybe the client itself would have like a timeout. And if it didn't hear back from the orchestrator, it would resubmit a request. So it is very easy to go level by level rather than doing everything after you complete that level. You can ask your interviewer, do you want me to list the API's? In most cases, in most interviews? People don't really get interested in API's because it's a little bit low level. If there is time at the end, you can do that. But it's usually not the most important part of the interview. Does that all make sense?
Nihilistic Hawk: Yeah, makes sense. So yes. What I put in line 35 to 37 was not necessarily is what you're saying?
The Inimitable Thunderstorm: It is, it's a little bit low level. So the whole idea of system design is how can you start the system? Basically, API's are usually the end of system design, you basically determine what is the systems? Or what are the subsystems? How can they communicate together? What is the flow? And then at the end, you start building the public interface, which translates into your API. So it's usually like not really the first or second or even third step.
Nihilistic Hawk: Okay, seems fine. But the end is going over that information.
The Inimitable Thunderstorm: Okay. Yeah, so like we like really like, as I said, technically, you have everything that's taken, like you understand all the components, you understand how they are used for you with the just mainly how you arrange yourself, that's the only missing part.
Nihilistic Hawk: Gotcha, it was like, a way to just organise my questions to maintain like a flow.
The Inimitable Thunderstorm: The main problem was system design. Like when you compare it to like problem solving, system design, it's not really about knowledge. It's about how we prove you have that knowledge. Problem solving is about knowledge, like, it is easy to prove your knowledge in a problem solving, either you solve a problem or not. System design, it's much harder. There is a bit of subjectivity here. And in order for you to lead on that subjectivity, you need to be very organised.
Nihilistic Hawk: Gotcha, okay.
The Inimitable Thunderstorm: Yeah. Any other questions?
Nihilistic Hawk: It seems like yeah, we have a lot to work on. This is good. Yeah, a lot of things are things we discussed.
The Inimitable Thunderstorm: Yeah it's not only to work on but more like, just changing the arrangement of your stuff. Like, if you looked at everything that I said, you already thought those points. It is just like, the way you go, if you go level by level, it would naturally happen.
Nihilistic Hawk: Okay.
The Inimitable Thunderstorm: One very simple advice that you can use, which basically makes things a lot easier is to like to think about persona. Persona is basically one design. Like system, which is basically you define the users that use your system, build like something like a psychological profile for them, and basically put yourself in their shoes, think about how they will use the system and naturally everything would come. So this system has two personas. We have the user persona and the admin persona. As a user, try to think if you have that application, what is the flow opens the application. You see a combo box with a list of filter fields, here is your first API calls. And then after you select the filter, you select the image and click submit. So it's a second API call. So it's basically just putting yourself in the shoes of your users would make a lot of those questions just start jumping into your mind.
Nihilistic Hawk: Oh I see. Okay, that's a lot. It's probably the best point of view to help you like think about the different scenarios the things that come up. That can come up. Okay, yeah. Note that down for sure. Cool. Yeah, that's how I think yeah, this was a lot of information, but a lot of helpful information. So i'll just take some time after this call and process the information and go from there.
The Inimitable Thunderstorm: Yeah, it's a good thing that the session is recorded. So you can see it as much time as you want.
Nihilistic Hawk: But yeah, that's what I plan on doing is going back to the video. Looking at the notes. Yeah. And yeah. Going from there.
The Inimitable Thunderstorm: Awesome then if there is no more questions good luck with your, whatever coming in your interviews.
Nihilistic Hawk: Okay. Thank you.
The Inimitable Thunderstorm: Bye.
Nihilistic Hawk: Okay, take care goodbye.