We helped write the sequel to "Cracking the Coding Interview". Read 9 chapters for free

Design Yelp Recommendations

Watch someone solve the design yelp recommendations problem in an interview with a Meta engineer and see the feedback their interviewer left them. Explore this problem and others in our library of interview replays.

Interview Summary

Problem type

Interview question

Design the machine learning recommendation algorithm behind Yelp's homepage that shows users personalized venue suggestions when they open the app. The system must optimize for user engagement and booking conversions while handling cold start problems, balancing exploration vs exploitation, and scaling to millions of users across different cities with varying venue availability.

Interview Feedback

Feedback about Full Meta Tetrahedron (the interviewee)

Advance this person to the next round?

Yes

How were their technical skills?

3/4

How was their problem solving ability?

3/4

What about their communication ability?

3/4

Here’s a more concise and polished version: --- As we discussed, your ML System Design mock interview was solid and approaching the Senior/Staff level bar. With targeted improvements—particularly in time management, structuring responses, and emphasizing key evaluation criteria—you’re very close. You're on the right track, and with a bit more refinement, I’m confident you’ll do well. Best of luck, and feel free to keep me updated on the outcome!

Feedback about The Incredible Pillow (the interviewer)

Would you want to work with this person?

Yes

How excited would you be to work with them?

4/4

How good were the questions?

4/4

How helpful was your interviewer in guiding you to the solution(s)?

3/4

Interview Transcript

Full Meta Tetrahedronn: Hello. Hello, can you hear me? Yes, I can.

The Incredible Pillow: Okay, awesome. So let me quickly introduce myself and then I will follow up with some questions that I have about yourself. So my name is [REDACTED]. I'm currently working as a staff ML engineer at Meta. I have been with Meta for almost 3 years now. And before that, I used to work for Amazon, Microsoft, and also for Samsung. So I have in total 12 years worth of experience. So before — and so far in my career, I have conducted like north of 400 interviews, like a mix of system design, ML system design, coding, and behavioral. So before we get started, may I know like, um, why you're actually doing that, uh, this ML system design? Do you have like an upcoming interview? And also, what is a little bit of your background? How many years worth of experience do you have?

Full Meta Tetrahedronn: Sure, yeah, I have an upcoming ML system design on Wednesday. Um, might be something related to Reels ranking or something. Um, and working on like Reels ranking, something like that.

The Incredible Pillow: Yeah, so okay.

Full Meta Tetrahedronn: And then I've worked. So I started working on like ML engineering about 4 years ago, but I was mostly focusing on like feature engineering and also creating like the feature store at a startup. And then I worked in industry on forecast training infrastructure for a year and a half, and then ads, like relevance models, like for a search term and an item, is the item relevant to the search term out of a catalog of items, and building out some of like the simulation infrastructure. So focusing more on like the, like the kind of the backend side rather than like the modeling directly, although I, you know, do have some experience with doing modeling from school. And I've gotten like involved in it in some of my jobs. And then I took a break to do some interpretability research for a few months last year to kind of get my feet wet on a more like cutting-edge problem, although it wasn't quite so applied to like industry setting. So that's why now I'm looking to move more back into like a REXIS industry applied modeling role.

The Incredible Pillow: Okay, I see. So you're looking for switching back in MLA type of role, let's say.

Full Meta Tetrahedronn: Yes.

The Incredible Pillow: Okay, I see. And you have like an upcoming interview with [REDACTED]. For what is the position that you're actually targeting?

Full Meta Tetrahedronn: Um, it's like software engineer, machine learning.

The Incredible Pillow: Yeah, but what level? [REDACTED]?

Full Meta Tetrahedronn: [REDACTED].

The Incredible Pillow: [REDACTED]. So for a senior position, how many years worth of experience do you have into that?

Full Meta Tetrahedronn: Well, I started working in 2017. Um, yeah, but I took like a year off in the middle of it in total. So like 5 minus 7. So like, let's say 7 years of experience.

The Incredible Pillow: Yeah. The reason why I'm asking, if you are close, like to 8 years, then you are eligible to be considered for a staff level position. That's why I was asking you that question. Okay, let me give you an overview of the loop, what you should expect on the upcoming loop with [REDACTED], and then we can switch over like to the mock ML system design. So with the interview with [REDACTED], you should expect to get like to have like 2 coding rounds. Usually like you should expect like 2 medium lead code type of questions on each round, and then you should have like an ML system design plus 1 behavioral. Now the ML system design It is actually the most important from a technical point of view because it's actually being used to determine your level. And then on the other side, the behavioral, it also determines your level from a behavioral point of view. So those are kind of a, let's say, an overview of what you should expect on the loop. Now, switching from there, and let's— so What's my key point there? Like, the coding ground, they're usually pass or fail, so they're not being considered for determining your level. But this— the mail system design and the behavior are the most important from a leveling point of view. So do prioritize also the behavior as well, because I have seen a lot of people recently get down leveled because of the behavior. Now, having said that, do you have any other question for me? Otherwise, you can start like with the mock ML system design?

Full Meta Tetrahedronn: Um, no, I've read the prep guide from them, so I think I understand what's coming up. I'm ready to go.

The Incredible Pillow: Okay, perfect. So what we are going to do, we are going to switch over like to the whiteboard, and then what I'm expecting from you is to lead the whole design because this is for a senior position. You might ask some clarifying questions, but I will leave the time and space for you to lead the whole design. And then I will interrupt you around like a 43 mark and in order like to switch over like to feedback. So also keep an eye on the time and I might just sit back and I will only jump in and interrupt you if I want to ask like a question, but the expectation is for you to lead the whole design.

Full Meta Tetrahedronn: Okay.

The Incredible Pillow: Perfect. The reason being is that during like the [REDACTED] interview, you are also being evaluated on how you handle ambiguity. How you make a reasonable assumption and you move on and you unblock yourself. So what we're actually designing today is we're designing like the Yelp system, the recommendation algorithm behind the Yelp, um, system.

Full Meta Tetrahedronn: Yelp recommendation system.

The Incredible Pillow: Yeah.

Full Meta Tetrahedronn: Yeah. Okay. Um, so this is like on the home page, like you open up Yelp for the first time today and then you see a list of recommended venues, like they could be restaurants or bars or like any like place on the map basically.

The Incredible Pillow: Exactly. That's correct.

Full Meta Tetrahedronn: Got it. Um, okay. And this is, is this just on the homepage? Uh, are we thinking about the search system as well?

The Incredible Pillow: Yeah, let's focus on the, on the landing page.

Full Meta Tetrahedronn: Just on the landing page. Okay, so a user will come onto the page and given everything we know about them, um, what's in their immediate vicinity, um, what's open, what we think they're gonna— we will show them the, uh, places that we think they're most going to want to visit. So what— yeah, so what would we consider, um, to be like the thing that we're trying to optimize here? I guess it would be whether somebody actually visits the place. Um, like, how does Yelp make money?

The Incredible Pillow: Yeah, through bookings, let's say. Bookings? Yes.

Full Meta Tetrahedronn: Okay, okay. So, so Yelp has a list of venues that have some— either like a restaurant with a meal or like a laser tag place where you could sign up for an hour, but there's something, uh, where a user can convert for that venue. Uh, and we want to like make sure— like make that as like likely as possible that the user will convert, whatever conversion means for the venue. Um, I guess like, do we get paid based on like, if like you make a reservation for like, I don't know, uh, like $80 at a laser tag, like we get paid proportional to how big the payment was for the reservation.

The Incredible Pillow: Yeah.

Full Meta Tetrahedronn: Yeah. Okay. I'm just thinking like, if, you know, we want them to click on anything at all and convert, or we want to like optimize for, I guess. We maybe are going to let people, um, or like I should say the venues, uh, list their own prices. Um, but yeah, we probably would want people to, it's probably like a balance here between we want people to click on expensive reservations that we get a higher, um, commission for, but then we also don't want to like only show expensive things and then the user doesn't click on anything. And we should have shown them something a little cheaper. So there's like a bit of a balance there. Okay. So what else did I think about? Okay. So yeah, like we assume that all the places that they're gonna see are open and you can make a reservation that night, or I guess like sometime in the next week for the venue, right? Like it doesn't really matter when the reservation happens as long as it happens during the session. Like they make the reservation during the session, right?

The Incredible Pillow: Yes. Yeah.

Full Meta Tetrahedronn: Okay. What— okay, let me think about— so, okay. So let me think about what we're trying to— okay, I think I have a sense of what we're trying to do. And we, yeah, we want to increase, I guess, the, the, the, online metric that we care about. I'll start writing things down. Um, online, online metric is, well, yeah. So we get, do we get like a, is it a fixed commission? Sorry, maybe should be clear. Is it a fixed commission per click or do we get paid? You said we get paid proportionally. So, so I guess like, um, total, like the, uh, I hear a bing. Oh, did you disconnect? Hello?

The Incredible Pillow: No, I can still hear you.

Full Meta Tetrahedronn: Okay, cool, cool. Sorry. Um, so I guess what we're gonna try to optimize for is, um, the like total revenue generated Across all sessions, uh, through conversion. Like that's, that's really what we're trying to optimize for. Um, and then I'll think about what, what we're going to be tracking in training, uh, towards that goal. So, okay. So the general, I'll maybe I'll talk about first, um, like what the end-to-end flow is gonna look like. First, from when the user opens up the app and sees something, and then we can dive into different parts and explore how those work leading up to the inference. Inference time, user goes to their app and then they make a request at a certain time. They, they open up Yelp and they want to see like, given where I'm at, um, and what time it is, uh, what are all the places that I might want to look at right now? Um, and maybe either make a reservation for, for tonight or for, for somewhere down the line. Doesn't really matter as long as the session ends with a conversion, uh, where we're successful. Um, so Then when Yelp gets that user request for the homepage, then we go into some sort of index of, I guess, like all the— so this is like the Yelp server. And then we have an index which contains every— these are just primarily venues, not like the events themselves, right? Hello? Yeah, yeah, okay, sorry. Right, so we have some sort of reverse index where we know like where the location is, whether it's open, some basic things like that. I'll get into more things we know about the venues later. And then We, we, we get some sort of candidate set of, of all the, the potential venues that, that are at least like within driving distance, or, or, uh, I guess, I guess it doesn't have to be open right now, right? Because if we could make a reservation for like 5 days from now, then we don't really care if it's open at the moment that the user, um, goes to it. So I think it's more important that we just see where the user is and find nearby places. Um, and then the, the index returns some candidates which we would then rank, uh, by some means, uh, and maybe just like something like a combination of like just clicking, um, clicking or converting. I'll think about that, how to balance between the two. And then we'll serve up to the user whatever has the highest score. Maybe if we're optimizing for conversions, we could, we could rank by that. But I guess given that most people like that, that the number of, like, the number of clicks across all views is, you know, not that many. Like maybe like, I don't know what the conversion may be like. Well, if you're on the app, you're probably going to be looking at a few places. Um, but like for all the items, maybe there's like a 10% chance that a given item gets clicked in a session. And then of those, um, that you convert is an even lower funnel. Um, so I would imagine that there'd be, um, like generally more, like there would be more positive labels in the click, um, like is it clicked dataset, then is it converted? So maybe that would be make more sense as, as like a primary filter. Maybe it could be like two stages depending on how large the, um, the candidate sets are of like first you could, you could rank by is it going to get clicked, um, and then within those you could re-rank by like once you, once you do a cutoff of like the top I don't know, 50 most likely items to get clicked on. You could re-rank by the most like, the likelihood that those would get a conversion, and then show the user from those 50 re-ranked. So, okay, anyway, sorry, I should finish the end-to-end first. So yes, the reverse index would then— I don't know how to represent a model. Let's just call it like some sort of, let's say, PCTR model. Oops, sorry. Right, which then, yeah, let's say for now, even though ultimately we care about revenue, we start by just predicting clicks. So that's the overall flow at inference time of how that would work. And so yeah, let's talk about what kind of data we have available before we talk about what kind of model we would make based on that data. Yeah.

The Incredible Pillow: Okay.

Full Meta Tetrahedronn: So we have venues and users and I guess, yeah, the way that users interact. So Yelp has, right, and then venues have different types of events. So like, A user who's going to bars is gonna— yeah, I guess that's like a category of a venue. All right, let me think of a few. So there's like the free form text that describes its— or sorry, its name, excuse me, its address, the owner's description. It has associated reviews across all things that people do in it, which are left by users, obviously. And we know the price of the reservation. And then I'll just hop around a bit and I'll jump back in if I think of something else. So a user, and then like user in a session, just to make that clear that like this is what we know at inference time. So we know what time it is that the user's browsing, the location that they're at, We know the history of past reservations. We know, I guess, like reviews that they've left. And can you like, are you friends with other people on Yelp? Is there like, I guess that's not really a major feature of it, right? Is like the network of like, where did your friends go out to, right?

The Incredible Pillow: And this was the last one that I don't believe that it is part of Yelp.

Full Meta Tetrahedronn: Yeah, I don't think so. Uh, okay. Um, so yeah, and then like we need some sort of— I mean, I was— yeah, maybe like just an easier, like predefined category instead of like just relying on this freeform description. We give them certain categories that, um, like a taxonomy that they can fall into. We can maintain this taxonomy ourselves. The taxonomy could itself be like, like some taxonomic classification. So you could have like, you know, at a high level nightlife, and then below that, um, like a bar versus a club versus a jazz venue. Those are nested and then you could have multiple levels of granularity of classification there. I think that's enough to get started with. We also have our sessions. Historically, what has happened on the app with regards to when a user saw a particular venue and it was at this— yeah, so a user— so a session is composed of a user looking at multiple venues at a certain, like, position in the, like, the results, and then was clicked, like, did convert by the end of that session, which we should, we should know that, like, start time. Okay, so these are— oh, sorry. Right, so that's what we know about what's happening. That's what we're trying to predict at inference time. So let me think more about, yeah, what is the— the training objective that we're trying to optimize for. I guess, um, it might be like a combination of like, did it's either we could separately consider our success in predicting clicks and conversions, or we could come up with like one overall a metric that combines both of them. Um, hmm. Yeah, I guess, I guess to keep it simple for now, I would rather just focus on predicting clicks, and then I could think about how I would expand it to incorporate, uh, conversions as well later. Okay, so let me, let me think about how to build like a training pipeline to determine if someone's going to click on a given result. And then I'll, yeah, I'll think about, yeah, because we don't really have to focus, or maybe Should I— okay. I was thinking if I need to worry about building up the index properly because given a definition of the index, that's going to determine what's even in— sorry. Am I going to cold start this from scratch or should I assume that a system already exists and I'm iterating on it?

The Incredible Pillow: Cold start.

Full Meta Tetrahedronn: Cold start. Okay. Got it. Okay. So initially we're not going to have any sessions. We're not going to, right? All we're going to have is venues and users. So initially we have to bootstrap this somehow with some heuristic metrics. So I think before we get into like modeling anything, I would want to suggest that people visit popular venues in their area. So, um, all right, so initially what kind of data? So, okay, so, so, so this is bootstrap. So this is like Yelp doesn't exist or Yelp is just a pure like database. Like you just search for things, but we don't like show anything on the homepage by default.

The Incredible Pillow: We have a system that actually fires it. It's not an ML system though.

Full Meta Tetrahedronn: We have a pilot system.

The Incredible Pillow: Yeah, we have a simple rule-based system.

Full Meta Tetrahedronn: You can assume we already have a simple rule-based system which is giving us some sessions where— so we are giving, uh, people like things to possibly click on or convert to. And we know whether they did or not. But yeah, those heuristics might be, yeah, things like, I'm assuming, is it a popular venue in the area? Does it have like a deal going on right now or like a discount? I guess what I'm trying to say is there might be like newer up-and-coming venues that aren't going, like, I don't know if our old system has the ability to balance exploring new venues versus exploiting ones. And we have to come up with a way ourselves of randomly introducing newer low PCTR items into people's feeds just to get them out there and get information on whether people like them or not. Yeah.

The Incredible Pillow: Hmm.

Full Meta Tetrahedronn: Hmm.

The Incredible Pillow: Make any reasonable answer. Like you are designing the whole system. So you have full power over any direction of the system.

Full Meta Tetrahedronn: Okay. Right. So yeah. So if I was, yeah, if I had the original system, then I would have some sort of basic, uh, yeah, like popularity metric. And, and I guess, I don't know if we have like an ad system, um, or like based on like number of like good reviews, we do have that. Right, which is a list of reviews and a review itself has like a numerical rating, I guess, out of 5 and possibly photos and a text description of the whatever, maybe like the event itself, text. Star rating out of 5, photo optional. Um, and okay, so, so yeah, let me, let me think about how, how I build a model which would, um, which would predict click, given that— yeah, as I said, even with the bootstrap model, it's going to be the case that most people, like, for the most part, people aren't going to even click on a lot of the results. They're going to scroll through and then click on only a couple. So there's going to be a class imbalance that I'm going to have to address at some point. Um, maybe— yeah, and yeah, given that I care a lot about More about false. Okay, so false positive would be I show somebody something I think they're gonna click on and they don't click on it. Okay, I mean, maybe they would think that I'm not really good at knowing what they like, but not a huge deal. If it's a false negative where there is something in my catalog that they could have clicked on but I didn't show it to them, then that's potentially missed revenue. So I guess I would And yeah, and like given that clicking is a rarer thing, I would want to maybe have a, like a larger penalty for missing. Like if the model predicts that they weren't gonna click on it, but they actually would have, then that would incur a larger penalty than the reverse. Okay, so yeah, like for setting up the, training, um, system I need to— I need to create like a unified view of a user at a given point in time. Um, I think it maybe should be, yeah, like pairwise. So for, for a user, for a venue, um, predict the likelihood that the user would have clicked on the venue, um, at the point of browsing it in their search results. And then I guess Yeah, maybe some sort of offline metric that I would want to track would be if I'm just predicting like this binary label, but it's showing up in a ranked list and I'd want to use something like discounted cumulative gain to assess like how well the ranked result was given that I like ran a simulation and try predicting the, yeah, the likelihood of click for each item in the ranked list, right? So I'll just add that offline metric and DCG for some N posts at the top that we care about for clicks, I guess. Okay, so So I think I'm gonna, yeah, I'm gonna build a, let me think, I guess I could start off by building a vector that describes both the user and the venue. And initially to keep it very simple, I could use some sort of linear model that predicts, Yeah, that produces a value between 0 and 1 by feeding that through a sigmoid, like a logistic regression, which predicts likelihood of click for a pair. And then we could think about making that more complicated if we want to capture cross-feature relationships. But just to get something off the ground, I think logistic regression is a fine start. Um, so yeah, let me think about what that would look like. Um, I think I'll just build up each, each part of, of this, this vector at once, or on each side. So for a user, um, yeah, I guess like given the time that they're browsing, if they're like looking for something to do right there and then, which I'm guessing is a lot of time when you go on Yelp, it's not like just plan out the rest of your week. It's like I'm currently out somewhere and I want to find a place to go. That's probably the majority, I would imagine the majority use case. So I want to have a sense of their— there's— so maybe like, yeah, so just the— well, The city is already being used to do the filtering from the index, like to retrieve things that are close. So maybe— so user features, venue features, and user venue. So I guess what I was trying to say is that distance from a user to the venue could be something I would want to use, but then that's very different in like Austin versus New York, right? Um, the, like, you can drive in Austin, but you don't drive in New York. So then you could tend to go to, you're willing to go further, relatively speaking, in Austin. Um, so I, I guess maybe I, I'm just gonna like throw something out here. Maybe like distance to venue divided by like something like city diameter. It's like a really rough ratio, like, like distance ratio. Just something to like normalize it, but I'd probably think about like whether it's like public transit or something. Um, and then, uh, for the venue, like, isOpen as a bool, which, yeah, again, we could, we could still show venues that are closed. And obviously they're like, when they're sending a— yeah, I'm not sure, like, there's like the balance between do I put this into the model or do I just like rely on the user doing the filtering themselves. So they wouldn't even like see, let's say, venues that are closed if they're looking for something right at this moment. So I'll think about whether to include— yeah, I think I'll just include everything by default, but like I'll think about to do how to balance— maybe I should put it here. How to balance between users doing filtering versus using venue attributes in the model. Okay, and then, so we have the address and then we have maybe like average rating, um, past month. Uh, sorry, did you disconnect for a sec?

The Incredible Pillow: No, I'm still here. I'm still listening.

Full Meta Tetrahedronn: I see something on the, on the Excalidraw that's like you reconnected. Anyway, so I'm thinking about, um, yeah, so I want to know if it's a good venue, but then this is going to potentially penalize new venues who don't have many ratings yet. Um, so maybe there's some way of like normalizing this by like the, the, the ratings that the venues receive based and then like the number of ratings. Um, yeah, I'm not sure how to formulate that right now, but I'll think about that. Um, how to consider low rate, maybe like you have some sort of like, um, for the city, you have like, uh, prior under like a prior probability of like what the average rating is for different venues of different categories, different, like at different points in the life cycle. And then you like use that as like the baseline that you're normalizing it by. I don't know. Um, And then the, yeah, we could have like an embedding from the text of the name plus the description, which I could describe later if we have time how to generate that. But we could use something like various, like take an off this, like just concatenate them together. And then put it through an off-the-shelf BERT-type model, and that would give us an embedding to compare venues. And yeah, again, even if it's a logistic regression, it doesn't matter. Like, you can just have each of these embedding dimensions as a separate coefficient. And you could have the taxonomy as a categorical one-hot, maybe like keep the category simple, the taxonomy simple and just use like the top-level taxonomy. And then for a user, you could do something like number of times click. We also know, like, well, we know conversions, like we know when people actually made reservations. So you have like number of times clicked on, yeah, let's say, let's say venues of different— so assuming that tax, like for the top-level taxonomy, we don't have that many categories. We have maybe like I don't know, 15 or something, like, then it wouldn't be too bad to have a, like, one-hot features that were defined along, like, within that set of values, like number of times clicked on, like, taxonomy category 1 last month, and then you can, or like X. So basically I'm saying you can have this aggregated feature which describes how often did they go to a bar, or did they go to a restaurant that was selling tacos or whatever we choose for our top-level taxonomies. And that should give us a good sense of what they're gonna do next. And then yeah, like maybe some sort of like hour bucket. So like, you know, from 0 to like 23, again, as like, as a one-hot vector, excuse me, to determine like when they're browsing, that could affect what they want to click on. Okay, I could go a little deeper, but I think this is pretty much the basics of what I would want to incorporate here, given that we don't have much. Well, so we also want to consider like people who tend to go to like, so like a venue would have a certain type of person who goes to that venue, right? Like a jazz lover. So maybe I could think about like on the user-to-user side, like, like users who went to venues, users who go to venues like the ones you go to, um, also go to venues like this one you haven't been to yet. Um, but that's like a, maybe like a later stage. Um. Interaction feature that is worth incorporating, but just for the MVP, for this initial iteration, we won't worry about it. But like, um, places that are like the ones that people similar to you go to. Um, and then yeah, maybe, um, I guess this— I'm not sure. I'm not sure this is like a model feature or this is like part of the index. Um, yeah, but going the other way basically of thinking about like of the places that the user has gone, um, in the last month, what are, um, venues that are similar to those places? Um, yeah, but I don't want to get too complicated right now before I've gotten the end-to-end system working for this set of features. So, um, I would build up a training set of, of examples of, um, like from historical sessions, uh, hydrating the user and the venue, um, with these pieces of information, then predicting, or it would have the like, it did, it was clicked as a bool. And so how would I, and I have to think about how I would split up this dataset into train validation. And test. I think that I'd want to probably make sure that I don't leak user-level data. So I wouldn't want to have like data. I would want to make sure that any users in the test set are not in the trainer validation set. So keep things separate like that. And then I probably would also want to consider that like what someone's done in the past is going to help me predict the future. So I wouldn't want to have time leakage either within a user. So I would probably want to consider like doing splits where like for some users I only consider, um, like I, I train on, uh, what, like, what their history was, or like what sessions from like 5 years ago, and then I try to predict what they did. Um, or, well, not— sorry, I— yeah, I only, um, well, sorry, I'm, I'm not speaking very coherently. Um, Okay, so I guess within this user stratified dataset, yeah, I would want to—

The Incredible Pillow: Okay, let me stop you here in order to switch over to feedback. But first, I would like to hear from you. How did you find this question? And what is your self-evaluation?

Full Meta Tetrahedronn: Um, all right. So I found, well, this is a good question. Um, I found it a little difficult to know how to pace things out. I think I got bogged down with the features, which is like a common thing that people get tripped up on. Um, and I maybe wasn't I should have like gone further into like quickly getting to having a model that predicts the basic thing and then getting it deployed and then describing how I would evaluate it online. I think I should have gone to that a lot more quickly rather than going like listing out all the possible additional nuanced features that I might want to have. So my self-evaluation is maybe mid-level but probably not yet senior.

The Incredible Pillow: Yeah, I would seem to agree on that. And I want to share with you— I'm just going to share with you a link. You covered stuff like the stuff that you covered, it was actually okay. But it needed a more holistic view of the whole system. That was the missing point here. So let me actually share a link. If you toggle back to the, to the main.txt file, I'm going to share a link to a Google Drive just to have together like a look into, let's say, a staff-level solution to this problem. Can you actually access that? Just let me know once you can actually access.

Full Meta Tetrahedronn: Yes, yes.

The Incredible Pillow: Okay, once you landed there, you are seeing like the whole system. So the first thing which I kind of provide to you is like to clarify, like, are we speaking about like new user, are we speaking about like existing user? And then you need to state out like what is the business metric that we're actually going to track for this system. For example, on this one, we just maximize for user engagement. We want to keep our user, to retain our user, and also maximize the booking. So at the end of the day, you have like two metrics that you always need to look at them in pairs, like the click-through rate and the bounce rate. So we want to avoid like recommending like clickbait recommendations to restaurants. At the same time, what we would like to maximize, to maximize their long-term satisfaction, of users plus business plus also ads. So the user will want to— from a user point of view, we want to minimize the bounce rate while maximizing the productive time that they spend on the platform. So ideally, at the end, it should follow up like it should end up like with some conversion type of metric. So the users, they actually need to book or actually click on something so to have some success. Otherwise, that the signal that is actually scrolling through the app and they are not able to find what they are looking. Now, from a business point of view, we can look on churn rate, the growth rate of business signups. From an ad, click-through rate, dwell time, ad revenue. Now, as bonus, you can look on explainability like the sub-values, bias, cold start, what is the fallback signal there, and also the dilemma between exploration versus exploitation. Now in terms of the design, we have input as a user and output like a list of recommendations. Now in terms of scale, we have different kinds of generators in order to face places that they are closed and they're also open. And then we're also looking to— we're going to follow from machine learning point of view, we're going to treat it as a pointwise learning to rank and create a binary classification. And then we can have multitask, like we can, let's say, output different probabilities. Like what's the probability of actually booking or actually clicking on a listing, and then what is the final engagement score. We're going to take the weighted average of those different probabilities. Then now in terms of the labels, in terms of the labels we can use like soft labels, we can also have some hard positive and negative examples. Those can be cases that the user actually book, they leave a recommendation, or they actually hide this listing. Now in terms of negative examples, we need to be very specific that negatives are cases that they were actually shown to the user but within X hours or X time the user didn't interact with. Now we need to stratify our dataset and the reason for that is to undersample like the popular places, otherwise they're going to dominate our dataset and we're only going to showcase like the popular place. Right. And also in terms of the negative page, you can even take like some random negative user item pair. Now in terms of the feature, we are looking for offline and online feature store. We are looking also for data governance, how you're going to encrypt, and privacy as well. Now personalization, we're looking on a user level, on an economic, what is the LTV, lifetime value of the user, do they like expensive or cheap places, the relation and how they engage with the platform, what are usually the reactions that they perform, and also do they have like a bad or good user experience so far. Now, other important features are the average of the last 10 item embeddings. It is like the time of the day, are we recommending like dinner place or lunch places. Another behavioral can be like the like rate for a particular category of restaurants. And then for cold start, like how we're going to treat it, we're going to cluster like similar users together or similar listings together. Now, in terms of modeling, you can mention like two models, like logistic regression, two-tower network. You can mention that the two-tower is the model of choice because we're actually able to capture like the nonlinear relations. We're going to have a tower for a user and we're going to have another tower for the item. Then we're going to fuse them together and we're going to have separate heads per probability.

Full Meta Tetrahedronn: Like click-through rate versus conversion, you mean?

The Incredible Pillow: Yes, exactly. And then we're going to have— because we're following like the multitask type of approach, like the shared layers, they're going to update less frequent rather than the specialized layers. Now in terms of the loss, we're going to have like the loss per click, the loss also per queue. And also in terms of evaluation, it's important to mention like the different metrics. For offline, we can have precision at scale, we can have recall at scale, we can have like mean average precision. Online, click-through rate, bounce rate, churn rate, revenue lift, reaction rate. And then from a production point of view, like MLOps, how we're going to do like an offline evaluation followed by shadow log-only deployment, followed by A/B testing. Like we're going to look on the success and the guardrail metrics. Even better, like you use interleaving A/B testing. Like what exactly is that? Is out of the top 10, like 5 of them, they are from model 1 and the other 5, they are from model 2. So within the same, let's say, list of recommendations, you just blend them together. So the odd are model 1, even they're from model 2. Multi-armed bandit, what is the rollback there in case something goes out? How we monitor the alerts? And in terms of system metrics, we monitor like the latency, the throughput, queries per second. In terms of retraining, how do we actually keep an eye on the data drift? How we actually follow a hybrid approach like online, how we incrementally train the model during online, and also how we actually retrain like the full model every once per week. And then in terms of bias, like positional bias, layout, popularity. So positional bias, how you undersample like the popular places. At the same time, you can also use position as a feature. So during inference, you just apply the same position for all of the features just to cancel this out, and you take into consideration that feature as part of the training. And then for popularity, you downweight those popular places. Important to mention the exploration versus exploitation. If you are only, say, casing like the— follow the exploitation, actually what the system is actually recommending, you're going to have a self-reinforcing loop. So you need at some point to also recommend some random place or some new place. So the system overall looks like that. You have the GAN generator, you optimize them for recall, and then maybe you have a distilled version of your bigger model that looks on a particular set of features, actually take the 10K business, limit them to 100, and then you have your heavyweight ranking in order to reprioritize those ones. And then you follow with the post-processing. There is just if you want to do some— apply some business rule, maybe for example you do not want to show the same cuisine more than 5 times at the end in the recommendation. And that is all in terms of decision. Like for example, like think we're not going to recommend, let's say, Italian places, only Italian place, like have like more than 5 positions at the final list. And the way to actually limit that is through the post-processing, who have some business rules there that they actually adapt. And that is all in terms of the system.

Full Meta Tetrahedronn: Okay. Huh. Whoa. So you think it's good to like copy this like skeleton of design, labels, features, modeling, and then like try to run through that really quick and like just get it working end to end, even if I haven't really fleshed out the features, you know, while I touch upon it and then later go back and I guess on my own, um, decide where it's most lacking. And then, or should I like ask the interviewer, like, where do you want to dive into as far as bottlenecks?

The Incredible Pillow: Yeah, I would say follow this framework because it actually helps track some of your let's say, your thought, and then always leave at least 5 minutes at the end just for questions and also places that you can go back and actually do it like a deep dive.

Full Meta Tetrahedronn: Yeah. Okay. Yeah, because there's so much, you're only going to be able to dive into like one thing. Yeah. Make sure that like you've at least mentioned it so that you could dive into it if they want you to.

The Incredible Pillow: Exactly. Okay, that's it.

Full Meta Tetrahedronn: Sounds good. Thank you.

The Incredible Pillow: Yeah, no worries. So the thing that I was actually mentioning, it is like just try to follow like a structured approach here.

Full Meta Tetrahedronn: Yeah.

The Incredible Pillow: And then take it from there.

Full Meta Tetrahedronn: Okay, okay. I'll try redoing it with this structure. Thank you.

The Incredible Pillow: Yeah.

Full Meta Tetrahedronn: Well, all right, so you'll send me like a kind of like a write-up of things you thought about during this interview?

The Incredible Pillow: Yeah, like for me, like you were missing a little bit of structure. Like there is a few touch points that was actually to the point, but I would explain like more of a holistic picture. Like I remember that this role is in the ML engineer roles, so they're looking from end to end. They're not looking for a data scientist to actually train a model. They're actually looking for a person that will hit the ground and they would actually able to deliver, let's say, model from, let's say, from start to end. Mm-hmm.

Full Meta Tetrahedronn: Right.

The Incredible Pillow: Yeah.

Full Meta Tetrahedronn: Okay.

The Incredible Pillow: Okay. Also, follow the structure, follow the system, and yeah, and best of luck with your interview.

Full Meta Tetrahedronn: Appreciate it. Thanks so much. Have a good night.

The Incredible Pillow: You too.

Full Meta Tetrahedronn: Bye.