We helped write the sequel to "Cracking the Coding Interview". Read 9 chapters for free

An Interview with a Google engineer

Watch someone solve the design a personalized news feed system problem in an interview with a Google engineer and see the feedback their interviewer left them. Explore this problem and others in our library of interview replays.

Interview Summary

Problem type

Design a personalized news feed system

Interview question

Case Study : A news feed is a feature of social network platforms that enable user engagement by showing friends’ recent activities on their timelines. Many social networks - Facebook, Twitter, and LinkedIn - personalize their news feeds to maintain user engagement. I want you to design a personalized news feed system. Assumptions of Safety - We build only for an American population ( I18N Internationlization is out of scope ) - Your system should be highly scaleable. - You may leverage pre-existing ML technologies, algorithms, and architectures. But we will not discuss them in significant depth for the sake of time . - Type out your thoughts / explain your thought process

Interview Feedback

Feedback about Fluorescent Torch (the interviewee)

Advance this person to the next round?

Yes

How were their technical skills?

4/4

How was their problem solving ability?

3/4

What about their communication ability?

4/4

FEEDBACK : TC := THE CANDIDATE TI := THE INTERVIEWER ( OVERALL ) TC thinks they performed 3 on 1-5 star scale ( POSITIVES ) WHAT DOES TC THINK THEY DID WELL ON ? + TC clarified the question strongly. + TC did well on sections before pipelines : fEng, dataEng + TC discussed several models and learning strategies : TTNN and MTDNN ( GROWTH AREAS ) WHERE DOES TC THINK THEY CAN GROW ON? - TC can review pipelines/systems portions : data prediction and data generation. - TC can work on proactively drawing boxes-and-arrows. - TC can study offline metrics.

Feedback about Purple Brontosaurus (the interviewer)

Would you want to work with this person?

Yes

How excited would you be to work with them?

4/4

How good were the questions?

4/4

How helpful was your interviewer in guiding you to the solution(s)?

4/4

Interview Transcript

Purple Brontosaurus: All right. I really wanted to get started on the case study question. Let me know if you have or have not seen this question before. I've taken this from Alex Zhu's book. It's called the Machine Learning System Design An Insider's guide from chapter 10 on designing out a Personalized Newsfeed.

Fluorescent Torch: I didn't study that one. I read the book, but I literally stopped before this chapter.

Purple Brontosaurus: Yeah, that's fine then. Let's go do this. If you haven't done it, that means that you haven't done this question, so I can quickly read this out for you. So if you're familiar with Newsfeed, it's a social network platform feature that enables user engagement by showing friends recent activities on their timelines. Many social networks, Facebook, Twitter, and LinkedIn, personalize their news feeds to maintain user engagement. I want you to design a personalized newsfeed system. All right. Does this make sense?

Fluorescent Torch: Yes.

Purple Brontosaurus: Perfect. And kind of. Because you've been reviewing Zeus books, we're going to kind of mirror his template structure. We can deviate a bit if we do or don't need to. But I do want us to start off with clarifying questions and the framing of the machine learning problem. So business objective, ML objective, and ML category. After that, we'll take a word across multiple different sections. Sound good to you?

Fluorescent Torch: Yeah.

Purple Brontosaurus: All right. And I want you to really type out your thoughts, explain your thought process. So I'm not just paying attention to you technically, but also in your communication skills.

Fluorescent Torch: Yeah, sure. The first thing I want to clarify is that question you mentioned only show for instance recent activity. By recent, how far is that?

Purple Brontosaurus: That's a really good. Let's say that we get to set like the gauge of what recency is. So I'm going to take recent activities to be like the past month or past seven days. So not. Not too far.

Fluorescent Torch: Okay.

Purple Brontosaurus: Like I don't want us to think too much of pagination, if that makes sense.

Fluorescent Torch: Yeah. And in the. In the system with users reasonable activity I. By this activity do you mean posters with only text or videos and. And anything like that?

Purple Brontosaurus: That's a really good question. I'm going to let the post be multimodal. So images, videos and text.

Fluorescent Torch: Okay.

Purple Brontosaurus: Basically, if you need you can set the stage to extract as much information as you need from the timelines. Does that help?

Fluorescent Torch: Yeah. Yeah. And I assume that the business goal is to increase the user engagement, right?

Purple Brontosaurus: Yes. Increase user engagement and platform time spent.

Fluorescent Torch: Yeah. And about the size of people's recent activity. I assume people have less than 1,000 friends and aged friends make one post per day. So the retrieved result I said about 10,000, let's say. Yeah.

Purple Brontosaurus: Like if you can use simple numbers and I don't mind.

Fluorescent Torch: Yeah. Okay. So the total the number of items we have in our database for all the users, there could be millions, right? Yes. Yeah. Okay. So let me think about. We are allowed to use for instance information for our prediction, for example that there may be some private data. We cannot use it, right? Yes.

Purple Brontosaurus: We can leverage user profile information and user data.

Fluorescent Torch: Okay. Okay. Only public ones.

Purple Brontosaurus: Yes, only public.

Fluorescent Torch: Okay. I think I have a good understanding about this question. Okay.

Purple Brontosaurus: Is there anything else that you want to ask like latency engagement type or interactions data?

Fluorescent Torch: Yeah. I think I assume that in your system you have already logged those engagement history between users and those posts. So that one we can filter out those one from the is friends. Right. And about. About the. And do we have to show it to the to new stuff to the user on every time he open the app or we should it's okay to periodically write as that one day see the same thing.

Purple Brontosaurus: We operate more like a real time personalized news feed. So we display data closer to real time requirements.

Fluorescent Torch: Yeah. Okay. So this means that every time the user open app we have to retrieve new stuff or at least run it periodically run it very short of time. Yeah. That 2 solution we can approach. So in both way we can try to reduce the about the service time. We assume that we want the result after the user should see the result milliseconds after he opened the app.

Purple Brontosaurus: Yep, you got it. That sounds good to me.

Fluorescent Torch: Okay, I think that close my clarification questions.

Purple Brontosaurus: Yep, let's move on to the next section.

Fluorescent Torch: So the business objective is do it. How to do the out of scope part.

Purple Brontosaurus: It's not that we have to do it. It's more like if you think something is out of scope.

Fluorescent Torch: Oh, okay. I don't have something for this part.

Purple Brontosaurus: Okay, that's fine. Then we can skip that.

Fluorescent Torch: Yeah, okay. The template.

Purple Brontosaurus: But we can deviate from the template. But I still need to ask ourselves a couple of these questions here and there. So we do need like, what is like the ML objective and the ML category. That's something that needs to be. That needs to be answered.

Fluorescent Torch: Yeah, of course. Yes, let's go to that part. So the business objective we had talked about, the machine learning objective will be we have to find from for instance recent activity. We find the most one that potentially infiltrated to the user and rank them based on the probability of the data. He will click it or interact with it. Yeah, that's the machine learning objective. This is we can do it as a classification problem to predict whether the user will engage with that post if we show it to him or we can do it learn it as a learn to rank problem just to rank which from all the list of the activity from his friends, which one should be ranked at the top?

Purple Brontosaurus: Got it. That makes sense. Okay, so I do want to ask like is learning to rank the only type of ranking to consider? Do you want to think about any other ranking strategies here?

Fluorescent Torch: Learning to rank. The first one is learn to. It's a binary classification. Predict whether he will engage.

Purple Brontosaurus: Yeah, okay.

Fluorescent Torch: Learn to rank. That's the two strategy. I come up for this question. Let me think about it.

Purple Brontosaurus: That's fine. We can move on to the next sections. So let's move on to feature engineering or data engineering.

Fluorescent Torch: Yeah, I think we should cover both of them. Let's do first feature engineering part first. So the task is to pre. Let's say we take the first machine learning category. We use a binary classification to predict whether the user will engage with data post if we show it. So for that one we need a features from both side and actually three side. Firstly is user information. We have information about the user's age, user's gender, users. Last time he engaged with this kind of topic. The topic he likes to engage with. This we have categoric features, numerical features and we have embedding our users to so we can get similarity between user and the posters. This embedding should be based on his engagement history. The second part will be features about items which is posted in our system. We have to have embedding all of those images this fit text and those topics of those posts. And we can have how long has it been made? This is a very important feature. And how many engagements has been has it received among his friend circle in users friend circle? This yes. Peer ID and yeah, so if you.

Purple Brontosaurus: Can like kind of type out your schema and what you're thinking about, that would be super helpful. So yeah, kind of tell me like what data you want to capture.

Fluorescent Torch: Yeah, sure. Let me type post ID catchments bending. Yeah. Okay. Then the third part should be the the interactive features interaction feature between user and the author of the of a post. Yeah. Basically we want to for example what how frequently how many percentage our post user engaged with. Which means that among all those posts made by that friend how many of them are retrieved were engaged by this user? So this one. Yeah, here actually I have a topic and how close if there's a way we want to also as the relation between this and the friend. Other classmates. Yeah, this. Okay.

Purple Brontosaurus: Okay. So I want to delve into this interaction. So let's get a bit more granular on the interactions. Like what type of interactions do you want to focus on and how would you kind of wait some over other interactions.

Fluorescent Torch: So the what I'm looking for is whether the user click that post, he reshared, he leave comments. All these are take as a positive. Yeah, those are the engagement I'm talking about. So I will calculate the how frequent he get it and the last time he do this. And yeah, those are the. Sorry, I didn't care to get the second question.

Purple Brontosaurus: Yeah, so we kind of have like a couple of interactions right. Like we have click reshare comment. Okay. I want to ask like why we want to get this interaction data.

Fluorescent Torch: I wanted to use the data as the feature and this is also as a fit label as our training system.

Purple Brontosaurus: Got it. That makes sense. Okay. And then the other thing that I want to ask is how do we weight these? Like do we weigh each of these interactions equally? Do we give different weightages to them?

Fluorescent Torch: That's actually decided based on our use business requirement. If we want to say okay because commands and reshare are very rare in our system. If we want say I don't only care about a click now. Yeah, we can only take a click as as our engagement.

Purple Brontosaurus: Okay, that makes sense. Can you think of any other engagement metrics so you have click reshare comment. Is there anything else you want to try to capture?

Fluorescent Torch: A message is also another one. Yeah, maybe he read it and he just met it. I leave comments reshare and yeah, those were very popular ones.

Purple Brontosaurus: Okay, I'll just add a liking as an interaction to help with the problem. I have a question. So a lot of these interactions they prioritize the active users, right? Yes, one to five. But a lot of most users are passive, right? Yes, like people who like just read Reddit. So how do we get information for passive users?

Fluorescent Torch: Yeah, we can have another feature which is time he spent on this state on this post. These are very good engagements. If this can be another sign of engagement. Yeah, other signs. Let me think about it. Maybe if the user read it, what's his next search query? What is the next post he engaged with? Is the same topic with this current one. I think it is also very good indicator whether engagement Got it.

Purple Brontosaurus: Okay, that makes sense to me. Okay, so coming back to the feature data engineering. So we have a data that's a texture. We have data that's video, we have data that's images. Tell me how you get this into a numeric form. So you don't have to tell me for all pieces of data but I'll give you example data. So let's say that we have a post with a thumbnail like an image post with a 10 second with a 10 second video. Let's say that we have user age info. Let's say that we have some type of username. So how do you get all this from an unstructured non numeric data to numeric form?

Fluorescent Torch: Yes, for the first one we have first we have embedding. We can embedding this thumbnail and videos into a vector. We get the the representative of this the other way is for example for egg.

Purple Brontosaurus: My question is how do you generate those embeddings for the image and video? Like what do you do there?

Fluorescent Torch: Oh yeah. Okay. So we do use a two tower model to train embedding. The label will be the user engagement or not with the and there are two tower model. The first tower is user tower is user features, user information age and all those user features we can have and we have put another tower is the item feature item some title embedding or title ID or something like that. With train in my model to decide what's similarity, what's the probability of how what's the similarity between the user and this embedding.

Purple Brontosaurus: That's not what I'M asking but that is a good discussion. I'm just asking like how do you generate those embeddings? Like. Like if I just give you like I'm just really testing if you know how to handle data here. It's a bit easier. So if I give you like a post content which is like a sentence slash, unstructured text, I'm asking you like how do you handle data? So you tell me like do you use pre trained models or do you do something else?

Fluorescent Torch: Yeah, I use for text we can use Pre trained Modifold, for example water to Vector. We converted it to a sequence of vector and we calculate the medium. No, the mean of them. It's a very simple way to get it. For images and videos I think we have to train. There are pro train available but for our case I think it's quite different from our distribution. It's better to retrain it.

Purple Brontosaurus: Okay, got it. So what about user age and username?

Fluorescent Torch: Username we can just convert. I would like to convert the username to a binary feature if this friends into some other feature. For example, whether they are in the same city, are there classmates? Yeah, those kind of binary embedding to show the relation between them. Okay, yeah, we can also use embedding but I doubt the helpfulness of the username embedding in this situation.

Purple Brontosaurus: Got it. That makes sense. All right, I think I'm good here. I have another question on the feature and. Okay, data engineering is the same. How do you manage interaction data that you don't have a lot like let's suppose like low volume of. Let's see, comment or reshare data. High volume click data. Tell me how you manage this.

Fluorescent Torch: Yeah, this way we have to handle them in training. We can merge them into one level, call them, totally call them engagement or we put them into multitask branch in multi ML. So we train them task for each of them and to predict for each of them. And at the end of the machine learning model we build a human rules to use weighted sum to combine those predict probability.

Purple Brontosaurus: Okay, that is correct but that's not what's being asked here. What I'm kind of asking is kind of, I'm asking like how to avoid bias how to get more data.

Fluorescent Torch: Oh, okay. Do you mean in the training process? Yes. Oh yeah. Okay, so how to get more data Engagements. We can down sample the clique if that's the way we want. Sample the case data points with those other engagement types.

Purple Brontosaurus: Okay, so how do you upsample like how to up Sample?

Fluorescent Torch: Yeah, you know database. Let's say we have 1000, 10,000 click and we have 100 commands. We can random sample from that. Just repeat the samples with multiple times and. Yeah, in database.

Purple Brontosaurus: Okay, that makes sense.

Fluorescent Torch: Got it.

Purple Brontosaurus: What about data augmentation, can we use that too?

Fluorescent Torch: Let me think about for this data augmentation, since this one I think we already have some. Do you mean use data augmentation to solve this bias problem? Yeah. Okay, let me think about it. I think I would not use data augmentation method because I want to keep the data as real as possible to the real distribution.

Purple Brontosaurus: Got it. Okay, that makes sense to me. All right, so so far I'm happy with feature and data engineering steps. Let's move on to the machine learning system design portion. So this is like the ML model. So tell me like the ML model and so this is where you're going to tell me like which ML model used. Why use the ML model and trade off analysis across different ML models.

Fluorescent Torch: Yes. In this one, assuming we decided to use the binary classification task, the many machine learning model method, I will consider random forest and deep learning based models. Random forest is a very simple model. It's easy to implement. It has high interpret interpretability. After we train the model, we can show check the model and design learn why the model thinks in that way. That will be very helpful for deep learning model. It has a very high represent learning ability and we can use it for continual learning every day. We can retrain our previous data points. That's good point. And your model is. Yeah, that's the thing I come up with while doing model selection and okay.

Purple Brontosaurus: Well what about cons? Like we listed pros. What are the cons of random forest? Forest. What are the cons of deep learning?

Fluorescent Torch: Yeah, the deep learning, the speed for random forester. It could be too simple for this task. And the good point is another good point of it is quite fast for deep learning. It totally another side. So it could be too slow if we want to get a result in very short time and we have many too much data to process.

Purple Brontosaurus: Okay, that makes sense.

Fluorescent Torch: Yes.

Purple Brontosaurus: So one question is knowing the form of our data, which model do you choose? Like we have random forest, we have deep learning. Which one is your preference?

Fluorescent Torch: My personal preference is to use deep learning though there has been proved in many industrial problems and we have so many learning materials and yeah, we have many resources to do it.

Purple Brontosaurus: Okay, got it. So it's a de facto. Okay, that makes sense. So a question for you is kind of like Coming back to the deep learning model, like let's see. So this is the part where I'm going to take you to like the whiteboard and I kind of want you to start diagramming how your. This is just on the ABL model. Let's suppose you have your input features.

Fluorescent Torch: Right.

Purple Brontosaurus: So show me how your input features turn into the probability of engagement.

Fluorescent Torch: Yeah.

Purple Brontosaurus: If you use a deep neural network.

Fluorescent Torch: Okay. Yeah. So this one, this is fitted system. We have multi stage. The first stage is the retrieval and then we have a ranking stage. After the ranking we have a re ranking stage. I will skip the retrieve part. Assume we have all those features, user features and we have another item. Features we have. Then we have many. Each feature has its own tower. I say user tower but just the tenth layer for sympathy purpose. And I have item dense layer after that I will. After that I will use another layer after that to concate those features together. Then push another concate and other layers. The reason I design in this way is since it is a fade from friends only, I assume the data points we have to say that for each user is not that heavy. So we can use this layer to get the probability of it.

Purple Brontosaurus: Got it. Can you draw your arrows to show the flow of data from your features to your layers to your final output? So these arrows help to indicate data flow or workflows in the system.

Fluorescent Torch: I use soft marks. Let's say we have a multicast problem. Yeah. Any question to the structure of this network? Yes.

Purple Brontosaurus: So what type of network is this? Is this a two tower neural network? Is this learning to rank is that.

Fluorescent Torch: Is a total deep neural network? It is a binary classifier. I use the softmax to show that the. Okay. I can use sigmoid.

Purple Brontosaurus: Okay.

Fluorescent Torch: Yeah. To make sure it's binary problem.

Purple Brontosaurus: Now I have a question. This concat and other dense layers, what's going on in this concatenation? Like why do we need concatenation in the tower neural network?

Fluorescent Torch: Yeah, this one we can better represent the inter. Yeah, we have item features. Since I have user feature and user in the engagement feature. Yeah, I need another one here. Layers, interaction features. Yeah, the concat and the other dense layer is to make sure the representation of a user feature and item feature can be made better learned and can be. Yeah. More better learned together by dense layers.

Purple Brontosaurus: Got it. Okay.

Fluorescent Torch: In each tower we only learn something specific to the user. But the interaction between this user and item we also learned here.

Purple Brontosaurus: Got it. That makes sense to me. Okay. I want to ask, can you think of any other Neural network architectures. So two tower is one approach. I want to explore what other architectures you're familiar with and why you would.

Fluorescent Torch: Consider that if we want think this model structure is too complex and we want this model to be faster and we can use a very simple one. After we have two we remove those 10 layers and we calculate the similarity between these users and the items. But the interaction features we have to find a way to put a decide which which side to be. So we have a user and with the item features maybe we just drop it. Then we calculate the similarity. Yeah. Then add similarity and use soft marks to got the probability.

Purple Brontosaurus: I see that that makes sense.

Fluorescent Torch: Yeah. This in this way. The good point is that the the feature and the embed last layer of the item can be pre calculated and saved into the database system. While we're doing serving we only to calculate user side and get the similarity very fast. Got it.

Purple Brontosaurus: That's good. Now I have one question. So this is probability of engagement. But what if I came back to you as a business and I said I don't just want the engagement probability, I want the probability of the type of engagement. So I want to know if it's a like, if it's a comment, if it's a share like I want to get more granular. How would you change your model?

Fluorescent Torch: Yeah. Okay.

Purple Brontosaurus: Now you have to tell me the type and the probability of each interaction type. So it's not just oh, it's just P of engagement, it's zero to one.

Fluorescent Torch: Yeah. Okay. Then I will convert it to a multitask model here. Yeah.

Purple Brontosaurus: So we can copy paste this diagram into another section. You just have to like click all this and basically copy over. So let's go to the right.

Fluorescent Torch: Yeah.

Purple Brontosaurus: And I want you to show me how you make those changes to your architecture.

Fluorescent Torch: Okay. How to go to the left right side.

Purple Brontosaurus: Oh can you like zoom and click on the hand icon which will allow you to scroll to the right.

Fluorescent Torch: Oh okay.

Purple Brontosaurus: Okay, perfect. So you can see model two. So I want you to convert this into a multitask model.

Fluorescent Torch: Yeah. Okay. Then I have a colleague branch. I have probability for it. This should be content and and layer. I have another branch for commands. Yeah, I will get the probability of it. Yeah, let me fix them. Let's just, let's take these two example. So for issue task there will be so each tag we used to have its own branch after that layer and we calculate the probability for it separately. Perfect.

Purple Brontosaurus: And so I want to ask like since this is a multitask model. What's the benefit of still having a single concatenation layer? Like why not separate out into different layers for each engagement?

Fluorescent Torch: Sorry, sorry. Can you repeat the question? So we've.

Purple Brontosaurus: We transformed our model from single task to multitask, right?

Fluorescent Torch: Yeah.

Purple Brontosaurus: We still have our concatenation layer. So I want you to go over why we don't split the concatenation layer up. Like why do we still keep a single concatenation layer across each task?

Fluorescent Torch: Because those features between them can be shared. The concate. Sorry, I don't really get the question. Why don't we do we keep. That's the way we keep the first feature into the system.

Purple Brontosaurus: I'm just testing like your knowledge of multitask learning. So I want to assess like why, like why not separate out the concatenation layer? Okay. It's fine. I got the gist.

Fluorescent Torch: Okay.

Purple Brontosaurus: Okay.

Fluorescent Torch: Yeah.

Purple Brontosaurus: So so far I like that we have a multitask model that can get us probabilities of each type. So I want to take us back to the text editor. I want to quickly ask. Actually not a text editor. I want you to come up with the diagram. So pretend you have your model. Let's go down and let's come up with a diagram that shows the servants. You have your end user. They go to a service and they get their personalized feed. Show me the end to end flow visually.

Fluorescent Torch: You mean the structure of the feeder system? Yes.

Purple Brontosaurus: So this is like the system diagram portion. You're going to show me your user, you're going to show me your services, you're going to show me your features going in and the end result being their personalized feed.

Fluorescent Torch: Okay, so we have a user here. We have a system. This is offline system. They set with some offline features. We have item base, we have item database. Then we have retrieve item retrieve stage. You just. This retrieves. They just find all those posts from his user inference. If there are two more than that, we just sorted by age of the the data the post are met. Okay. After that we have a renting stage. Like instead we will use overline features.

Purple Brontosaurus: Okay.

Fluorescent Torch: Let me. I can move it.

Purple Brontosaurus: It's okay. I'm also not seeing too much of the diagram, but I think we can come back to the text editor instead. Okay, so let's just go through the diagram. But more like textually, right?

Fluorescent Torch: Yeah.

Purple Brontosaurus: Line 105. So I got like the idea like you have your. You have your end user. They hit a retrieval service. You mentioned they go through like some ranking service and then A re rank service, right?

Fluorescent Torch: Yes.

Purple Brontosaurus: Okay, so can you like start editing this online 106. Let's start showing me more system design interactions.

Fluorescent Torch: So. Sorry, can you repeat the question?

Purple Brontosaurus: Let's go to line108 on the text editor. So let's toggle away from the whiteboard. So this is. So you have like an offline mode, right?

Fluorescent Torch: Yeah.

Purple Brontosaurus: So you have your end user. They take in offline feature data. It goes through a retrieval service and a ranking service and a re ranking service.

Fluorescent Torch: Yeah.

Purple Brontosaurus: Okay.

Fluorescent Torch: And email to me to put them here. Yeah. Oh, okay. So we have each user. We retrieve the retrieval system will interact with item database and find all those. Yeah, let me. Yeah. To find all those items. Then the. The ranking stage will have as you features that we kick in later in a ranking stage. So. So let me put it here. Okay. And the user will also have a. There is online. Let's change this to online one. So okay, so we know that the system. We use both online feature and the offline feature through the system. And after that we have the list of items and we re rank them the feature store, the online date feature store item database. Then we get the list of our post sorted by probability for clicking in getting. Yeah. Okay. Then I think that's the. That's the structure. I think. Do you want to put some other components you think necessary into it? Yes.

Purple Brontosaurus: So we have what looks to be a good prediction pipeline. So we have like the flow from the end user to the list of posts sorted by probability of engagement. Can we also make the data preparation pipeline?

Fluorescent Torch: Yes, yes.

Purple Brontosaurus: So tell me how you populate this feature store and the online feature computation.

Fluorescent Torch: You mean put this online. Okay. So we have in our database, we have some database we use. This could be saved into a relational database or online key value online. We do. We don't do embedding. We get a medium. Let's say we calculate the medium over those values or mean. Let's say mean over values. We get a frame other transformers this. Then we get how to populate data into online feature store and the feature store from raw data. This some are here saved.

Purple Brontosaurus: Okay.

Fluorescent Torch: Mean over values and push them. I will push the new Kiwi database so we can read online. There are some other ones. We can just save them into gpu. For example, if we have embedding one so we want to find the care nearest neighbor we. We have to push them into GPU also. Is this detail enough for you? Yes, that's detailed. Okay.

Purple Brontosaurus: Okay, that looks good to me. Let's go through the metrics online and offline. So tell me about like how you would evaluate the effectiveness of your ML models under deployment. So we can start off with. We can actually start with offline metrics.

Fluorescent Torch: Yes, for offline matrix depends on the machine learning task. We decided we can use metrics to evaluate the ranking. For example, ROC EUC is this is the area under the curve of roc. The higher those means, the better our model is. If we decided to use other training strategy we can use ndcg. We know the ranking state performance our model and we can also have some others for the confusion metrics. This is where even we design binary class and F1 score precision recall. Yeah, we can have all those.

Purple Brontosaurus: Okay, tell me like the ROC AOC curve. Like like what does that help with.

Fluorescent Torch: The help the higher this value means that how the value means the probability the model will put the result that user will click before the result that for any pair of the user will click and will not click the probability that model put the wheel click one before the other one. So that's the meaningful way.

Purple Brontosaurus: I'm kind of asking like trade off wise. What what does the ROC AEC show.

Fluorescent Torch: Trade off?

Purple Brontosaurus: If you can kind of tell me like what the curve what type of trade off is shown with the ROC AOC curve? You're not wrong at the area under the curve summarizes the performance of the models. But you need to tell me like what the curve shows like on each axis.

Fluorescent Torch: Oh, oh, okay. Okay. The. The x axis I think is true false positive red. The y1 is true positive red, true.

Purple Brontosaurus: Positive rate and exact. So are they both true positive rates?

Fluorescent Torch: The first is false positive red. Yeah.

Purple Brontosaurus: Okay. And why do we have and why do we want to look at the trade off between the false positive rate and true positive rate?

Fluorescent Torch: For some machine learning tasks we care about, we have some cost preference. We want the model to be more.

Purple Brontosaurus: More.

Fluorescent Torch: Let's say sensitive or it's more. More active. No, not much active. Yeah, we want the model to have more recall and others such as we have model to have more precision. So for this one we can select between them.

Purple Brontosaurus: Okay, that makes sense. Now let's go to online metrics. So what would you analyze?

Fluorescent Torch: The first thing I will analyze is of course the click rate. Then I will have the rate of commands and rate of rate of reshare. This can point wise battery for sensual wise we have metric to how many times the user uses the app every day, how long he spend on the app with it, how frequent he go to the reference page to Deep Dive. Yeah, this kind of decision based features and I will have other business features. For example, if our business goal is to make sure we get the dau, then we can check it.

Purple Brontosaurus: What are you trying to hit here with the business?

Fluorescent Torch: If we have some business related metrics we can check it. For example daily active users, month active users. If they. Yeah, these are features for business purpose.

Purple Brontosaurus: Okay. Can you pick up any other online metrics.

Fluorescent Torch: For person times, AM style, frequency? Let me read the question. Time spent click rate. Time of time spent online metrics. Yeah, I think for the model performance we have to monitor how fast our system is and for example, what's the average time since they start the request and get the results? Yeah, those kind of a system size.

Purple Brontosaurus: I see average latency analysis.

Fluorescent Torch: Yeah.

Purple Brontosaurus: Okay, that's good. I want to ask to stress test you Are there some limitations to these metrics? Like is click through rate like by as combination. These are good metrics. So if we just considered only click through rate. What's the limitation?

Fluorescent Torch: Yeah, the limitation users if we can cater only one that maybe the user click it and quit the app. So it says time he spent on it is quite low. So it only can represent one small part of our business goal.

Purple Brontosaurus: Okay, that makes sense. So I want to ask with the time users spend on the app that's also like app time spent. Is there any other time engagement metric you collect?

Fluorescent Torch: I would use a time spent. Yeah, for increasing he use it every day time spent and every year time he use it among all those times.

Purple Brontosaurus: Okay, that makes sense. That makes sense.

Fluorescent Torch: Yeah.

Purple Brontosaurus: Another question that I want to ask is what about manual surveys? So what if we got asked for. What if we solicit for user satisfaction? Do you think this is. Are there any limitations.

Fluorescent Torch: User? Sorry, I do not quite understand. What does solitate user certification mean?

Purple Brontosaurus: Oh, like grab. Solicit means to grab.

Fluorescent Torch: You mean we give user us away and ask his Are you satisfied with this fit? Yes. Yeah, of course. This. Yeah, I think this one. Yeah, I think this definitely helpful. If we have this for example the user click yes then we know that our model is pretty the correctly. We should give more weight to those model to those positive ones and give it less more weight to the data points. If we is not satisfied. I mean our model is not a little bit failed. So for this positive samples I think we are not very confident with it anymore.

Purple Brontosaurus: Okay, that makes sense. All right. So overall that looks good to me. I don't think we have too much time left in the session I just want to take it. I want to delve back and see if there's something I could explore further. Again, quick question. I didn't talk too much about how you get your training data. How do you get annotated training data?

Fluorescent Torch: For this one we have we can use interaction log in our system. Every time we show the users the fate we log all the result and we log which one the user collect and which one the user engaged with. So the training data can be used get from this one or you can ask use other things. For example, let me think other data sources other than log. You can ask if we is possible we can give information over the user and this post from user ask a human label to help us to whether the user will be interested to that one. This could be helpful for our code if the system could start problem.

Purple Brontosaurus: Okay, so you can use a human labeler annotator or leverage interaction logs. I have a question. If we have something like a dwell time or like a time spent on a post how do we determine if positive or negative? Because dwell time is numeric, right?

Fluorescent Torch: Yeah, yeah, that's very good question. I think it really depends on our business. Business require you can decide of you. If we choose a very long time range the label will be very rare. If we use a very short one the user open and click. That's not a good engagement. I think we can do a B test for this purpose. We find a model that increase this hypervol. For this let's say we have a threshold. We have a model. We increase that matrix and we see whether other business matrix have increased or not. Otherwise we say that this level, this record is not good.

Purple Brontosaurus: Okay, all right, that makes sense. And another question is how to select your loss functions.

Fluorescent Torch: That that really depends on the machine learning objective we have selected. For example here we have to select machine learning binary classifier caching problem. We can use cross entropy if we use decided to learn it as contrastive learning we can use structured learning loss based on similarity score. So yeah, that's the one. But the purpose is always to make sure the model can train with better offline performance.

Purple Brontosaurus: Okay, but what if we do. What if we have to. What if we use a multitask deep neural network how to compute loss there?

Fluorescent Torch: Yeah, that's very important. So in that system we have very unbalanced data set click. We will have more level positive level and commands and other we will have one very little one. If we want to our system to be more accurate on those small tasks we can give them higher weight. This can can be done by prefix or we can use some automatic software technique and our model. So that model will learn to balance the weight between each task during the training.

Purple Brontosaurus: Got it. Okay, so we learned. That makes sense to me. Okay, we're at the one hour mark. I'm going to end the session here. Yeah, yeah, sure. So I just want to do a self assessment.

Fluorescent Torch: So.

Purple Brontosaurus: Overall I want to get. So let's start off with the positives. Let's start off on a high note. What does. What do you think you did well on?

Fluorescent Torch: I think I did well on the question clarification. Then I did oh well on the before the prediction pipeline and the data preparation pipeline. I think I did well during that I'm a little bit panic so after that I have a little bit of panic and I think didn't do very well.

Purple Brontosaurus: Okay, so. So like where does PC think they can grow on? So what are areas that you think you can work towards?

Fluorescent Torch: Definitely those pet pipeline part and the data prediction pipeline. Data preparation pipeline. I will do it. I will learn how to draw the picture and how to explain it and what other components I should include in it. And I have. I think I should study more about overlying matrix when people are asking question what they want to know. Yeah, I'm not sure whether I'm correct about this ROC curve definition. Yeah.

Purple Brontosaurus: Is there anything else that you think you did well on?

Fluorescent Torch: I think I bring multiple model structure and talk about several learning strategies. I think that's a good point for it.

Purple Brontosaurus: Got it. Okay, that's good. Yeah. So overall I agree with this. Overall how do you think you did? How did. How do you think you did?

Fluorescent Torch: I think if it's one to five I should get five. Oh, sorry. Sorry. Three.

Purple Brontosaurus: Okay. So overall I think you actually perform strongly. I would rate you closer to a four than a free on a one to five star scale. I feel like you covered. We covered a lot of good ground. So the clarifying questions are done really well. The ML problem task framing, the business objective object, ML objective and ML category. It's good. The feature engineering and the data that you're capturing for your features at the level of user post and interactions between the user and the offer a post is also good. So you're not just thinking about like the items at the individual level but about. You're also thinking about like higher level features caught with the interactions between each item that that's good for the interactions. I did focus us a bit on this but I really wanted to stress test how well you could think about like grabbing data from multiple different sources and we we're covering clicks, reshares, comments, messages, likes all that is good and we're thinking about how to wait when I asked you like why weightages matter that this matters a lot because some interactions give you more data than others like clicks they tell like you can gather more information but a like or a time spent tells you more about the actual user engagement. Kudos on how to capture passive user information with what we call dwell time which is a time spent on posts. Yep the other one if you're kind of curious which they go over in Alex Zu's book is also skip time that's like everything that time that the users have spent not even on post but I'll just going around on the application example data. I had to clarify this a bit better. I'll make sure to take note of this but at least I understand how you generate your embeddings for multimodal content, thumbnails, images and post content. We're talking about Word2VEC or or using a pre trained models or trained two tower neural networks. You could have also talked about VIP or RCNN faster rcnn but all of that is good so at least some knowledge that you know how to convert non numeric unstructured data into numeric structured fixed size quantities. That's what I'm really trying to delve at handling up upstream bias, low volume of common Orisha data, down sampling, up sampling and even justifying why not to go for data augmentation model selection model design. That's good. I really like the discussion on a two tower neural network seeing how features go into a similarity layer then with a softmax for a binary classification beyond engagement and when I asked you to take us to a multitask dnn you were able to do that and show me like the different branches and also still justify that you're using a common context dense and concatenation layer to learn across each of these tasks because they are separate tasks click and comment but they share features. That's good. Also trade off analysis on random forest versus deep learning and why we go for deep learning seen as a de facto standard. The diagram I know that was a bit of a messy section because I think there was a bug with the whiteboard but I also want to say the prediction pipeline looks really good to me and if you diagram this I could see how this would come up and even the data preparation that also looks good. I think we can use some improvement There. But overall it looks good. Offline metrics. You have the metrics, but you need to tell me why the metrics. So if I'm probing you a bit here, it's because I just want to make sure you you're not just telling me oh, roc auc. Because it's standard.

Fluorescent Torch: Right.

Purple Brontosaurus: You can tell me why you did that and why you want to analyze false positive rate versus true positive rate or why you want to analyze F1 score. Because again, there's the cases of we want tasks with more focus on recall versus precision. Online metrics is really good. So you're thinking about session point wise and business. I added this manual survey. If I was just out of curiosity, like what else you could think of. But you're very comprehensive there. Additional talking points, how you get anti training data, how you select your loss functions. Those are also good. So overall good performance.

Fluorescent Torch: Okay, thank you. Thank you.

Purple Brontosaurus: All right, any other questions?

Fluorescent Torch: I want to ask about the pipeline, prediction pipeline and data prepared this kind of paragraph while making this graph, how important it is. Can we just type here or we have to make this graph.

Purple Brontosaurus: I feel like people like it when you can do this visually, but if something gets buggy visually, it's okay to type it out.

Fluorescent Torch: Okay. Yeah.

Purple Brontosaurus: I just know sometimes the platform has bugs.

Fluorescent Torch: Oh yeah. I just. I just have to get familiar with it. I would try it. All right, thank you.

Purple Brontosaurus: And yes, I think there's an option to connect at the end of the session. So feel free to connect.

Fluorescent Torch: Yeah, sure. Thank you so much.

Purple Brontosaurus: All right, take care. Bye.

Fluorescent Torch: You too. Bye. Bye.