We helped write the sequel to "Cracking the Coding Interview". Read 9 chapters for free

An Interview with a FAANG engineer

Watch someone solve the detect fraudulent and scam practices problem in an interview with a FAANG engineer and see the feedback their interviewer left them. Explore this problem and others in our library of interview replays.

Interview Summary

Problem type

Detect fraudulent and scam practices

Interview question

You have been tasked to develop a classifier to detect fraudulent and Scam practices at Facebook

Interview Feedback

Feedback about Atomic Parallelogram (the interviewee)

Advance this person to the next round?

Yes

How were their technical skills?

4/4

How was their problem solving ability?

4/4

What about their communication ability?

4/4

Strengths: Structured Approach: The candidate laid out a clear and comprehensive outline at the beginning, addressing data, features, modeling, evaluation, and deployment. Feature Engineering Depth: Demonstrated good understanding of relevant features, including user behavior (reports, message velocity, interaction with known offenders), content-based features (text extraction from images, sinusoidal encoding for time), and considerations for various data types (text, image, audio, hyperlinks). Model Awareness: Showcased knowledge of various models (e.g., CLIP, ViT, audio models) and discussed their pros and cons. Correctly identified the potential for combined image and text to be offensive even if individual components are not. Handling Class Imbalance: Proactively identified class imbalance as a key challenge and suggested appropriate techniques (class weights, thresholds, Focal Loss) and evaluation metrics (AUC-PR, prevalence, segmented metrics). Business Objective Connection: Attempted to connect the solution to business objectives like revenue impact (reducing human review), advertiser safety, user trust, and legal considerations. Clarifying Questions: Asked relevant initial questions regarding scale, latency, data types, and categories of fraud. Deployment Considerations: Discussed different deployment strategies (shadow, canary, A/B testing) and monitoring aspects. Areas of Improvement: Initial Simplicity & Iteration: The proposed solution was quite complex from the outset. Consider starting with a simpler baseline model and then iterating towards complexity. Order of Discussion: It would be more logical to discuss model selection and its rationale before detailing very specific features. Critical Trade-offs: Did not explicitly ask about or address crucial trade-offs, such as the preference between precision and recall for scam detection in this specific context. Exploring Baselines/Alternatives: Could have discussed the potential of an existing/old model as a baseline or the concept of negative labeling. Nuance in Feature Design: Some feature ideas (e.g., "interaction with known offenders" as a binary feature) could benefit from more nuanced discussion, considering that many legitimate users might be contacted by scammers. Questioning Problem Constraints: Could have proactively questioned the necessity of a multi-task approach if one was implied, or explored simpler alternatives. Probing on Business Impact: While business objectives were mentioned, further probing on specific engagement metrics or deeper business impact could have strengthened the discussion. Attaching a template that I recommend for ML System Design 1. Clarification • Scope & Scale: Determine if the solution needs to handle real-time data processing or if it can be batch processed. Understand the scale in terms of data volume, velocity, and variety. • Business Objective/Key Metrics: Clearly define the business problem and the success metrics (e.g., accuracy, latency, throughput, user engagement). 2. High-Level ML Approach • Type of Learning: Decide between supervised, unsupervised (e.g., clustering), weakly supervised learning, or other approaches like reinforcement learning. • Trade-offs: Discuss the pros and cons of different approaches, considering factors like interpretability, complexity, and scalability. 3. Training Data • Data Collection & Logging: Understand the sources of data, frequency of logging, and the nature of data (structured, unstructured). • Data Imbalance: Address any class imbalance issues and the impact on the model's performance. 4. Feature Engineering • Feature Selection & Transformation: Discuss the process of selecting, creating, and transforming features to improve model performance. • Domain-Specific Features: Consider the inclusion of features that are particularly relevant to the business context. 5. Model Development & Training • Objective & Loss Function: Choose the appropriate loss function based on the problem (e.g., cross-entropy for classification, MSE for regression). • Hyperparameter Tuning: Consider factors like learning rate, batch size, model complexity, and regularization techniques. • Trade-offs: Balance between bias and variance, underfitting and overfitting, and address challenges like cold-start problems. • Model Bias: Consider potential biases in the model and their impact on fairness and ethics. 6. Evaluation • Metrics: Use appropriate online and offline metrics to evaluate the model's performance (e.g., precision, recall, AUC for classification, RMSE for regression). • Debugging & Holdout Sets: Ensure robust validation by using holdout sets and perform thorough debugging to catch potential issues. 7. Deployment • Real-Time vs. Batch Processing: Discuss the deployment strategy in terms of real-time predictions versus batch processing. • Monitoring & A/B Testing: Implement monitoring to detect issues in production, and use A/B testing to validate model improvements. • Continuous Learning: Consider the need for continuous learning to adapt to data drift and maintain model accuracy over time. Discuss potential problems if continuous learning is not implemented (e.g., model degradation due to concept drift).

Feedback about Nefarious Shadow (the interviewer)

Would you want to work with this person?

Yes

How excited would you be to work with them?

4/4

How good were the questions?

4/4

How helpful was your interviewer in guiding you to the solution(s)?

4/4

Interview Transcript

Nefarious Shadow: So this will be a machine learning system design interview. The way generally it works is that I want to know a little bit more about you, what's your background, and if you have any interviews coming up and what companies might they be. So I might be able to tailor my feedback accordingly. And generally the system design part lasts around 40-45 minutes and I can do some feedback post on that or if you want some people like to get feedback as we go. So, yeah.

Atomic Parallelogram: Yeah, I'll give you a quick background I have a PhD in physics. I graduated about three years ago and I have my on-site with Meta next Monday. So that's what things are looking like. As for the feedback, do you find one works better or other, like during or after, do you think one's better than the other?

Nefarious Shadow: So it depends. So if you feel that you are ready, then it would make sense that we just kind of take it like an actual interview and then give you feedback in the end. But if you feel like you are not really sure what type of things might be asked or you don't feel like very well prepared, then generally it's better that if we take the feedback as we go. So you don't waste time trying to just work through the problem without really feeling confident about it.

Atomic Parallelogram: Yeah, I understand. Maybe we'll try to wait for the end and if I change my mind partway through, I'll let you know. Also, I was hoping we could do, as I understand, there's two main problems, recommender systems and harmful content. I've already done some rec rexes Mox. So if we could do something with harmful for content, that would be good.

Nefarious Shadow: Okay. Yeah, that's. That sounds good.

Atomic Parallelogram: What did you have in mind, by the way?

Nefarious Shadow: I had recommenders.

Atomic Parallelogram: Yeah, but.

Nefarious Shadow: Yeah, that's harmful systems, but because that's actually what I work on, so this might be. Usually people don't get that type of a question, so I don't really pick that. But if you have been seeing the trend that has been asked, I can pick up that question and we can go through it. All right.

Atomic Parallelogram: So one quick question before I forget. What's the likelihood of each problem, would you say?

Nefarious Shadow: So it really depends on For example, like who does your interview? You probably might have the names of the people who are doing your system design interview. If you look at their work, what they're doing right now, if they work with advertisement, which is probably where most of the hiring is happening, or if they work within Instagram or something, then you can be more likely that you will get like a recommender system interview. If you see somebody with like a description like, you know, like working on content moderation or something like that, or any type of content problems, then you will most likely get harmful because they want to kind of be asking questions that they are more familiar with.

Atomic Parallelogram: Got it. Yep, sounds good.

Nefarious Shadow: Okay, so yeah, so we can kind of start with one blinder. So. I think. So this is kind of your job, you come as a machine learning engineer and. The first thing that you might want to end up doing is you want to develop something that can pick up, let's say with some confidence that this might be a fraudulent or this might be a scam content so that the content can either be taken down or be used kind of for manual review or just be like kind of not recommended to users depending on the use case and the score that the classifier produces. All right.

Atomic Parallelogram: So should I go ahead and get started then?

Nefarious Shadow: Yeah. Okay.

Atomic Parallelogram: So we'll start off with some clarifying questions. So first, let's just think of scale. How many users are there and how many posts per day and what's the historic history of fraudulent posts per day?

Nefarious Shadow: So the history can be that there are approximately, say, one from the past that we have seen using human sample, let's say maybe like two percent of the content falls under fraudulent or scam practices. And the total volume of content is in on the daily basis can be in like in billions.

Atomic Parallelogram: Billions?

Nefarious Shadow: Okay. Yeah, tens of billions. So the total volume. Yeah. And 2% of that is functional.

Atomic Parallelogram: Got it.

Nefarious Shadow: Okay.

Atomic Parallelogram: And then for the latency, how fast does this need to perform?

Nefarious Shadow: So for the latency, we can assume. That. I would say like probably no more than half a minute, like 30 seconds.

Atomic Parallelogram: 30 seconds, okay.

Nefarious Shadow: Yeah.

Atomic Parallelogram: And then just a couple more quick questions regarding the data and the labels. So as far as the data is concerned, are these like, you know, posts like on Instagram or are they like something on like Facebook Marketplace? What's the context of this?

Nefarious Shadow: So you can think about it. This is on Facebook and this would be posts that the people do which includes some type of media and also the comments that people leave. Right.

Atomic Parallelogram: Yep. And so presumably there's going to be, there can be like images, hyperlinks, videos, text, yeah, okay. And then for the labels, I can think of at least two ways to generate labels from user reports and human annotation or what's the, is that fine to assume I can use this to generate labels?

Nefarious Shadow: Yes, so we do have some of the people who are hired to do reviews and we do also get user reported content. So. That is also a source which can be used. Generally, user report content goes for a review, like a manual review.

Atomic Parallelogram: Got it.

Nefarious Shadow: All right.

Atomic Parallelogram: Well, I think I'm ready to get started then. So just start off with some business objectives. So this one, unlike the recommender system, doesn't have as direct connection to revenue, but it still does have some. if you have a better system.

Nefarious Shadow: That.

Atomic Parallelogram: Can reduce the need for human annotation. That's one way you can improve the revenue. The other is by promoting a safer environment for advertisers. They don't want to advertise on a platform where there's a lot of negative content. The other is to improve the trust and safety of the users. And finally, I can think of a third reason, which is, you know, there could be some legal, you know, things going on where harmful content not being suggested to minors and things like that. So first, I'll just give a kind of an outline of the system as a whole. I don't know if I can draw anywhere here, like a whiteboard.

Nefarious Shadow: You can, there's a toggle whiteboard. If you want to just do like a rough boxes, that's fine.

Atomic Parallelogram: Okay, yeah, maybe I'll just draw it out real fast. You can see my screen, right?

Nefarious Shadow: Yep, I see it.

Atomic Parallelogram: Okay, so you're gonna have some user, they're gonna interact with the client, which is Facebook to make a post. this is going to come down into this.

Nefarious Shadow: Service. Right?

Atomic Parallelogram: And it's going to store the post and the associated data, the metadata and content in our feature store. And there will also be some trained model that it sends the features to, the classifier, which I'll talk on in a second. And then on the output end, there's going to be a few different situations. So if the post is clean, just send it back to Facebook. post it so that the user can see it. But then there's a few different situations. So you could have a situation where you have a post with low confidence that it's harmful. Here there's a variety of things you can do. You can demote it to limit the spread of it. you can flag it for further review by humans. And I think maybe it might even make sense that when you retrain the model, actually maybe this will be not for low confidence. Let me come back to that point in a moment. Okay, so that's for the low confidence ones.

Nefarious Shadow: For.

Atomic Parallelogram: The posts on which you're highly confident, you can just immediately leave remove.

Nefarious Shadow: And.

Atomic Parallelogram: You also want to in this case, you know, tell the user why it was removed. And also, maybe I should have asked this and I forgot to ask this in the clarifying section, but for the fraud aspect, I'm assuming and scam, are there like different categories of these or is it just kind of can I just assume it's one kind of fraud, one kind of scam or are there like maybe crypto scams?

Nefarious Shadow: Yeah, there are multiples. Like we can assume there are multiple but if the type of them that shouldn't be of a concern. But yes, there are multiple Scams and frauds. Okay, yeah.

Atomic Parallelogram: So then in that case, when you do, you know, identify that it's a scam or fraud, you want to notify the user for, you know, why that was the case. So that's kind of the high level design of what the system will look like. So let me just give a brief outline of some things I want to go over. So we're going to have the data. So, you know, features, labels, feature engineering, right? Then we're gonna have the, you know, modeling aspect. So, training it, evaluating it, and then deploying the model and monitoring the model. I feel like I'm forgetting something, but Maybe I'll figure out as I go along.

Nefarious Shadow: Sure.

Atomic Parallelogram: Data model. Yeah, so this is, you know.

Nefarious Shadow: This is.

Atomic Parallelogram: What it looks like. So let's think about what the data looks like. So we'll have, of course, three sources. We'll have the user, And then this is some kind of Facebook post, right, for the item. And then we'll have user post interactions. And we'll go through these.

Nefarious Shadow: One by one.

Atomic Parallelogram: So for the user, there's different kinds of data associated with them. There's demographics, there's contextual information. There's also social information. So let's go through those. So we'll have the age, gender, location.

Nefarious Shadow: What else?

Atomic Parallelogram: Date, gender, location.

Nefarious Shadow: It's fine. You can imagine they're, yeah, so we don't have to really dig into all the features. Yeah.

Atomic Parallelogram: And so for the context, we'll have device, time of day, things like that. Social, so this will include things like number of reports they've received in the past, right? A lot of bad accounts tend to spam messages out, so you could use a feature like number of messages sent in last n seconds. N can be varying. You could have one for the last 30 seconds, the last minute, last hour, whatever. Because I might expect that bad accounts that are spreading spam, or not spam, but fraud and scams would be sending out a lot of messages in a short amount of time. Another one would be interactions with known offenders. And let's see. Maybe I'll come back to this, but I think that should be enough to get started. For the post, we're going to have of course some text associated like the post itself, right? We're going to have an image associated with it, or we can have an image rather. And here it's important that the image, this can contain text itself. You can extract the OCR. So this is often the case in memes. There can be video, audio, and then I'll just include one last one. some kind of hyperlink. And then for the user post interactions, again, we could have number of reports, which, you know, number of reports.

Nefarious Shadow: Number.

Atomic Parallelogram: Of comments, number of likes, the actual comments themselves. And I think that should be a good place to start. Now let's go through and process these features before we feed them to the model. So for the age, you can handle it a lot of different ways. One option is to bucketize it and then one-hot encode it. So here you'd have 0 to 18 as a bucket, 18 to 24 and so on, and then one-hot encode it. Same with the gender. Location, usually you get this as latitude and longitude. And you definitely want to be aware of some privacy concerns here. You don't want to have their exact location. But in any case, since there's going to be so many different locations, it might be best to use an embedding layer here. For the device, whether or not it's They're on a mobile device or a laptop. This shouldn't be a high cardinality feature. So you should be able to use one hot encoding. Time of day, there's a few different things you could do. You could of course one hot encode it for morning, afternoon, and night. I think a better option is to use a sinusoidal transform here because 11:00 p.m. is going to be closer to 1:00 a.m. and that takes it into account. Number of reports, I think maybe the best way here would be trying to think if it's going to have a skew is what's going through my mind right now. Maybe this might be best to just bucketizing one hot encode. So you could have, you know, zero is not a lot of reports, one is a lot of reports and so on. You could also do something maybe like Spontaneous binning here. So number of messages sent in the last n seconds, I think here probably just use some kind of standardization. And interactions with known offenders. Maybe it's simplest to just have this as a one or a zero, basically. so just one hot encode that, like one if they've interacted with a known offender and zero otherwise. Okay, so now let's look at the post itself. So I guess I should have asked previously, but is this model global or multilingual?

Nefarious Shadow: Yes, so it generally works all across the world, but I think for starters, let's say it's we are focusing on North American market and only on English.

Atomic Parallelogram: Got it. Okay, well in that case.

Nefarious Shadow: Since we.

Atomic Parallelogram: Only have to use English, and since this is a social media, platform, there's probably going to be a lot of emojis. And in this case, using some kind of pre-trained model that uses byte pair encoding, probably will handle the emojis better. It usually works better with out of vocabulary stuff. So here, use clip and We could probably just combine the text in the image and also the text associated with the image and just pass them all through CLIP. So that'll take care of.

Nefarious Shadow: These.

Atomic Parallelogram: And a few quick remarks on this. So of course we're using pre-trained models which are our general purpose, but we can fine tune if we need to. All right, we should have enough data given the assumptions earlier. And there's also some pre-processing we need to do, right? So we need to, you know, remove leading, trailing, whitespace, lemmatize, things like that. We also need to tokenize and clip is a, I believe it uses a byte parent coding for the tokenizer, so we need to use that. And then for the image aspect of.

Nefarious Shadow: It.

Atomic Parallelogram: I don't remember, I think there's some kind of vision transformer that uses for clip, but I do know you, in any case, you're gonna have to resize it, you have to normalize it, and you're gonna have to scale it appropriate to the distribution it was trained on. Now let's address the video. You can use, you know, pre-trained models to just do the video directly, I think it might be better to just extract every N images and you could treat it as a hyper parameter. And then you could use CLIP. Maybe here you just use ResNet. And again, you have to resize, normalize, and scale the images and then aggregate them. So one option to aggregate them is just simply take the average, but there's other options as well. For the audio, Wave2Vec is a, you know, because if there's a video posted or, you know, maybe just a standalone audio file, but there will also be often audio associated with the video that you want to pick up on. So here we can use Wave2Vec. I don't think you need to do a lot of pre-processing. I think the main thing is you just need to down sample it to 14 kilohertz. And then for the hyperlinks, this one's a little bit more difficult. Just starting off simply, you could have a blacklist, right? So for sites that you know are bad, you can just use a rule-based approach here. Another option would be to use some tools like Selenium Beautiful Soup to extract metadata and also like the homepage image. So you go to the site, take a screenshot and then you know, use that image information. Also, another thing you could do here at this point is like some kind of object detection. Maybe I'll have, you know, to identify, actually no.

Nefarious Shadow: Yeah.

Atomic Parallelogram: So anyways, and then for the user post interactions. So here, oh, I already have them. So, yeah, for the number of reports. I think standardized. Actually, no. Let's stick with what I did earlier. So bucket ties plus one hot encoding. Actually, yeah, whatever. Number of comments. Yeah, I expect this to actually be pretty skewed. I'd expect, yeah. So here, maybe do something like a log transform. And actually the same for here, do a log transform. And then for the comments themselves.

Nefarious Shadow: Probably.

Atomic Parallelogram: Just use Clip from earlier to process all the comments and then aggregate them. All right, so that's enough for the data. think about the actual model itself. So here we have a decision kind of immediately on whether or not to do early fusion versus late fusion. So late fusion is, you know, you have, have a.

Nefarious Shadow: What?

Atomic Parallelogram: Actually, you said there's different kinds of scans, right? Yeah, multiple kinds. So, yeah, so I guess if there's, like, a crypto scam versus, like, I'm drawing a blank on the other kinds of scams that could be. Maybe a bank scam or something like that. You could have a model for each kind, crypto scam, maybe like a cash app kind of scam and so on and so forth. And so the pro of this is you can train each model independently. But the con is you need to maintain many different models, which requires a lot of infrastructure and compute. You also have to dilute the data a bit across.

Nefarious Shadow: Each.

Atomic Parallelogram: It should give me a second to think. Yeah, so with late fusion, you build a model to detect each category and then combine the results at the end. But yeah, so maintaining all these different models is going to require a lot of compute infrastructure. The other big drawback is that.

Nefarious Shadow: You.

Atomic Parallelogram: Can'T learn joint distributions. So it could very well be the case that, you know, like maybe attack the text alone is fine. The hyperlink, no, no, no, no. How would that apply here? Give me a moment to think.

Nefarious Shadow: Check. Text.

Atomic Parallelogram: Maybe this wouldn't apply here. I'm drawing a blank on maybe I'll ask you to step in actually for a moment. I changed my mind. Is there the potential for multiple things to interact to produce something harmful here?

Nefarious Shadow: What do you mean?

Atomic Parallelogram: So like in the case of, you know, to try and say if someone makes a Facebook post and you're trying to detect whether, like, if you have a meme, right? So when you have a meme, the image by itself can be safe, the text associated with the image can be safe by themselves, but when you combine them, it's bad, right?

Nefarious Shadow: Okay, so if you look at them. In. By themselves, they're fine. But when you combine them, then that becomes like a offending thing. Yeah.

Atomic Parallelogram: Let me give you an example. If there's text that says, I hate these parasites and the image is a bunch of parasites, well then that's fine. But if the image is a person of a certain race, then you would deem it racist, right?

Nefarious Shadow: Yeah.

Atomic Parallelogram: So is there an analogous situation here?

Nefarious Shadow: I think yes. So there will be some type of content where just giving you an example, like maybe somebody is offering, maybe like you're in a bank, right? And you're offering some loans to people, right? So which is fine because you know you're a bank and you send messages or you post about like giving out loans. But then there might be a case. Where. There'S a post about offering loans. And. The image just contains maybe like money in it and just says maybe like quick loans. call approved, but the texture says we are offering loans click here to apply. So, I mean, there can be cases where like the picture combined with the text might actually become a target, but by themselves, for example, just like a picture of a lot of cash is fine.

Atomic Parallelogram: Yeah, yeah, yeah.

Nefarious Shadow: But if you are showing like people throwing cash in the air, that's fine. But if you're like, I'm going to give you loan and then that's a picture associated with it. And generally a bank wouldn't do that. That sounds like a scam.

Atomic Parallelogram: Yeah, great example. I think that's pretty good. All right, so I'll continue on from there. So okay, then in that case, you wouldn't want to use late fusion for that reason. So you can't learn the joint distributions. So okay, so then you have the option of early fusion. So here, instead of training a model on each kind of scam, you concatenate all of the features above immediately and then train a single model. Go back to the whiteboard. Start to zoom in on the model itself. So, actually, let me just draw it. So, there's a variety of different options as far as the model is concerned.

Nefarious Shadow: So.

Atomic Parallelogram: You know, you could just have logistic regression. There's problems with this at a, there's a few different problems with this. One is, you know, it's a binary classifier, so it can't identify the kind of scam which may or may not be important. When there's, you know, collinearity in the features, it's known to struggle. Often it doesn't perform as strong as other models. And lastly, it assumes the data is linearly separable, which is often not the case. That does have some pros. It's fast, scales well, and so on. You could instead consider something like a gradient boosted decision tree. So some pros here include minimal feature processing. You don't need to pre-process it like you do for the other ones. It's fast, scales well, But there are some cons. So one con in particular is no online training. So you have to retrain the entire model. You can't continuously train it, which is a big drawback. Another one is usually doesn't perform as good as something like a neural network when you have a lot of data, which is the case here. And then before I forget, let me make a note for class imbalance. So you want to probably use some kind of neural network approach. So it kind of alleviates a lot of the problems before. You're very free in the input layer. It is very flexible. You know, you can set it up to be multi-class, multi-label, multi-task, so on and so forth. you know, continuous learning and of course it performs well when you have a lot of data. And so let's kind of briefly go over what the model might look like. So yeah, so you know you're gonna have the input features Here, it might make sense to run them through kind of a lightweight MLP to preprocess them, and then-- so we'll talk about like.

Nefarious Shadow: A light with MLP from the feature. So these are the features that will be the embeddings that are coming out of the model that you describe, like SLIP or something like that.

Atomic Parallelogram: Oh, yeah, sorry. I didn't mean for this arrow to be coming from the feature store. But sorry, what was your question again?

Nefarious Shadow: Yeah, yeah, I just wanted to know what are the inputs to this ML lightweight?

Atomic Parallelogram: Oh, yeah, sorry, sorry. Yeah, I guess it could come from there. The user features, sorry, I wasn't clear about that. The post features and then the user post interaction features will be the input to the model. And this will just learn some kind of embedding. And from here, I think, maybe the best way is to have Do multitask where you have some kind of head for each kind of scam, whether it's crypto scam, cash app scam, whatever, cash app, bank related.

Nefarious Shadow: Right.

Atomic Parallelogram: And so you feed all these features, feed these features to each of these models. And then so each class, let's talk about how to train it. So you're going to have some sigmoid activation at the end. And then for the loss function, could use something just since each one of these is a binary classifier detecting it, you could use something like binary cross entropy or, you know, normalized cross entropy, which is just binary cross entropy dividing by the average. But there is a class imbalance, so there's a few different ways we can handle it here. So for the imbalance, you can use class weights. So like you said, 2% of the posts will be bad. So you could use class weights. You could tune the threshold. But I think maybe a better option at this stage would be to use the focal loss to handle the class imbalance. And you apply the same procedure across them all. and then at the end, the, the final loss is just the, the sum of the, the losses from before.

Nefarious Shadow: And.

Atomic Parallelogram: Yeah, so the inputs will be, you know, like the, the, what I described earlier, and then the outputs will be the, the label for each one. It'll be a vector of, like, 1 1 0 or 1 0 1 0 0 1, so on and so forth. And then probably use standard things like Rayleigh activation, drop out regularization, things like that. So the class imbalance we addressed, before I forget, let me write this down. Actually, maybe I'll just bring up now. Yeah, there is going to be some kind of label bias. So from what I've seen, this is actually a really hard problem. If you have three different interviewers, not interviews, three different people, Label something, they might, well actually in the case of scans, I think it should probably, maybe it wouldn't be, in any case maybe we'll come back to that.

Nefarious Shadow: Actually, yeah, let's talk about that a little bit. So. You'Re talking about label bias where there's multiple people who might label the same content differently. Is that what you're talking about?

Atomic Parallelogram: That's right.

Nefarious Shadow: Okay. So. How do you think we can kind of handle that? Yeah. That's a real problem.

Atomic Parallelogram: I guess in the case of, you know, say maybe. Maybe here's an ideal, right? So in the case of, like. someone pushing some stock. All right. Would that even be a scam? It's hard to say. Like, say someone's saying, buy this, this nft meme coin that they're gonna pump and dump.

Nefarious Shadow: Right.

Atomic Parallelogram: Like, that's a scam. Maybe that would be a.

Nefarious Shadow: There's a difference between, I think, like, I think the, the thing that you're describing here as, as knowing, like, you yourself are having trouble. trying to identify, like, would this be a scam or not? So. How do you identify in your model? Like, how do you train a model so that it can differentiate between scam and scammy posts? That they look scammy, but they are not technically a scam. They might be down the line, but with the information that you have right now, you cannot really make a decision?

Atomic Parallelogram: I'm guessing maybe the first thing that comes to my mind is the output probability. So if the model outputs like 0.5, right, then that suggests it's not confident one way or the other on whether or not it's a scam.

Nefarious Shadow: Okay.

Atomic Parallelogram: I think that's, maybe the best I can come up with right now.

Nefarious Shadow: Is there something we can do on the labeling side that might help with it?

Atomic Parallelogram: Oh yeah, you could. You could assign like a confidence maybe to the labels. So when you're actually going to do the labels, you know, if you're very confident in the scan, you could label, assign it more weight. and the loss function. I'm guessing that's what you had in mind.

Nefarious Shadow: Something like that, yeah. So it's more or less what I was thinking was having multiple people review the same content and pick the majority, you know, so something like that. So. We kind of like try to tune out the bias there by showing the same thing to more than one person. That's one of the things. But I think in the end that turns out to be, can be like a label weight because the higher percentage of the people that mark this as a scam, the label rate goes up.

Atomic Parallelogram: Right, right, right.

Nefarious Shadow: Okay.

Atomic Parallelogram: Should I continue?

Nefarious Shadow: Yep.

Atomic Parallelogram: All right. So, you know, now we've got this model built, let's think of some kind of offline ways to evaluate it. So, since it's a binary classifier, you can use, you know, the classics, accuracy, precision, recall. Accuracy, not good because Imbalance, recall, precision, those are fine. In this case, probably want to favor recall over precision.

Nefarious Shadow: You.

Atomic Parallelogram: Want to catch all of the scams, right? You don't want to let any through. And then the problem with these, though, is that they're dependent on some kind of threshold. So a metric like the area under the curve of ROC is a bit better because it operates over multiple thresholds. And I think even better than that is the one I'll go with is the area under the curve for precision recall. So this tends to perform better when there is imbalance, which we have a very large imbalance. And then finally, I think the last step would be to just discuss the deployment and online metrics.

Nefarious Shadow: So.

Atomic Parallelogram: First, I think it's best to start with coming up with how you want to monitor it first before you deploy it.

Nefarious Shadow: So here.

Atomic Parallelogram: Assuming users can report.

Nefarious Shadow: These.

Atomic Parallelogram: Scams, maybe something you would want to log is the number of reports. And you might even want to normalize this by the posts per day. If you have more posts, you might expect more reports. Here you might want to use something like Proactive. how many scams you identify before they're reported. Although since we're assuming, what was our time? 30 seconds? Okay, yeah, that's fine. And then finally, you could have something like prevalence, like, but this isn't, so prevalence is, how many scams per posts are persist on the platform. One problem with this is it doesn't take into account how popular it is. So if you have a bunch of scams but they get no views, no impressions versus one scam that gets a million impressions that matters a lot. So here you might want to use something like kind of the harmful impressions idea where it's how many people interact with the scan. And then let's think, now that we have these metrics in mind, let's talk briefly or talk about the deployment. So here there's a variety of options. You could use something like shadow deployment. So here you deploy both models in parallel or not. You deploy both models, but you only serve the old. Problem with this is a lot of infrastructure, a lot of compute. You could use a canary deployment where you deploy to a certain demographic. And also with that in mind, let me backtrack just a bit. On these evaluation metrics, It could be the case that there's a bias present over some demographic. Maybe this model performs better over-- or not guys, what am I thinking? Cash app scams. Maybe it's really good at identifying cash app scams, but not the bank scams or telegram scams for that matter.

Nefarious Shadow: Right?

Atomic Parallelogram: So you'd want to segment.

Nefarious Shadow: You know.

Atomic Parallelogram: The evaluation to see where it's performing good and bad. So for the canary deployment, you want to deploy to certain demographics or maybe certain regions, but there's some bias, as I just mentioned. Finally, the kind of standard way is to do some kind of A/B testing. And in this case, since we have a very, you know, sensitive thing we're dealing with, it'd be probably best to gradually roll it out. You don't want to just deploy the model, it not perform well, and then expose a bunch of people to harmful content. So you might want to start off with a 90-10% split, and then once you gain some confidence, slowly roll it out to 50-50.

Nefarious Shadow: What metric are we using here to if we can roll out or not.

Atomic Parallelogram: Thank you. I think, of course, you want to take them all into account, but I'll probably. I think harmful Impressions would be if you had to pick one. You could just weight them, too. You could take some linear combination of these as well. Um, but I think harmful impression makes the most sense to. To use as our metric.

Nefarious Shadow: Okay.

Atomic Parallelogram: And I think that. I think that's it.

Nefarious Shadow: Okay, cool. I think overall, I can say you did a really good job, especially given, um, you just finished school. Um, you don't have, like, a lot of. Experience. Generally this kind of thing I would expect from somebody who probably has at least like maybe like seven to eight years of experience. Also probably has experience in this field because the kind of things that you're talking about do seem like very specific to this type of a problem. So I just wondering like do you work for this kind of thing or do you just happen to study about this during your time?

Atomic Parallelogram: I mean, I have a background in ML and I've never had a real job in my life. I have no real formal training in ML. I did my PhD in physics, so I don't know if that answers your question.

Nefarious Shadow: No, I think that's, yeah, so I think you did really well. So probably, I think from all the, things that you have read so far, you do have a really good grasp of the product area as well. So it's not just ML because there's a lot of people who have like really good ML, but it's good to see a person who can relate the ML back to the product. So like, you know, like that comes to show like what kind of metrics do you pick? For example, like, the metrics about prevalence and harmful impression these, as someone who hasn't worked with this type of content probably wouldn't have in their mind. So I think that's something good. That. You identified, like the biases as well that might be here, especially like the label bias. That's something I really haven't really heard somebody talk about before outside of work. And. I think you're aware of like handling the different type of content. Especially. Like the hyperlinks and the audio files. I think that was good. People usually don't mention that. they usually stick with the image and text. So I think that's a big plus. And I think some of the areas, I think probably I also wanted to mention that regarding the class imbalance, I think that the loss type that you mentioned, focal loss, I think that was very insightful as well. Yeah. So I would say as it comes to improvement, I think if I were to say one thing that you might want to work on would be asking a little bit more questions and starting simple. For example, you did talk about the shadow deployment. we have both models. I never mentioned we have an old model.

Atomic Parallelogram: You mean on the deployment aspect?

Nefarious Shadow: Yeah. So you didn't ask, do we have a model? We don't. So my mind was like, we don't have any model.

Atomic Parallelogram: Got it.

Nefarious Shadow: One of the things that I would have like, you to kind of ask would be regarding the metrics, maybe towards the beginning as well, like business or that is right. So you did talk about revenue here. It's a little, I would say probably like it's useful if you try to fish this information out of the person's first. So you might want to like start off with like so what kind of objectives are you looking at here? So they might have something in their mind they might talk about or they might just throw the question back at you, like what do you think? But it's always good just try it out, just ask them. Sometimes you might get an answer. For example, in this type of a problem, the general highest concern is user engagement, right? So. We want to make sure that besides the legal thing, you know, that's obviously there. So we've got trust and safety, that's fine. But for like some tangible thing, like you want to make sure that the user engagement, you know, like remains high, that we don't impact that in any way, you know, like too many false positives. might discourage people to post stuff. And too many false negatives might just make the platform full of full content where people don't just don't enjoy coming back to it anymore. So. Kind of like trying to face this thing, talk about both things. Usually, even if you're like, I think you're talking about the recommender system, even in that, you know, I just. do mention both of the things, revenue and user engagement. And usually people will ask you a question like how do you measure user engagement or how would you measure like the revenue impact? So be ready to talk about like a tangible metric that you can mention at that point. But I guess I will measure maybe some posts click or user posts by the day or how many likes are happening. You don't have to make up the exact best metric. Just make something up that sounds reasonable.

Atomic Parallelogram: Got it.

Nefarious Shadow: And I think the next thing goes on is the metrics that we were talking about. So. We talked about evaluation, like how we evaluate the model. You were talking about how do we, like we would favor recall. I think that would be a question to ask. Like what are the goals here? Are we trying to reduce the volume of the scams? How are false positives more expensive than false negatives? There might be a case where we might be, well, you know what? If you have too many false positives, if you're focusing solely on recall, we might be taking down quite a lot of content that might cause a little bit friction between users. So, you know, we want a really high precision model. we just, we want like it to be, it's fine if you get less stuff, but we want it to be very, very confident. So I think this is the thing that you could have asked that would have come out and might have influenced maybe the way you select the metrics and also your training rates, because if you're favoring Precision over recall, it changes the way that you assign weights because now you want the model to be more confident in, like I say, having a really high precision for the negative class. But it kind of shifts that you want to be making sure that it is very confident in picking the negative class at the cost of the positive one. because we want two labels to be really true. So. I think this is one of the things. I would say in terms of simplicity, the model structure that we talked about, I think you kind of went on with the-- you didn't mention that we might do multi-task model, but you were not sure if we need it. ended up doing it anyways in the design, you kind of start to break it down. Like we have this, we have like a model for doing XYZ scans. Again, that might be a good question to ask, like, do we need to break it down or how do you. Think. In the business case, like, do we have cost? Like do we have money to be able to train all these models or are we constrained? And that's fine, you might be able to say we don't know, how do you think about it? So generally I would err on the side of simple and mention how complex it could have been and then go down with the simple. It's like we just have a binary model which just takes in all these features into like a one NLP classifier and we have like a binary score. The benefit of that is that it is very easy to read. So if you are supplying the score to maybe let's say Facebook or Instagram or WhatsApp, it's much easier for them to be like, okay, I have this content, they gave me back a score. and based on that, I can decide if I want to demote this content, remove this content, or report this content for manual review. But if you have multiple scores for, let's say, 10 different type of scams, then that becomes a little bit of a problem. So I would say just start very simple and then mention that this could have been if we had more time, if we had more resources. But try to follow the simple path. If the time you feel like there's a lot of time, then always just go back and talk about things that you mentioned that I could have added XYZ here. So that generally makes the flow a little bit more easy as well and then gives you a chance to finish the question right on time and then add stuff on top of it. So just in case, let's say the time ran out, at least you have a basic approach ready to go. So it's more of a, I would say like a time optimization strategy when you're doing the interviews. So. It'S, yeah, so I think that's mainly it. I would probably say like, try to ask a little bit more questions, try to fish as much as you can from the interviewer and try to go simple first and mention about the things that you could have done, like, you know, pros and cons, but just follow the simple path and then go back to it if there is time. But I would say overall, this was very good. I just wanna know, like, what level are you? Applying for?

Atomic Parallelogram: Yeah, thanks for asking. It wasn't specified. I have no idea what my level will be.

Nefarious Shadow: Was that it? Okay, so this is a first grad. So you finish your graduate school PhD and then this is your first job?

Atomic Parallelogram: That's right. I finished almost three years ago, about two and a half years ago. Meda contacted me a few months ago for the interview and yeah, this is the only interview I have. This is basically the first interview I've ever done post-graduation. I started a company of my own two and a half years ago and I've been working on that since then. So this will be my first job.

Nefarious Shadow: Okay, yeah, so this is, I would. Say. I would probably think it would be maybe like a IC4 since you have a PhD and you have some experience even if it is working for yourself. And I would probably say like this would be like a green light or IC4 to go ahead on Meta. Definitely.

Atomic Parallelogram: Gotcha. Just a few quick questions from me. Where do you work and where country are you in? obviously you don't have to answer but I'm just curious.

Nefarious Shadow: I work in Metta.

Atomic Parallelogram: Fair enough.

Nefarious Shadow: Yeah, so I'm in the US.

Atomic Parallelogram: You're in the US?

Nefarious Shadow: Gotcha.

Atomic Parallelogram: And then my last question, you know, I haven't really spent a lot of time on the job hunt right now. Things don't seem quite good. This is the only interview I have. and they contacted me. I'm just curious if you have any advice on how to like get referrals or land more interviews. I spent so much time preparing for this interview that I'd really like to add some companies, you know?

Nefarious Shadow: I think like this is, this is a very, I would say like a hard time to be honest. I've had this question come across like multiple people that how do I, land jobs because I'm not even getting contacted. And. I would think that it's generally comes down to as number one is the type of the role that you have. So machine learning is really, I would say, in demand right now. So it, which is good. so that's your specialty. And I think you probably will fare much better than any other person who doesn't have an expertise in ML. And I would say besides this, the only other people I would say would be not having a problem would be some people who are specialized in very niche fields, ML or non-ML. and I think if you want to kind of increase your chances of getting more companies approaching you, one best way is to have experience working somewhere, which is known in the industry. There is, I have seen a big trend where people without experience or who are just like getting out of college now and having an extremely hard time. But people with experience don't have these similar kind of problems. It's because the companies are trying to kind of hire more experienced people, especially in the times when the economy is not very stable, so they don't want to take a risk with somebody who doesn't have a track record. So it's more or less like trying to play a little conservative with hiring, especially in this time. So there isn't really too much that you are not doing or you are doing wrong. It's just the time that is happening right now. And I would just say apply. as much as you can. The other way is if you know somebody, if you have friends, referrals do significantly boost your chances of landing an interview. You can meet people in maybe some type of like tech talks or something might be happening in your local area, which are not like don't pay to go to like expensive conferences, but generally better to like go to like a smaller TechTalk where, you know, it's like a small group of people which is lands you a better chance of meeting somebody, talking to them about your work and maybe getting a referral. So, so I think like I would just, I would just say probably focus more on like a small meetups that are happening in an area regarding your field that you want to work in.

Atomic Parallelogram: Yeah, that seems like solid advice. To be honest, I might just wait for the market to cool off because I really are not cool off, but to come back because it's just, it's, you know, I feel like if I could get a couple interviews, I would pass at least one of them. It's just, it's very hard. I don't even know why Meta contacted me in the first place, to be.

Nefarious Shadow: Honest, but they have a hard push, big, big hard push. And I think they are probably the only large company right now that is hiring that aggressively. I don't think there's any other company hiring that much.

Atomic Parallelogram: Yeah, that's the impression I got as well. There's some people I've been studying with and I haven't seen the offers come in at a pretty high rate among my study group. So that's for Meta specifically.

Nefarious Shadow: So yeah, I think they have a very high, I would say, firing rate going on right now. There's a lot of pressure, especially on the recruiting side. to get as many people as you can. Teams also are growing significantly. That's one of the reasons why they are really, I think that's a good time to hire because they know that not many other companies are hiring so they might, they'll have a good big talent in the market. they don't have to compete with other companies as much. So financially, it makes more sense for them to hire right now and nobody else's.

Atomic Parallelogram: But, yeah, that makes sense.

Nefarious Shadow: Yeah, but it's not good for the person on the other side because you don't have a lot of leverage when it comes to negotiating your offer. That's so.

Atomic Parallelogram: Yeah, that's what I was gonna say. Like, you know, if they made me an offer, I probably. I might not negotiate at all. And even if I did, it would be very lightweight, you know, I think.

Nefarious Shadow: With that said, I would say, like, just my advice, negotiate as much as you want to. I can almost guarantee you that. They. Will try their best to get you in if you have the offer green light, because the target set on recruiters are so high that losing even one candidate is like they lost like a lot of work. So. Not even like, I'm not gonna, like, gonna joke like you can say, like, and say, yeah, I want like half a million dollars. Like, you can just say it. They might not give it to you. Right. You can say it, and they will, they will still come back to you with something. They're not gonna be like, oh, this guy's not serious. I'm not gonna move forward. So I'm just saying, like, you in that way you have leverage because there's a big need for hiring right now. A lot of targets are set. So with that information, you know, you probably have some chance of at least doing negotiations. And I would very strongly suggest please do negotiate. Yeah, because of course effort will not be a very good offer.

Atomic Parallelogram: Yeah, I appreciate that because that was what I've always been told as well. Like once you get the offer, the power dynamics shift a little bit. Like they took a lot of work to get you there and the data that I've seen says that like once they make you an offer, like very, very rarely do they rescind it based on the negotiations.

Nefarious Shadow: No, they won't. Yeah, I mean, I just like went back on my offer. I said, I negotiated really hard and then in the end they were still not there. I just said, I don't want to join. And then I got a few offers afterwards. I didn't like them. I think the offer was still the best. I went back to them after like 10 days and they were still like, yeah, fine, we're gonna. That's why I tell you that the recruiters have like a big big pressure to hire people. So they, even if you said, like, no, I don't want to, right now, come, go back to them after two weeks, they will still take you. Yeah. And that's generally true because your, your interview remains valid for, I think, maybe like six months or something like that. So. So, yeah. Don't worry about, like, if you were to, like, redo, like, a really hard negotiation, they might. they might rescind the offer or they might just not move forward. And the best thing that moves fast is the highest, I would say, gain you will get it on sign-on bonuses and the stock and the base is very hard to move. You can move and you probably they might move, but I don't expect more than like 10-15% change in the base. But sign-on bonus, you probably won't get any and they offer you just look at levels FII, see what people are getting and just ask that much and they will give it to you like instantly without even like. They just want people to ask. Sometimes people don't even ask anything so they lose Like 100K, just like that. Just ask them once and they will be like fine, here's 50K.

Atomic Parallelogram: And do I need another offer to negotiate?

Nefarious Shadow: If you have a counter offer. That. Can get you base salary and stock a little higher.

Atomic Parallelogram: Gotcha.

Nefarious Shadow: But they have a limit that is on their website for the role and that is the hard limit, the upper limit. That generally is the upper limit unless you do so good for the interview that they might consider you for the next level. Unless that happens, they will stick with the limit that they have and that. Will. Be the max that they will be able to offer at that level. that's why I said like sign on bonus is the first thing that you should always talk about always and just ask whatever you see online the maximum value for your level just ask that exactly. Gotcha.

Atomic Parallelogram: Yeah, yeah.

Nefarious Shadow: And go ahead. Yeah, so basically and yeah, and then the next thing is probably like stocks that they will be willing to go a little bit higher on especially right now. I don't know. They might not because the stock is low. So they might push you on, like, saying stock is low right now. So you might have a very big upside, but, yeah, I mean, you. You'll be able to just don't. Don't not negotiate.

Atomic Parallelogram: Yeah, I appreciate that. You know? Yeah, it's like a little. I'm not. I'm not quite desperate yet, but I'm getting there. and so it's good to know that I still have some room, you know, a lot of room to, to negotiate.

Nefarious Shadow: Just the market. Yeah. I mean, yeah. No, you have. You'll definitely have people generally think that they don't, but even if you don't have a counter offer, you still have a very big room to negotiate.

Atomic Parallelogram: Gotcha.

Nefarious Shadow: Yeah. Yeah.

Atomic Parallelogram: Anything else you want to add?

Nefarious Shadow: No, I think that's good. I'm just gonna leave like a structure that generally I share with the people that I do system design with. I think you kind of have a good structure of your own as well. But this is kind of what. I. Advise people to talk about, how to break down the problem. so I pasted that in the chat. Yeah, I should have it.

Atomic Parallelogram: Yeah, I see that. Thank you.

Nefarious Shadow: Yeah, so I think that should be it and all the best for the rest of your meta interviews. I hope you got the offer.

Atomic Parallelogram: Thank you. I appreciate all the feedback. You were one of the better mock interviewers I've had.

Nefarious Shadow: Oh, thank you. Thank you very much. Yeah.

Atomic Parallelogram: Thanks for staying over. I know we went a bit over, but thanks again for everything. Take care.

Nefarious Shadow: Thank you.