We helped write the sequel to "Cracking the Coding Interview". Read 9 chapters for free

FAANG Behavioral Interview

Watch someone solve the faang behavioral interview (engineering manager role) problem in an interview with a FAANG engineer and see the feedback their interviewer left them. Explore this problem and others in our library of interview replays.

Interview Summary

Problem type

FAANG Behavioral Interview (Engineering Manager Role)

Interview question

The candidate managed a platform supporting customer support chatbots. When a major outage occurred, frustrated customers opened multiple chat sessions simultaneously, creating 20K requests per hour and overwhelming the downstream chat agent platform. The solution required implementing user session management with authentication, caching, and rate limiting while coordinating across multiple teams and maintaining customer experience.

Interview Feedback

Feedback about Neuro Storm (the interviewee)

Advance this person to the next round?

Yes

How were their technical skills?

4/4

How was their problem solving ability?

4/4

What about their communication ability?

4/4

The candidate demonstrated strong fluency in articulating the scope of work and conveyed their thoughts with clarity. Their communication skills are excellent, allowing them to explain complex points in a structured and engaging way. When considering complexity, it can be viewed across three dimensions: 1. Technical complexity 2.Stress and timeline complexity 3. Collaboration and stakeholder alignment For this project, I would recommend placing greater emphasis on the second and third dimensions—navigating timelines under pressure and ensuring effective alignment with stakeholders.

Feedback about Doctor Malamute (the interviewer)

Would you want to work with this person?

Yes

How excited would you be to work with them?

4/4

How good were the questions?

4/4

How helpful was your interviewer in guiding you to the solution(s)?

4/4

Interview Transcript

Doctor Malamute: Oh, can you hear me?

Neuro Storm: Yeah, I can hear you fine.

Doctor Malamute: Nice, uh, nice to meet you, by the way. And, uh, yeah, yeah, so before we start, uh, let's get to know you a little bit more, uh, for example, like, where you are, what interview are you preparing, uh, Yeah, absolutely.

Neuro Storm: So I, I work at [REDACTED]. I'm a principal engineering manager and I'm interviewing at [REDACTED] for an engineering manager role. Um, the interview is actually tomorrow. I've finished two rounds, uh, the design interviews, two design interviews, system design interviews, and this one is a project deep dive interview. For 1 hour where I get to talk about a complex technical project that I have worked on to the interviewer. So just practicing so that I can be prepared for tomorrow.

Doctor Malamute: Okay, okay, great. Iris, that's very— I recently actually did the same thing, like I was switching to a different company for a manager role, and I also went through this exact same thing. Like you did.

Neuro Storm: Perfect. Yes, I'm excited. Yes, yes.

Doctor Malamute: So, uh, go ahead. Uh, maybe I think, uh, yeah, let's just start.

Neuro Storm: Sounds good. So I'm going to talk about a project that I managed, a product that I managed. Um, so I owned a platform called Virtual Agent Platform, which basically was a platform where customers would interact with the support bot. So anything the support bot could answer questions on, common issues, the bot would handle that. And then if the bot couldn't answer the questions, it would escalate to a chat agent, and which would connect them to an actual human agent in the backend. So basically, you know, and the normal— yeah, yeah.

Doctor Malamute: So before we get started, so that is the whole team, you only need a whole an d you own all of this?

Neuro Storm: Yes, this was a whole platform that I owned. Yep, end to end. Yep.

Doctor Malamute: Sounds good.

Neuro Storm: Sounds good. So on typical normal conditions, we would have about 300, 400 chats an hour. Occasionally we would see, see like duplicate sessions and volumes where, you know, within a limit that we could control and escalations would happen normally. But in June of 2022, we encountered an outage which where a lot of [REDACTED] passes, you know, think of this chatbot that was supporting [REDACTED] and ads and other customers.

Doctor Malamute: Maybe, yeah, I guess maybe a better way you can actually draw the diagram of the whole thing in a very simple high-level way and then we can discuss more. The project you're working on.

Neuro Storm: Okay. Sounds good. Is there a design whiteboard? Okay. Whiteboard. There you go. You can think of this as the virtual agent bot, and then this was the client which would interact through a load balancer. We had couple of services in the backend which was a virtual agent service. Then we also had something called a dispatcher, which was the responsibility of this dispatcher was mainly to take the request from this virtual agent service and then send the service, wrap the service to an external platform. So this was the external, uh, Power Virtual Agent platform. Um, it would route these, so requests would come to the load balancer, um, which would send the request to the virtual agent service. Which would end up creating, you can call this the conversation DB, and send the request to dispatcher, and the dispatcher would call into this virtual agent platform to recognize the intent. So for example, if you typed, I need help, because my [REDACTED] console is not working, right? It would send the intent to the—

Doctor Malamute: so the virtual agent bot is user or is a bot? Is something you own?

Neuro Storm: Yes, this is the client app. So this would be a client.

Doctor Malamute: Oh, the client part.

Neuro Storm: I said, yeah, and then there would be a user that would interact with this Okay, gotcha.

Doctor Malamute: So could you walk me through a little bit like how this like one, one conversation or one query through the system?

Neuro Storm: Yeah, so the user can type saying that, um, I need help with my [REDACTED] console because it would not start, something like this in the virtual agent bot. And then this user's message would route to our virtual agent service, which would, you know, send it to the dispatcher, I tend to go call the appropriate bot that was hosted in the Power Virtual Agents platform. So there were [REDACTED] intents, there were [REDACTED] Ads intent, there were [REDACTED]intents. So these are all, you can think of them as different set of conversations that exists in the backend, but the dispatcher service was smart enough to know which intent to call based on the user query.

Doctor Malamute: So dispatcher service will have to figure out the intent first before sending to the right agent, or is it like there's an external platform that does this thing? So how does this— Oh no.

Neuro Storm: Yeah, yeah. So, so these bots are hosted on [REDACTED], [REDACTED], and [REDACTED] developer platforms. So these are hosted in the, you know, some of these platforms, and within this intent we would know which product which product this bot is hosted under, and the Virtual Agent bot would tell the Virtual Agent service that the intent is coming from the [REDACTED]. Oh, I see. So this dispatcher already knows that it needs to initiate the [REDACTED] intents because it's captured as part of the URL as a query parameter.

Doctor Malamute: Got you.

Neuro Storm: Got you. The dispatcher knows that it's [REDACTED] product which is initiating this, and then it knows which Power Virtual Agent bot to redirect the request to. So this dispatcher had the PVA SDK integrated. The SDK had the functionality for us to host multiple dispatcher service to call into, to call the external public location platform.

Doctor Malamute: Gotcha. Now what's next? Once you, for example, for this one is [REDACTED] and it goes to the [REDACTED], uh, intent. And then what's next?

Neuro Storm: Yes, so the intent is recognized. We know it's [REDACTED]. The user says this and there are flows created within this. This is a low-code platform, uh, no low-code, no-code sort of platform where you can draw, you know, whenever an intent like this is recognized, which path to take. Within the platform. And then this virtual agent platform comes back to dispatcher saying that this is the intent you have to respond to. And the virtual agent bot service, the request, the response comes back to the dispatcher service, which routes the, which has a webhook into the virtual agent service. And we had a WebSocket connection open between the virtual agent bot and the virtual agent service, which would respond back to the bot with a response. So this could say, okay, here is what you can do with steps in there as a response.

Doctor Malamute: Wait, so who generated the response?

Neuro Storm: So the response is generated by this Pub Virtual Agent, which is sent back to the dispatcher service. Through WebSocket.

Doctor Malamute: So when you say it's a WebSocket, do you mean this is an interactive session?

Neuro Storm: Yes, yes, it's, it's, it's a chat session. Um, you can see these are all chat chats going back and forth.

Doctor Malamute: Then how the Commercialization conversation DB plays in a row?

Neuro Storm: Um, we're just keeping track of which conversations are coming into the flow. Um, so this will have this this intent where I said I need help with the [REDACTED] console, this is given a conversation ID.

Doctor Malamute: Oh, gotcha. So basically that will be tracking the whole flow.

Neuro Storm: This is like a tracking, yes, yeah. This is a tracking ID for all of this, you know, question and answer sort of a flow between the bot and the service.

Doctor Malamute: Gotcha, gotcha. Outside of this, like when you say external Power Virtual Agent platform, that's some other team doing this?

Neuro Storm: This is another team, yes. I see. Got you. This is a team within [REDACTED], but not us.

Doctor Malamute: Got it. Makes sense. Now I think I know the platform. How many engineers are working on this platform?

Neuro Storm: This is about 8 engineers that work. There is another piece to it. This is the bot piece, right? And there is another piece within this where, say, the, um, you know, the bot is now doing the best it can to respond to the client, to the customer, um, and these users are like users of [REDACTED] and [REDACTED]. So if the bot doesn't have enough to give, do the self-help for the customers, we have an option within here to say, um, to escalate. So the customers will type, um, so here the customers might type, I want to talk to an agent, right? So when they say that, the bot will send the request back to the dispatcher, which talks to the Power Virtual Agent. The virtual agent will respond saying that, oh, this is an escalation, so they don't have anything in this platform to escalate. So when we get this escalate signal back from the Power Virtual Agent platform, this communication is closed. So there's no longer dispatcher service is no longer talking to Power Virtual Agent platform. The virtual agent service says that, okay, you're escalating. So what the client will do, we had a lightweight you know, integration within the bot to, you know, escalate to via— and to another platform which was called the Dynamics Chat Agent platform. That's what it's called. It's called Dynamics Consumer. So they— so the bot would identify— the client would know that it's an escalation, and then it would open a new connection with this agent platform to have a chat between the agent and the customer. Yeah, so this is an actual, uh, agent here. This is not bot. So connection is open between this bot and the platform to talk between the agent and the, the agent and the actual customer. So these are the two flows. So we had 8 engineers managing this. Yeah, yeah.

Doctor Malamute: So for the virtual agent bot, which is on the client, are you owning that part of the code, or is that different team?

Neuro Storm: This which one?

Doctor Malamute: Sorry, the virtual agent bot, right? This is on client.

Neuro Storm: This is the client. So we own the client, we own the service, we own the dispatcher, we own all that.

Doctor Malamute: I see, I see.

Neuro Storm: These two, yeah, these two were other, like other teams within [REDACTED] that owns these two.

Doctor Malamute: I see. Okay, so I think that's a good, uh, like foundation. Now how about you talk about some project you're recently working on, the most challenging project you're working on recently?

Neuro Storm: Yeah, sounds good. So, um, so I'll focus more on this area where the agent bot, uh, once the customer escalated, um, the age— the bot would actually open a connection with an agent to talk, get help from an actual agent. So in June We experienced an outage which, you know, where a lot of [REDACTED] passes were canceled and about 200K customers were affected due to this cancellation. And customers were desperate to get in touch with support because they couldn't use their [REDACTED] passes to play, right? So what happened was a lot of these frustrated customers started coming into the bot started opening many, many chat sessions hoping to get connected to an agent faster. So within about 24 hours, we had seen over 500K chat requests coming into the platform and we saw a peak of about 20K chats an hour. The impact was that the agents were, you know, the chat agent platform that I drew here, was spiked and it started throttling and completely collapsed. As in, we had to take the whole chat platform down by closing it and making it, making, making that flow, you know, not take any more traffic because of this throttling and direct everybody to a phone support within, within the platform as well. So they could no longer talk to us.

Doctor Malamute: I may lost it here. So which part is throttling now?

Neuro Storm: So the throttling was happening between the chatbot and the agent platform.

Doctor Malamute: Uh-huh.

Neuro Storm: Yes. So the client, whenever they would say escalate or talk to an agent, this communication had stopped between the dispatcher and the Power Virtual Agent, and then the bot would have to open up a chat with the agent platform, and then we were sending over 20K calls per hour.

Doctor Malamute: Gotcha. Then what's the resolution here?

Neuro Storm: Yeah, so I mean, I can talk quickly about why it mattered for business. If it doesn't matter, then I can move on. Go ahead. Yeah, so basically they couldn't escalate and the customers were very frustrated because support was not available. When they wanted. And also a lot of agents were stuck in a situation where they would sometimes see these customers opening a chat, but nobody was there on the other end because all the customers started opening multiple chats thinking that they would get connected to an agent and that didn't happen, right? And they kept looking, waiting for the customers to join, nobody joined and the agent productivity sort of went. Down. Um, so, uh, in all in all, my role as an engineering manager to sort of own the— own a solution, um, to implement something that helped us mitigate the issue faster. Um, so a couple of things that we had considered. One was either we had to put a rate limiting on the client side which means that we could throttle the chats on the client and not allow them to open more than one chat. But, you know, user can always bypass that, right? They could open an incognito window, they could open multiple browsers. You know, this wouldn't solve the problem that we were looking at. The other thing we can—

Doctor Malamute: [Speaker:SHIVA] So this is a, sorry, so I want to clarify here. So this is an incident happening right now, right? And the one you're saying, making this, uh, uh, this, uh, like, uh, limitation on the client side, you need to update the client.

Neuro Storm: Yes, we needed to. Correct. So this incident was already contained. We had mitigated the issue by routing chat. We, we were no longer accepting the chat flow, uh, in US, um, US, China, EU, and Russia locales. So we had completely cut that off. Yeah, we had already mitigated the issue and routed all the traffic. So this is one flow, but this— there was another modality for phone support, which is not owned by my team, um, that, you know, custom— we would just show a phone number, like a flow within the app to enter the phone number, and we would redirect the request to the phone support system via an API, and we routed the request to phone support, which was managed by another team. Um, this was a platform, so that's how we mitigated it. That issue happened, but we had to go in— I didn't, you know, look at the root cause as to what happened and put in a long-term fix. Um, that's where the solution came in. Yeah, so, so this one, yeah, like I said, the rate limiting changes had to happen on the client side and it wasn't very useful. Users could easily bypass it. The second option we had looked at was that we could put rate limiting on the backend, create like a backend service and add some rate limiting through API Gateway on IP addresses, that wouldn't work either, mainly because we would be blocking actually some legitimate users because sometimes people sort of live in the same house and they could be working from like, you know, using the, you know, working from like a, you know, gaming cafe right? They would all have similar IPs, so we did not move— we didn't want to move forward with that solution either. And— sorry, why is that?

Doctor Malamute: Sorry, I'm not sure I follow here.

Neuro Storm: So like in gaming cafes, you would have similar IP addresses coming from everywhere from that cafe, or in a household there could be many people using the same— like you could have one person use the browser to open up the chat window and then another person sort of trying to reach for help from another machine, or you could have family accounts. So we couldn't use the IP addresses because we would be stuck with legitimate users too.

Doctor Malamute: But would that be some way to identify unique person in the same household?

Neuro Storm: Yes.

Doctor Malamute: So you cannot tell.

Neuro Storm: So not with the IP address. The solution we had was what we introduced was before they, so before they raised a, you know, request with the chat agent platform, we added a, we had an auth service Right? We had an auth service where the client, so the users would say, I need, I want to talk to an agent. And then we would pop up something like, okay, tell me which. So if we knew the product, we wouldn't ask for the product that they're coming from. We would ask what issue. And they could say issue could be, it could be account issue. They could be locked out from their [REDACTED] account, for example. It could be a billing issue. It could be, you know, other issue, et cetera. They could choose from this as well, right? And what we did was once they chose the issues that they are requesting help for, we would ask them to log in. And the login would call into the auth service. The login has to be an [REDACTED] account, which is a [REDACTED] [REDACTED] account that a user would create to log into their [REDACTED] console or any, any like [REDACTED] Ads that they're managing or developer platform. So that [REDACTED] account is the account that we would authenticate against. And once they're authenticated, we would send that authentication saying details to the chat agents. The agent knows how to greet that person. So we had this flow, so we thought, okay, best way to sort of leverage—

Doctor Malamute: When you say we had this flow, you mean this is the existing flow? Before your proposed new solution?

Neuro Storm: Yes, the solution was— so this, this solution for [REDACTED] authentication existed. The— what didn't exist was a way to intercept this and create a session store to recognize, okay, this individual who's trying to raise a request has already logged in and they already have an ongoing chat. So we leaned on that approach a little bit. So, which means the user has already authenticated and they've already picked which product and issue that they're requesting help for. Why not use that as, and store that session state in cache and identify so that they, the user will only open one chat if they already have a chat open with an agent, we will not allow them to raise more, like raise more than one chat per session. Because the problem was before this, before, you know, the problem with what we had seen was that users, even after they authenticated, they were able to open over a dozen chats at a time. So we didn't have that session store, where we captured that the user had an ongoing chat with an agent.

Doctor Malamute: Okay, so I guess I know the general idea now, but I have questions here. Like, for example, if you have like opening session for this user and you start to open another one, would that be make sense to actually reuse the original session? So instead of forbidding them opening another session Would that be just more natural for them to have the reuse the same session? So both sessions share the same conversation, something like that?

Neuro Storm: Yes. So mainly because if a customer already has, say, this product and issue type selected, they're probably already in the queue to request help from an agent. So they wouldn't, within that timeframe. So we had a way to expire those tokens that were generated through the auth service, right? Like 15 minutes. So in case they didn't respond and they didn't get connected to, like say they didn't want help from an agent, we would expire that token in 15 minutes. They could come back and open a chat with an agent after 15 minutes if, or they, you know, say they forgot to engage with that agent and they no longer need help, we would still allow them to come back. But within that 15-minute window, if they had a chat open, we wouldn't allow them to create a new chat against the same product and issue type.

Doctor Malamute: I see, I see. So basically there's a timeout.

Neuro Storm: There is a timeout, yes.

Doctor Malamute: There's a little concern, like for example, I have been through the, uh, agent or chatbot system too, like I'm opening too many tabs.

Neuro Storm: Yes.

Doctor Malamute: And I forgot the original tab, where the original tab is. So you're forbidding me from opening a new tab, but I cannot find the original tab. No, that's—

Neuro Storm: no, it won't. So we, we didn't stop the users from opening multiple tabs. They can open multiple tabs, and for the same product and issue type, we had a way within the bot to show them that they're already in the queue to connect with an agent. So we would pop up a card saying, to make that more friendly, saying that, hey, you're in queue, your position is 8 in line, or your position is 2 in line. So they, they don't lose track. So even if you end up opening multiple tabs, you would be in the same queue, you wouldn't lose your place. And get connected to the agent like you had requested in your original task.

Doctor Malamute: Okay, I see, I see. That makes sense. Okay, so to dive a little bit deep on this project, so what essentially are you building? And is it— what's challenging about this design and any alternative?

Neuro Storm: Yeah, so the challenge was that there's several challenges within this. So Challenge was this Dynamics chat agent platform was something that we couldn't control. We had to build our own solution. They didn't have anything that provided to control the number of requests coming into the chat platform. We had to build a way and provide an internal solution to stop chats from going, you know, throttling their systems. The second thing was we had to make sure, like you said, right, we had to make sure that we were reusing the sessions whenever possible. Possible based on user's identity, product, and issues, and so that we didn't risk the customers from routing them into the wrong sort of queue. And also that because we were leveraging their identity, we had to do it in a very compliant way so that privacy and security was also in mind. And this— and also this required us to coordinate with the chat Dynamics Chat Agent platform team, support team. There was a support team that was sort of recruiting these actual support agents, right? So we had to make sure that we were in constant communications with them. Our PM team, all of them. So bringing all of the stakeholders together and coming up with, you know, the thresholds and rollout plans was something that was challenging about it. So on the technical side, what we did was, yeah, like I said, this [REDACTED] flow existed. So what we ended up doing is whenever a user would come into the chatbot requesting for help from an agent, we would call into the Auth Service. Auth Service would create a token. We would store that token in a, Cache. Right, so this— yeah, yeah, this was a cache. This was a key-value cache like a Redis cache which stored the identity, stored the product. Are you drawing something?

Doctor Malamute: Sorry, I didn't see the cache.

Neuro Storm: Is there a lag? I am drawing something.

Doctor Malamute: Let me refresh. Oh, once the— I may drop my voice, so one second, I'm refreshing. Hello? Yes, I can hear you. Oh, nice. Hello?

Neuro Storm: Yes, hi.

Doctor Malamute: Yeah, yeah, I can see the cache now. Yes.

Neuro Storm: Okay, okay. So every time the customer authenticated, we would call the authentication service, log the identity, which is their identity plus the product and issue, like the token, and respond back. And if it was the first time, right, and the bot would know that this is the first time they're authenticating, and then the bot would start a chat with the agent platform.

Doctor Malamute: Phone.

Neuro Storm: Now if the customer came in and requested another in a new tab, then we would go check the cache in the backend to, you know, the authentication service would go check the cache and see, okay, the customer is already authenticated and send back the original, you know, same token back to them saying that, hey, you're already authenticated, and they can still continue to stay in the same chat flow. We wouldn't open a new chat with the Dynamics platform. The way we integrated with this, this Dynamics chat platform was through a, something called a Chat SDK. And we would call this through the SDK and would say, okay, chat is open, we wouldn't open— we would just reconnect them to the same chat, um, through the Chat SDK platform.

Doctor Malamute: So, uh, during the contract between you and the Chat SDK, is that just giving them the token ID?

Neuro Storm: Yes, so we would give them like a reconnect ID, um, to the, uh, through the SDK saying, hey, this is an existing chat, here's your reconnect ID, um, connect them to the chat platform.

Doctor Malamute: I see. And another question for the auth service. Do you own the auth service?

Neuro Storm: Yes, we do. We, we had to have the auth service created and built on our own.

Doctor Malamute: But that's my question. Also, it seems very general to not just chat platform. It seems to be [REDACTED]-wide, should be unified, right?

Neuro Storm: So, okay, so when I say auth service, it's not Auth for like all of [REDACTED]. The Auth service was integrated through [REDACTED]L, which was our [REDACTED] authentication library. The Auth service talked to— I know it was not something that we owned fully. We were more of a subset of the service that would call in to [REDACTED] authentication library that identified the customers and would come, you know, identify that this was a valid customer.

Doctor Malamute: Gotcha, gotcha.

Neuro Storm: Okay, we used a pre-existing library that [REDACTED] had built to identify customers, but we had a layer to do that, um, back and forth.

Doctor Malamute: I see.

Neuro Storm: Okay, uh, so how is your, for example, "What if the Redis is down?" Yeah, so if the Redis is down, it, you know, we had a fallback strategy. We would create a new token, and this is all in memory. We would create a new token, and the worst that would happen is that we would generate a new token for the bot and they would end up creating a new chat. But Redis, the reason we picked Redis was that it could handle many, many queries at the same time, and it had that native TTL support. So the— in case the customer ended up closing the tab or leaving the tab open and forgetting that they engaged with the— engaged with an agent, we would expire the TTL if there were any network issues or they close the tab or the system went into hibernate mode, we had a way to just expire the TTL within Redis. So it provided that capability. The worst case, we would just create a new tab. That was the trade-off that we did, um, that, you know, if Redis was down, we would just create a new token for the user and send that back to them.

Doctor Malamute: Gotcha, gotcha. Okay. Uh, so, uh, I guess there definitely will be some, uh, challenge in execution alignment. Uh, do you have any?

Neuro Storm: Yeah, yeah. So in addition to this Redis flow, we also had to introduce, um, the, um, presence as well, right? Like when I said, um, that, you know, we had to detect the customer was actually there or not. So we had to implement another layer to another API to detect presence. So the Auth Service—

Doctor Malamute: sorry, I may, I may lost this. Why we need to detect that?

Neuro Storm: So it was constant ping to say if the customer had the tab open or not. If we didn't know, if— how would we know otherwise that the customer had their tabs open?

Doctor Malamute: So we need to have the 15-minute timeout, right? So we still— we don't need to worry if it's active or not, right? But we'll expire this token within 15 minutes.

Neuro Storm: Correct. 15 minutes was definitely a window. That 15 minutes was like a, you know, token expiration. So for example, if the customer had been waiting for 15 minutes to connect with an agent, say there is a long queue time, there is a long wait to get connected to an agent, right? If it, which, if it is over 20, say if it is over 15 minutes, then we would end up expiring that token because they had been waiting for 15 minutes. So we needed a way to refresh the token after the 15 3 minutes window and also sort of detect that they had the window open as well, right? So for example, there are two cases. One case is that they engaged with an agent and in the middle of engaging with an agent, they closed the tab. For example, we had, you know, we had a way to signal that by saying, okay, we would ping every 30 seconds from the browser to say that the customer was there. To clean up, to, you know, to say that the TTL was still valid. Like, you know, we didn't have to refresh, like move the TTL.

Doctor Malamute: Maybe my question here, what's the worst case?

Neuro Storm: Worst case is we would wait. We would wait. I mean, they would, what would happen is that the JWT token was like, it would expire in 15 minutes.

Doctor Malamute: I see, I see. That now we are probably just— it's more like optimization to shorten the— yes, to basically optimize the resource. We don't wait, um, yes, idle, like you said. Okay, sounds good.

Neuro Storm: Yeah, yeah. So that, that required some like going back and forth, um, with, um, with the team to understand whether it was the best way to approach that as well. So we knew, um, so, uh, okay, so that, that was the complexity. Um, so basically I had another way to send things from the pod to the auth service for presence. Um, and also, um, we also had a way to, um, do refresh, like a token refresh after 15 minutes. Which was done by the, um, hot service as well.

Doctor Malamute: Um, so since it's a client app, right, uh, they may have different releases. They have a follow-up release cycle. And, uh, so this thing is not built in the old version of the client, right? So how do we ensure they are actually using the new updates, unless otherwise they are bypassing this, uh, throttling mechanism, right?

Neuro Storm: Correct. Um, so this— so think of this, this is not an app, this is not a mobile app, this is a browser app.

Doctor Malamute: Oh, I see.

Neuro Storm: Yes, yes, this is a browser app. Yeah, so just like how would you would go to a URL, it would just be a bot hosted within that. So they would get the latest version whenever we took, uh, we applied the changes on the client side.

Doctor Malamute: I see, I see.

Neuro Storm: Yeah, yeah, yeah. So that, um, I don't know if that helped, um, that clarifies.

Doctor Malamute: Yeah, that, that helps. Yes, that helps. I see. Okay, uh, I think the, uh, to be honest, it's pretty clear to me. And I mean, maybe the final question is, if you were doing this again, how would you do it?

Neuro Storm: If I were to do this again, there were a few learnings through this flow. Um, one thing, you know, you know, Aish, load testing is something that I could— we could have done prematurely, right? Um, the load that we saw, we had never expected in the past. Um, I would, I would do load testing, um, as part of my cycle of developing an app application that supported so many components, um, and stress tested the system a little better, um, you know, beforehand to handle such large volumes. The other thing is getting everyone aligned, right, so that we don't miss the timeline. So this— why this was important for the business is because we had about 4 months to deliver this change, and these bots actually supported across 32 locales. And there were 32 different support channels that were available within this Dynmic platform. So what it meant was the rollout strategy had to be developed in a way that we rolled out first this to a low-traffic sort of, local and then slowly rolled it out to other, uh, locales. What I would have done differently is have like a canary environment set up for the application, uh, to sort of test and validate the changes rather than going live directly to the customer to catch any issues. So those were some of the learnings and some things that I would have done differently for the app itself.

Doctor Malamute: I see. Gotcha, gotcha. Okay, I guess I'm out of questions. Maybe I can— we can jump to the feedback part.

Neuro Storm: Okay.

Doctor Malamute: Yeah, so first of all, I think this is very good. What do you think about this interview? Maybe I—

Neuro Storm: yes, I mean, I know it's important to give interviewers context. How much time should I spend giving them the context? I don't know if I spent too much time giving context about what I did.

Doctor Malamute: No, no, no. I think, so at least from my understanding, there's no very clear guidance on this session, this kind of interview. But my understanding is we want you first, like, evaluate, assess if you are good at communication. And especially when we're doing deep dive, I will ask you— actually, I'm doing right now, I'm asking both business requirements and also the team setup and also the technical part. And yes, I think you take it very well, you know, very like clear and, uh, clear and professional way handling. So, uh, which is very good. To be honest, I will give you a pass in this kind of an interview. However, I guess something I will be curious—

Neuro Storm: Yeah.

Doctor Malamute: It's like, this is like 8 Engineer Platform, and the project you are describing seems like definitely there might be a lot of challenges in alignment, in execution. In some like collaboration, but I don't feel this is an 8 engineer job.

Neuro Storm: Oh, I see, I see, okay.

Doctor Malamute: Yeah, so what I would recommend if this is one of your projects, maybe you group some projects together like as a kind of theme. Like this is one way to improve the reliability, maybe another one, another project improve the reliability in a different way. So you're kind of like building a long-term project, setting different milestones. Each milestone is improving part of the system. You know what I'm saying? So basically, grouping this into a bigger story.

Neuro Storm: Okay.

Doctor Malamute: If that's possible. So this can be one milestone. Okay, setting this routing. And then can be another thing like maybe scaling, uh, virtual agent service, like in a way like sharding or conversation DB, whatever. It's another way consider like, uh, improving the reliability of the platform. That could be another milestone. So you set milestone in multiple phases and do you deliver? So that makes this story bigger. Uh, right. Yeah, that could be one thing to improve on. Um, another, another actually, uh, purpose of this interview is more for background matching. Um, I, I think you are implying for more general suite manager, right? Uh, I'm working in the AI field, so, uh, definitely this is something like we wouldn't hire. Because we need a person working in AI field and working more agentic or the LLM stuff.

Neuro Storm: That kind of stuff.

Doctor Malamute: So I guess that I cannot say if this is a yes or no for them, but usually we also using this opportunity for background matching.

Neuro Storm: Got it. Got it. Okay. Okay. Makes sense.

Doctor Malamute: But overall, so far it's very good. I mean, You demonstrated you have very good communication skill set, and also you actually work on this project. You know this end to end, and every tiny detail you kind of know well. You also answer my challenging question pretty well. Take those challenges.

Neuro Storm: Yes. Okay, wonderful.

Doctor Malamute: Yeah, I have no complaints.

Neuro Storm: Thank you, that makes me feel prepared. For it, but I like the idea that you said. I think what I'll do is, because, you know, because of this load, we also saw a lot of resiliency issues. So I had to put people, um, have my engineers work on improving the resiliency side of the app as well, um, where we had to scale to other regions to support the load.

Doctor Malamute: Yeah, you can group this. Yeah, I feel a few of these can be grouped together, right? Like a whole thing. You're trying to improve the resilience within this year. So what milestone you are trying to achieve? And you're trying to manage people on different sides and different projects. That's also showing leadership skills.

Neuro Storm: Makes sense. Yeah, okay, okay. That is actually good feedback. Perfect. Yes.

Doctor Malamute: Okay, I guess that's, we still have 10 minutes, but feel free to ask me question, or you, I think we are good.

Neuro Storm: It's okay. Okay, I mean, I don't have anything specific. I mean, how much of a deep dive will they go in these interviews? I mean, is whatever deep dive—

Doctor Malamute: It varies by, yeah, varies by company. So for my company, usually couples with a system, mini system, design. But it usually depends on how big your project is. Sometimes we take too much, very long time on dive deep on the project and a very short time on the system design. But for your case, to be honest, I feel it's a mid-size project.

Neuro Storm: Yeah.

Doctor Malamute: So we probably will spend more time on the system design.

Neuro Storm: Okay, got it. Yes.

Doctor Malamute: I see.

Neuro Storm: Typically, like what kind of projects are usually discussed? Discussed. So in, in, you know, in your experience, um, what, what are— I know how— what is a large-scale project? If you can give me some pointers, maybe.

Doctor Malamute: Yeah, large-scale project usually comes within a year and with multiple milestones.

Neuro Storm: Okay.

Doctor Malamute: Yes, so it's not like a one-quarter project we're talking about. It's more like one year, uh, one year's timeframe.

Neuro Storm: Okay, gotcha.

Doctor Malamute: But usually people don't finish one— even for me, we don't actually finish one year before we talk about one year. So you know what I'm saying, right? This is interview, so you can always have next steps. It's not necessary you have to accomplish all the three milestones before you talk about it.

Neuro Storm: Gotcha, gotcha.

Doctor Malamute: Okay.

Neuro Storm: Okay, makes sense. Okay, that's why I wanted to check that one thing, like what does it mean to talk about a technically complex project? So is it— does— do they determine it based on the size, the— or more on the complexity itself? I mean, does complex really mean it has to be a multi-quarter kind of a thing or a Small, small.

Doctor Malamute: I think, uh, so that's actually something I want to discuss with you. What do you think, uh, the complexity in this project?

Neuro Storm: The, you know, the complexity was really identifying the gaps because we didn't have observability. That was the challenge one. The second thing was, um, the resiliency was under, um under a lot of scrutiny because we couldn't scale to meet customer demand. The third thing was that time, time was also under— we were under pressure to deliver this within that quarter to be able to have the system up and running before holiday season because we usually saw spikes during holiday season.

Doctor Malamute: Yeah, I feel if that's the case, I feel your case can be described as like the complexity can be deliver a mitigation plan plus a long-term plan within a short time.

Neuro Storm: Okay.

Doctor Malamute: And also this gives you both the stress and also you have to quickly come up with the milestone between the mitigation to the long-term, right? So you have to kind of managing resources, the engineers, blockers, dependencies, because you only have a certain amount of time. That could also be complexity. So back to my question, back to your question, I think the complexity have multiple dimension. Each one we both values. First of all, technical complexity. Something you did is impressive that no one did before. Or something innovative. Usually that's what we're looking for. AI, in AI field, that's something very important. Second one is execution, similar to your saying, is execute this in a very short-term plan. And the third one is something like alignment. Usually, since you're in [REDACTED], you definitely have a lot of stakeholder alignment meeting. And how do you kind of work around the conflict of interest, work around with reaching the middle ground between the two teams, right? That can also be complexity. I would suggest you focus on the other two instead of the technical part.

Neuro Storm: Oh, OK. Got it. Got it. OK. Sounds good. OK. Thank you. That helps a lot.

Doctor Malamute: OK. Yeah, yeah. And good luck. I feel you're very proud. So don't take it the wrong way. I think the other person is happy.

Neuro Storm: Oh, no, no, no.

Doctor Malamute: Absolutely not.

Neuro Storm: I mean, I'm here to— I mean, I— if I make mistakes, I want to make mistakes here, making it in the interview.

Doctor Malamute: Yeah, you're doing a good job.

Neuro Storm: Yeah, okay, perfect. Thank you so much. Thanks for your time. Yeah, bye-bye. Bye.

Microsoft Interviewer

Unique ID generation

Invincible Cloud, a Microsoft engineer, interviewed Golden Possum

Watch interview

Google Interviewer

Order statistic of an unsorted array

Intergalactic Avenger, a Google engineer, interviewed Supersonic Taco in Java

Watch interview

Google Interviewer

Most frequent integer and pairs of integers sum

Paisley Wallaby, a Google engineer, interviewed Propitious Bear in Java

Watch interview

Google Interviewer

Triplet Array

Rocket Wind, a Google engineer, interviewed Whirlwind Alligator in C#

Watch interview

See more interviews

We know exactly what to do and say to get the company, title, and salary you want.

Interview prep and job hunting are chaos and pain. We can help. Really.