Indelible Raven, a Microsoft engineer, interviewed Pseudo Gyroscope in C++
K nearest points
1) Find the K nearest points to a vertex. 2) Change your solution under the condition that you have an input file of size 500 Petabytes.
Feedback about Pseudo Gyroscope (the interviewee)
Advance this person to the next round?
How were their technical skills?
How was their problem solving ability?
What about their communication ability?
It was a pleasure interviewing with you. Unfortunately I would have said no to passing you for an offer (although I would have advanced you to onsite). For the most part your technical ability is pretty good, you were able to write out clean functioning code. I wish you had cleaned it up a little bit by moving some of the re-occurring code into it's own function (i.e. distance function). When it comes to algorithm design and data structures, I was happy that you skipped straight to using a queue instead of starting with sorting or something else. Although (this may be due to you using java recently), your mapping of algorithm knowledge to the C++ STL equivalent is lacking. I recommend reviewing that if you're going to interview in C++. Communication was good, you were clarifying requirements and edge cases but did not actually check it / write the code for those edge cases. I wish I saw that. You did ask questions which was great. Basic problem solving was decent but when I got into something more advanced you kept with the old style code instead of recognizing that you'd need to use a map-reduce instead to handle the large amount of data. Overall, good work. Take the lessons here and improve for future interviews. Good luck in the future interviews.
Feedback about Indelible Raven (the interviewer)
Would you want to work with this person?
How excited would you be to work with them?
How good were the questions?
How helpful was your interviewer in guiding you to the solution(s)?
Please let me know the resource for design prep in case you remember it. Thanks!
Pseudo Gyroscope: Hello? Can you hear me? Hello? Indelible Raven: Can you hear me? Pseudo Gyroscope: Hi yes, I can hear you now. Indelible Raven: Awesome, I've been trying to figure out where my audio problems are coming in and I think I've just figured it out. Apparently it just doesn't work when you're on a VPN. I've been helping them troubleshoot that so. Pseudo Gyroscope: Okay okay. Indelible Raven: Before I forget, let me pull up their email. So, have you been on this platform before? Pseudo Gyroscope: Yeah, this is my second time, so one time before this. Indelible Raven: How do you like it? Pseudo Gyroscope: It's pretty cool, I like having an IDE now instead of in the past used um just basically like sharing text documents, so this is pretty neat. Indelible Raven: Oh yeah, it's definitely a lot better. You get kind of immediate feedback after, it's a pretty neat platform. So I guess kind of how I do it, I work in Microsoft and formerly Google, so I kind of follow the style of interview that we do here. So it's about... usually there's about a five-minute intro, thirty five minutes of technical interview, followed by five minutes of Q&A. It's anonymous, so the first part's easy. So we usually just get right into the tech interview and then I could give you verbal feedback afterwards if you want, then I'll write up the written one afterwards as well. So you ready? Pseudo Gyroscope: Yes. Indelible Raven: What's your primary language? Pseudo Gyroscope:
C++Indelible Raven: Awesome. Oh you already change it, sweet. Okay, quick question: roughly what level are you at in your career? Pseudo Gyroscope: I am at the, you know, quote unquote I guess the regular engineer, like right before a senior, so one step above a new grad, but not at senior. Indelible Raven: Okay alright, knowing that I'm gonna give you this question. I'm gonna give you a vector of let's say pairs. And I'm gonna give you... we're gonna call it points and basically what it is is a list of XY coordinates. Then I'm going to give you another pair, which will be the vertex, so it's not going to be
0 0, it's going to be another point that's given, and then a
Kand I want you to find the nearest points around the vertex, but only
Kof them. That make sense? Pseudo Gyroscope: Let me make sure I understand so, I'm given a vertex and I want to find its neighbors? Indelible Raven: Yes. Pseudo Gyroscope: Okay, what is the the criteria for a neighbor? Indelible Raven: The shortest distance. And the distance can just be the
sqrt((x2-x1)^2 +(y2-y1)^2). Pseudo Gyroscope: Okay, I see yeah. Okay, so I have a list of points, I got a pair. What does int
Kmean again? Indelible Raven: It's just a number of points to return. Pseudo Gyroscope: I see I see. So I'm returning a vector of neighbors that are within... oh wait... oh, you're saying the K closest neighbors? Indelible Raven: Yes. Pseudo Gyroscope: Okay, yeah just letting that sink in. So we've got a list of points, I got a vertex, I want to return the K closest neighbors. Okay. Do I just kind of write the input inside the main or you want like a separate function to call? Indelible Raven: I want a separate function for the nearest neighbors. Pseudo Gyroscope: Okay, so it'll be like a vector of pairs for the return value of that function. Indelible Raven: Yeah, just another vector of pairs to return. Pseudo Gyroscope: I see I see. So that'll be the signature. Okay. So I have a vertex, I will need to consider the distance to all the other points, and I guess I can put them in a priority queue to put them in shortest distance and then pop off the first K elements in that priority queue. So if K is larger than the total number of points, I just return the whole thing right? Indelible Raven: Sure. Pseudo Gyroscope: Okay, I think I have a plan to implement. I will loop through the entire vector of points and then calculate the distance and put them in our priority queue with a distance value, so I need to store the... I guess I can use the index of that point plus its distance, so could be like a separate pair and then that queue will compare using the distance. Okay so I think ready to implement, I'll go ahead and start. Okay, so for every point I need to calculate the distance. And then I need to put in a priority queue. I don't remember the syntax with this, should I just ask you or Google? Feel free to google it if you need to, just the syntax. Okay yep, okay so I'll just write some pseudocode and then fix it up later. So then I will have my answer. My return value will be a vector. Indelible Raven: Alright difference... I was going to say it's probably good to make it a double, but this will be. Pseudo Gyroscope: Yeah I need to insert a pair of my index, so my index will be
iand distance. So now I need to add a... ok so I don't remember the syntax for the priority queue to add the compare function, so I'll look that up real quick. Indelible Raven: Sure. Pseudo Gyroscope: So I'm thinking about this, looking at every point, getting the distance and then inserting it with the index as the distance. So I need to push back the actual pair of the coordinates, so let's see... just for readability here. So neighbor is equal to okay... so it'll be the coordinates... Indelible Raven: Well, you store the index right? So can't you just pull that element off? Pseudo Gyroscope: I store the index, yep. So I need to look into... so this is the index and then I need to look into the the original. Okay, so this is my other pair. Okay, I think I'm ready to run some tests. Indelible Raven: Yeah, let's try it. Pseudo Gyroscope: Okay, so let's see. I remember you saying something about something not being
(0 0), I don't remember what that was about? Indelible Raven: Yeah, the vertex doesn't have to be
(0 0). Pseudo Gyroscope: Let's take a look at that example again. I think I'm missing an arrow. Indelible Raven: Should be able to do a lambda. Pseudo Gyroscope: I'm sorry? Can you repeat that? Indelible Raven: I said you should be able to do a lambda function. Pseudo Gyroscope: Should be able to... oh yeah, I don't know what version of
C++it's using, okay. Okay, I can try the the struct way. Indelible Raven: What if we did this? Let's look that up. What I'm saying is that you can do an auto function and then declare type it. Pseudo Gyroscope: Ah, yeah. I saw an example with the declare type and then using then using a function instead of a lambda. Indelible Raven: What do you know... oh that might work. Pseudo Gyroscope: Oh wow. Indelible Raven: So do you see kind of a problem with this? Not the actual problem, but something else? Pseudo Gyroscope: Like a logic error? Indelible Raven: Yeah. Pseudo Gyroscope: Let me take a look. I don't know how it works if the K is bigger than the? Indelible Raven: That would work. Pseudo Gyroscope: I don't know how it works with if K is bigger than the size of points? Not sure what it does. Indelible Raven: What if I gave you ten billion points and K is equal to three? Pseudo Gyroscope: Oh, I see yeah. Ten billion points, K is equal to 3, I would be storing too many... okay yeah, I only need to store the first K. Let me think about which one to put in. So I need K so... put in the smallest. So while I'm putting it in, if the size is greater than K, I need to pop off. Okay, so I have the example of
1 3 5 7 9and I want to keep the first two, so I have... 1 3 6 5 then I don't need to store it. Just trying some examples. So I guess I could have another priority queue to sort by the largest and then I would always know the largest, I'm thinking that's the best way to do it. Indelible Raven: Sure. Pseudo Gyroscope: That's the method I'm thinking of right now, just trying to think of anything better than that. Okay so I keep k and then... let's say I'll keep a separate queue of k in the opposite sort order, in some kind of a max priority and then if a number is larger than that one, then I just skip it, if a distance is larger than my top, then I skipped it. And if it's smaller, then I add it in to the... Okay, so I should only need one queue, I need to make it a max... Okay, so what I can do is fill it up with K, and then... okay I see. Fill it up with K and then only add one in if it's smaller. So I reverse the order of my queue and then we add K of them in first. Indelible Raven: I'll just tell you alright? Pseudo Gyroscope: I'm looking at the index right? So if the distance is smaller than the biggest one in there, then I get rid of that biggest one and then I insert the... So now I do all of them. This should be the entire queue, should be able to leave that there. Indelible Raven: Alright, let's... I don't want to say stop here, but stop with this part of it. What if I change the input a little bit and I gave you a file with probably three... let's say like five hundred petabytes of points. How would that change it? Pseudo Gyroscope: 500 petabytes of points or do you want the return value to be that big? Indelible Raven: No, the input file is like five hundred petabytes. Pseudo Gyroscope: I see, I see. Input file is five hundred terabytes, so okay... Indelible Raven: Not terabytes, think times a thousand: petabytes. Pseudo Gyroscope: Petabytes, okay. So, I would look at if... so I want to read like a... not sure how to do this but look for a way to read parts of a file, like kind of like the the bash kind of head and tail concept, but with like the iterator or the file cursor, I would need to like... kind of like use a stream of some sort to read in from the file, depending on how it's formatted. So let's say there's like one per line, 1 point per line. Then we read from like the start cursor to a certain number. Not sure how to do that, but I'm guessing I'd look at the file stream options and then see if there is a way to do this. Indelible Raven: So you want to do this all in one machine, just one line at a time? For 500 petabytes of points? Pseudo Gyroscope: I see where you're going. So once I get the the cursor capabilities, then I can do a MapReduce, so we can farm it out to a distributed amount of machines and then it can actually be split even further and then get those... okay so each machine would look at a certain subset and then in the reduced step, they can combine all of their K neighbors into... basically put them into the points variable and then each machine during the reduction step will have a new set of points, which is the combined return value from the mapping. So okay, so yes, so do a MapReduce with this, we combine it at the end. Indelible Raven: Okay, so how do you set it up, what would your reduce be? Like how do you expect the map to work. Pseudo Gyroscope: Okay, so during the reduce, let's say if I ... Indelible Raven: Let's say just this right here. Map stuff reduce stuff. And go in the rest on ideas. Pseudo Gyroscope: Okay, so the map... I'll look at the... what's the capacity of how much I want each machine to do. So let's say I have a start and stop and then so there can be multiple levels of the MapReduce, so let's say I could do like for example like one tenth of it and then farm it off to each machine and each machine will recursively map out a tenth of its inputs until the last machine, it gets the size that is less than a threshold we set, so something like... Indelible Raven: Let's stop here. You kind of want your feedback now as well or do you want the written feedback. Pseudo Gyroscope: Oh yeah, I'd like the verbal one also. Indelible Raven: Okay, so you're a mid-level engineer from what you were talking about. How often do you work in
C++? Pseudo Gyroscope: Actually, I do Java on the job now, but most of my previous career was
C++, I feel more comfortable doing that in the interviews, so right now it's practice. Indelible Raven: So, I mean I was happy you started with a queue, it was a good idea. I kind of wish you like realized right away that, hey k could be pretty small and I don't want to create the entire queue out of the whole thing. You went over some edge cases, but you never did anything about it, so that's a bit disappointing. Like I feel like if you're going to go through the edge cases, you should write those out, like checks and exceptions and stuff for them and just say, hey if it's below zero don't do anything or K is greater, just return the list and so on. It seems like you have a good understanding of like algorithms and data structures, but you have a hard time mapping those to STL libraries, like you had to look up with a priority queue looked like and how it worked and then there's some weird things like down here when you created a.. or in main... when you created a vector that way, instead of saying like points parentheses, like this right here. So this would have been easier. Something like that. So it seems like
Javamight be taking over a little bit and you might be trying to do it a little bit different. The technical skill there is fairly decent, just some simple mistakes. That just might be
Javacoming into play versus your
C++knowledge. You've never done MapReduce, have you? Pseudo Gyroscope: I've never written the MapReduce from scratch, no I haven't. Indelible Raven: Okay, or using the library? Pseudo Gyroscope: Like a long time ago, so I don't remember too much. Indelible Raven: MapReduces don't work that way. I kind of expect someone with a mid-level experience to know how to use MapReduce, how to you say: okay I have 500 petabytes of data, clearly this isn't gonna work on one machine. So let's investigate a MapReduce or some sort of distributed system in general. You kind of ignored that until I brought it up. So I would practice that for sure, do like a local Hadoop thing or see if there's an online IDE that can help you with Hadoop or some other framework. You're in Java now, you should take a look at a Google's Flume framework for MapReduces. It's pretty easy and fun to use. So that's kind of a hit, that kind of hits your problem solving as well because you used the basic stuff, but I wanted to see if with a big change like, way out of memory, what you would do. Also, yeah I kept saying petabytes. I don't know if you just misheard or not... yeah definitely terabytes. What were you gonna say? Pseudo Gyroscope: Yeah, I was just saying that I guess this is like the first time that yeah, I guess it's kind of like a turn code into design, where usually it's just algorithmic, so it's basically expanding the problems, it's like a second part of the problem, so it was kind of unexpected. Indelible Raven: Yep, you're at a level where it's kind of expected now. I would have loved to see you clean up your code as well. There's no reason you had to redefine what distance was each time, you could have just created a function for it. Overall, you did okay. I would not have passed you, but you weren't bad either. That's kind of my feedback, do you have any questions? Pseudo Gyroscope: So yeah, definitely the... I'm just really starting to look into design stuff, do you have any suggestions ahead of how best to prepare for a designing? Indelible Raven: System design... I have a YouTube channel that I saw recently. That... let me see if I can find it real quick. There's a YouTube channel that I was looking at for... There's some YouTube channels when I was looking at like Facebook interviews, that I started looking through system designs. It's really helpful, I don't remember where it is, so it might be worth you looking for. I don't really have any book recommendations or anything like that, it's kind of more of a let's step out and let's look at the whole process on how to build the whole thing versus let's look at one project, so you're basically taking one big project and splitting it down to individual projects and how you would want those to look and how they work with microservices and clustering and stuff like that. There's some good videos on that. You can just search YouTube and find some. Pseudo Gyroscope: I didn't catch that, what was the the library you mentioned for MapReduce? Indelible Raven: Flume. That's for Java. For like C++ and other stuff is a Hadoop. Pseudo Gyroscope: So when you interview for your company, basically the bar to pass is knowing the either Hadoop or Flume and being able to write out the MapReduce part. Indelible Raven: Well that's not everyone's barrier. I do expect someone as a mid-level engineer to know how to do a MapReduce, but that's not all. I can ask another question, I would have asked... One way I could have taken it would do it in over one time. Create a function where the input type or you do pre calculations where you return it all right now or in the future. Another one would be if I give you infinite number of data. It's a way for you to... I'm echoing over there. Your level, I kind of expect you to be able to tackle those at least somewhat, to be able to start think about problem-solving skills with those. Pseudo Gyroscope: Ok, what was the
O(1)time? Indelible Raven: That, you could do some sort of pre calculation to almost return... or return probably with some sort of accuracy threshold in the 95 percent plus, the K nearest neighbors in
O(1)time. Pseudo Gyroscope: Ok. Indelible Raven: Literally you just do pre calculation and then you just return whatever it's in, so... Alright, I gotta jump off to write your feedback real quick. It was a pleasure, you should be getting in probably like five to ten minutes. Feel free to leave me feedback as well. Pseudo Gyroscope: Yup, thank you so much. Indelible Raven: Bye.