We helped write the sequel to "Cracking the Coding Interview". Read 9 chapters for free

interviewing.io

Please read our definitive guide on Microsoft's hiring process and questions

K Closest Points to Origin (Python)

Watch someone solve the k closest points Python problem in an interview with a Microsoft engineer and see the feedback their interviewer left them. Explore this problem and others in our library of interview replays.

Interview Summary

Problem type

Vertex distance order statistic

Interview question

1) Given a list of points as [x,y] pairs; a vertex in [x,y] form; and an integer k, return the kth closest points in terms of Euclidean distance 2) Assuming that the dataset is too big to store in memory, rewrite functions for distributed system

Read more about the questions

K Closest Points To Origin

Interview Feedback

Feedback about Mythic Borogove (the interviewee)

Advance this person to the next round?

Yes

How were their technical skills?

4/4

How was their problem solving ability?

4/4

What about their communication ability?

4/4

It was a pleasure interviewing with you. Tbh, I don't have any solid feedback to give you. You did phenomenal in this interview. You were really quick, clarified all edge cases in the beginning. Started with the simple solution and optimized with (with a little hint). Then when changing the requirement to use a map-reduce you nailed it right away. The only thing I would suggest is that not everyone will know the short cuts in python (like zip or list comprehension in general), so occasionally you may need to be able to write that out yourself. Overall though, thanks for a wonderful interview. I wish you the best of luck in your future interviews. On a side note, I would recommend asking for a system design question during your next interview.

Feedback about Indelible Raven (the interviewer)

Would you want to work with this person?

Yes

How excited would you be to work with them?

3/4

How good were the questions?

3/4

How helpful was your interviewer in guiding you to the solution(s)?

4/4

Problem was well explained, hints were dispensed at good intervals, and interviewer had patience to let me think through things. There was also a good progression from easy to harder, which allows the question to build complexity over time. One thing which will help is to develop more familiarity with the problem, like knowing distance equation for N-dimensional distances.

Interview Transcript

Indelible Raven: Hi.

Mythic Borogove: Hey.

Indelible Raven: How's it going?

Mythic Borogove: Pretty good, how are you?

Indelible Raven: I'm doing alright, it's the weekend finally, so.

Mythic Borogove: Yeah.

Indelible Raven: Um, okay. So have you used this platform before?

Mythic Borogove: Once, yeah.

Indelible Raven: Okay, so you kind of know how it works?

Mythic Borogove: Yup.

Indelible Raven: Alright, so for me, I generally follow the Google/Microsoft style of interviews. So usually it's 5 minutes of intro, 35 minutes of technical, 5 minutes for Q&A. Since the intro is all anonymous, we just usually skip right to the technical. But I do have a couple questions. What is your primary coding language? Or at least what you want to code in for the interview. And what level are you at roughly?

Mythic Borogove: Python, and I'm already in the loop and have interviews scheduled for Facebook and Google coming up in a couple of weeks.

Indelible Raven: Awesome, so as an intern? Or is it full time out of college?

Mythic Borogove: Full time, I have five years of experience.

Indelible Raven: Well, best of luck in those, I might be joining you at the Facebook one here soon. Let's see, do you have any quick questions before we start?

Mythic Borogove: No, I guess let's just go ahead.

Indelible Raven: Awesome, so I'm going to start you out with... Oh we switched to 2, that's why. I'm going to start you out with giving you a list of (x, y) coordinates. And then I'm going to give you... This is a 2-D plane, it could be 3-D, 4-D, so on. I'm going to give you a vertex. So let's say, Point vertex, which will be the center point. It doesn't have to be (0, 0), let's keep it like (2, 2) right here. Then I'm going to give you a k value, which will be the number of points I want you to return, so let's say 2 right? I want you to create a function that finds the closest points around the vertex that's in the point list, based on distance.

Mythic Borogove: Mhm. Okay so in this case, vertex is (2, 2), and k=2 indicates the distance, so we want to find what? Cartesian distance?

Indelible Raven: Yeah, just do like (x1 - x2) squared plus (y1 - y2) squared.

Mythic Borogove: Okay, square root of all, right?

Indelible Raven: Yeah.

Mythic Borogove: Okay, yeah sure. Okay, so I assume all the states in memory are on one machine? And we're getting integer inputs, or at least numerical inputs?

Indelible Raven: They'll be numerical, yes. As far as being on one machine, for now, yes.

Mythic Borogove: Okay, great. Sounds good. So we're getting these two arguments. We have a vertex, too. So (vertex, points, max_distance), and then we're just going to through all the points...

Indelible Raven: Not max_distance, but how many points to return, so like k.

Mythic Borogove: Oh, okay. Got it. k is points to return. Okay cool. And we want to return the 2 closest?

Indelible Raven: I'm sorry, what was your last question?

Mythic Borogove: And we're returning the 2 closest?

Indelible Raven: Yes.

Mythic Borogove: Okay, got it. Alright. So let's see. The first approach that comes to my mind is we go through each point and then we compute the distance to vertex and then we pick the two closest, which we can do in a sort of running fashion. And then if we do it this way, our time complexity is we look over every single point, so we're at like O (P), P is the number of points. Oh I did points_to_return, so I need to sort the list too, right?

Indelible Raven: Yeah, but how are you going to find the smallest. It's not always going to be two.

Mythic Borogove: For sure. So it looks like, we if we have to sort it afterwards, then we're at O (n log n) for the sorting and the points, so O (n log n) total. So that's the first path that can work. As far as trying to reduce the work we do, I think maybe there's a way to not consider certain points if our point is is sorted. Is that something that we can assume or do the points come in in random order?

Indelible Raven: It's random order.

Mythic Borogove: Okay, so even if we sorted the points, we're back to the O (n log n), and then we do some divide and conquer - take off the stuff that's clearly out of range, but we're still back at that original issue. So I think this approach might be as good as we can do.

Indelible Raven: Yeah, let's start with that. Let's see what that looks like.

Mythic Borogove: Sure. Are we assuming that it could all be in arbitrary dimensions?

Indelible Raven: Yeah, it could be in any number of dimensions.

Mythic Borogove: I'll fill that in later. I want to get the distance between vertex and p for every point in points. And then, we want to return sorted_indexes. Okay, so we need the indexes to write. So we'll get that, and the ith index i for these. And then we can sort it.

Indelible Raven: C++ needs to do that where you have a for loop like that and just have a numeric.

Mythic Borogove: Oh, I see. Were you saying C++ or something?

Indelible Raven: It needs to do that, it needs to implement that.

Mythic Borogove: Oh, yeah. Maybe in C++ 20, I don't know.

Indelible Raven: Yeah, right.

Mythic Borogove: So, sorted_distances, and then we want the first up to points to return. And we might have to do this too to get it as a list. Okay, and then for getting the distance of arbitrary dimension, we said we would take the square root of...

Indelible Raven: Take the square root of the sum of the squares.

Mythic Borogove: Is that only for two dimensions? Or does that work for higher dimensions too?

Indelible Raven: I don't know. I assume it's higher.

Mythic Borogove: Okay, cool. Either way we have a metric, as long as we keep it consistent.

Indelible Raven: Yeah.

Mythic Borogove: Okay, so. We're summing all of the squares. I guess it would be a**2 + b**2 for (a, b) in zip(p1, p2). I hope that works.

Indelible Raven: What are you trying to do with that? I don't know what zip() is.

Mythic Borogove: Oh, I see. If you have two arrays of equal length, it matches them up. Can you see the console?

Indelible Raven: Yeah, I see it.

Mythic Borogove: If you zip() like (1, 2) and (3, 4), you expect (1, 3) and (2, 4) to be together, so like it takes the first and matches it with the second. So that's what I'm doing there. So by zipping, I basically have the first pair and the second pair together. And then when I say for (a, b), I unpack the tuple. And then I'm saying a**2 + b**2, so I'm computer (a1 + b1)**2 and then (a2 + b2)**2. And then I'm going to take the square root at the end. Is that correct?

Indelible Raven: Yup. That's an interesting way of doing it.

Mythic Borogove: Oh, yeah. This is clearly not optimized for probably execution time either. So we have a list of the squares, and then we want to sum them. And then at the very end, we're going to sqrt or something?

Indelible Raven: Don't know. I don't really know Python.

Mythic Borogove: Oh, okay. I guess that's a downside to having to interview all of these different people using different languages.

Indelible Raven: Most of the time we can tell, "Yeah, that looks right." zip() is just a new one for me, I've never seen that.

Mythic Borogove: Oh, I see. It's pretty common in like a lot of functional programming languages especially.

Indelible Raven: I'm sure they have that in Haskell.

Mythic Borogove: Oh, yeah. I'm almost certain Haskell has it, although I've never used that myself. Is that like 3**2, which equals `sqrt(~9 + 25) is 34. No, that's not right. 5 and 13??

Indelible Raven: Haha, okay. Maybe it's squaring each one inside. Because 1 + 3.

Mythic Borogove: Oh, you're right! Yeah, it must be quantity squared, right.

Indelible Raven: But that makes sense right? Because (x1 - x2)**2, oh okay.

Mythic Borogove: Yeah, is that right? Is it (a + b)**2 for each (x, y) in (a, b), right?

Indelible Raven: Pretty sure it's minus, let me look it up.

Mythic Borogove: I think that's what it is in two dimensions, in three I'm not really sure. It might be like cubed and then quantity cube root maybe?

Indelible Raven: Oh, so it is for any dimension, it's the sum for (qi - pi)**2.

Mythic Borogove: Oh squared, okay. Wow. Did you say it was quantity squared, or is it inside? Like a**2 + b**2?

Indelible Raven: No, it'd be like (x1 - y1)**2 + (x2 - y2)**2.

Mythic Borogove: Oh yeah, because it's like the difference. Then you square to get rid of any negatives, yeah that makes sense. And this is something that we can test too, just that function. Okay, and then moving on to back to within_distance(). Let's just try that out. These are the points. And we are trying to find the closest one. Aw man, it did something. Wait, it doesn't look right because it's returning... Oh, okay right, because the first element is the distance, and then second element is the index. And then what we want to return is the actual point, right? Or a list of the points. So, I guess it will be points[i] for _, i in sorted_dists. Great, so we have a list containing [1, 2]. If I bump this to two, it should return [2, 3]. Okay, it looks nice. Should I try[1, -3]?

Indelible Raven: Not sure, but what I mean is check for zero elements or negative one elements.

Mythic Borogove: For sure. So first of all, this isn't defended against if the dimensions are different. Should we do that?

Indelible Raven: No, let's just leave it here for this.

Mythic Borogove: And so we should defend against this being an empty array. If it says to return two things and it can't should it throw an exception, or try to return as much as it can.

Indelible Raven: It's up to you actually, either works for me.

Mythic Borogove: I think returning as much as it can seems easier. So in this case it would return an empty array, which is as much as it could return. In this case, if I added a bunch of stuff... Cool. Let me just make sure that sort is working right.

Indelible Raven: What about negative number points?

Mythic Borogove: That's a good question. So yeah. If I were to request -3 points. So in that case, I think I should throw an exception, because that seems like a misuse of this API, right? Currently, it does this weird thing.

Indelible Raven: Sounds about right.

Mythic Borogove: Yeah, should we do that?

Indelible Raven: Yeah.

Mythic Borogove: So you can ask for 0 points to return I guess. We'll raise...

Indelible Raven: Oh, that's what it is! Someone last night when I was interviewing, just did not know how to do exceptions in Python and I figured it'd be throw, but raise is it. It's been a while for Python with me.

Mythic Borogove: Sure. I love interviewing in Python because it's really concise.

Indelible Raven: Yeah, so I have a little bit of feedback on that at the end. Alright, so let's make it better. You were right that there was something better, but you didn't figure it out right away. Let's try to think that out.

Mythic Borogove: Okay... Maybe it would be easier if I reduced it to one dimension?

Indelible Raven: Well, no. As in to figure it out? Or the whole thing?

Mythic Borogove: Just to figure out the optimization. So in one dimension, there's like a shingle of points, and then there's a bunch of dots scattered around this number range. And we want to return the n closest things. But the input is not sorted at all. And sorting it would be primitively expensive. And these aren't whole numbers that are within a range, so we can't bucket some of them. So doing better would mean not looking at some of the points, right? I'm assuming.

Indelible Raven: Yeah, sure. But what if we just store like k elements instead of n elements in something or maybe just kept the closest k elements in temporary storage somehow.

Mythic Borogove: Let me think. So you're saying instead of... I'm wondering what that means because we're getting this list of points n long, right? And then we need to return k things. And if I could make a list of k elements, or some collection of k elements, but I would still have to go through every points, right?

Indelible Raven: Right, you're still looking at every point, but is there an algorithm where you can go like n log k instead?

Mythic Borogove: I see, yeah got it. So maybe we can maintain like a min heap of k. And then we go through and then every time we insert into k, k is pretty small, especially inserts would be log k, right? The same with removals as we pop off things at the end. And then, at the end of it, we just dump out the remaining I guess? If order doesn't matter especially.

Indelible Raven: Sure. Let's do that. Leave the code above though.

Mythic Borogove: Okay, we'll leave the code. So, I'll just make a copy of this guy then. So, a lot of it stays the same, except I get the distances and instead of sorting over the entire list of distances, I'm going to iterate through the list of distances and put them into a min_heap. So was it that? It's like import heap? heapq? Yeah.

Indelible Raven: Okay, like whenever I go into Python, I've never seen that before.

Mythic Borogove: Yeah, there's a joke that Python has everything. What is it called though, heap? Mind if I look it up real quick?

Indelible Raven: Yeah sure. Maybe a priority heap?

Mythic Borogove: Let's see. Oh my God. Interesting. What is this, is this just a bunch of functions?

Indelible Raven: Maybe. Should be like push() or something.

Mythic Borogove: Yeah, actually it might just be like a bunch of functions and you give it a list, because heaps are just arrays underlying. So like, `heap.push() is a function, and you give it a list and then you give it an item like 3. Yeah, it's not super object oriented, you know? Okay, so it just maintains the list for you in heap format. Alright, that's the most annoying thing that I have seen so far in Python.

Indelible Raven: Right?

Mythic Borogove: Yeah. Alright, well this is the path that I've chosen. So we'll instantiate our heap here with this list. And it's going to be k size. And so what we'll do is for distance, idx in distances.

Indelible Raven: Do you really want to generate a whole new list of space?

Mythic Borogove: On which line, 35?

Indelible Raven: 32. Why not just insert the distances as you go?

Mythic Borogove: Oh, yeah that's true. We can do that. I mean, if we're getting to this level of optimization, it's going to be faster to just not do this in Python in the first place kind of thing.

Indelible Raven: Yeah, true. But you picked it.

Mythic Borogove: Yeah, so. I mean, maybe I should have asked what the thing we were optimizing for was? It doesn't change the time complexity. I can do it if it makes the code easier.

Indelible Raven: Well not really easier, but let's say we have 4.5 GB of data and `8 GB of memory. Calculating the distance will be a lot.

Mythic Borogove: We'll use a Lazily Evaluated Generator, so instead of actually materializing this into a list, it becomes a generator. Think of it like an iterator, like in Java. It has a next(). And then when you say for something in that, it can materialize that list.

Indelible Raven: Cool. That works then.

Mythic Borogove: Great, so we're here then. We're going to look at each distance and index. And then we're going to add it to the heap, and possible pop something off if it's too big. So should we pop() first or should we add() first? I guess it doesn't matter in this case. So, let's add first. So we say heapq.heappush(). We give h and then we give the element, which is (distance, idx). I'm hoping the heap will handle tuples and just use the acronyms in there. Okay, cool. So heappush() and then if len(heapq). Actually, I wonder if that doesn't work anymore.

Indelible Raven: I think you should type it, right?

Mythic Borogove: Oh yeah, I was just making sure heapq had a length function. if len(heapq) > k, let's go ahead and popheap(). Wait, we need to remove not the smallest one, but the biggest one.

Indelible Raven: Yup. Do you still hear me?

Mythic Borogove: I hear you, yeah.

Indelible Raven: Okay, I moved my mic a little bit. My cat decided she wanted to talk into the mic apparently.

Mythic Borogove: Oh, okay, no worries. Cats will just do that all the time. Is there a way to pop() the biggest one? Is there a way to do that...

Indelible Raven: Can you maintain some sort of max heap, where the top element is the max?

Mythic Borogove: Um, let me see. I'm not sure the order that it uses in Python. I've like literally never used this class.

Indelible Raven: Are you looking it up, or...?

Mythic Borogove: Yeah, I'm looking it up. Interesting, it's called `heapq.heappushpop(). Ooh. Okay. Is it heappushpop()? Oh, interesting. That's disappointing. So there's a method called heappushpop. So basically if it's less than the points to return, we can insert it directly like that. But if it's greater, then it becomes heappushpop.

Indelible Raven: Interesting lesson. Okay, let's stop on this part. I'm going to switch it up a little bit. Instead of doing it in memory, let's change it to 500 Petabytes of points.

Mythic Borogove: Okay, so. Let's say we have it distributed on a bunch of machines, using like an HTFS, and it looks like we need some sort of distributed computing. And it looks like in this case, the part where we say distances on line 16, that get_distance thing is massively parallelizable, like obviously parallelizable. So if we do something like mapreduce right? We compute the distances in our mapper step. And then we emit a tuple of (distance, index), and then in our reducer phase, we... Yeah, so it would end up looking like line 12 basically if we use something like mapreduce because we would run line 16 in the distributed in the mapper phase, and then hadoop would run merge sort for us on the entire data. And then we would return the smallest, like the head of the file basically in our reducer. So, how should we do our reducers? Normally in hadoop, the reducers get data by hashing the key, so it ensures something like even distribution, but we don't want that.

Indelible Raven: Sure.

Mythic Borogove: Maybe what would happen is, we could do that. But then each reducer pairs it down to k points, which is kind of the safest thing to do. But we have k points for every reducer, and then we run the process again. We iteratively pair down until we get like k points total.

Indelible Raven: Makes sense.

Mythic Borogove: I'm trying to think if there's a better way to do this. Because you know, we don't want to generate a bunch of stuff, especially if k is small. Yeah, I mean that would be one approach. I wonder if...

Indelible Raven: Well... Can you write out map and reduce functions just to see what it would look like?

Mythic Borogove: Sure. So the map function, given the vertex and some points, would compute the distances as before, except it would be like for point in points. You would be adding to this list like distance, and the point itself, so get_distance() and then point. Alright, so in our mapper stage, we simply return back, and then we get like distance is 4 and point is like (3, 2) or something like that. Put one on the entry. And then for reduce, we're going to be given the output of mapper_output, which is going to be this list of the mapper output. And then, what we want to do is... And k I guess... And then we'll just return... I think. This might be where mapreduce might be sort of limiting, because if we do this, we will get... Best case we'll get our answer, but worst case we'll get k kinds of reducers basically.

Indelible Raven: Yeah.

Mythic Borogove: Alternatively, we can do... Well, we can do a single reducer, but it will basically never complete. So, I don't know if in hadoop you can have tiered reducer stages.

Indelible Raven: I don't think it does yet. Google's version, I used to work at Google, is just getting to that point now, till where it pairs, maps, and reduces. I don't think hadoop has it yet.

Mythic Borogove: Okay, yeah. What about spark? I know spark has a more iterative computational model. Maybe we can model this easier in spark?

Indelible Raven: Let's not. Let's just stop here. It' been nearly 35minutes, and I have everything I need to do. Sound good though? I kinda want to hear your feedback of yourself first. How do you think you did?

Mythic Borogove: Let's see. I think I might have jumped in a little bit too soon, just because I produced a suboptimal algorithm initially. I probably should have stepped back and come up with the heap solution. That's sort of my big one.

Indelible Raven: So, do you want to hear my feedback now? Or do you just want to hear it later?

Mythic Borogove: Yeah, I'll hear it now.

Indelible Raven: Okay. I'm not sure I agree with that. Because, the idea is that you're supposed to come up with a fast quick solution, and then optimize on it. Before it, you clarified everything, you got all the edge cases. I wish you would have started by saying, "Hey, here are the edge cases, let's put in errors." And then if there were error points, you could have just skipped everything, and just returned null. If there was more k, just return the current point list. Or k > n, right? And then gone into it. One thing I wasn't a big fan of: Python list referencing is really cool, I really like it. And like the zip() function is really neat and I was impressed and stuff. You're going to Facebook, you're going to the other places may not get it. I don't think we get it all. So it might be easier to simply do an if statement for some of them. List coverage is fairly simple to know, probably so not something that you would need to. 8I would have definitely expanded to a for and if statement and stuff like that. Not ifs, but yeah. Definitely for loop instead. It took you a while to get the heap, but as soon as I asked about an algorithm that might be n log k, you got it right away. Your code looks good, you can tell you're fairly experienced. You follow some sort of style. Yeah, and then you definitely knew about mapreduce, in the very beginning you considered if you needed a mapreduce, or if you could do it in memory. Overall, I'm impressed, you did really well. You're very knowledgeable of everything, you took a hard problem, that actually surprisingly enough has not been answered correctly, the mapreduce part, until you now.

Mythic Borogove: I used it when I first started my career, so... I used to work on hadoop.

Indelible Raven: Yeah, I'm coming from Google, so you change your entire mentality to go from doing stuff linearly to assuming it will be done in a mapreduce. If you get the job at Facebook, you're probably going to have to do the same to where everything you write is mapreduce. Which is cool, it's a fun thing to do. Probably not the feedback you wanted to hear, because I don't really have much criticism, like you did really well. I would have definitely passed you. And probably fought for you to get the job.

Mythic Borogove: Nice, I'm super glad to hear that! Yeah, I'm feel like I'm going to keep the studying up.

Indelible Raven: In Facebook and what was the other company?

Mythic Borogove: Google actually.

Indelible Raven: Google and Facebook itself, I didn't do systems design, so I don't know how you deal with that. If you do another interview, I recommend asking to do systems design. And I'll make a recommendation to do that just to see where you stand with that. But overall, I think you'll do really well. Do you have any questions?

Mythic Borogove: I guess, just out of curiosity, what do you get out of doing Interviewing.io as an interviewer?

Indelible Raven: ... Giving back to the next class of software engineers. I kind of came from a weird background... I shouldn't be where I am. But because I am where I am, I want to help other coders and be kind of...

Mythic Borogove: Yeah, no that's super awesome. I'm really appreciative that you take the time out of your day to help me learn on the platform.

Indelible Raven: It's my pleasure. Hopefully one of these days we'll meet each other at Facebook.

Mythic Borogove: Yeah, that'd be great.

Indelible Raven: Do you have any other questions?

Mythic Borogove: Nope, I think that's it. Thanks a lot again for the interview.

Indelible Raven: I do have one more piece of feedback. I would try to apply for...

Mythic Borogove: Oh, whoops. I think I need to use headphones then.

Indelible Raven: Yeah.

Mythic Borogove: Thanks for the feedback.

Indelible Raven: Anyway, I wish you the best of luck, I'll fill out some feedback, but it's not going to be much. You did really well. I don't really have much to give you to get better, since you did phenomenal.

Mythic Borogove: Haha, the experience and practice is just tremendously helpful I think. Thanks a lot. Have a great rest of the weekend.

Indelible Raven: You too, thanks!

Mythic Borogove: Alright bye.

Indelible Raven: Bye.

Unique ID generation

Microsoft Interviewer

Unique ID generation

Invincible Cloud, a Microsoft engineer, interviewed Golden Possum

Watch interview

Order statistic of an unsorted array

Google Interviewer

Order statistic of an unsorted array

Intergalactic Avenger, a Google engineer, interviewed Supersonic Taco in Java

Watch interview

Most frequent integer and pairs of integers sum

Google Interviewer

Most frequent integer and pairs of integers sum

Paisley Wallaby, a Google engineer, interviewed Propitious Bear in Java

Watch interview

Triplet Array

Google Interviewer

Rocket Wind, a Google engineer, interviewed Whirlwind Alligator in C#

Watch interview

See more interviews

We know exactly what to do and say to get the company, title, and salary you want.

Interview prep and job hunting are chaos and pain. We can help. Really.

App screenshot