Indelible Raven, a Microsoft engineer, interviewed Massively Parallel Llama in Go

Summary

Problem type

Vertex distance order statistic

Question(s) asked

Given a list of points as [x,y] pairs; a vertex in [x,y] form; and an integer k, return the kth closest points in terms of euclidean distance

Feedback

Feedback about Massively Parallel Llama (the interviewee)

Advance this person to the next round?

Yes

How were their technical skills?

3/4

How was their problem solving ability?

3/4

What about their communication ability?

3/4

I think you did well. The two biggest things I saw were that you need to think about where your code errors out and write edge cases for those and you need to be a little bit more fluid in how you interview. That way it speeds things up a little bit more. I would have also liked to see your mindset change to a more distributed approach when we switched to using petabytes of points instead. Overall I probably would have recommended a hire decision. I wish you the best of luck in your future interviews.

Feedback about Indelible Raven (the interviewer)

Would you want to work with this person?

Yes

How excited would you be to work with them?

4/4

How good were the questions?

4/4

How helpful was your interviewer in guiding you to the solution(s)?

4/4

Transcript

Indelible Raven: Can you hear me?
Massively Parallel Llama: Hi I can hear you, I'm on the phone.
Indelible Raven: Awesome, yeah it's been having problems all day, for a while. So have you done this before?
Massively Parallel Llama: I have done one before, yes.
Indelible Raven: On here?
Massively Parallel Llama: On here, yeah.
Indelible Raven: Awesome, I do it just a little bit different. I do the same interview, like 35 minutes. But also at the end I can give you some feedback verbally as well. I do that with most people. So if you want to hear it just let me know. Other than that, let's see where you're at in your career.
Massively Parallel Llama: In my career I'm about 2 or 3 years in experience.
Indelible Raven: Alright, so I used to work at Google, but I don't know

`Go`

, and I feel bad for that.
Massively Parallel Llama: That's okay, it's a language you'll be able to understand.
Indelible Raven: No it's just like: if you work at Google, you really should know `Go`

. It's like their language. Awesome well I'll just start you out with something, and ideally we'll get into a second part of the same question, so just keep that in mind.
[2:11]
Indelible Raven: I'm going to give you a bunch of points on a 2 dimensional graph. It can either give its own class or do a pair. So it would just be `[x,y]`

. It would be a list. `Point[]`

of `points`

, where it's something like `[[1,2], [3,-1], [2,1], [2,3]]`

, right? And then I'll give you a point, the `vertex`

, so in this case it's not going to be `[0,0]`

, it's going to be `[2,2]`

or anything I give. And I'm going to give you int `k = 2`

. So what I want you to do is take the list of points on the graph and find the closest ones that are around the vertex and I want you to return the `k`

closest.
Massively Parallel Llama: Okay so let me just reiterate the problem so I have an understanding of it. You're going to give me a list of `[x,y]`

coordinates and you'll give me a vertex, some ideal `[x,y]`

coordinate and you need me to return the `k`

closest `[x,y]`

coordinate in the original `point[]`

.
Indelible Raven: Yes, to the vertex
Massively Parallel Llama: Okay, what would you define as the closet?
Indelible Raven: Distance wise, so kind of the general euclidean distance.
Massively Parallel Llama: Cool. And these points, they're not in any sorted order are they?
Indelible Raven: No.
Massively Parallel Llama: No. Okay. So, my first intuition to this problem is to go over the `[x,y]`

coordinates and for every single `[x,y]`

coordinate compute the `distance`

to the `vertex`

and store that in some sort of...via `min heap`

. And then once we have all of the elements stored in the `min heap`

, we can pop out the top `k`

from the `heap`

and that will be the answer.
Indelible Raven: Okay.
Massively Parallel Llama: Does that make sense?
Indelible Raven: Yeah that sounds reasonable.
[6:19]
Massively Parallel Llama: So let me try to code this up. At the same time, before I start, maybe we can just discuss time complexity. Before I start coding maybe I'll think of something better? So the first step is to compute all `distances`

to the `vertex`

and that is going to take `O(n)`

because we have to visit every single element and once we computed, we're pushing it onto the `heap`

. Now when you insert something into a `heap`

, we need to make sure that the `heap`

....so inserting into a `heap`

is actually constant, we don't need to worry about that yet. And then we need to perform `k`

pops from the `heap`

, and when you do a pop from the `heap`

we need to figure out the next minimum element from the `heap`

, so that's going to be `O(k)`

, because we need to pop `k`

times and the bubble up operation can be performed in `O(klog(n))`

because `heap`

has `n`

elements. Yeah so that's one solution so if we just look at the worst case it's still `O(n)`

, that's out dominant time complexity right there. And of course we'll have to use space, our space is going to be our `heap`

which is also `O(n)`

. Yeah so we could have another solution if we didn't have access to space, we just had to do it without using any extra memory, you could just sort of do a `O(n)`

....that would be `O(n^2)`

solution or….I can't think of a way where….
Indelible Raven: So you're worried about space right?
Massively Parallel Llama: Yes.
Indelible Raven: How much space are you using?
Massively Parallel Llama: `O(n)`

Indelible Raven: Right, so how many points do I want to return?
Massively Parallel Llama: Oh yes, so it could just be `k`

. We need to only keep a `heap`

of size `k`

, everything else larger than `k`

we can just throw out.
Indelible Raven: Yeah, so how does that change the complexity overall?
Massively Parallel Llama: Space complexity goes to `O(k)`

. And time complexity becomes `O(klog(k))`

for the pop.
Indelible Raven: I'm not entirely sure your complexity is right. It is for pop, but not the push. Every time you insert it needs to keep track of something which means it re-heapifies, that's `O(log(n))`

so it should be `O(nlog(k))`

.
Massively Parallel Llama: Right, so if you already have a heap and you're pushing onto a heap you have to figure out what the next minimum element is so you would have to heapify. So yes this would also be `O(klog(k))`

. Every time you're inserting or removing from the heap you have to find the next minimum so it would both be `O(klog(k))`

.
Indelible Raven: No it's actually `O(log(k))`

but's it's `n`

times that.
Massively Parallel Llama: Right, so for the push right, the push will be...we're going to do that `n`

times and it's `O(log(k))`

right. But for pop we don't actually need to do anything here because we already have the `k`

minimum elements and we can just return so this just goes away. So if that's the case our dominant time complexity will be the push.
Indelible Raven: Try it out
[12:24]
Massively Parallel Llama: Sure let's try it out. Okay so let's call this...for the `points`

I'm just going to define a structure...it's going to have 2 elements and our input is going to be `Points`

and we're also going to be given…
Indelible Raven: Wait really, it works right to left in this case? That's weird
Massively Parallel Llama: Yeah, it takes a while. And let's say we're going to return a list of a `[]points`

of of `k`

closest points. So I guess while I was checking this another obvious way of doing this was compute all distances and just sort and once you sort you have your `k`

closest element sorted already. I guess another clarification: would you like the original `point`

or do you just want the `distance`

?
Indelible Raven: I don't want the `vertex`

in the output but if there is a `point`

on the `vertex`

then that could be returned. So if that `[]point`

had `[2,2]`

in it then yes.
Massively Parallel Llama: Okay so just a special case if it's exactly the `vertex`

then you want to return the `point`

otherwise the `distances`

is fine?
Indelible Raven: No I want all the `points`

.
Massively Parallel Llama: Oh okay, you want all the `points`

.
Indelible Raven: Obviously a `point`

on the `vertex`

would be `0`

`distance`

so it would be the closest.
Massively Parallel Llama: No I was wondering if the return should just be list of `distances`

or the actual `points`

.
Indelible Raven: No, `points`

.
Massively Parallel Llama: Okay. So if that's the case I'm just going to add an extra element to our `point`

`struct`

and this could definitely be an option element that just has the `distance`

.
Indelible Raven: Actually I want you to leave the `point`

class alone.
Massively Parallel Llama: Okay let's just figure it out as we move forward then. How we can figure out which `point`

to return after we have our `heap`

. Alright let's just figure out how to compute the `distances`

first, we're going to go through all the `points`

. And for each `point`

we're going to want to calculate the `distance`

. So let's just assume we have a function that can do that for us. Once we have the `distance`

we want to push it on to some sort of `heap`

. We need some sort of `heap`

, in `Go`

, if you want to use a `heap`

you have to use a priority queue and you have to write out the interface for it, so is it okay if I assume that I have a `heap`

?
Indelible Raven: No, I want to see the priority queue. It's that way in most languages.
Massively Parallel Llama: You have to implement all of the interfaces for it.
Indelible Raven: What do you mean?
Massively Parallel Llama: The priority queue is offered as an interface and you have to implement the `pop`

and the `push`

based on the `struct`

and how you want to compare each element on the `heap`

.
Indelible Raven: That's interesting. Then assume you can, but come up with...I don't know if `Go`

has something like a comparative function, like a `lambda function`

for it.
Massively Parallel Llama: We don't really need a priority queue for this, a priority queue is more involved than a `heap`

. We could write one.
Indelible Raven: Just assume you have it.
Massively Parallel Llama: So we have to initialize it somehow, and once we have that we can actually give it a size. We only want `k`

elements in the `heap`

and over here we can just do `heap.push()`

Indelible Raven: No, I want you to handle the `k`

yourself.
Massively Parallel Llama: You want me to handle the `k`

myself?
Indelible Raven: Like I don't want it to limit the `k`

, I want you to do that.
Massively Parallel Llama: Okay. So if that's the case we need to figure out what the size of the `heap`

is, assuming we can get the size of the `heap`

by looking at the length of it. If this is the case then we need to find the element that's the largest in the `heap`

.
Indelible Raven: So is there a data structure that will let you find it in `O(1)`

?
Massively Parallel Llama: That will let you find the largest?
Indelible Raven: Data structure or algorithm.
Massively Parallel Llama: So let's assume our `heap`

is represented as an array. If it is, we can actually keep track of each index in a map.
Indelible Raven: I don't want to do it that way. Assume your `heap`

works with `push`

and `pop`

and stuff, think about what the `heap`

really is. What kind of `heap`

is this?
Massively Parallel Llama: It's a `min heap`

? Actually it doesn't have to be a `min heap`

, since we're only storing the `k`

element that we're going to return, it could actually be a `max heap`

and if it's a `max heap`

we can just find the largest element.
Indelible Raven: There you go.
Massively Parallel Llama: Yeah so if our `heap`

is a size `k`

, then what we need to do is a `heap.pop()`

and that will get rid the largest element and then we can just `push()`

….even if it is full and we turn directly to a `pop()`

. So let's assume we have some sort of `peek()`

function so we can see what the largest element is and if its larger than our current `distance`

, then only we need to do a `heap.pop()`

and we need to `push()`

our new element. So that's sort of our special case and if it's not then we don't have to worry about the `dist`

, we can just continue and we just need to have an `else`

here to take care of the case when our `heap`

isn't full yet to go ahead and just push.
Indelible Raven: So your `heap`

is just `distances`

, right?
Massively Parallel Llama: Right now yes, it's just the `distances`

.
Indelible Raven: So how are you going to get the `k`

closest out of that?
Massively Parallel Llama: Right so I think it's best to define the new object or `struct`

`heapNode`

, and in this `heapNode`

we can actually have a `Dist`

and we can have a `Point`

and for our comparative function for our heap, we don't care about the point we're just worried about the `distance`

when we do the `pop()`

and `push()`

. Which means we just directly insert the `distance`

and we should put it in this `struct`

here. Okay so now that we have our `heap`

, we should have `k`

elements and I guess how you kind of write a `while`

loop in `Go`

, there's no `while`

loops so you just write a `for`

instead. So while our length of `heap`

is not equal to `0`

, we keep `pop()`

to our `ans`

, and storing the `point`

and returning it. And you mentioned a little bit earlier in the conversation that you want the `vertex`

to be the first element in our answer?
Indelible Raven: No, I literally just want the `k`

closest.
Massively Parallel Llama: Cool so we just append this to our list, and we don't care about the `distances`

anymore we just want the `point`

?
Indelible Raven: Just the `point`

.
Massively Parallel Llama: And after this is done we'll have the `k`

element and we can return our answer.
Indelible Raven: Okay, there's more to this but I want to stop here and I want to take this to a different approach. This obviously works for `k`

if I give you a list of points. What if my list is roughly a petabyte worth of `points`

?
Massively Parallel Llama: So our list of `points`

is huge? If that's the case can we assume that the list of `points`

coming in to program or function is element by element and not just as a giant list?
Indelible Raven: You could. That would work.
Massively Parallel Llama: That's a big assumption right. But this function would potentially work.
Indelible Raven: I mean it generally would work, but it's a petabyte of points. I know `Go`

is suppose to be fast but how long would that take?
Massively Parallel Llama: So if its a petabyte of `points`

just computing of every `distance`

in our list is going to take some time. So if our `point`

array is too big, we can just throw more resources at it and divide up the `point[]`

to a more manageable chunks and send it to multiple machines I guess. And after they've all figured out their `k`

lowest, we can do a `merge`

operation to find out the `k`

lowest afterwards.
Indelible Raven: Sorry could you repeat that?
Massively Parallel Llama: Yeah so my first thought is if it's too many `points`

we divide up the `points`

to manageable chunks and we divide up the work to multiple machines that hey here's a set of `points`

, find the `k`

closest and once every machine has done its work we have a `merge`

operation to find what the actual `k`

closest is at the end. That's my initial thought process if the `points[]`

is too big.
Indelible Raven: I mean that makes sense but what is that?
Massively Parallel Llama: What is that? Like just a term for it? It kind of reminds of `MapReduce`

a little bit?
Indelible Raven: There you go, `MapReduce`

.
Massively Parallel Llama: Yeah you have a bunch of workers that are parsing various chunks and combining it afterwards once their done.
Indelible Raven: So that's kind of what I'm looking for and how you would set up the `MapReduce`

and stuff like that.
[34:30]
Indelible Raven: Let's stop here that way I can give you your feedback if you want.
Massively Parallel Llama: Sure sure.
Indelible Raven: Usually I evaluate people's ability to code, especially when it comes to style guide. But truth be told I still don't really know how `Go`

works or what style guide would look like. But based on general style guide it looks good. Still confusing but….
Massively Parallel Llama: Just to touch on `Go`

a bit. `Go`

actually forces you to follow style. If you look at every codebase it'll look identical no sort of tabs versus spaces. So for example if I had line 48 on line 49 this actually won't compile. So it has to look like this, `Go`

sort of forces the same sort of standards on everybody.
Indelible Raven: I really wish you got those semicolons, drives me nuts that there aren't semicolons.
Massively Parallel Llama: I think in Go they are optional, you can use them if you want.
Indelible Raven: I really wish you had your edge cases. Say hey, what happens when `k`

is less than `0`

or if `k`

is greater than the number of `points`

. And actually wrote out test cases and error checking in the code. That's something I look for. Your algorithm design is pretty good, you're able to come up with a semi-optimal solution and I gave you a very very subtle hint and you were able to come up with a much more optimal solution as a result. I would have liked to see you go faster, time is not your solution in an interview. Ideally when we come up with these questions we plan for us to solve it in 15 minutes and since we know these questions it's going to take longer for the person but I still would have liked to see if go a little bit faster that way we could spend a little more time talking about `MapReduce`

and what the `Map`

would look like and how you would Reduce it. So I used to work at Google and I work at Microsoft. There's a different mentally when you work with an excessive amount of data. You don't think about how it works in one program you don't think about how it works linearly. Your brain automatically switches to: how do I spread this out to thousands of machines. I wish you went there first, instead of saying: hey we're just going to use a datastream and it'll work perfectly fine. When I switch to that, when I come up with something new the obvious answer is not the right answer. It may use some of the same code but it's not going to use the exact same code, so your first thought should not have been that. I kind of wish you started talking about `MapReduce`

instead of the general concept of one. It kind of gives me the impression that you never really use one right?
Massively Parallel Llama: I honestly haven't.
Indelible Raven: The reason I bring this up is, at the two companies I worked for, it's big there. If you're interviewing for me there, I kind of expect that. If you're not interviewing there then that feedback is kind of pointless but take it as your want. Overall I think you did well, I would have liked to see you go faster and talk a little more but I think you did well. Took you a little while to figure out to use a `max heap`

for sure. It's really all I have, you did pretty good. Do you have any questions?
Massively Parallel Llama: In terms of coding fluency, I need to just solve more problems and get faster before I actually go into the real interviews.
Indelible Raven: Not even faster the speed at which you're coding is fine. You need to come up with a way to make it fluid, make one step into the next into next to the end. I know that doesn't make sense in real life coding, but it has to be a very fluid operation and if it is, you're going to go fast. You don't need to speed up how you're coding, you just need to speed up your fluidity.
Massively Parallel Llama: Yeah that definitely makes sense.
Indelible Raven: Yeah so you talked a little about the scale. Do you think, when we started off with the problem, maybe I should have asked you how big the points array could be?
Massively Parallel Llama: I would have told you it would have fit in an array. I wanted this answer first because this teaches me how well you code. `MapReduce`

and this are two different things. So you would have still ended up with this but then I would have shifted to `MapReduce`

. What I wanted to see was your mindset shift once you knew what the data was. It's always good to ask how much data there is, but I would have not told you a petabyte at the beginning.
Indelible Raven: Yeah I don't think I have anything. This was great. This was a great experience looking back.
Massively Parallel Llama: Thanks for interviewing with me. I think you did well. I would have definitely moved you on-site, I might have even said yes to a hire. So I have feedback, some is negative but that's just other stuff you can improve on.
Indelible Raven: I'm going to jump off, write your feedback. It's probably not going to be much because you did well. Just going to hit on the speed part. That's all I have so I wish you the best in the future and good luck!
Massively Parallel Llama: Yeah thanks man, thanks a lot. Bye
Indelible Raven: Bye.