Indelible Raven: Can you hear me?
Massively Parallel Llama: Hi I can hear you, I'm on the phone.
Indelible Raven: Awesome, yeah it's been having problems all day, for a while. So have you done this before?
Massively Parallel Llama: I have done one before, yes.
Indelible Raven: On here?
Massively Parallel Llama: On here, yeah.
Indelible Raven: Awesome, I do it just a little bit different. I do the same interview, like 35 minutes. But also at the end I can give you some feedback verbally as well. I do that with most people. So if you want to hear it just let me know. Other than that, let's see where you're at in your career.
Massively Parallel Llama: In my career I'm about 2 or 3 years in experience.
Indelible Raven: Alright, so I used to work at Google, but I don't know Go, and I feel bad for that.
Massively Parallel Llama: That's okay, it's a language you'll be able to understand.
Indelible Raven: No it's just like: if you work at Google, you really should know Go. It's like their language. Awesome well I'll just start you out with something, and ideally we'll get into a second part of the same question, so just keep that in mind.
Massively Parallel Llama
Indelible Raven: I'm going to give you a bunch of points on a 2 dimensional graph. It can either give its own class or do a pair. So it would just be [x,y]. It would be a list. Point of points, where it's something like [[1,2], [3,-1], [2,1], [2,3]], right? And then I'll give you a point, the vertex, so in this case it's not going to be [0,0], it's going to be [2,2] or anything I give. And I'm going to give you int k = 2. So what I want you to do is take the list of points on the graph and find the closest ones that are around the vertex and I want you to return the k closest.
Massively Parallel Llama: Okay so let me just reiterate the problem so I have an understanding of it. You're going to give me a list of [x,y] coordinates and you'll give me a vertex, some ideal [x,y] coordinate and you need me to return the k closest [x,y] coordinate in the original point.
Indelible Raven: Yes, to the vertex
Massively Parallel Llama: Okay, what would you define as the closet?
Indelible Raven: Distance wise, so kind of the general Euclidean distance.
Massively Parallel Llama: Cool. And these points, they're not in any sorted order are they?
Indelible Raven: No.
Massively Parallel Llama: No. Okay. So, my first intuition to this problem is to go over the [x,y] coordinates and for every single [x,y] coordinate compute the distance to the vertex and store that in some sort of...via min heap. And then once we have all of the elements stored in the min heap, we can pop out the top k from the heap and that will be the answer.
Indelible Raven: Okay.
Massively Parallel Llama: Does that make sense?
Indelible Raven: Yeah that sounds reasonable.
Massively Parallel Llama
Massively Parallel Llama: So let me try to code this up. At the same time, before I start, maybe we can just discuss time complexity. Before I start coding maybe I'll think of something better? So the first step is to compute all distances to the vertex and that is going to take O(n) because we have to visit every single element and once we computed, we're pushing it onto the heap. Now when you insert something into a heap, we need to make sure that the heap....so inserting into a heap is actually constant, we don't need to worry about that yet. And then we need to perform k pops from the heap, and when you do a pop from the heap we need to figure out the next minimum element from the heap, so that's going to be O(k), because we need to pop k times and the bubble up operation can be performed in O(klog(n)) because heap has n elements. Yeah so that's one solution so if we just look at the worst case it's still O(n), that's out dominant time complexity right there. And of course we'll have to use space, our space is going to be our heap which is also O(n). Yeah so we could have another solution if we didn't have access to space, we just had to do it without using any extra memory, you could just sort of do a O(n)....that would be O(n^2) solution or….I can't think of a way where….
Indelible Raven: So you're worried about space right?
Massively Parallel Llama: Yes.
Indelible Raven: How much space are you using?
Massively Parallel Llama: O(n)
Indelible Raven: Right, so how many points do I want to return?
Massively Parallel Llama: Oh yes, so it could just be k. We need to only keep a heap of size k, everything else larger than k we can just throw out.
Indelible Raven: Yeah, so how does that change the complexity overall?
Massively Parallel Llama: Space complexity goes to O(k). And time complexity becomes O(klog(k)) for the pop.
Indelible Raven: I'm not entirely sure your complexity is right. It is for pop, but not the push. Every time you insert it needs to keep track of something which means it re-heapifies, that's O(log(n)) so it should be O(nlog(k)).
Massively Parallel Llama: Right, so if you already have a heap and you're pushing onto a heap you have to figure out what the next minimum element is so you would have to heapify. So yes this would also be O(klog(k)). Every time you're inserting or removing from the heap you have to find the next minimum so it would both be O(klog(k)).
Indelible Raven: No it's actually O(log(k)) but's it's n times that.
Massively Parallel Llama: Right, so for the push right, the push will be...we're going to do that n times and it's O(log(k)) right. But for pop we don't actually need to do anything here because we already have the k minimum elements and we can just return so this just goes away. So if that's the case our dominant time complexity will be the push.
Indelible Raven: Try it out
Massively Parallel Llama
Massively Parallel Llama: Sure let's try it out. Okay so let's call this...for the points I'm just going to define a structure...it's going to have 2 elements and our input is going to be Points and we're also going to be given…
Indelible Raven: Wait really, it works right to left in this case? That's weird
Massively Parallel Llama: Yeah, it takes a while. And let's say we're going to return a list of a points of of k closest points. So I guess while I was checking this another obvious way of doing this was compute all distances and just sort and once you sort you have your k closest element sorted already. I guess another clarification: would you like the original point or do you just want the distance?
Indelible Raven: I don't want the vertex in the output but if there is a point on the vertex then that could be returned. So if that point had [2,2] in it then yes.
Massively Parallel Llama: Okay so just a special case if it's exactly the vertex then you want to return the point otherwise the distances is fine?
Indelible Raven: No I want all the points.
Massively Parallel Llama: Oh okay, you want all the points.
Indelible Raven: Obviously a point on the vertex would be 0 distance so it would be the closest.
Massively Parallel Llama: No I was wondering if the return should just be list of distances or the actual points.
Indelible Raven: No, points.
Massively Parallel Llama: Okay. So if that's the case I'm just going to add an extra element to our point struct and this could definitely be an option element that just has the distance.
Indelible Raven: Actually I want you to leave the point class alone.
Massively Parallel Llama: Okay let's just figure it out as we move forward then. How we can figure out which point to return after we have our heap. Alright let's just figure out how to compute the distances first, we're going to go through all the points. And for each point we're going to want to calculate the distance. So let's just assume we have a function that can do that for us. Once we have the distance we want to push it on to some sort of heap. We need some sort of heap, in Go, if you want to use a heap you have to use a priority queue and you have to write out the interface for it, so is it okay if I assume that I have a heap?
Indelible Raven: No, I want to see the priority queue. It's that way in most languages.
Massively Parallel Llama: You have to implement all of the interfaces for it.
Indelible Raven: What do you mean?
Massively Parallel Llama: The priority queue is offered as an interface and you have to implement the pop and the push based on the struct and how you want to compare each element on the heap.
Indelible Raven: That's interesting. Then assume you can, but come up with...I don't know if Go has something like a comparative function, like a lambda function for it.
Massively Parallel Llama: We don't really need a priority queue for this, a priority queue is more involved than a heap. We could write one.
Indelible Raven: Just assume you have it.
Massively Parallel Llama: So we have to initialize it somehow, and once we have that we can actually give it a size. We only want k elements in the heap and over here we can just do heap.push()
Indelible Raven: No, I want you to handle the k yourself.
Massively Parallel Llama: You want me to handle the k myself?
Indelible Raven: Like I don't want it to limit the k, I want you to do that.
Massively Parallel Llama: Okay. So if that's the case we need to figure out what the size of the heap is, assuming we can get the size of the heap by looking at the length of it. If this is the case then we need to find the element that's the largest in the heap.
Indelible Raven: So is there a data structure that will let you find it in O(1)?
Massively Parallel Llama: That will let you find the largest?
Indelible Raven: Data structure or algorithm.
Massively Parallel Llama: So let's assume our heap is represented as an array. If it is, we can actually keep track of each index in a map.
Indelible Raven: I don't want to do it that way. Assume your heap works with push and pop and stuff, think about what the heap really is. What kind of heap is this?
Massively Parallel Llama: It's a min heap? Actually it doesn't have to be a min heap, since we're only storing the k element that we're going to return, it could actually be a max heap and if it's a max heap we can just find the largest element.
Indelible Raven: There you go.
Massively Parallel Llama: Yeah so if our heap is a size k, then what we need to do is a heap.pop() and that will get rid the largest element and then we can just push()….even if it is full and we turn directly to a pop(). So let's assume we have some sort of peek() function so we can see what the largest element is and if its larger than our current distance, then only we need to do a heap.pop() and we need to push() our new element. So that's sort of our special case and if it's not then we don't have to worry about the dist, we can just continue and we just need to have an else here to take care of the case when our heap isn't full yet to go ahead and just push.
Indelible Raven: So your heap is just distances, right?
Massively Parallel Llama: Right now yes, it's just the distances.
Indelible Raven: So how are you going to get the k closest out of that?
Massively Parallel Llama: Right so I think it's best to define the new object or struct heapNode, and in this heapNode we can actually have a Dist and we can have a Point and for our comparative function for our heap, we don't care about the point we're just worried about the distance when we do the pop() and push(). Which means we just directly insert the distance and we should put it in this struct here. Okay so now that we have our heap, we should have k elements and I guess how you kind of write a while loop in Go, there's no while loops so you just write a for instead. So while our length of heap is not equal to 0, we keep pop() to our ans, and storing the point and returning it. And you mentioned a little bit earlier in the conversation that you want the vertex to be the first element in our answer?
Indelible Raven: No, I literally just want the k closest.
Massively Parallel Llama: Cool so we just append this to our list, and we don't care about the distances anymore we just want the point?
Indelible Raven: Just the point.
Massively Parallel Llama: And after this is done we'll have the k element and we can return our answer.
Indelible Raven: Okay, there's more to this but I want to stop here and I want to take this to a different approach. This obviously works for k if I give you a list of points. What if my list is roughly a petabyte worth of points?
Massively Parallel Llama: So our list of points is huge? If that's the case can we assume that the list of points coming in to program or function is element by element and not just as a giant list?
Indelible Raven: You could. That would work.
Massively Parallel Llama: That's a big assumption right. But this function would potentially work.
Indelible Raven: I mean it generally would work, but it's a petabyte of points. I know Go is suppose to be fast but how long would that take?
Massively Parallel Llama: So if its a petabyte of points just computing of every distance in our list is going to take some time. So if our point array is too big, we can just throw more resources at it and divide up the point to a more manageable chunks and send it to multiple machines I guess. And after they've all figured out their k lowest, we can do a merge operation to find out the k lowest afterwards.
Indelible Raven: Sorry could you repeat that?
Massively Parallel Llama: Yeah so my first thought is if it's too many points we divide up the points to manageable chunks and we divide up the work to multiple machines that hey here's a set of points, find the k closest and once every machine has done its work we have a merge operation to find what the actual k closest is at the end. That's my initial thought process if the points is too big.
Indelible Raven: I mean that makes sense but what is that?
Massively Parallel Llama: What is that? Like just a term for it? It kind of reminds of MapReduce a little bit?
Indelible Raven: There you go, MapReduce.
Massively Parallel Llama: Yeah you have a bunch of workers that are parsing various chunks and combining it afterwards once their done.
Indelible Raven: So that's kind of what I'm looking for and how you would set up the MapReduce and stuff like that.
Massively Parallel Llama
Indelible Raven: Let's stop here that way I can give you your feedback if you want.
Massively Parallel Llama: Sure sure.
Indelible Raven: Usually I evaluate people's ability to code, especially when it comes to style guide. But truth be told I still don't really know how Go works or what style guide would look like. But based on general style guide it looks good. Still confusing but….
Massively Parallel Llama: Just to touch on Go a bit. Go actually forces you to follow style. If you look at every codebase it'll look identical no sort of tabs versus spaces. So for example if I had line 48 on line 49 this actually won't compile. So it has to look like this, Go sort of forces the same sort of standards on everybody.
Indelible Raven: I really wish you got those semicolons, drives me nuts that there aren't semicolons.
Massively Parallel Llama: I think in Go they are optional, you can use them if you want.
Indelible Raven: I really wish you had your edge cases. Say hey, what happens when k is less than 0 or if k is greater than the number of points. And actually wrote out test cases and error checking in the code. That's something I look for. Your algorithm design is pretty good, you're able to come up with a semi-optimal solution and I gave you a very very subtle hint and you were able to come up with a much more optimal solution as a result. I would have liked to see you go faster, time is not your solution in an interview. Ideally when we come up with these questions we plan for us to solve it in 15 minutes and since we know these questions it's going to take longer for the person but I still would have liked to see if go a little bit faster that way we could spend a little more time talking about MapReduce and what the Map would look like and how you would Reduce it. So I used to work at Google and I work at Microsoft. There's a different mentally when you work with an excessive amount of data. You don't think about how it works in one program you don't think about how it works linearly. Your brain automatically switches to: how do I spread this out to thousands of machines. I wish you went there first, instead of saying: hey we're just going to use a datastream and it'll work perfectly fine. When I switch to that, when I come up with something new the obvious answer is not the right answer. It may use some of the same code but it's not going to use the exact same code, so your first thought should not have been that. I kind of wish you started talking about MapReduce instead of the general concept of one. It kind of gives me the impression that you never really use one right?
Massively Parallel Llama: I honestly haven't.
Indelible Raven: The reason I bring this up is, at the two companies I worked for, it's big there. If you're interviewing for me there, I kind of expect that. If you're not interviewing there then that feedback is kind of pointless but take it as your want. Overall I think you did well, I would have liked to see you go faster and talk a little more but I think you did well. Took you a little while to figure out to use a max heap for sure. It's really all I have, you did pretty good. Do you have any questions?
Massively Parallel Llama: In terms of coding fluency, I need to just solve more problems and get faster before I actually go into the real interviews.
Indelible Raven: Not even faster the speed at which you're coding is fine. You need to come up with a way to make it fluid, make one step into the next into next to the end. I know that doesn't make sense in real life coding, but it has to be a very fluid operation and if it is, you're going to go fast. You don't need to speed up how you're coding, you just need to speed up your fluidity.
Massively Parallel Llama: Yeah that definitely makes sense.
Indelible Raven: Yeah so you talked a little about the scale. Do you think, when we started off with the problem, maybe I should have asked you how big the points array could be?
Massively Parallel Llama: I would have told you it would have fit in an array. I wanted this answer first because this teaches me how well you code. MapReduce and this are two different things. So you would have still ended up with this but then I would have shifted to MapReduce. What I wanted to see was your mindset shift once you knew what the data was. It's always good to ask how much data there is, but I would have not told you a petabyte at the beginning.
Indelible Raven: Yeah I don't think I have anything. This was great. This was a great experience looking back.
Massively Parallel Llama: Thanks for interviewing with me. I think you did well. I would have definitely moved you on-site, I might have even said yes to a hire. So I have feedback, some is negative but that's just other stuff you can improve on.
Indelible Raven: I'm going to jump off, write your feedback. It's probably not going to be much because you did well. Just going to hit on the speed part. That's all I have so I wish you the best in the future and good luck!
Massively Parallel Llama: Yeah thanks man, thanks a lot. Bye
Indelible Raven: Bye.