Interview Transcript
Immutable Automaton: Hello!
Professor Squirrel: Hi how's it going?
Immutable Automaton: Hi, good. All right. So uh, we did two questions last time, right.
Professor Squirrel: Sounds about right.
Immutable Automaton: Yeah. Just to confirm you are interviewing for WeMo.
Professor Squirrel: Actually, right now Google, and now it's fuchsia at Google.
Immutable Automaton: Sorry what at Google?
Professor Squirrel: Fuchsia . It's the projects operating system.
Immutable Automaton: Oh! Dot OS.
Professor Squirrel: Yeah, yeah.
Immutable Automaton: I see. Okay, cool. All right. So I believe last time you did really well, I'm looking at the previous feedback. Yeah. So today, I have a question for you. Would you like to do something? Do you have a specific type that you want to try? We did... What did we do last time?
Professor Squirrel: I'm Afraid I don't remember.
Immutable Automaton: Yes. We did the K smallest element from two arrays. And then we did whether your parking pass to traffic cones?
Professor Squirrel: Sure. Yeah, I remember that one. Okay.
Immutable Automaton: So any type, like dynamic programming, or graphs, trees?
Professor Squirrel: Anything really, graphs are always fun, but like basically anything, just sort of mix it up, I guess.
Immutable Automaton: Okay, let's try this one. And see how it goes. Okay, so let's say you have you have a website that allows users to find the 10 nearest restaurants to your location.
Professor Squirrel: Okay.
Immutable Automaton: Okay. Today, we're going to implement a back end piece that, that powers this. So given a list of 100 million restaurants, write a function that takes a user's location and returns the 10 nearest. So for the purpose of this question, 100 million will be n and k. So 10 will be K.
Professor Squirrel: Cool. And how are we representing these restaurants today have like some just unique identifier plus their location?
Immutable Automaton: Right now they are just coordinates. Restaurants are x and y's and x and y are bounded by negative one and one.
Professor Squirrel: Okay. I see. Okay.
Immutable Automaton: What language will you be using?
Professor Squirrel: Probably rust if that's okay.
Immutable Automaton: Okay. Alright rust.
Professor Squirrel: So I guess it's really the location, this restaurant location. And the other input is just the user's location. Right, 100 million restaurants for the K nearest restaurants? Umm Sure. For this, I think... So we've kind of abstracted with the details of this being restaurants and locations and like Euclidean distance. The basic concept is we want the 10 smallest values from 100 million values. Roughly, I mean, there's really we're going to take differences between the user location and the restaurant location, but we'll call that like the restaurants like value basically, it's the thing we're ordering by and trying to find the 10 smallest. We can sort the whole thing. I mean, it's, it's a pretty big thing to sort so we could do is a naive sort. Sorts will take n log n time to sort it and then describe the first k things guessing in general though we expecting k to be rather small. So, it would be preferable to do something like a left to right scan with a heap of size or a priority queue of size k. And then we can get it down to the log k factor and the memory also Yeah, I mean, depending on if this is in place, you need some extra memory up to like, O(n) memory, whereas here you can just say O(k) memory. So all in all, I think the heap is a lot more favourable in terms of performance. And that way also you can write it streaming. So even if n was like, larger than 100 million it didn't fit in memory or something, you could sort of also do something that took like disk IO or network IO, and adapt this code to run that way.
Immutable Automaton: Do you think that it can fit in memory?
Professor Squirrel: In this case, we're fine. 100 megs is not so much memory, or, you know, some factor that, but it's a fair bit of memory. So it depends on sort of, I mean, most devices, this will fit a memory is the short answer. But it's large ish. And yeah, so this is an approach I like I get the feeling is maybe something like more? No, no, yeah, this sounds good. Because there's this like, clever, Quick Select thing, which I've never quite looked into in detail, but that I think, applies to getting some single element of the rank k, okay. And if we try to get the 10 smallest, they won't actually help us because we'd have to run it. K times. So yeah, anyway, this this key thing, I think, is pretty promising. Do you want to go this?
Immutable Automaton: Yes. So this is, this is this is a good solution N log(k). But I have like to ask you, suppose you're in Tokyo, and you're looking for 10 nearest restaurants? Would you consider? Would you bother checking the distance between you and some restaurant in London?
Professor Squirrel: Probably not, probably not. So you're saying that heuristically, we can kind of maybe do better? Like the intentions by filtering, like, we have 100 million restaurants? It's quite a lot. So I'm not trying to say basically, heuristically, we can guess at sort of a threshold. And look only Well, I guess. Interesting. So so we're sorry, that was like, if we're thinking about this, in terms of designing a system that lasts a long time, we might have some better ways to represent this list of 100 million restaurants.
Immutable Automaton: That's correct.
Professor Squirrel: And then we can take advantage of that. Nothing springs to mind as far as how to do that. But you can kind of explore that if you're interested.
Immutable Automaton: Yes because you have to understand that you have to you only get the user's location during query time. But your 100 million restaurants, they don't change between queries. Or at least they don't change much.
Professor Squirrel: Yeah, yeah. Okay, that is astute. And but inherently sort of two dimensional is the tricky thing about this data. So it's not clear like we could, it'd be nice to sort of group it, maybe literally, just by sort of quadrants. Like, I don't know what the granularity is here, but like, buy, buy sort of, like, you know, some, like tile. Because these are presumably on the globe, like on the earth and pick some reasonable size of tile that is like, you know, the, the nearest? I don't know, like, its nearest like metropolitan area or something like...
Immutable Automaton: okay, so you're talking about something like a grid?
Professor Squirrel: Yeah, yeah. That's what I'm thinking of. Yeah.
Immutable Automaton: Okay, so you are correct to say that the granularity is a problem, because regardless of what you pick, you will end up with some parts of the world with too many empty tiles, like in the Pacific Ocean. Yes. And you have some tiles that are too saturated, like, around Manhattan?
Professor Squirrel: Sure, sure. This is true. Hmm. So perhaps there's a way to do this differently. Although it's not clear what it is. I think you even mentioned this data structure to me actually, last time, although I didn't look into it, there's some way to do like, sort of this two dimensional binary tree, like binary search tree like things?
Immutable Automaton: Did I mention this to you in the past?
Professor Squirrel: I think so at the very end of our interview, when we were talking about a car driving through pylons or something, you said, well, actually, there's this data structure. And I didn't look into it. But it's, it seems like it was very relevant here.
Immutable Automaton: Ah yes the same data structure can be used for both questions that's correct. So for the for the car driving one, this is more of an optimization. For this question. That data structure is very core.
Professor Squirrel: Unfortunately, I don't actually know anything about the data structure, besides the fact that it's intuitively sort of sorting by these two dimensions and achieving, like logarithmic lookups. I can actually come to think of it this must be how databases, I know nothing about databases, but they store like composite indices over like a pair of keys, right? And they must use something like this. Does that seem reasonable?
Immutable Automaton: It's like basically a binary search tree with more than one dimension. In a binary in a binary search tree at every node, you you either go left or right, with a smaller or bigger, right? But if you have two dimensions, then you have four ways to go.
Professor Squirrel: Sure, sure. And we could say, like, left and right, and x, and like, up and down, let's say as y. But my question is do just...how is this different from storing a single binary search tree that is indexed by the X key, like, keyed by x, and then a separate binary search tree that is keyed by y? Like, where? Obviously, it's not exactly what we're doing? Or maybe it is?
Immutable Automaton: Well, if you do that, then you're going to have to end up with a cross shaped such base. So okay, I'm in drawing mode right now.
Professor Squirrel: Sure.
Immutable Automaton: If you have a, this is this is the world, right? This is the world, your, your first binary search tree? If it's bi X, you're gonna have something on the left, and then you have something on the right, right? Then it will do it have M within this, you're going to have another two sections like that. And like that. Okay, and then you have a different binary search tree, for for the y. So you're gonna have Y like that. Are you seeing what I'm drawing?
Professor Squirrel: I'm not fully understanding and honestly, but um, yeah.
Immutable Automaton: So if you have two binary search trees and you find ,and you are able to find where the user is, in one of the binary search trees, and where the user is in the other binary search tree, you end up with a cross shaped region, where you need to process. Okay, so previously, you mentioned so I'll move this aside, previously, you mentioned that you, you want to do a grid. So a grid is like this. And the most elementary grid you can have is a two by two, right?
Professor Squirrel: Sure.
Immutable Automaton: Yeah, so you have a two by two grid. And if one grid is if one cell is oversaturated, what can you do to it?
Professor Squirrel: Umm... Let me think about that. so if some of them are saturated. You can let me say like this cell is much heavier than this one. And this one, then you well, on like a binary search tree that's just like the left, right split, you would rebalance, to sort of in some sense, push some of the mass over to the right. But you actually have options. You can also sort of rebalance
Immutable Automaton: That is one way. You can you can move your line to the left right, you can move the line like this.
Professor Squirrel: Yeah, I see what you're saying. I see, yeah yeah.
Immutable Automaton: Okay. But that's not what I want to get to. You can repeat this process basically.
Professor Squirrel: Oh, I see. I see. Cut it, sure. Okay. Okay. So wow, interesting. Okay, I'll do it further. But quick question, is it self balancing in any way or not? Like, there's no relation.
Immutable Automaton: If the data is random, it will be it will not be it will not be self balancing. In fact, there are very many variants, but the one I'm describing right now, it has fixed coordinates. So you're dividing them four equal squares and you're subdividing them to four squares all the time. So partially or more density is just going to be deeper. And that's if you want to sell balance it you're going to end up with a slightly more complicated we are representing them, which may have some advantages, depending on use case was up to you to to explore, but I don't think there is enough time in order to do the kind of exploration
Professor Squirrel: Totally, yeah okay. So, this is starting to come together and make some sense to me like, in some sense, this like, this dynamically captures this like, grid granularity problem by sending Manhattan which is very dense is going to end up getting split, you know, 100 times
Immutable Automaton: Yes correct, but this data structure can be very, can be defeated by an adversarial data input, right. If P_i is = to 0.5 of P_i + i-1 then you're gonna have, you're gonna have a very heavy tail, and each cell will, you know what I mean?
Professor Squirrel: I mean, not quite. So P, these are the point coordinates, right? And like say x, y be the horizontal coordinate, and you're saying the, so you keep cutting magnitudes in half. So in particular is like, like a point here, and then a point halfway to point a quarter eight, so on and you're kind of you converging down to like, approaching zero. Say, you're doing it diagonally this way or something.
Immutable Automaton: Yes, then your, your, your, your tree,
Professor Squirrel: your linear depth is what you're saying, you're gonna pull this into a rough list.
Immutable Automaton: Okay, that's right.
Professor Squirrel: Okay, word right. Okay, but I guess in practice, the data probably won't look quite like that. For restaurants, for example, they're not going to adversarially decide where to open up, you know, pizza places in Manhattan. Cool. Okay, so I have a very high level of kind of grasp of the idea.
Immutable Automaton: Suppose your data is already in this data structure? How would you use it to find the tendons?
Professor Squirrel: Yes. So the data structure is somehow representing basically, these cells haven't yet decided how but it's going to tell you sort of mean, I guess there's some sometimes there's one top level node, which is this like big rectangle, like the largest outer square, and then there will be either I guess, there's two flavours of nodes, there's no that's like a leaf node. And on a really sparse data set with like, one element or something, maybe the whole tree is just one big leaf node. But otherwise, interior nodes have four children, as pointed out as like these sort of four quadrants. And in each of those four nodes, again, we'll have the pattern of as being either a leaf node, and or divided in four. Yep. And as long as we sort of, I mean, I'm not going to worry too much about things that are exactly on the boundary, we'll just sort of round them one way or the other. And, I mean, it might be subtle to worry about that for correctness of the implementation. But it also might not, because a lot of the times, you know, things to sort of work out. I don't know if that's gonna sink. But anyway, so I'm not going to think we're like too hard about about the lines that, like points that lie exactly on one of these. One of these lines for now, I'll have to say is that when you when you have a point you if the current like, let's say, you're finding a point, you're playing like the find routine, if the current square, I would say is a leaf, then the point should be in that current square, otherwise, you figure is the points, you know. And let's suppose we're also tracking the current ranges of x and y, which are 0101, whole axes to start with. Then yeah, you find the point x, y on the left half of the right half. So it's sort of like a binary search. And then you do the same thing for the y as well before recursing. And then you'll end up with a quadrant because there's sort of four ways that can be like x, x minus x plus and y minus y plus, say, and, or however you're done. Yeah. And right, and that'll fly to a conference. So you can recurse in that quadrant and record the fact that your new sort of frame of references the smaller box, and your point lies inside it. So you just keep recursing basically, until you find a leaf, like an inner indivisible box that contains your point. So that's like a lookup of a point.
Immutable Automaton: Okay.
Professor Squirrel: Right, so the question is, though, how, what do we actually want? We're trying to find something a little different here, which is like, in particular, pretty different this one? Yeah, this is interesting. So we could Well, I don't know this cheating. But if we augment the existing data structure to include a count of the number of things, it maintains, not just for the whole data structure, but at every node. So this this entire,
Immutable Automaton: You can definitely do that.
Professor Squirrel: Okay, nice, that'll help a lot. So this entire square will have like, let's say 100 million, is the count. And when we figure we're looking, let's say the target that we're searching for is the user's location. We want to find their exact location in the data structure for now. And that would put us squarely let's say in this top left coordinate to help us box. And as long as that top left box has at least 10 restaurants in it, we can throw everything else and just not worry about it. So we're now on a much smaller problem. And by doing that pretty quickly will converge to a very small problem. But the like around the edge cases where we start to hone in on, well honed in on, on tiles that are leafs but only have like, let's say, five elements in them. Because that's what we sort of gone too far, if that makes sense.
Immutable Automaton: That's not necessarily true.
Professor Squirrel: Well maybe if we can go back up, like if there's parent pointers with the nodes, and we can conserve. Yeah, I guess well, yeah.
Immutable Automaton: We have this problem, right?
Professor Squirrel: Well, I guess I have a meta question about the guarantees or like the invariance of the data structure. Is there like a maximum and a minimum number of elements for given like, leaf tile? Do you understand my question? Like is there guarantee?
Immutable Automaton: I see. So yes, physically, there should be a minimum. I mean, if you map this through real world, restaurants, you can't you can't conceivably have more than 100 restaurants with practically the same x&y coordinate, you can have that in a in a in a multi storey building. But yes, you're right, there is a there is a realistic limit. So let's assume that you can that there is such a limit but what are you going to do with this information.
Professor Squirrel: There is a limit well, so I'm hoping for something kind of like the sum factor, let's say B, which is like, like 100, or something. And I'm hoping that each leaf top contains between like B and B over two elements or B in two B elements, something vaguely like there's some sort of bound that.
Immutable Automaton: Oh, I see. No, you can't. Because like, like I said, the oceans will have zero and Manhattan will have a lot.
Professor Squirrel: That's, that is true. But how do I say, well, we don't subdivide the oceans like this, this bottom left quadrant, maybe it's like the whole Pacific Ocean roughly. Or at least, like, maybe we don't have quite this guarantee. But like, at least an upper bound would be nice to say that, like, the data structure was not filled in by, you know, bad code that has 400,000 in a single leaf, like that would be really awkward,
Immutable Automaton: Oh that would actually happen. So you see, so suppose you know, there are some small islands here. That's why there are two restaurants and a bigger island here. But let's say this is the tip of a continent, like, I don't know, you know Cape Town in South Africa, or something like that. And this, this would still be, this will still be like zero, and zero, and this could be 200,000.
Professor Squirrel: Right? But if it were 200,000, you'd have to divide it further until eventually,
Immutable Automaton: Yes correct. So these two, these two will be divided further. And these two will be divided further, but you still have a lot of zeros here.
Professor Squirrel: Right? Right. And I guess, okay, interestingly, one point you're getting at, so I'm starting to clue in that, that the thing I was thinking of, is too naive, I think this thing I was saying, Well, let's find, find the user's location and, and ignore, I was I was hoping that it would be safe to as we go into, we drill into this 400k. Since there's so many in this quadrant, I was hoping it would be safe for us to ignore this two and the seven, but as you I think, are alluding to, potentially, actually, that would be unsafe, because our, like their users location might be kind of on the edge here. And it's actually quite close. Right?
Immutable Automaton: That's Correct. So yes, but um, I want to, to prevent you from getting lost here.
Professor Squirrel: Okay sure.
Immutable Automaton: There are two cases, right? There's one case where you find the leaf node that your user is in, and your tenuous restaurants are in there. That's the happy case. And there's this an unhappy case where there are closer restaurants outside the square that you need to deal with. Okay, so let's focus on the happy case. And then we'll go back and think about how we deal with this unhappy case okay?
Professor Squirrel: Okay.
Immutable Automaton: Okay. So based on how you describe your data, that data structure, it sounds like any there's a guarantee that each self is square will have between zero and C number of restaurants whereas C is the point at which you decide that a cell has too many. And you're going to break it up into four.
Professor Squirrel: Yeah, sure. Sounds good.
Immutable Automaton: Right. So what do you think C Should be?
Professor Squirrel: That is a great question. I mean, maybe like 4000 just off the top of my head. That's a reasonable number.
Immutable Automaton: That's a very good guess, by the way.
Professor Squirrel: Okay.
Immutable Automaton: Yeah, it is. I don't know the answer. So I wrote the code and random random simulation. So for random distributions. 10,000 is a good number. Oh, nice. Okay, cool. Cool. And the reason is, because... why do you what do you think 10,000 is a good is a good number.
Professor Squirrel: Well I'm guessing each of the leaf towels is its own like allocation. So it's like when you're implementing a binary tree, let's say you would, you don't want to have the data like fully just scattered across memory addresses and completely like no locality at all. So making it basically nice and big, is good for memory locality to be able to like read, like, sort of sequential or closely packed pieces of memory. But also, if you make it too big, then you have to read the whole, like 10,000, or 20,000, sort of elements, left or right. And that's like, these are the two sort of contending trade offs, I think, roughly.
Immutable Automaton: That is very good observation. But more related to this problem, it's actually that there is not much performance gain, doing N log k, where n is 1000 versus N is a 10,000. And in fact, it's very expensive to do n equals 10, or N equals 100. Because there is setup time to know for function call or to do, and you're setting up your, your heap, right, your priority queue. So at 10,000 is a good balance where there is enough data and the computer can run can process 10,000 very quickly. But if you go beyond 10,000, then that is the linear part of the problem, right?
Professor Squirrel: Yeah
Immutable Automaton: And that's bad. So 10,000 is a good number. And the other thing is, what do you think's the probability of being unhappy case versus the sad case? Depending on C?
Professor Squirrel: I guess yeah, when C is large, it makes it extremely likely that you're in the happy case. So that's something I hadn't thought about. But that's probably quite relevant here. Because imagining, yeah, you're in, you're in some, like, you know, kind of tile which is aligned to these like, boundaries. It's, you have 10,000 things, and it's extremely unlikely that you just so happen to be at the very edge of the tile. And like, there's some closer things that are actually here.
Immutable Automaton: Yep.
Professor Squirrel: Yeah, I guess intuitively.
Immutable Automaton: Okay, let's, let's assume you have this data structure already done. I want you to write the function so find the 10 nearest or find k nearest. And let's work on the happy case and then we can talk about the unhappy case.
Professor Squirrel: Okay, I will try. Okay so find K nearest and you have these sort of data structure I mean, there's a name for this? What is the data structure called?
Immutable Automaton: Quad Tree.
Professor Squirrel: Quad tree okay.
Immutable Automaton: There was a quad node.
Professor Squirrel: Okay, reasonable, quad node. In the quad note, you said we had sort of like num points and oh, how do I say this? There's two types here. Yeah, it's children I guess. There's like a leaf, I'll make this first quad leaf is one type which has no children, but it does have
Immutable Automaton: Restaurants
Professor Squirrel: Restaurants
Immutable Automaton: Or content.
Professor Squirrel: Yeah, let's say location of some number of points, I guess the specific number of points. And there's a number of them like 1000. So something like this, I think. And I keep saying this, this is a weird data structure, but like, basically we want a list backed by an array. So I'm sure there's a library for this. Anyways, yeah, list of points. And sort of a quad branch or something. Other node is gonna have children, it's gonna have exactly four. I call them like, top left, or something like that. Quad node where quad node is like leaf or branch.
Immutable Automaton: Okay
Professor Squirrel: And then the whole country is like, a quad node. Okay. Seems reasonable. Yeah. Okay. So I think well, oh, so num points actually cared to know that I'm pointing a branch as well.
Immutable Automaton: Num points would just be the size of the vector in quad leaf, right?
Professor Squirrel: That is true, actually. Yeah. Yeah, that's redundant here, but not, but not here. So like, something like that? Yeah. Okay, cool. I'm pretty pleased with this. Question, how do we use it? So there's sort of two cases basically. And then basically there's actually another check as simple as this. I'm not sure how we do the texts. Run through how to do that. And then we just want to return the points that are closest to the target. So we can do that via a subroutine of nearest points. We could do something like a ref somewhere in here, I'm not sure but we take points by reference. But basically points can be... we need points we need K and we need target. And then this will return value, and we can just directly return that. So alright, the helper function. Okay so this is the heap thing we were talking about before, basically, I think, well, honestly, if it's small enough. Let's just leave it, I think that makes sense. I want the first or I can just make an empty heap, lets think about this.
Immutable Automaton: Are you making a max heap or a min heap?
Professor Squirrel: That is a good question. I've done something like this before. And remember, it's not the intuitive thing. So we want to find small elements. And to find small elements. We'd like a max heap so that we can kick out the larger candidates. They're less useful to us intuitively.
Immutable Automaton: Yep Good.
Professor Squirrel: So a max heap for us is binary heap. So you can clear the new heap. I want to at least have it capacity k + 1. And again, we'll just flatten it into a vector but the sort of main loop here. For p in points I'm going to Yeah, what I'll do is I'll push the point to the heap and then pop the largest.
Immutable Automaton: Okay
Professor Squirrel: I don't know if that's the clearest code but we'll try to push p, separate p by some... I'll come back to this. This is this is a flavour of what I'm trying to do here, because things are missing, I think the only things missing is the ordering on points. And that ordering is... it's this distance property. So the kind of entry points were actually data types we're gonna change this to be an alias to actually like real proper data type. We'll be able to do like a norm which is like the Euclidean norm, which is like, so we have. So literally just want square root of the squares so say like self x to, not really sure what this called, something like self y to...I think that's all we have to do there. And then we want subtraction for points. So we can take a difference which is sub for points. Yeah, so then now I think we have enough to be able to compute like distance between two points. So you guys can do that.
Immutable Automaton: That is a very interesting way calculating distance. I've never seen this. Yeah, this is correct.
Professor Squirrel: Okay. Yeah, maybe I'm overthinking it. It just felt natural to have subtraction. Yeah.
Immutable Automaton: Yes, I typically we would do the Pythagorean Theorem, you just take the x, take the x and y you square it. And for the for specifically for this question. You don't need the square root because you you just care about the ordering you don't care exactly.
Professor Squirrel: Ah neat, true true cool
Immutable Automaton: And you save a little bit of computation on that part.
Professor Squirrel: Right so this is yeah, okay. So cool. But now I have to somehow get this to work with my binary heaps. So think that yeah, we'll do something a little hacky and say that points are ordered. No, it shouldn't be so that I can have the answer heap be like two things be like some sort of distance and then the points. Again, this is not what you should do sort of in practice, because it doesn't make sense for points to be ordered. I think but I can do it anyway. So the question is, what is the distance we care about here? It's target distance p and then that's what you're pushing with. And they're gonna be popping. Yeah, this I guess is all we need. So wait, what else, max heap, max by distance? Yeah. Cool. Okay. Yeah, so I think I think this is what we wanted there.
Immutable Automaton: Okay, so I am not familiar with Rust. So I don't know what you mentioned about how points shouldn't be ordered. Like basically, you're not, you're not ordering points. You're ordering tuples right now?
Professor Squirrel: Yes. Yeah. The issue though, is that for this pair of values of the type of like, right to take care, which is a binary heap of F64 comma, points. And for this F64 points to have to implement the like order trade ord, you need to point implement ord. So the hack where I implemented on point was to allow me to write this, what you probably should do instead in production is define a data type like Heap elements, which has the distance and the points. And then you'd implement order directly on heap elements. And the ordering would look like ordering by distance and ignoring the point.
Immutable Automaton: Okay good. Awesome. Yeah. Thanks for explaining.
Professor Squirrel: Yeah, no problem.
Immutable Automaton: This is great. Okay. So one thing about your implementation of, you know, of, indiscriminately pushing things in and then popping them out. That means that you, this is C log K right? C is the capacity of the cell. So you have C of them. By doing this, you are forcing keep reordering for all C of them.
Professor Squirrel: So that is a good point. Yeah, that's pretty rough actually, okay, true. Constants are not that good. I agree.
Immutable Automaton: So, if you compare before you push it in, you compare it with the top the biggest element in your heap and you only push it in if its smaller than you will save on keeping the ordering.
Professor Squirrel: Yeah, yeah, in particular, the average case you save a lot because unless the input is like reverse sorted or something adversarial, the usual operations, just a peek and then decide you don't care and you don't need to heap at all, which is O of one which is much better. Okay. Cool.
Immutable Automaton: Okay, but it doesn't change the time complexity. So I'm gonna...
Professor Squirrel: Yeah, okay.
Immutable Automaton: Okay, so, now that you have done this, you have found it 10 nearest. Okay, now it's time to move on to the unhappy case.
Professor Squirrel: Okay, great question. Well, we also need to do this like in person somehow. But this is just the sort of bisecting thing we mentioned. Right.
Immutable Automaton: Like, where are you? Are you in the dry moment?
Professor Squirrel: Sorry? No I'm on line 59. The I did not implement the case to recurse on a sub node when we're not yet at a leaf.
Immutable Automaton: Ah, okay but that's pretty trivial, right?
Professor Squirrel: Yeah, I'm sure we can do it. I'm interested to think about this unhappy case, because I don't yet have any really solid ideas on how to do it. Yeah, I think like, one concept that seems reasonable, but I don't yet see it working is like going up somehow. Or at least checking, like neighbours in the state sort of sentence like, like we have. I guess we could probably detect, yeah, we can detect if we're in the happy case, in a way that like, usually, we will correctly detect, like, there might be some false positives and we'll say okay, we can't verify that we're in the happy case. So let's, let's do the heavy lifting. What I mean by that is...
Immutable Automaton: So how do you verify you are in the happy case?
Professor Squirrel: Yeah, the, the quick check we can do is just say, the, I said my the point the target within its bounding box its distance from any edge is greater than the farthest of these 10 points that I currently have. Because that means to me that like, I do not have to go inside the box, because everything outside the box is actually strictly worse than what I've already got.
Immutable Automaton: Good.
Professor Squirrel: But if we're really close to edge then we can't be so sure.
Immutable Automaton: Okay. That is absolutely right. And the other case is. So over here, I wouldn't do the assert that it has more than or equal to k. I'll just find, you know, up to k nearest and then you check if you have ten. If you don't have ten, then you also need to declare
Professor Squirrel: Also true, yes, yes. Good call. Good call. The question that is, what do we do about it? I guess intuitively, like supposing you're really close to the left edge, the bounding box, you just want to kind of find the thing to the left of us geometrically, which potentially involves, like, going up for a while, maybe because if, yeah, because if we're in this like deeply nested sort of structure, we might not actually physically go to the left until we've gone to our parent, like, I guess we can, we can go up to our parents. And then if we see that we were one of the left children of a parent who keep going up. And then until eventually we get to a node, which was the, on the right half of its parents, then that parent is the sort of shared ancestor, I should draw this but like the shared ancestor of these two, is what I'm picturing here is these two which is cut by this edge. I don't know if this makes sense and this is only to try to implement the functionality of going physically to the left eye.
Immutable Automaton: So what you're describing, I can clarify it for you. So that it's sure you're trying to do is you're trying to ask which nodes overlap with my circle where the radius is defined by this?
Professor Squirrel: Yeah, that's even better. Because we might also have to go down as well as left like it's if we're in your corner or something. So you wouldn't have to always go in all four directions, right? Like there's probably something about?
Immutable Automaton: Yeah, you don't always have to. So by using what I described, you could ask the root node, or you could start from the root node again and say, you know, which nodes overlap with the circle?
Professor Squirrel: Oh, cool. I see. And that's okay. That's cool. I like that. And is that a query?
Immutable Automaton: But the problem is, you might, you might overlap with some really dense square, and you end up with too many points. So you want to find a good radius?
Professor Squirrel: Oh, wow. That's interesting. Yeah, wow.
Immutable Automaton: So obviously binary, such a good radius, you end up with a bunch of nodes that overlap that circle with that radius, and then you do you repeat whatever you're done with tenuous, but this time with all the overlap nodes, and we'll have the answer.
Professor Squirrel: Okay, got it. Interesting. Is there any reason to instead do an approach where you're sort of physically just moving to the left from where you started? Like, is that something that...
Immutable Automaton: So you can but, but you may be in a big node, and if you move to your left, it might be a very dense node with lots of children, then it's gonna be complicated to figure out, you know, which is your left node, you could have multiple nodes that are directly next to your left edge.
Professor Squirrel: Also true as well. Yeah, actually, okay. Yes, I can. I can see this getting complex quickly. Okay. Okay. Yes, yes. Yes. Got it.
Immutable Automaton: So conceptually, it sounds easier. But implementation wise, it's going to end up being very similar to just finding nodes that overlap with.
Professor Squirrel: Gotcha. Yeah. Okay. Okay, cool. And how does one do that? Like? Yeah, how do you search for a range of points essentially, or like a show?
Immutable Automaton: That's a geometric problem. So I wouldn't candidates to know how to do that. And you can search for that online. There's a good here's a good one.
Professor Squirrel: Sure. Sure
Immutable Automaton: I pasted that in the chat.
Professor Squirrel: Oh I see let me find it, Gotcha.
Immutable Automaton: It's actually just the one line equation.
Professor Squirrel: Okay. I'll take a look after I don't want to keep you, but cool.
Immutable Automaton: But I think you did. Fantastic. Okay. And I want to say that this solution is not the best.
Professor Squirrel: Interesting,
Immutable Automaton: The better solution...do you,do you mind if I spoil it for you?
Professor Squirrel: No no, go ahead please do, I'm interested in.
Immutable Automaton: Okay, so the easiest solution is actually to have a quad. quad tree with capacity, see at maybe four or five.
Professor Squirrel: Okay.
Immutable Automaton: Okay. And what you do is you take the entire tree, you put it in a priority queue. Okay. And then at every step, you pop out the top element, and you unpack it. And the priority queue is ordered by distance. So if your priority queue can contain nodes, or restaurants, so when you pop out a restaurant, it's one of the nearest restaurants is the nearest restaurant so far? If you put up a quad node, then it may contain restaurants or nodes. And how do you order those nodes with restaurants? Well, with restaurants is just points where it's easy to just listen for a node, you would, you would order them by the maximum distance, the maximum, the furthest point in that note to your user will be...
Professor Squirrel: Interesting, I would have thought it might have been the closest because you're trying to...
Immutable Automaton: Ah, yes, yes. Sorry. Closest.
Professor Squirrel: No problem. No problem. Yeah. Got it. Yeah.
Immutable Automaton: So if you just repeat this process, and you'll pop up 10, restaurants 10, you're done.
Professor Squirrel: Fascinating. That is very conceptually clean. And it's like unlikely, I guess to do a lot of extra work, probably. I mean, yeah, intuitively, you're doing exactly what you have to because those tiles could potentially have the things that you care about.
Immutable Automaton: And because you know, based on your data, set num points, you know, the number of points in a cell, you could potentially prune. Sometimes you don't, you don't need to push nodes back into the queue. If you know that. I have enough. I have enough things in my queue that our that you know that everything is remember, because then you can you can optimise that way.
Professor Squirrel: Interesting. Okay. That's a little subtle, but very reasonable. Neat, yeah. Okay.
Immutable Automaton: This solution, no one gets to it, because I wouldn't even encourage it, it requires so much insight. In fact, I've used this question for a good half a year. Before I, before it just came to my mind that I could do that. Yeah, this question has been asked, I asked more than 200 times in Google?
Professor Squirrel: Wow, it seems like I had a difficult question like, what level of candidates do you give this to?
Immutable Automaton: I give this to level three to five. And the amount of guidance I provide would vary depending on the level. I will say that you did one of the best. So I'm surprised that you're going for an entry level position.
Professor Squirrel: Yeah, I mean, I, I am a new grad. So I think I would struggle to convince them otherwise.
Immutable Automaton: Ah I see, but you did great. I mean, make it so did I tell you the two things we care about in a coding problem?
Professor Squirrel: What are those?
Immutable Automaton: Which is coding and problem solving. Okay, so coding is basically how comfortable and how fluent you are in writing code, like how, how familiar you are with it. And based on based on what you have typed, the code you have written, it's amazing, you are, you know, some candidates, they could write the fine K nearest, and then they would, they would, they will talk about the struct, they will talk about the, you know, the helper methods, but they don't implement it, but you are able to go and implement those things really quickly, while writing the main function. Amazing, okay. So your coding is out of this world, you will get a strong rating for your, for your, for your for your coding. The other thing is problem solving. So problem solving is process is process and, you know, being familiar with data structures and algorithms, again, amazing, okay. Most of my candidates, they don't figure out this data structure on their own. You had a tiny, tiny bit of hints from me to get to this data structure, but otherwise, you have driven this solution, largely on your own. Okay, so definitely amazing on data structure and as well on problem solving as well. So I don't have I don't have much feedback to give it to you beyond this. I think you're completely ready for your interview. You'll do amazing at it.
Professor Squirrel: Okay.
Immutable Automaton: So perhaps you want to try different kinds of problems to make sure that you have the same ability to solve those kinds of problems. So actually, why don't we do this? We have five minutes. Do you want to try thinking about different problems?
Professor Squirrel: Yeah, sure. I'd love to. Okay. So, one lingering question about the queue approach is, you mentioned the towel size, like C. Should be small, like five. Why? Why make it so small?
Immutable Automaton: Because you're pushing individual restaurants into into the queue.
Professor Squirrel: Right, right. Yeah, I guess theoretically, intuitively, it seems reasonable. That is sort of. Yeah. Okay. I mean, yeah, it seems very plausible, I believe you.
Immutable Automaton: Okay, so this second question, it's requires a bit more thinking. So I'm going to paste it at the bottom. Usually, I take about 10 minutes to explore this problem with the candidate before we even try to solve the problem, just to make sure that the candidate has a good understanding of what we're trying to solve.
Professor Squirrel: Right it does seem pretty intense?
Immutable Automaton: Yes.
Professor Squirrel: It's, it sort of has a similar flavour in terms of division of geometric space. Do you do you have like a long background in this? Did you do like a PhD in these data structures or something? Or have you worked with them for a long time?
Immutable Automaton: No, I worked in data. I work in Google Maps.
Professor Squirrel: Right? Okay, that makes sense.
Immutable Automaton: Well, I worked in Google Maps, I left Google half a year ago. I was there for 10 years.
Professor Squirrel: That is quite a while. Can I ask why you left a curiosity?
Immutable Automaton: Oh, you know, Google likes to cancel products and your stuff. So I was working on a project for two years, and then they cancelled it. And I have to find a new team. So if I had to find a new team, I told myself, Okay, why not? I'll consider other companies as well. It's a new team. Right. So, so I interviewed and, and I realised how underpaid I was so.
Professor Squirrel: I think it's in Tech broadly, like people that stay in one place for a long time get underpaid. And I think that really sucks. I mean, it's a bad incentive.
Immutable Automaton: Yes, it's, it's crazy. My new my new compensation is 50%. Higher.
Professor Squirrel: Wow. And you couldn't just sort of negotiate with Google based on that, like the offer? I guess they would?
Immutable Automaton: Uh I don't know. 50%. It made me feel terrible. Like, okay, I should have left years ago.
Professor Squirrel: Yeah, yeah. Right. Okay. So sorry, I'm like halfway through reading this so that we have these divisions into sort of zones, and we can squash the largest talent zone. I think it's what I got from the first paragraphs. So we're choosing how to bifurcate. So every day we need to build one horizontal or vertical barrier. Cutting cool.
Immutable Automaton: In each zone
Professor Squirrel: Oh in each zone, ah okay, so this sort of, I see see, to those who are successful in sales blog is contained. True. Okay. The damage, and so we're trying to optimise for this damage metric is basically the number of roaches alive, like the integral or something like per day, the number of roaches alive until eventually containing everything.
Immutable Automaton: Well they don't actually kill the roaches you're just reducing the damage done.
Professor Squirrel: I see I see, you're sort of fighting them off or keeping them tied up. Okay. I see. Okay, because I was about to ask this. None of these numbers are going to zero, but I see. Okay, so eventually, it is easy. So what will only I mean, unless there's any like zero tiles in the input, we have to actually fully divide up the entire thing into this tic tac toe board it takes to reach like a stable state. Would there ever be Tiles at zero roaches?
Immutable Automaton: Let's say no, but I don't think it matters.
Professor Squirrel: Okay.
Immutable Automaton: And here's an example
Professor Squirrel: Minimum damage must be sustained. Okay. Yeah. So we're trying to optimise this and I guess intuitively. So let's suppose we just didn't build any barriers, then, like the the maximum tile is the thing that we're currently like dealing with everything else we're getting damaged by. Intuitively heuristically, what we want to try to achieve with these divisions is to, to get other, like, maximal tiles or other quite large tiles and be able to squash or at least manage those as well. So that in our first example, that kind of explains why we cut 447 from 147, just so that we're able to handle both of them. And then, it's not clear that everything will work fine. You but like, that's sort of a vague sense I have about how we, the things were opened, optimised, and then...
Immutable Automaton: And I just pasted an example to show to show that greedy will not work.
Professor Squirrel: I see I see. This is number three.
Immutable Automaton: Yes, three is the optimal, but if there are just more roaches in the 29, then approach tool will become the optimal.
Professor Squirrel: Okay, I see. You're saying so project one probably was a greedy thing. They were cutting 95 from 40. And they were cutting. Oh, interesting. So I miss No, I didn't miss, this make sense. So when we chose to cut 95, from 29 and 40, from 10. We actually had flexibility here. Like we partitioned zones individually. Right? I could have cut... Yeah, okay. Yeah, I don't know if I'm gonna think of the optimal solution here. I might have to go and think on this. But this seems really interesting. And I will definitely spend some time with it afterwards.
Immutable Automaton: All right.
Professor Squirrel: Cool. Well, yeah, again, so thank you so much for talking to us. These were really fun and interesting problems. And, and yes, thank you.
Immutable Automaton: Yes, I'm very happy and with how you did good luck, and yeah. Okay, have a good interview.
Professor Squirrel: Thanks. Thanks so much. Yeah, maybe we'll meet again someday. Yeah, cool. Cool. Okay take care.
Immutable Automaton: Yep. Email me your solution. I'll be happy to talk about it. And I can send you the write up as well.
Professor Squirrel: Oh, sure. Actually, yeah, sure. I will let you know if I if I do attempt it. And I'll let you let you know. So I'll give you my email in the thing. Cool.
Immutable Automaton: Yep. Okay. Bye Bye
Professor Squirrel: Good. All right. Ciao.
Immutable Automaton: Yeah