Interview Transcript
The Legendary Artichoke: Hello?
Supreme Werewolf: Hello, hi can you hear me?
The Legendary Artichoke: Yes, can you hear me?
Supreme Werewolf: Yeah, I can.
The Legendary Artichoke: Okay, great! Why don't you go ahead and pick a language and we can get started?
Supreme Werewolf: Sure, okay.
The Legendary Artichoke: Okay cool. So how about this. To start with let's just write a quick function that reverses a string.
Supreme Werewolf: Okay, so a quick function that reverses a string? Alright, here I'll have a string….end = inp.length()....and as long as start < end, start++ and end--. So the idea is that I'll start a pointer from the beginning and end from the end and keep swapping the whatever they're pointing to and when they meet, the program will tabulate.
The Legendary Artichoke: Okay.
Supreme Werewolf: So, it is constant right here….swap inp[start] and inp[end].....Okay, and let me test it with another input.
The Legendary Artichoke: Well, hold on. So that didn't work.
Supreme Werewolf: O wait, I see. “rab oo”, it's missing the 'f' part. Okay what happened there? I see, it should have been length() - 1 because I'm starting from the end.
The Legendary Artichoke: Yeah.
Supreme Werewolf: And I'll try some other inp which is just empty. So in that case I just okay, do this only if inp is of length() > 1. Otherwise we don't need to do any swapping. Yep, so this and yeah this should be good.
The Legendary Artichoke: Yeah, that looks good. So what would be the time and space complexity here of reversing a string?
Supreme Werewolf: It's O(n) time because I'm going through each and every character of the string so O(n) times the length of the string. And space is constant because I'm not storing anything extra.
The Legendary Artichoke: Yep, you got it. Okay cool, so let's move on to the problem. So the way this problem is going to work is I'm going to give you a function called find_missing() and it's going to take in two arrays as its arguments. And the first array is going to be some array of unique integers, let's throw some integers in here: [4, 8, 12, 9, 3]. And then the second input is going to be another array that is going to be like the first array but one element is going to be missing: [4, 8, 9, 3]. And basically what I want you to do is find the missing element, output the missing element that is missing from the second array that was in the first array.
Supreme Werewolf: I see, and there is one element that is present in the first but missing in the second and I want to find that.
The Legendary Artichoke: Yes, also there's no guarantee that they're in the same order.
Supreme Werewolf: One typically is that if I sort both arrays then I can maintain a pointer to the head of both arrays then I can walk and whenever there's a mismatch that is the element that I'm looking for. That's going to be O(nlogn) + O(mlogm). I'm thinking if I can do better than that.
The Legendary Artichoke: Sorry, you said the time would be O(nlogn) + O(mlogm)?
Supreme Werewolf: Yes. Or I could do something actually…
The Legendary Artichoke: So you put the time complexity in terms of two variables, what is the second variable?
Supreme Werewolf: Sorry sorry, so the time complexity will be O(nlogn) because sorting the first array will be O(nlogn) and sorting the second array will be O(nlogn) in them.
The Legendary Artichoke: Yeah, okay. And how much space does that take?
Supreme Werewolf: That will be constant space.
The Legendary Artichoke: How will it be constant space?
Supreme Werewolf: Because then I can sort them in place no? Sort the first array in place, second array in place and then.
The Legendary Artichoke: Okay, what sorting algorithm would you use to sort them in place?
Supreme Werewolf: To implement, I can use STD sort, that's one way.
The Legendary Artichoke: What sort does STD sort use, do you know?
Supreme Werewolf: I'm not entire sure, but in place sorting can be done through building a heap also and then what is uses is probably a tree, but if I build them in heap then I can sort in place.
The Legendary Artichoke: Yes, okay. You're all correct. Okay, yeah so that would work if you sorted then walked on the array you'd get O(nlogn) time and O(1) space. So that's good solution if we're very space constrained. What if I'm not space constrained and I want to do this faster?
Supreme Werewolf: Okay in that case I can put the first array in the map and output the second array in a map and then just take one element by element in the first array and whenever I find an element that is missing in the map that's the missing one. That will take O(n) time and O(n) space.
The Legendary Artichoke: Okay, yeah that sounds like it would work. Why don't you try implementing that?
Supreme Werewolf: Okay. const vector&<int> bigger and smaller. Okay so I can std::map, or I don't even need a map I can just create a set<int> elems. Or is there a constructor….? No actually sorry, the lookup's going to be costly if I made the search.
The Legendary Artichoke: Sorry, why would lookup be costly?
Supreme Werewolf: Well okay how is the std::sort….? I think how much std::sort look-up cost I'm not entirely sure about that.
The Legendary Artichoke: It's probably implemented as a hashset.
Supreme Werewolf: Hashset? In that case it's constant time.
The Legendary Artichoke: Yeah, I mean, do it either way, It doesn't make a huge difference. Whatever you're more comfortable with is fine.
Supreme Werewolf: Okay well sure, cool. Okay let's do that, let's go with the map. for...is there a way to change the settings on there? Right here yes, okay. elems.insert(inp). What is going on here….vector, that was not declared…
The Legendary Artichoke: You can look up documentation if you need to.
Supreme Werewolf: Google, ah sure. Alright so for (auto inp: bigger). if (elems.find(inp) == elems.end()). That means the element was not found and we can return this input element, otherwise we will return -1. Yeah, so this should work and we can find_missing() of the number you gave. vector<int>...but I guess I could just do that here.
The Legendary Artichoke: Okay, that looks good. So this solution now is linear time linear space. Is there a way now that you could do this better than linear time linear space?
Supreme Werewolf: Better in terms of both?
The Legendary Artichoke: Either one or both, whichever one you think you can do.
Supreme Werewolf: I see.
The Legendary Artichoke: Just find, if you can, some way to improve on this.
Supreme Werewolf: Okay I guess I'm going through the previous array and….push on the element that is missing. Oh, I can test by take the sum of all of the elements of the first array then subtract the sum from the second array.
The Legendary Artichoke: Yeah okay, why don't you implement that real quick? That should be a quick one. Go ahead and put in a solution.
Supreme Werewolf: Okay, find_missing2() and sumVec()...return sumVec(bigger) - sumVec(smaller).
The Legendary Artichoke: Okay great, so what is the time and space of this solution now?
Supreme Werewolf: So space is constant, I'm not storing anything. Time is O(n).
The Legendary Artichoke: Okay so, that's actually not correct.
Supreme Werewolf: That's not correct?
The Legendary Artichoke: No that's not correct.
Supreme Werewolf: How?
The Legendary Artichoke: Do you have any idea why?
Supreme Werewolf: What part is not correct, time or space?
The Legendary Artichoke: So the time is correct obviously, because all you're going is summing over the array, so that's obviously correct, but the space is not correct.
Supreme Werewolf: Space is not correct, so some work is just looping over and collecting everything so there's one variable there and...why is that not constant space?
The Legendary Artichoke: So one question is, your solution is written, is it guaranteed to be correct?
Supreme Werewolf: If all are positive integers and there is one unique element that's missing in the bigger then it's going to be correct.
The Legendary Artichoke: What if the numbers are very large?
Supreme Werewolf: I see, then it can run over the int limit.
The Legendary Artichoke: Right, so if you're using a hashmap or hashset, it doesn't matter how big integers are: your solution will always be correct, but if your sum of all the integers in this vector are too big, then you'll have an overflow, and then your solution will not be correct. So how can you make this solution guaranteed to be correct.
Supreme Werewolf: I see, then I can do, subtract each individual...so in that case it can still overflow if I like subtract first of first and then sum those differences but it can still overflow if the first two numbers of the bigger one are too big and the first two of the other one are smaller then it can overflow. Do you want me to improve the sums solution or like think of something else different?
The Legendary Artichoke: Well I think you can certainly salvage the sums solution if you can deal with this overflow problem.
Supreme Werewolf: I see, so overflow….Difference might generally work but it's not guaranteed to work.
The Legendary Artichoke: Right, it will sometimes work and sometime fail depending on the size of the vector, and of course when we're talking about asymptotic analysis we're talking about big sizes. So it makes no sense to talk it being O(1) on a large input size if it's always going to fail on a large input size.
Supreme Werewolf: So I mean one solution that I have is that instead of summing each array separately then taking the difference, I could sum the difference of each individual element. I get the element of first and that of second and I sum those differences and the answers will be smaller. But the problem can be that depending on the ordering of the array the sum can overflow.
The Legendary Artichoke: Right.
Supreme Werewolf: So that might not work, how do I prevent overflow. One other thing I could do is….I will take the sum of the differences and, depending if the differences are going positive….Okay, so if the sum is going positive then I'm going to subtract from the second array, and if the sum is going negative I'll add from the first array that way it's guaranteed to work.
The Legendary Artichoke: Yeah that makes sense, why don't you try implementing that?
Supreme Werewolf: Sure.
Supreme Werewolf
Supreme Werewolf: Okay so find_missing3(), I have two points h1 = 0 and h2 = 0. And I have int sum = 0….or what I can do is sum = bigger[bigger.size()-1]. Oh wait, am I doing this right? No, I don't need this loop….while(h1 < bigger.size() || h2 < smaller.size()) so the head is still before the end, do this: if(sum<0) then I need to add stuff from the bigger array. If there is stuff still available there, in the bigger array, then add bigger[h1], h1++. Else then if(h2 < smaller.size()) then sum += smaller[h2]. Now while h1 is less than this, and while h2 is less than this…sum -= bigger[h2], and return sum. Alright.
The Legendary Artichoke: Okay, so why don't you generate some large and small integers and see...let's try this with large integers and see what happens.
Supreme Werewolf: Okay.
The Legendary Artichoke: Well, these aren't going to be valid ints, right? Or, I don't actually know.
Supreme Werewolf: Okay, so how do I generate the large int, is it a thing called UINT_MAX, INT_MAX?
The Legendary Artichoke: Yeah, is there no constant you can reference? Yeah I believe int underscore max and int underscore min, you have to include a thing called c limits limits.h.
Supreme Werewolf: Okay so INT_MAX works so let's say it equals INT_MAX and INT_MAX-2 and let's have INT_MAX here and INT_MAX-2 and maybe is to order a bit and move the 9, 3 to the beginning. So that yeah.
The Legendary Artichoke: Okay, and see what happens when you introduce negative INT_MAXs, or I guess that would be INT_MIN
Supreme Werewolf: Okay well the solution will...so there was an assumption in this part of the solution that the numbers are positive but I can modify it to work with negative numbers too. The reason when I want to do plus I go to the h1 is because I assumed that our numbers are all positive but if that's not the case then let's see how do I fix it? Okay so, if I want to both are negative, in the operation will reverse right? So h1 is negative so, oh I see. If the first one goes negative and I'm already running negative then….if the first one is negative then the second one will also try to make it negative. If the first array is negative then the sum is all negative then it's going to try to add the first array which would be problematic if that next number is also negative but what if the number in the other array is also positive, in that case..negative here and no negative events true.
The Legendary Artichoke: Right so depending on the order of elements, if you don't assume the assumption that all of the integers are positive then you can get into these sort of pathological situations where you have no choice: there's no way to keep track of the current value because the order is not fixed right? Because the elements can be shuffled from one array to the next, you can end up with a very bad situation such that there's no way to not overflow or underflow.
Supreme Werewolf: Right, right.
The Legendary Artichoke: So it seems like the only way to get around this….so like, this kind of approach is going to work, again, for most of the time. For most data sets you could get as inputs, this would work perfectly fine but we can very easily construct pathological inputs such that this also fails. So what would be a way….so asymtotically, this solution, being just as good as this, if we were, you know, facing an adversary who wanted to sort our algorithm. So what is a way that, using this solution still, we could get around this problem?
Supreme Werewolf: Using this solution, we can get around the problem?
The Legendary Artichoke: So like a simple way to get around this problem, what's a simple way we can get around this. The problem, of course, being that we're overflowing.
Supreme Werewolf: We are overflowing then we can store the sum, but that would violate the constant space condition and we can still look like somewhere temporarily and then in a vector and then do something.
The Legendary Artichoke: It would be stored in a vector?
Supreme Werewolf: No no no that's not a good way because even that can eventually...no sorry, ignore that.
The Legendary Artichoke: Okay.
Supreme Werewolf: How would I make this solution work…
The Legendary Artichoke: So do you know of a data-type that is immune to overflowing?
Supreme Werewolf: String, I mean, that's pretty costly.
The Legendary Artichoke: Well a string is, yeah, that is a data type that does not overflow. Do you know one that's used for storing numbers?
Supreme Werewolf: BigInt, I think? Something like BigInt?
The Legendary Artichoke: Yeah, we can use a BigInt type library that can store arbitrarily large numbers so that even if we sum a bunch of ints together they would not overflow. So instead of declaring this sum here to be an integer this would be some kind of BigInt, the logic would be all exactly the same right? So I'm not going to have you write that because it's not very hard to sub in a thing there but let's analyze that as a solution. Let's say you were to turn this sum to a BigInt, or whatever BigNum library you use. What would then be the space-time of your solution here.
Supreme Werewolf: Time? Time remains O(n) but that depends on the BigInt sum and addition like...if we consider them constant time then it's O(n).
The Legendary Artichoke: Do you think that's a reasonable assumption?
Supreme Werewolf: I mean for large numbers, not really because it can depend...it can depend...it has to convert because underneath it stores your number as strings and it have to convert them or something like that.
The Legendary Artichoke: Okay, so I mean it depends on the BigInt library but it's probably a fair assumption to make that addition for a BigInt is going to be constant given that the integer you're adding is constant. Because most of the time what will happen is….usually the way a BigInt stores things is it's basically just going to store...can you see what I'm typing on line 72?
Supreme Werewolf: Mhm
The Legendary Artichoke: If you assume you have some binary, some integer right. And I guess it would be 4 bytes, okay and let's pretend that's 4 bytes. Basically what a BigInt will do is say okay this is one piece of BigInt and there will be another piece here, and another piece here and then there will be another piece here right? And then basically there will be some kind of padding element that shows when the BigInt is over. So this is how it sort of stores a BigInt this is just bytes back to back to back but they're still in binary because if they're not in binary, if they're stored as a string of some kind, then you wouldn't be able to use fast addition, you couldn't take advantage of the fact I can add ints really really quickly in registers. So basically the lowest byte of your BigInt is almost always going to be what you need to add your element. So we can say for all intents and purposes it's constant or very very close to constant. Basically, negligibly important. So let's assume that addition in a BigNum library is constant.
Supreme Werewolf: Okay so then runtime will still be O(n) and space is going to be...going to store the maximum possible sum, the maximum or minimum in terms of absolute value that is the space. But in terms of the number it should be constant.
The Legendary Artichoke: So why should it be constant?
Supreme Werewolf: Well okay I guess it can go internally wherever needed to the maximum possible sum or in terms of absolute value so...yeah I mean it's independent of the number of elements the sum and more dependent on the magnitude.
The Legendary Artichoke: But let's assume the worst case magnitude for each of the elements, that's sort of how we're going to hand the worst case right? So what would be the worst case magnitude for each of the elements in our input set?
Supreme Werewolf: I see. So worst case can be, for one element missing. So first initial elements are very large in the first array and in the second array they are very small. There are n +1 in the first and n in here...in the max I could do here is….
The Legendary Artichoke: So try...the solution was a good idea but we basically described how given pathological inputs, it would behave asymptotically the same as this solution. So it'll actually be easier for you to analyze this solution because you're going to get the same runtime characteristics as with this. This is sort of like a, given random data, this will perform great. But given pathological data, to perform only about half...basically it'll take one half the time as basically this brute for approach. So let's analyze this approach because it's going to make the analysis much easier.
Supreme Werewolf: Sorry which approach?
The Legendary Artichoke: The sum vector. Just basically adding all the numbers into one giant BigNum.
Supreme Werewolf: Okay find_missing2(). So here the space complexity is constant here, it's just storing one number.
The Legendary Artichoke: No, I'm describing if we made this into a BigInt instead of an integer.
Supreme Werewolf: I see, then the space is going to be the sum of the entire thing which is essentially n * INT_MAX is going to be the space, so O(n) space.
The Legendary Artichoke: So sorry, why O(n) space?
Supreme Werewolf: Because each number in the bigger limit can be INT_MAX so the sum will be n * INT_MAX then I need to store INT_MAX as a constant so I need to store like basically the length of the internal data structure of BigInt which is going to depend on the number of elements.
The Legendary Artichoke: Right, so you're correct in that it's going to depend on the number of elements, but does it take n * the size of that number to store that number? Another way of putting this is: So what you're saying is like I need to store the number n * INT_MAX. Your claim is: therefore I need O(n) space. But what this is assuming is that INT_MAX is a constant so we can strip that out. So the question is that I need to store a number that is of size n and therefore I need O(n) space to store that number.
Supreme Werewolf: By implementation….I can store, sorry it's O(logn) space.
The Legendary Artichoke: Why is it O(logn) space?
Supreme Werewolf: Because if I'm storing in binary then I need to have M bits so 2^M bits is equal to n and M is O(logn).
The Legendary Artichoke: Yeah, you are correct. So this solution if you use some kind of BigNum library to solve the problem you will have O(logn) space and O(n) time. Good, cool. So we're just about out of time, and I got to run, but well done that was a very strong solution.
Supreme Werewolf: Thank you.
The Legendary Artichoke: I will tell you that there actually is a way to do this more efficiently than this solution. It's kind of tricky though so if you're curious you might want to spend some time thinking about how you can improve on this but you did really well, this was very impressive.
Supreme Werewolf: Thank you so much, thank you for your time.
The Legendary Artichoke: Of course, no problem, take care.
Supreme Werewolf: Thank you, bye.
The Legendary Artichoke: Bye.