Interview Transcript
The Legendary Artichoke: Hello?
Mammoth Avenger: Hello?
The Legendary Artichoke: Hey can you hear me?
Mammoth Avenger: Yeah I can hear you pretty well.
The Legendary Artichoke: Okay perfect, so why don't you go ahead and pick whatever language you want to work in?
Mammoth Avenger: Okay it's going to be python.
The Legendary Artichoke: Okay cool. Alright so let's start with a quick little warm-up. So why don't you write me a function that reverses a string?
Mammoth Avenger: Okay, this is my first time doing interviewing.io so...So like the cheeky python answer is that you can just return a reversed() string for the input.
The Legendary Artichoke: Yeah, you could do that...although I'd prefer if you did the reversing yourself.
Mammoth Avenger: I expected. I just changed the variable name there because input is actually a built-in python function. So we could do...start with an empty string. For character in our input we would just prepend it to our output. And I think you can just add characters to a string like that. Should I go ahead and run it?
The Legendary Artichoke: Yeah go ahead and run it, let's see if it works.
Mammoth Avenger: Yep that seems to work.
The Legendary Artichoke: Yeah that looks like it works.
Mammoth Avenger: I was going to take a look at some edge cases usually as well make sure it handles non-string input like NaN stuff like that which it wouldn't do so great right now.
The Legendary Artichoke: I wouldn't worry about non-string inputs, we can assume that if it's not a string that numPy isn't blowing up. So what would be the time and space complexity here of your reverse() function?
Mammoth Avenger: Time is going to be O(n), basically looking at every character once which I don't think you could do better than...you could do it in half the passes if you did it in place and swapped characters but you're still going to be O(n). And space complexity: we're adding another variable output here of the same size, so we could reduce that to O(1) by doing it in-place in the same string. So actually you cannot do that because strings are not mutable.
The Legendary Artichoke: Correct, so there's no way to do this in O(1) space in python because you cannot mutate a string. So you are correct on that O(n) is the best you can do on space. But you're actually not correct about the time complexity.
Mammoth Avenger: Okay so...interesting. So I would have thought we looked at every character once.
The Legendary Artichoke: So you're correct: you look at every character once, but it's not the only thing you're doing.
Mammoth Avenger: Right, of course, yeah. So this isn't actually O(n) I think correctly that we cannot get better than O(n), so the concatenation right here is probably another O(n) actually because it's adding the whole string that we've created character by character to create a new copy of output.
The Legendary Artichoke: Correct: again because strings are immutable.
Mammoth Avenger: So one opportunity we could...yeah python is weird for string based interview questions but we could treat it as a array and swap each character in the correct index as we work our way either forwards or backwards. And we can get the length of the string in O(n) so then we can do it in O(n) that way? Do you want me to code it up?
The Legendary Artichoke: Yeah, why don't you go ahead an code that up?
Mammoth Avenger: Okay so treating it as a character array makes sense to you?
The Legendary Artichoke: Yeah I think that's probably a better approach.
Mammoth Avenger: So basically as we look at each character we're going to put it into the string starting from the back. So output_index = len(x) -1 and then output[output_index] = c and output_index -= 1. That should do it, I'll just run it see what that does. Okay off by one errors are usually the things that sneak in here so in the case length of x is 4, output_index is starting at 3. Right, I think we...yeah so we need to initiate this to make it like an empty array. Let's see if that works. Then we just want to join this back to a string at the end. Okay so I think that should do it, we have O(n) for the length and similar for creating the new output array and we're just doing a single character assignment per loop so I think this whole thing should be O(n) now.
The Legendary Artichoke: And what would happen if you just had an empty string?
Mammoth Avenger: I think empty string would return here.
The Legendary Artichoke: Okay cool.
Mammoth Avenger: Still this works. Yeah the whole array complication thing is a little not straightforward.
The Legendary Artichoke: Okay yeah, that was good. Let me go ahead and give you the meatier problem that we're going to work on. So I want you to write a function and you're function is going to be called find_missing() and it's going to take two arguments: the first argument is going to be an array that has some elements in it [4, 12, 9, 5, 6]. You can assume they're all unique to make the problem simpler. And it's going to take a second array that's going to contain the elements in the first array but with one element missing. And what I want this function to return is the missing element which in this case is 5.
Mammoth Avenger: And you said that we can assume they're all unique.
The Legendary Artichoke: Yes you can assume they're all unique.
Mammoth Avenger: So my first instinct here is to go to hash tables although we could also use sets in python I believe. So let's see...So I think if we, there's probably a very easy python way to do this where we could actually just make a set of the first array and then I think it's just like subtraction. We can just try that and we can also do it manually as well. Yeah so I'm going to say that's the super easy version which is…missing_items = set(full_set) - set(partial_set). And we assert(len(missing_items) == 1) and we return missing_items[0].
The Legendary Artichoke: I don't think you can index into a set? I take it python has a weird api for sets?
Mammoth Avenger: There's probably a way to pop things out of there but casting to a list is probably the easiest thing for now. So that will work pretty like just putting the work in python. So I can also do it with a hash table?
The Legendary Artichoke: No I mean this solution is good. So what would be the time and space complexities of this solution here?
Mammoth Avenger: So converting to a set assume it's going to take linear time because you're basically just populating a hash table I assume it's similar to how python implements it. So if you have m and n as the sizes of the two arrays, so they're both going to be linear and finding the difference should also be a linear operation.
The Legendary Artichoke: Why do you say that?
Mammoth Avenger: So at least to do this specific problem if we were doing it ourselves we would basically just be walking through the second array and removing things from the first when they match and because it's a hash table it's constant operation finding and removing them. So I don't know for sure what the full python set difference would take because you might have to walk through both of them but I think it would be linear.
The Legendary Artichoke: Yeah I mean let's give python the benefit of the doubt and that it's not doing this in a totally stupid way for set subtraction. So I think you can say pretty safely that this is going to be linear. Cool, and how much space would this take?
Mammoth Avenger: Right so we're going to be creating another full copy of m and n and basically just a hash table version but on that order of size and then the missing_items is just going to be one extra thing assuming that our input matches our criteria. So it would on the order of O(m + n) or it might be O(m) + O(n) I don't remember what the specific different between those two are.
The Legendary Artichoke: Well there'd be no difference between those two but what is m in this case?
Mammoth Avenger: I guess m and n are only one off so they're basically the same so we're talking linear on the size of either set really.
The Legendary Artichoke: Yeah, exactly. Yeah that sounds correct, you are correct. So is there any way you think I could do better than linear time here?
Mammoth Avenger: Let's see....so the only way to do better than linear time would be to not look at every item and the case where that could happen is if we...well missing item is by definition a number in the first set but not in the second. I'm trying to figure out if there's a way we can short circuit the check from looking in either array. So in the case the 5 is missing, so we're looking at the second array we don't know until the very end until we've looked at all the items we don't know for sure that there's no 5 in there. Actually that's not true we could sort it.
The Legendary Artichoke: If we sort it, how long would it take to sort?
Mammoth Avenger: Oh but that's O(nlogn), that's longer. So that doesn't work. So if it's not sorted then we can't know for sure until we get to the very last element that there isn't a 5 in the second one and if we go through the first one we can't know looking at any item that it's not in the second set until we've gone through so I'd say you can't do better than linear.
The Legendary Artichoke: Yep, you are correct and your argument is sound. There's no way to do better than linear time, because if there's at least one element we haven't looked at, that could be the missing element if we don't have any guarantee on order which we don't. Okay so linear time is the best we can do here, but let's say we want to trade off time for space. I'm very space constrained and I don't have enough memory to create a whole copy of my dataset. How would you lower the amount of space that this solution takes?
Mammoth Avenger: So if we don't care about time at all, one option is we sort both arrays and then you can just walk through them one by one you sort them in place so it takes the same amount of space, walk through them index by index and as soon as you find a difference in two elements you've found the missing item.
The Legendary Artichoke: Yeah, and what sorting algorithm would you use for that?
Mammoth Avenger: Just, we don't have any expectation given on the the specific kind of data that we're working with so probably just go with Quicksort or general data sanity, I think that's what python uses by default to sort data or it might be some slightly modified version of Quicksort.
The Legendary Artichoke: Yeah I'm pretty sure it uses a modified Quicksort and what would the be the space complexity if we used Quicksort?
Mammoth Avenger: My sorting algorithms are a little bit rusty but I did forget that it can add extra space complexity as well so that might be an issue. But I think that Quicksort is a variant of Mergesort I believe which picks a pivot and is just swapping items around. Oh but it's recursive, so it's probably bad for the space complexity. So if we really didn't care about the time complexity we could just do a Bubble sort or something that doesn't take any extra space but I think there are also better sorts that don't take any extra space.
The Legendary Artichoke: Yeah, you are correct. In particular there are a number of sorts that are O(nlogn) time while O(1) space such as Heapsort which you may vaguely remember.
Mammoth Avenger: Yep, okay.
The Legendary Artichoke: So if you use a sort like Heapsort, you could get it in O(1) space. Quicksort would be O(logn) for the reason you just mentioned which is because its recursive we need stack writes. But yeah okay you could go look up a sort that's O(1) space and if you needed to bite that bullet you could get O(nlogn) time and O(1) space. So that's good it gives us different ways to approach this problem. Let's say we need O(n) time, like O(nlogn) is not the trade off I'm willing to make. Do you think there's a way we can improve the space from O(n) to being less than O(n)?
Mammoth Avenger: This require the constraints to be fairly specific but if we're guaranteed that they're integers and they're all unique then we can just sum up the first list and subtract the sum of the second list and that should give us the missing item? And that's a very simply algorithm and that would be linear time and constant space.
The Legendary Artichoke: That would work. Why would you say it's constant space?
Mammoth Avenger: Because we're just doing two sums and then a subtraction?
The Legendary Artichoke: That's true but that doesn't guarantee that it's constant space. Do you know why?
Mammoth Avenger: I guess because we don't know how big the sum is going to be?
The Legendary Artichoke: Correct.
Mammoth Avenger: Gotcha, so if we're dealing with very large integers.
The Legendary Artichoke: Right if we're dealing with very large integers or just sufficiently many integers then clearly the size of the sum we're going to store is, in some way, going to scale with the size of the input right?
Mammoth Avenger: Yeah, is this one of the things where you could use an XOR?
The Legendary Artichoke: How would you use an XOR? How would you do that?
Mammoth Avenger: I think if you just XOR a series of integers and then….how does this work? I'm just going to..is it okay if I just look up what the python XOR function is and just play with it in the console for a bit?
The Legendary Artichoke: Yeah it's the ^, I'm fairly certain.
Mammoth Avenger: So we do 4 ^ 12, then I think it's like if you XOR with the same number again it gets you back to….basically if you have any number and you XOR it with an integer and you XOR with that integer again it undoes the effect. Yeah which is like topological because it's just flipping the bit. So I believe we would just XOR all of the integers in both lists. It feels kind of strange because you're treating the two lists differently but I believe the end result should end up with 5, and that one is actually constant space because it's only an integer at any point.
The Legendary Artichoke: Yeah, that sounds perfect, why don't you go ahead and write that up real quick?
Mammoth Avenger: Alright. I don't know what you would call this but XOR_sum seems close enough. That's it.
The Legendary Artichoke: Yeah let's give that a quick run and make sure that works.
Mammoth Avenger: That looks good. Don't know if that would handle negative numbers. Interesting…
The Legendary Artichoke: No it does because negative numbers, because the XOR is going to respect the negative bit.
Mammoth Avenger: You could use this on any binary datatype like find a missing object from a list of binary objects.
The Legendary Artichoke: Yeah as long as you only have one of the them right? Otherwise you would just get the XOR as the two objects. But yeah you pretty much got it, it was perfect. I don't have anything else for you so: well done.
Mammoth Avenger: Thank you. How does this work from here? Do you have a feedback form afterwards, or do we just chat about it for a bit?
The Legendary Artichoke: It's up to you we can chat about it if you want. We both get feedback forms and we send them to each other and then we can mutually read the feedback. If you have any questions I'm happy to answer them.
Mammoth Avenger: Yeah I'm just back on the job scene after a couple years of running my own startup so it's been a long time since I've done technical interviews any constructive criticism or things that I could do better or work on would be appreciated.
The Legendary Artichoke: Yeah my feedback is honestly you pretty much knocked this one out of the park and you're probably the fastest person I've seen solve this and get to the optimal solution. And you're coding is clearly strong. You're familiar with python you're pretty comfortable I can tell. The only thing that was a slight irk was calling variables x like that's just a peeve of mine, but different interviewers aren't going to care so that's totally and idiosyncrasy that doesn't matter in the long run.
Mammoth Avenger: I always struggle with variable naming in interview questions, I always seem to be vary ambiguous quantities. Any suggestions on what to call a variable for like a string input for like the reverse() function for example?
The Legendary Artichoke: Usually I just call it str for string. As long as there's nothing descriptive of it other than its datatype then I would just name it by its datatype. But again this is not nearly egregious enough for me to mark you down for it. Like I've done interviews where people are writing a fairly complicated function and they're calling things x and y and g and things like that or they call a function f() you know, that sort of thing. So honestly calling the input to a reverse function x is not enough of a violation that I care.
Mammoth Avenger: Cool
The Legendary Artichoke: Overall I'd say you solved this strongly.
Mammoth Avenger: Alright, thank you very much.
The Legendary Artichoke: Cool yeah, no problem take care.
Mammoth Avenger: You too, bye.
The Legendary Artichoke: Bye.