Intergalactic Avenger, a Google engineer, interviewed The Mighty Eel in Python
Edit distance string comparison
1) Given two strings s1 and s2, write a function that returns if the two strings are equal in a case-insensitive way 2) Expand the first function such that it returns True if the edit distance between s1 and s2 is 1 or lower 3) Expand the function to take a third parameter designating the comparision tolerance between s1 and s2 to be 'n' edit distances or lower
Feedback about The Mighty Eel (the interviewee)
Advance this person to the next round?
How were their technical skills?
How was their problem solving ability?
What about their communication ability?
Great job! You worked through the easy version of the problem very quickly and then were successfully able to build on it to get to more optimal solutions and added constraints. My only suggestion would be to be more vocal as you are thinking through solutions. It's good to think through a solution before starting to code, but it helps to let the interviewer know your thought process just to keep the conversation going and allow the interviewer to guide you at that step as well.
Feedback about Intergalactic Avenger (the interviewer)
Would you want to work with this person?
How excited would you be to work with them?
How good were the questions?
How helpful was your interviewer in guiding you to the solution(s)?
Intergalactic Avenger: Hello? The Mighty Eel: Hey how's it going Intergalactic Avenger: Going pretty good. Yourself? The Mighty Eel: Pretty good. This is my first one of these so pretty excited. Intergalactic Avenger: Okay so let's just dive on right in. So I'm going to ask you an algorithmic question if that's alright. So there's a little selector box on the top right of the coderpad. It says plaintext now and you can pick any language that you think makes sense for an algorithmic type question. The Mighty Eel: I'll probably do Python. Intergalactic Avenger: Python, okay those are usually good for these things. Okay we're going to start with the basic problem and add to it. So the first one is pretty simple and the idea is you're going to define a function that takes in two strings and returns whether they're equal in a case-insensitive way. So of course Python has this but we're going to work through a couple additions to this once we get the basic one down. The Mighty Eel: Okay cool. So I mean there are other Python built-ins that could make this pretty easy as well. Like turning a string to lowercase or uppercase and then comparing them. Intergalactic Avenger: You can use any of those. And we'll talk about the implications of using those and all that kind of stuff. The Mighty Eel: Okay so that would look something like this. And I think that would do it. I'm assuming by case-insensitive you just mean letters of the alphabet. Intergalactic Avenger: So let's write in the shell, case-insensitive compare of
"ABC". The Mighty Eel: Okay. Intergalactic Avenger: I think you're going to want to just do it inline here. The Mighty Eel: Oh got it, I see. Intergalactic Avenger: Then you'll see run. Okay, excellent. So what is the time and space complexity of this algorithm as you have implemented it? The Mighty Eel: So I don't know how
upper()exactly works but I'm guessing what it does is, it likely builds a new string character by character so that would suggest that you have a linear running time and memory is also linear. Intergalactic Avenger: As far as I know, strings in python are immutable, so if you call something like
upper()then it will have to return a new string. Okay and so that's the best-case, worst-case, always-case? The Mighty Eel: Memory is going to be always-case. Runtime would also be always-case unless there's something I'm missing. Intergalactic Avenger: Okay that makes sense. So let's say these are very long strings and we don't expect this to happen very often. Is there an optimization we can make to this so that in the best case it doesn't take
O(n)time. The Mighty Eel: So not expecting it to happen very often you mean that… Intergalactic Avenger: That they're equal, you're not expecting that. Most of the time they're not going to be equal. The Mighty Eel: Mostly they're not equal. Okay, interesting. So we could do something like. So we can go...well first you'll want to check the length of both of them and see that the length is the same. And if not, you can loop through both of them simultaneously and do a comparison character by character and compare them that way. Intergalactic Avenger: Sounds good. The Mighty Eel: Trying to figure out how to do this in the cleanest way. I wouldn't want to create one…. Intergalactic Avenger: Wouldn't want to create what? The Mighty Eel: I wouldn't want to create a new..I guess this might terminate early I guess this might, so let's just try this for now. Intergalactic Avenger: Alright, I think we should also have one failing case so like
"abcd"or something. Okay, excellent. Quick question here, this is kind of an extra credit but as you've written these are, I guess it depends on what Python you're using. If these are unicode strings or 8-bit strings. So if this is Python2…. The Mighty Eel: Which it is. Intergalactic Avenger: Which it is. Can we think of any possible thing that might go wrong with this? The Mighty Eel: Yeah, you could be comparing unicode versus 8-bit strings here and I imagine that might cause problems. Intergalactic Avenger: So let's even just think in just both 8-bit strings. So you don't necessarily have to fix this problem because the 8-bit string thing is kind of a mess, so I'm just curious how familiar you are with if this was
'ábc'and this was
'ÁBC'. So let's even see if that works. The Mighty Eel: It doesn't. Intergalactic Avenger: Which is okay. Can you think about why this isn't working? The Mighty Eel: So again I don't know how the upper conversion happens behind the scene but I'm guessing it either ignores….actually one thing I'm curious about is what that actually is for the strings. So it doesn't get converted so it probably has some whitelist of character that it only works on. Intergalactic Avenger: So interestingly what you'll notice is Python is pretty good with this if you're dealing with…oh it doesn't work. I would have guessed that would actually work. The Mighty Eel: Maybe in Python3. Let's see what happens here. Intergalactic Avenger: Oh well. I actually would have thought that calling the unicode string would have fixed your problem. But in in any event, let's not worry about that case for now. Okay so that's all good now what is the best and worst case runtime and space complexity of this one? The Mighty Eel: So best-case is that the first character is different, that would be
O(1), worst-case is still
O(n). So average-case depends on what your input is. Memory is the same as that, memory is at best
O(1)and at worst
O(n). Intergalactic Avenger: Is it
O(n)at the worst case? The Mighty Eel: I believe so because you'd have to create a new copy of each character because I'm using
upper()here. Intergalactic Avenger: But is it going to keep those characters around? Do you need all
nof those characters? The Mighty Eel: Yeah that's a great point, it's not going to use those over and over again. Yeah it's not going to create new ones each time. Yeah I believe it would be
O(1)then. Intergalactic Avenger: Alright,
O(1). Excellent so already we're improving on it. Now we're going to introduce a bit of a wrinkle and that is: what if I want these to
return Trueif they are the same or they're edit distance is
1or less. So are you familiar with edit distance? The Mighty Eel: I am, yeah. Intergalactic Avenger: So in that case. For example, like we've shown here. Like this is going to be
Trueand this is going to be
Falseand this is going to be
True. The Mighty Eel: So we want… Intergalactic Avenger: So in the case that I wrote, there was an insertion of a
"D"and you could also imagine that
"A"deletion would also…. The Mighty Eel: Yeah, let me think about that. Intergalactic Avenger: And there's definitely a number of ways of going this so whatever way you think of starting it is fine. The Mighty Eel: So I'm thinking about if I can use this iterate through each string? I'm wondering if that's constraining my thinking. So basically what we can tolerate is one insertion, one deletion, or one modified character. Intergalactic Avenger: I mean you can define that any way you want. And I think it makes things a little more straightforward if we just say insertions and deletions just to get started. The Mighty Eel: Okay. Then there's the….so there's an algorithmic approach then which would just be basically a modification of this algorithm and you keep track of both iterators and you iterate them together and if one of them is off. Or if one of them has a mismatch you iterate the next one to see if it is an insertion or deletion and actually the action you would take would be different if it is truly and insertion or deletion. The approach I'm think of still involves using the two iterators. So and if you had…Okay so I'm thinking now that I might use this method I've written as a subroutine so the actual…so this might be…this new
ins_del_compare()would call out to the
case_insensitive_compare()and…Okay so I haven't fully thought this out yet but let's see what happens. So if the lengths are not equal then we check if they're within one of each other. Intergalactic Avenger: Okay. The Mighty Eel: At this point we hypothesize that the longer one has an insertion relative to the shorter one. I just realized this problem is symmetric in that way. Then, for…maybe we don't need to use this method. Intergalactic Avenger: Yeah I think that should be what…. The Mighty Eel: Yeah I'm just working out the sense post issue right here. Let's see, at this point we push the longer one more so yeah. Is those two are unequal…Actually, yeah okay. Intergalactic Avenger: I could see a case here where….I guess if you've already checked that the two are one apart. I was thinking with this loop the longer one could just have twice as many characters as the other one, but you've already checked that the lengths are off by 1. The Mighty Eel: Yeah, off by 1. I'm not sure if I'm done or not. I think there's something I'm missing that I need to see the edge case to… Intergalactic Avenger: Right well if the lengths are equal then there might be something else going on here. The Mighty Eel: Right if…okay I'll be lazy. Intergalactic Avenger: I mean that's just the
elsestatement that we'll be using. The Mighty Eel: That's right. Intergalactic Avenger: I think it's only going to return that. The Mighty Eel: What about the
elseof the previous if statement? The one on 15? Intergalactic Avenger: Right, then they would just be… The Mighty Eel: Right looking good. Alright let's run it. This block here it makes me a tiny bit uncomfortable because it seems that you could do this multiple times but because you're doing the check up here then it's stopping it from being able to actually go multiple times. Intergalactic Avenger: What do you mean by multiple times? Oh you mean the whole thing could go multiple times. The Mighty Eel: Yeah this whole
forloop could go multiple times where this doesn't equal that so you push this one and you increment that one and you keep going. But I think because the way you've written this check up on line 15, you can't do it more than once. And as you've noticed there's that symmetry to it and this would hit either an insert or delete by checking the shorter one. Let's see if this works. Intergalactic Avenger: Alright. The Mighty Eel: I think it should. Intergalactic Avenger: Yeah I'm not sure either. The Mighty Eel: Alright! Intergalactic Avenger: Works on these cases. The Mighty Eel: Yeah can you try those as well. Look at that. Okay very good, very good. And then what's the runtime complexity and space of all this? Intergalactic Avenger: Okay in the case where the strings are equal it's the same as before. In the case where they're not equal by 1 runtime would look like….it would also be between
O(n)and memory would be the same as before too. Memory would be
O(1). The Mighty Eel: Excellent. And now, the really tough one. So this is great, it's all good so far. And now I want to add a parameter where, instead of being hardcoded to 1, I could put a parameter at the end here for how many edit distance I'm willing to tolerate. So if I did like 2 here, on the last one there's 2 deletes instead of just the 1. Intergalactic Avenger: Yeah I mean this is pretty hard because we also have..what shows the case more clearly. The Mighty Eel: Yeah because one could be a delete and the other could be an insert. Intergalactic Avenger: Yeah exactly. The Mighty Eel: So you were able to take a nice little shortcut by checking the length up here. Can you take the advantage that with only 1 edit the lengths are telling you something whereas here the lengths are not telling you quite as much because as you've shown in the last one, they could be the same length but depending on the number it could be either acceptable or unacceptable. Intergalactic Avenger: So I'll have to change my approach a little bit. So I unfortunately have to head out in about five minutes so what I can do is talk about what I would do and any approaches? The Mighty Eel: Okay, yeah that's fine. Intergalactic Avenger: Yeah sorry, I didn't know it would go for this long. The Mighty Eel: No problem. Intergalactic Avenger: So what I'm thinking is…well one kind of cheap way is implementing the Levenshtein Distance and kind of compare it that way? That would be the pretty standard dynamic programming type of exercise. The Mighty Eel: Right, and do you know the the space and time complexity of that? Intergalactic Avenger: Let me think about that. I'm thinking space would be up to
O(n^2)no space is
O(n)because you're keeping track of a parallel array essentially? Unless it's
O(n^2)if you're doing…if you have to keep track of every pair of characters. The Mighty Eel: The standard approach is where you build up a table which is
nwhich would be
O(n^2)but there's this one optimization that you could make that you never actually look more than one row behind as you're filling it. So there's this optimization that you can do it in
O(n). Intergalactic Avenger: I see. I can see how that optimization would work. So yeah that would be one way we would do it. It would take me longer than ten minutes to implement it though… The Mighty Eel: Yeah that's fine. So then we're looking at
O(n)space and what's the time complexity of that approach where you get the full Levenshtein Distance? Intergalactic Avenger: The time complexity I think is
O(n)because you...you're basically filling out this table in each step… The Mighty Eel: Each cell? Intergalactic Avenger: Yeah each cell of the table you're basically looking at, you're not looping through the entire backlog essentially. The Mighty Eel: And how long is the table? Intergalactic Avenger: Table is n. The Mighty Eel: So the table has all the characters on the first one as the rows and all the characters on the second one as columns. Intergalactic Avenger: Right so is the smaller table you build, it's the size of the diagonal of the array. Is that correct? The Mighty Eel: So it's an
n^2table so you have all the characters on the top, and the optimization is what you're saving but you still have to fill in every value in the table. Intergalactic Avenger: Got it. The Mighty Eel: So that's a good base:
O(n)memory. But then the question is: is there a way to improve that. And obviously in the worst case, like if the tolerance were the number of letters and you still had to fill in that whole table then even the worst case in any implementation is also going to be out that far. But sort of what we're trying to do here by going incrementally through and shortcutting it is trying to see if we can find a way to stop it early so we're not filling in that entire table. Intergalactic Avenger: It'd terminate early somewhere. The Mighty Eel: Right. Intergalactic Avenger: Yeah that's interesting. Maybe I'll try to see if I can get that working after this. The Mighty Eel: So another thing just to throw it out there is: if you look at this tolerance, so you need to get rid of this piece here with the length because you can't do anything with the lengths. But you can take a look at this step here where you can think of the problem as being…if the two characters you're looking at are not the same, one of two things could be happening. It could be a deletion, in which case you sort of used up one of your tolerance, and you're just repeating the same problem for the substrings afterwards. Or insertion and you're doing the same thing. So you basically call…And with iterators I'm not sure if you could make that work with iterators instead of just using indexes into the string, but it would basically be that you call
ins_del_compare()twice to possible…was it insertion or was it deletion type of thing. Intergalactic Avenger: Yeah and you would return the or of those two. The Mighty Eel: Right. Just something to think about if you have time and want to work through that. Intergalactic Avenger: Yeah cool thanks so much, that was a lot of fun. The Mighty Eel: Alright well take care and good luck with all your interviewing. Intergalactic Avenger: Thanks, you as well. The Mighty Eel: Bye. Intergalactic Avenger: Alright, have a good one. Bye bye.