String Shuffle (Python)

Watch someone solve the string shuffle and analysis problem in an interview with an engineer and see the feedback their interviewer left them. Explore this problem and others in our library of interview replays.

Interview Summary

Problem type

String shuffle and analysis

Interview question

1) Write a function that takes a string as an input and returns a shuffled version of that string (i.e. a version where each character is just as likely to be in any given position) 2) Write a function that analyzes how well your previous function randomly shuffles the string

Read more about the questions

Interview Feedback

Feedback about Stateful Armadillo (the interviewee)

Advance this person to the next round?
Thumbs upYes
How were their technical skills?
How was their problem solving ability?
What about their communication ability?

Feedback about Adequate Lobster (the interviewer)

Would you want to work with this person?
Thumbs upYes
How excited would you be to work with them?
How good were the questions?
How helpful was your interviewer in guiding you to the solution(s)?
Solid interview. The amount of hints dropped kept the interview moving a long, which is important (nothing is more frustrating to me than radio silence and me just blabbing and going down the wrong path and wasting all our time). I also felt the transparency was really nice. Often in interviews you just get a problem to "write some algorithm" without any context about constraints or outcomes (do you want to see ANY solution no matter how crappy or is a quality partial solution and an explanation of how it would be finished good enough), do you care about performance, what's the difficultly, if this is the one and only problem I'll be tackling, etc. In this case it was stated clearly there were two parts, each part was going to be challenging and there are multiple solutions and getting something to run was the most important thing, etc so it made it easier to understand what the interviewer wanted from me (with the lack of visual queues over phone screenings this is really important to me, I can't read faces of people and they have a disapproving or bewildered look for example).

Interview Transcript

Stateful Armadillo: Hello?
Adequate Lobster: Hey how's it going?
Stateful Armadillo: Good how are you?
Adequate Lobster: Good. You probably know the drill but go ahead and pick a language that you're comfortable with and we'll get started. Unless you want to do something else first?
Stateful Armadillo: No we can hop right into it?
Adequate Lobster: Great. It's a two part question and the first part is not important that you get it exactly right or anything like that. We can talk through the solution at the end and make sure you have something that makes sense but it doesn't have to be perfect.
Stateful Armadillo: Okay.
Adequate Lobster: So the first part is I want you to write me a function that takes a string as an input and returns a shuffled version of that string. As in a version where each character is just as likely to be in any position.
Stateful Armadillo: Okay. So def shuffle(). So what I'm thinking here for a simple shuffle is, I can do two random index swaps converging in the middle maybe. I can go for that approach I guess. So is it okay use random? I assume it's okay to use random.
Adequate Lobster: Yeah sure.
Stateful Armadillo: I think random is in the math library. No it's in the random library. So what we want to do here is get our str_len. And let me do a quick check on random I think it will return a float.
Adequate Lobster: You got to call it, yeah.
Stateful Armadillo: Yeah I was going to say, it would be helpful to call the function. Alright cool, as expected. Now what I'm going to do to get it into a random index is just take that random and apply it by length. Alright cool, so that gives me a random index and so then I need to swap as many elements as possible and I need this to end eventually so I need to do a case where….so I guess it depends on how random we want it. I was going to split this thing in half and just swap between the two halves of the string but that may not be sufficiently random for your purposes. I guess another option that's more memory intensive is I can keep a running tally of what has been swapped and only choose between the remaining indexes that need to be swapped. I'm fine implementing either, do you have any preference based on what I mentioned?
Adequate Lobster: No, I think let's see what we do.
Stateful Armadillo: So I guess I'm going to do the more memory intensive but maybe conceptually simpler approach. So let's allocate an array of indexes so we're going to do indices = [i in range(0, str_len)]. So now we have a list of indices and now we're going to grab our rand out instead of doing our rand slice against these indices…and I need to make sure I don't pick the same two indices and I don't want a random while true option in here. So to do that, we are going to get a random index against the length of the indices so this changes the length of indices and then we are going to peal out my swap_1. Okay so that's my swap_1 so this doesn't guarantee that it will get the same value so what we'll do here is do this and if the values are equal and, this is maybe not great but let's snake this random edge case where the indexes are equal….I guess I should check against two, so I'll deal with the edge case where there's just two in a second and the simplest case where it just stops. And now that I think about this maybe recursion is going to be simpler but let me just finish writing this though out. Okay so this is basically going through, getting some random indexes….you know this is kind of dumb now that I look at it instead what I want to do is, maybe on the fly generate an array with new indices. Maybe this array and just map it. So how do I generate an array already shuffled from 0 to n? Okay I got an idea. Let me just restart this, I have a simpler idea than where I was going so we're going to do. Sorry I'm sketching something out here I would use the draw tool but just sketching on paper real quick. Okay so we're going to start with our max_length and then we're going to set that to the length of the input string to start with. I guess I'll just call it cur_length. Then I'll generate a number between 1 and the cur_length which should be a random index into the original input. I'm going to push that newly generated index into a list, I'm going to shrink the current length. So I'm dealing with swapping what I'm getting hung up on all these solutions is I want to put together an algorithm that's not going to deal with swapping the same random index. So I guess my initial solution of splitting the string in two, dividing it maybe useful because we would then guarantee we'd never pick the same index from the two different sets. So our cur_length and we need to generate a random number that's not going to be picked or we need to subdivide the problem down to a trivial case which would be 3. 3 is the trivial case. 3 is the real trivial one, pick and index and swap that. Okay I don't know why I'm making this super complicated. New plan: so I'm just going to keep track of a single index for this solution. It's going to be a pointer into the input string position that we know we shuffled and it's going to be a sliding index style thing where I only consider swapping elements in the half of the array I haven't touched basically. What we're going to do on that is cur_index = 0 and we want to basically go from 0 to the end of the array so for i in range(0, input_str). So what we want to do here is generate a random index between i and the length. So back to our random plan so this should be random_index = random.random() * (str_length - i). So now we're just basically going to swap i and this thing. So this is my temp. That will ends, we bump up i by one, i is now doing a random between length and 0 but I need to offset for i so after the first iteration of "foo" on line 8 here this is going to be between 0 and 2 and the next iteration it will give me a number between 1 but I've already burnt index 1 so if I'm thinking about this correctly it should just plus i here.
Adequate Lobster: Pretty good.
Stateful Armadillo: I think this may work. So if this works I guess we can return input_str. Maybe this works.
Adequate Lobster: Not exactly.
Stateful Armadillo: Thing does not support item assignment. Okay so I guess the simple thing would be to turn this into an array then turn it back to a string at the end. If I recall correctly you can just cast a string to a list.
Adequate Lobster: Could you do it inside the shuffle() function? Otherwise it will not met the goal.
Stateful Armadillo: I see. Could I do it inside the shuffle() function. You want me to do it by reference or are you okay with me allocating a new object?
Adequate Lobster: I'm okay with allocating a new object as part of shuffle(), otherwise the second part of the problem is going to be kind of tough.
Stateful Armadillo: Okay no problem. So I'll just rename this to input_arr. And then I would just recast this to a string here.
Adequate Lobster: Yeah I don't think the recasting works, but there's probably some way to do it.
Stateful Armadillo: Well that didn't work, but it also didn't shuffle.
Adequate Lobster: Well it might have.
Stateful Armadillo: Might have, I'll just print the array also. I guess it just randomly shuffled back to the same thing, which is probably not great for a shuffle but I guess it happens. So let's try a super long word. I guess that appears to be reasonably shuffled. And it has the same length…so I guess if I was writing….so if I were to test thing I'd probably check if the shuffled string and input string lengths match. If I were being very thorough I might write a test a unit test that would confirm that all the letters are present in the shuffled string as well.
Adequate Lobster: I mean we can just do something like this to see it a little bit easier. Although it still needs to return a string, not an array.
Stateful Armadillo: Yeah and then I realized why casting doesn't work. It should just be a join() on empty if I recall. I think this should work. There you go.
Adequate Lobster: Okay. I would argue that is a shuffle. I can't say for sure that it's a good one, although it looks good to me which gets us to the second part of the question. Before we move on do you have any questions or do you have any feedback or should we jump right into the second part of the question?
Stateful Armadillo: I just wanted to mention that the methodology for how I would test the method if I were running some automated tests for it but that's the only thing that I would mention at this point.
Adequate Lobster: Yeah I feel like that's above the scope of this part of the question, because it happens to be the second part of the question. So it's not so much automated tests. I want you to put yourself in the mind of something like a TA and I want you to write a function that tests the shuffle but the goal of the test is to test if it's shuffling truly. So some bugs might be that it returns an invalid return, which you should definitely test for, but I wouldn't worry too much about because the second part is to make sure it's actually shuffling the string. Like it is truly random shuffling. So I don't expect you to return true or false, I don't think that would be good enough for me to be able to tell if what you're doing works. I think it would make more sense for it to print out a bunch of things and then for you to tell me: if I print this and this and this, I should be able to eyeball and tell me if it's working.
Stateful Armadillo: Okay. I can kind of dig that. So the way I would tackle this is to write a bunch of little functions to test explicit behavior because based on what you're saying we want to very fine grained behavior like: this is what went wrong so somebody can diagnose what's not working. So I'm probably going to write one little method per failure case so the obvious one is kind of the invalid inputs. So I could write those but…
Adequate Lobster: I guess for the purpose of brevity, skip the easy ones like invalid input. Like invalid return would be good. Yeah don't test that it can handle invalid input and stuff like that.
Stateful Armadillo: Okay so the first invalid return that came to mind is a simple length check so I would do a length check there. Do lengths match and I might do a unit test. Okay because I'm typing this a bunch I would go back and turn those into local variables but I'm going to go on for now.
Adequate Lobster: Fair. Great, let's see that this one works. Let's just call this.
Stateful Armadillo: So the expect output should be nothing. So let's see if we get nothing. Okay no output, so hopefully that worked.
Adequate Lobster: Okay, so in order to not spend our time formatting test cases, for all the test cases that are not trivial, just declare what the test case is and move on. Just make a comment and move on until we get to the meatier ones like: is it shuffling.
Stateful Armadillo: Okay, gotcha. So I would do length checks, does contain all letters. Don't expect crazy letters in there. I guess technically I would want to do an inverse, or that check might check that it doesn't have any extra letters. Alright let's move on to the meat: so how do we determine if something is shuffled or not.
Adequate Lobster: Right.
Stateful Armadillo: So one way to test that may be. So there's a couple of edge cases with that. So input is going to affect this test basically. If I give it a string with all letters 'a's basically it's impossible for that to be shuffled correctly because it's all going to be 'a's. So what I'm thinking is to determine if it's shuffled correctly or that's it's sufficiently shuffled, my expectation would be the delta between the position of the element in the original list is sufficiently different than the position in the new list basically. Does that make sense, or should I explain it further?
Adequate Lobster: That does make sense, I'm not sure it's the best….
Stateful Armadillo: It may not be. Before we jump in then, let me throw out some more options before we jump in. So one option is: are the indexes basically different. That's tricky because there are multiple 'a's, we don't want to search for 'a' all the time. So maybe what I want to do is create a meta or different data structure that contains basically a map, so we have a quick lookup on each individual letter. So I'm going to sketch out what I'm thinking here really quick so you can see where I'm going here. So what I kind of what to do is a map that maybe looks like this and it has maybe some positions or something. So that's where 'a' is at in the string. So you kind of the idea: I want fast look up and the lookup into the dictionary is going to be the position of those letters relative to the original string. And my expectation would be: for any given letter in the new string, it should not have the same index as the original string. Now there may be some possibility that happens but I'm going to set a threshold I guess and my expectation is that 90% of the things don't match up. Like what we saw with the really short string 'foo' it shuffled back to where it started which isn't great but it happened.
Adequate Lobster: Well I have a couple of questions about this that might help you think about this because you're just a little overcomplicating actually.
Stateful Armadillo: Okay.
Adequate Lobster: So if I was to take a deck of cards and shuffle it and I was just to look for one card, the ace of spades, and let's say it started on the top of the deck what determines if that ace of spades is shuffled?
Stateful Armadillo: If it's no longer on the top of the deck.
Adequate Lobster: Is that true? There are 52 cards in a deck right, so is that an accurate statement?
Stateful Armadillo: I guess not. If you shuffle it repeatedly it could end up back on the top.
Adequate Lobster: So if something was truly shuffled. As in it were shuffled repeatedly enough, what is a better way to determine if the ace of spaces was properly shuffled or not?
Stateful Armadillo: I see, so you're saying it should be different between iterations.
Adequate Lobster: Well no, because there's no concept of an iteration. This is a black box shuffling. You shuffled it by swapping a bunch of random spaces. Someone else's shuffling program might split it into halves and split it together. There's no concept of an iteration. It's either shuffled or not.
Stateful Armadillo: Oh I meant iterations of shuffles. If I call shuffle() once my expectation is that the ace should be. Oh you're saying internally shuffled or shuffled a million times.
Adequate Lobster: Well the other question is like: if I shuffle something and the ace is likely to be in any position. And then I shuffle it again, the second shuffle and the first shuffle should be completely independent of each other.
Stateful Armadillo: Yeah maybe I used a loaded word there. What I essentially thinking was: I would shuffle once and my expectation would be that 'a' is in a different position after that. Now I record the position of 'a' in the new list, let's say it's 5, and if I shuffle it again my expectation is that 'a' is at a position other than 5.
Adequate Lobster: Alright and I guess what I want to lead you towards is: you shouldn't be thinking of this in terms of position relative to where it is in previous shuffles. Like if you shuffle it the 170th time, and you shuffle it the 6th time they should be equally shuffled with respect to the 5th time. So whether or not they are in the same position as they were after the last shuffle is not really relevant. It's actually simpler, you don't have to worry about distance from previous iterations, you don't have to worry about the starting set up at all.
Stateful Armadillo: Interesting.
Adequate Lobster: Like think about flipping a coin, do you care if the coin was heads a minute ago or not?
Stateful Armadillo: Right. Alright so if we don't care about it's previous state and we only care about the output….I know this is something obvious I'm just not connecting the dots. And they're independent of events, they have nothing to do with each other. So going back to the coin example, if I flip a coin I all I know at any given point is what it landed on, and I know that's heads.
Adequate Lobster: So how would you determine if a coin is fair?
Stateful Armadillo: Well in the coin world I would, I would keep going back to the iteration. What jumps out at me is you flip it a hundred times and just see if 50-50 comes up or if it converges on 50-50.
Adequate Lobster: Right I agree. That's the way you would determine if a coin is fair: you'd count the number of times it was head and the number of times it was tails. You wouldn't see: it was heads last time is it now tails, it was tails last time is it now heads. That's what you're saying but you shouldn't be doing that you should just be saying: I don't care what it was heads last time I just care how many times it was head and how many times it was tails.
Stateful Armadillo: Okay point taken on the coin. So if I apply that concept to the list then the only thing I care about is….if I have 'abc' and I shuffle it a bunch of times my expectation is that there's an even distribution of where 'a' is present in the list, is that what we're getting at?
Adequate Lobster: Yeah that's exactly right.
Stateful Armadillo: Okay that seems relatively straightforward I guess.
Adequate Lobster: Yeah it's much simpler than what you were trying to do.
Stateful Armadillo: Yeah I get what you're saying for sure. So in reality I guess I would still definitely use a hash map still because that's quick. So I would initialize a map of all 0s I guess so I'm going to do that and then my expectation is I want to see an even distribution of 'a's and I want to start with an input_srt of 'abcde'. And then we're going to shuffle it a 100 time and in that result we want to find the location of 'a's.
Adequate Lobster: I want to say: before you look up the position of 'a', let me ask another question. Is a deck of cards shuffled just because the ace of spaces is shuffled?
Stateful Armadillo: Well, I guess it depends on how thorough the test is. The answer is no. So if I'm treating this as more of a fully blown test then the answer is no.
Adequate Lobster: Right and the reason I ask that is a common bug in a shuffle is you accidentally terminate after a single shuffle. So the 'a' was actually shuffled into a random place in yours, but then you messed up your for i in range check, or you forgot to check string length so you're only actually doing this whole operation here, line 9 through 12, once. And your thing would think that it's shuffled because the 'a' would be properly distributed but that would be invalid because 'b' would not be. So I'm just trying to give examples of why you have to be even more strict.
Stateful Armadillo: Gotcha, I appreciate it. I probably haven't put as much time into this as you have.
Adequate Lobster: This isn't my first rodeo. But to be clear this is not a problem with an easy answer or like nobody just solves it and I think the goal is to talk it through and think about it and see if you're like: oh that makes sense and then come up with an adjustment. I never expect anybody to just write the answer from this.
Stateful Armadillo: Well one day you find that unicorn you'll have to hire them right away. So back to the problem at hand: we're not focused on just 'a' we're focused on everything. Once we have our string shuffled we want to then get the position of all the letters in it and do something with that to keep a running tally of some sort. What is that running tally going to contain. So here's what I'm kind of thinking, and maybe I'm making it too complicated again. But, just thinking out loud, maybe it would be something like 'a', and 'a' would point to a structure and that structure is going to be another lookup of position. So maybe 'a' would be 0 in one iteration and 12 in another. And there's a count on there.
Adequate Lobster: Except for the comma instead of a colon and…yeah that's what I was imagining.
Stateful Armadillo: Yeah the syntax might be a big wonky.
Adequate Lobster: To be clear, it's a map of position with the number of times it was in each position?
Stateful Armadillo: Correct. So I would want to build that, and the way I would evaluate whether it was shuffled or not is my expectation that all of those numbers would be relatively the same, meaning those letters occupied all positions relatively evenly.
Adequate Lobster: Right, so rather than…to avoid the idea of printing true or false, or saying if false: print 'this is not good enough', I think you can just print those numbers and inspect them and we could discuss how to improve on that, but I don't expect anyone to have the statistics background to say what is the right answer to how close should they be?
Stateful Armadillo: I'm certainly not a mathematician. Okay so then basically we're going to iterate over the letter in our result. We're going to ask if the letter is not in.
Adequate Lobster: In python for blank in blank gives you the key or the value?
Stateful Armadillo: The value.
Adequate Lobster: Okay.
Stateful Armadillo: Yeah there's a bunch of ways in python to iterate over things but…
Adequate Lobster: Can you get the key from that?
Stateful Armadillo: I can get it.
Adequate Lobster: Okay, cool. Because you're going to need it, only reason I ask.
Stateful Armadillo: They key by which you mean the index?
Adequate Lobster: Yeah, the index. The position.
Stateful Armadillo: Okay so if the letter is not in our results object….Thanks by the way for taking this interview late, I know it's not ideal.
Adequate Lobster: Yeah no problem, actually one of my coworkers took an interview last night at 9 pm.
Stateful Armadillo: So results iterated over the shuffled…
Adequate Lobster: You're double using i by the way.
Stateful Armadillo: Oh yeah, thank you pair programmer.
Adequate Lobster: Just saving you some time. I want to leave some time for questions and I can't stay super late. I don't know, I just find that a bit more productive. Whether you messed up your for i and j stuff isn't as important.
Stateful Armadillo: Thank you, appreciate it. So if j is not in the results then we're going to do results[letter][j] = 0. And now that everything is initialized we can just do this. Okay, now let's see what happens.
Adequate Lobster: Oh yeah you've got your weird quotes up here, does that…
Stateful Armadillo: That's allowed, there's just a weird typo in the variable. Alright there we go.
Adequate Lobster: Okay great!
Stateful Armadillo: So based on this on an eyeballing point of view, it seemed shuffled plus or minus 5ish with the exception of this outlier of 30 for the 'c'. This output as you're saying, you could statically look at this. I'm sure there's some math where you could sum up everything and get an average or a standard deviation or something.
Adequate Lobster: If you were suspicious about that 'c', what would you do quickly to improve this?
Stateful Armadillo: If I were suspicious about the 'c' what would I do quickly to improve this? Well one method is I could just do more swaps I suppose, if I'm suspecting that basically that's happening because my random index generator there is….
Adequate Lobster: I said it too ambiguously. I didn't mean the shuffle algorithm. I mean in the does_contain_all_letters(), is there something you can do to be less suspicious? Like that 'c': 31 could be completely valid but if you want to know more how would you make this better at determining flaws?
Stateful Armadillo: So if I wanted to know more.
Adequate Lobster: So it's kind of a weird loaded question and it may be too weird.
Stateful Armadillo: I get what you're going after. My mind jumps to….the most important thing to me to solve this as a developer, if I'm a developer the thing that matters to me most is the history, what was the history of 'c'. I know you were steering away from the history idea as far as complexity, but if we are looking to get a little more complex then I could imagine where the scenario where the values of the numbers could themselves be objects a count basically plus the positions that it….
Adequate Lobster: Well there's a really simple way.
Stateful Armadillo: Seems to be the theme right now. I'm overthinking. Yeah nothing obvious jumps to mind, I guess. If I wanted more information on why I ended up with 30 on that 'c' then my expectation would be….I would want.
Adequate Lobster: Let me phrase it back in the coin flip thing. Let's say you flip a coin 5 times and 4 of the times its heads. What would you do to say I'm not sure if this coin is invalid. What would you do to test it more? You're not sure if it was fair or not.
Stateful Armadillo: For that I would get another coin and see if that one was fair or not. I thought maybe what you were getting at is we could try more input strings. But let's see: if I'm flipping a coin and I'm sure the coin is fair, I'm suspecting the coin is not fair, what would I want to evaluate.
Adequate Lobster: Well you're not sure it's statistically significant for the coin to show up heads 4 times.
Stateful Armadillo: Right. I mean one option I guess, the really simple solution is to run it a billion times. Maybe that was the obvious answer I didn't realize.
Adequate Lobster: Exactly what I was looking for. Not a billion but I think a hundred is too few to get a statistically significant sample and if you think the 'c' has a tendency to be in the 3rd position, well if you change this to a thousand and run it again, that 'c'. Now I'm confident your algorithm works. If I wasn't it because there wasn't enough iterations. That's all I'm looking for. To be more sure of your algorithm you simply need to do more. If then you're like: maybe 'b' likes to be in odd positions more than even positions, I'm not sure. Well let's run in ten thousand times. Like the more you run it, the more you can trust these numbers and the more it will converge on almost two thousand.
Stateful Armadillo: Yeah that makes total sense. It's funny, looking back on the problem it's obvious. This is literally a conversation I had with someone about writing a test for a random sequence generator we had and it was almost the same sequence of conversation: just run it more.
Adequate Lobster: Yeah that's exactly right. So the question we're basically out of time for and I want to first give you some feedback and then give you some time to ask me some questions.
Stateful Armadillo: Sure.
Adequate Lobster: That sounds good?
Stateful Armadillo: Sounds great.
Adequate Lobster: So feedback-wise. So I'll start with the overall is you did really well on this. I know it probably felt like you didn't and that's part of the question. It's kind of a gritty question, it's confusing it's a little off-putting but it's designed to not just do algorithms but to think as an engineer in the work place. Like doing tests and thinking about that validity and solving unbounded problems with unclear answers and being able to communicate a spec with someone and think of things like what kind of input string, how many iterations should I run and stuff. So overall I think you came up with an answer that worked, you implemented things super quickly and well. I think people stumble on nested arrays. Like this stuff over here, this is exactly what this should look like, lines 43 through 50, that's exactly what that should look like and you'd be surprised with how few people write that cleanly on their first try, you know? I'd say you took longer on shuffle than I'd like so I wonder if you spent too little time thinking about it before you started writing or I don't know maybe this is the ones that people have seen before and you haven't or not? But your implementation was great, your final implementation was ideal it's just you stumbled around a bit and your double swap would have been a really bad approach. I like that you stopped yourself and said: no this is stupid there's a better way. And then you basically solved it right away. So overall I think you did really well on the problem, you just had a couple stumbles along the way and none of them were egregious. I'm just nitpicking for the most part your code was clean, you were pretty quick and you were able to think and communicate about all the concepts at a high level while coding which this question is designed to mess with.
Stateful Armadillo: Cool.
Adequate Lobster: Does that make sense?
Stateful Armadillo: Yeah that makes sense. I agree, I think your observation about the shuffle is correct I would have preferred to get through that quicker as well. I don't know what threw me off about that I guess I just haven't done a shuffle in while?
Adequate Lobster: Makes sense. I mean it's a deceptively difficult question. I never understood why. I never asked it to myself. I just thought of it and asked it but it's one of those questions that, through asking it to a bunch of people, a surprising amount of people get it wrong and the people that have the least trouble with it are usually really junior in their career, and then they stumble in the next half big time.

We know exactly what to do and say to get the company, title, and salary you want.

Interview prep and job hunting are chaos and pain. We can help. Really.