Ironic Panda: Hello?
Meta Dolphin: Hi can you hear me?
Ironic Panda: Hey how's it going?
Meta Dolphin: Good how are you?
Meta Dolphin: Yeah so thank you for taking the time to join this interview. I think this will be a good practice for both of us so thanks. Yeah so just a little about myself: I'm fairly new to this platform, I've been doing this for a couple of months now. I joined this platform I thought I wanted to improve my skills both as an interviewer and as an interviewee so I thought this platform would be a good chance to...So how long have you been preparing as and interviewer, how long have you been using this platform?
Ironic Panda: On this platform for a couple of weeks. This is like my third or second interview and I'm basically just getting interviewed just for practice and maybe try to start looking for a new job so I guess this is a good chance for me to get some practice.
Meta Dolphin: Yeah yeah definitely. Yeah so we can dive right into the question and I think you'll be coding in c++? So I'll copy the question right over…
Ironic Panda: Okay so given an array of integers, return the k most frequent elements. So for the first example we have [1, 1, 1, 2, 2, 3] with 1 being the most frequent one and k is 2 so we want the second most frequent one which is 2 so you return [1, 2]. Okay straightforward enough. So I guess a kind of easy way to do this is to look through it have a dictionary to count how many times it appears for each number in that array and after that I can sort the count and just sort the top key of that. If I'm doing that the time complexity would be basically the complexities for sorting them so the worst case would be O(nlogn) and in many different case it might be lower than that but pretty much at that level. But I guess that's a little overkill since we don't really need to sort everything cause we want the k most frequent one. So instead of sorting them we can actually just grab the k most frequent one. We can do that by building a heap I think, so if we build a maximum heap using the count as the value as we build the heap and every time we pop out the most frequent one and reorganize and we do it k times. This way we can lower the time complexity to be just like….we actually don't need to build everything we only need to keep the heap as big as k so time complexity is going to be O(nlogk) or something like that.
Meta Dolphin: Yeah sounds good.
Ironic Panda: So let's try some coding then.
Meta Dolphin: Sure let's do it.
Ironic Panda: I guess the best we can get, am I right about that. I guess I just wanted to know if that's the right direction before I get started.
Meta Dolphin: Yeah that's the right direction, let's dive right into it.
Ironic Panda: Sure okay. So let's have a function that returns a list of integers. Let's just call it kMostFreq() and it will take a list of integers as an input. Something like that. So that's going to be our function we got a list of input here. First thing I'm going to do is create a dictionary and count for each number I'm seeing here.
Meta Dolphin: I think you're missing a bracket on line 29.
Ironic Panda: So I got this and I need to keep, I got a priority_queue for us to use. So I'm going to need the count and I'm also going to need the values so why don't we just make it compare. Make it as a pair. Actually we just want a min_heap that way everytime we push something into the heap we pop the less frequent one so in the end we get the most frequent one in the heap.
Meta Dolphin: Exactly.
Ironic Panda: So I just want a min heap and the behavior should be like a mean heap. So the next thing I'm going to do is loop everything through a dictionary….Okay and that seems to be it.
Meta Dolphin: Yeah that looks good. Could you reiterate the time complexity for this please?
Ironic Panda: So the time complexity is going to be. Let's start from the beginning. So for the first for loop it's going to be O(n), n being the number of values in that array. And for the second it's going to be a little less than O(n). It would be O(m), m being the number of different numbers and here the third loop, every time….on the outside we loop t times at most so there's a k and for the heap to pop out a value and reorganize it takes log(n of k) so it's going to be log(k) here. So the overall time complexity is going to be whichever is bigger either O(klogk) or O(klogk, n) or something like that.
Meta Dolphin: Yeah sounds good. So as an interviewer I would be completely fine with this implementation I think it works great. I think the purpose is great. But since now we have quite a bit of time left and we've achieved this time complexity already, let's try to make it better. Can we do this linear time?
Ironic Panda: Let me think about it. Can we do linear time? Well the only thing I can think of off the top of my head is: if the input is special like if it's not bigger than a specific number we can maybe use bucket sort and that way we can lower it to linear time.
Meta Dolphin: Yeah I think that's the right approach here because in the note it says that k is always valid so the input array will always have, at most, k unique second elements. So if we look at the example at like 18, if example array is [1, 1, 2, 2, 3, 3], k cannot be 2, because there are no two unique most frequent elements. Because if k is 2 we have 3 numbers that share the same frequency so k is not a valid number to be 2 so in this example the only valid k it can have is 3.
Ironic Panda: Okay thanks for pointing that out I didn't really notice that at the beginning. So let me think about what you just said. So there are 3 different number with the same frequency so the number k has to be at least bigger than that.
Meta Dolphin: Yes.
Ironic Panda: Okay. Is that related to what I was saying about using bucket sort? I wasn't sure how those two are related.
Meta Dolphin: Yeah sure so if we look at the example on line 18 right? [1, 1, 2, 2, 3, 3] we see that each number has a frequency of 2. If we can use the bucket to represent the frequency of numbers can we do something? And I should clarify something. And in this example the relative order of return values doesn't really matter. What that means is this is equal to this. So relative order doesn't matter. So if we can use frequency as the bucket….
Ironic Panda: That way we can easily sort everything. I got you, I got you. So if we held something number and we can allocate another array to do something like that. So let's just use the example from line 18 so I can basically allocate an array which has space of n at most.
Meta Dolphin: Okay sounds good.
Ironic Panda: And this one means with frequency at 1, this means frequency at 2, this means frequency at 3 so on and so forth so that way we can easily use the bucket sort and do this thing. Okay I got it. Shall we try some code?
Meta Dolphin: Alright let's do it.
Ironic Panda: I'll just call it kMostFreq2 for simplicity. That's bad naming convention but for the interview purpose I will just use that. Okay I still want to do the mapping stuff, which makes it a bit easier here….I guess that doesn't really matter I want to keep a variable here to count which one is the most frequent one that way we don't have to loop through the end from the very beginning…..So everything is in the bucket now so…
Meta Dolphin: That looks good…
Ironic Panda: Yeah actually that's not the right placement for the if statement and I want to put it here.
Meta Dolphin: Yeah that is correct but we can assume that k is always right. Or always valid so it will never...but that is also fine. What you have is also fine. Just one minor thing: can we check the boundaries in line 37?
Ironic Panda: Sure, let me think. Theoretically we never hit a boundary right?
Meta Dolphin: So if we have our input array of something like this, we have a bucket…
Ironic Panda: Yeah you're right you're right. So I guess when I allocate the bucket, while this actually needed to be something like that...something like that.
Meta Dolphin: That looks good, just please guide me through the time complexity and I think we're good.
Ironic Panda: Okay so like what we did before the first one is still O(n) linear time, the second one is linear time instead of O(n) it's going to be O(m) number of different number here. And at the end we got here it's still linear, it's linear the size of the bucket and the bucket is the same size as our input so it's O(n) linear time.
Meta Dolphin: Can you explain to me why this is still linear even though we have two loops inside?
Ironic Panda: So even though we have 2 for loops here an outside for loop and an inside looping through everything. But….so the second for loop. So for that piece of code to behave, the code inside the inner loop, this code can only be run at most k times. And if we are not doing it at all, we are only running the outside loop which will just loop through it now drawn into the inside loop at all but if we ever hit this if statement and go into this inside loop, we can run this as much as k time. So it's either going to...if bucket[i].size = 0 then no inner loop run. Or else inner loop runs at most k times. So you either don't run the inner loop or the inner loop runs k times so overall it's still linear.
Meta Dolphin: Exactly. Just one more thing: line 40 should be equals there right?
Ironic Panda: Yes.
Meta Dolphin: Yep I think your logic makes sense.
Ironic Panda: Okay.
Meta Dolphin: Yeah thanks, that's all I had I think you did a great great job in implementing O(klogk) and improving to linear time complexity.
Ironic Panda: Thank you, thank you.
Meta Dolphin: Yeah so that's all I had so thanks for joining in and good luck with your future interviews.
Ironic Panda: Okay thank you very much thank you for your time.
Meta Dolphin: No problem, thank you have a good night. Bye bye
Ironic Panda: You too, bye.