Recursive Cactus has been working as a full-stack engineer at a well-known tech company for the past 5 years, but he’s now considering a career move.
Over the past 6 months, Recursive Cactus (that’s his anonymous handle on interviewing.io) has been preparing himself to succeed in future interviews, dedicating as much as 20-30 hours/week plowing through LeetCode exercises, digesting algorithms textbooks, and of course, practicing interviews on our platform to benchmark his progress.
Recursive Cactus’s typical weekday schedule
|6:30am – 7:00am||Wake up|
|7:00am – 7:30am||Meditate|
|7:30am – 9:30am||Practice algorithmic questions|
|9:30am – 10:00am||Commute to work|
|10:00am – 6:30pm||Work|
|6:30pm – 7:00pm||Commute from work|
|7:00pm – 7:30pm||Hang out with wife|
|7:30pm – 8:00pm||Meditate|
|8:00pm – 10:00pm||Practice algorithmic questions|
Recursive Cactus’s typical weekend schedule
|8:00am – 10:00am||Practice algorithmic questions|
|10:00am – 12:00pm||Gym|
|12:00pm – 2:00pm||Free time|
|2:00pm – 4:00pm||Practice algorithmic questions|
|4:00pm – 7:00pm||Dinner with wife & friends|
|7:00pm – 9:00pm||Practice algorithmic questions|
But this dedication to interview prep has been taking an emotional toll on him, his friends, and his family. Study time crowds out personal time, to the point where he basically has no life beyond work and interview prep.
“It keeps me up at night: what if I get zero offers? What if I spent all this time, and it was all for naught?”
We’ve all been through the job search, and many of us have found ourselves in a similar emotional state. But why is Recursive Cactus investing so much time, and what’s the source of this frustration?
He feels he can’t meet the engineer hiring bar (aka “The Bar”), that generally accepted minimum level of competency that all engineers must exhibit to get hired.
To meet “The Bar,” he’s chosen a specific tactic: to look like the engineer that people want, rather than just be the engineer that he is.
It seems silly to purposefully pretend to be someone you’re not. But if we want to understand why Recursive Cactus acts the way he does, it helps to know what “The Bar” actually is. And when you think a little more about it, you realize there’s not such a clear definition.
What if we look at how the FAANG companies (Facebook, Amazon, Apple, Netflix, Google) define “The Bar?” After all, it seems those companies receive the most attention from pretty much everybody, job seekers included.
Few of them disclose specific details about their hiring process. Apple doesn’t publicly share any information. Facebook describes the stages of their interview process, but not their assessment criteria. Netflix and Amazon both say they hire candidates that fit their work culture/leadership principles. Neither Netflix nor Amazon describes exactly how they assess against their respective principles. However, Amazon does share how interviews get conducted as well as software topics that could be discussed for software developer positions.
The most transparent of all FAANGs, Google publicly discloses its interview process with the most detail, with Laszlo Bock’s book Work Rules! adding even more insider color about how their interview process came to be.
And speaking of tech titans and the recent past, Aline (our founder) mentioned the 2003 book How Would You Move Mount Fuji? in a prior blog post, which recounted Microsoft’s interview process when they were the pre-eminent tech behemoth of the time.
In order to get a few more data points about how companies assess candidates, I also looked at Gayle Laakmann McDowell’s “Cracking the Coding Interview”, which is effectively the Bible of interviewing for prospective candidates, as well as Joel Spolsky’s Guerilla Guide to Interviewing 3.0, written by an influential and well-known figure within tech circles over the past 20-30 years.
|Apple||Not publicly shared|
|Amazon||Assessed against Amazon’s Leadership principles|
|Not publicly shared|
|Netflix||Not publicly shared|
|1. General cognitive ability 2. Leadership 3. “Googleyness” 4. Role-related knowledge|
|Cracking the Coding Interview – Gayle Laakmann McDowell||– Analytical skills – Coding skills – Technical knowledge/computer science fundamentals – Experience – Culture fit|
|Joel Spolsky||– Be smart – Get things done|
|Microsoft (circa 2003)||– “The goal of Microsoft’s interviews is to assess a general problem-solving ability rather than a specific competency.” – “Bandwidth, inventiveness, creative problem-solving ability, outside-the-box thinking” – “Hire for what people can do rather than what they’ve done” – Motivation|
It’s not surprising that coding and technical knowledge would be part of any company’s software developer criteria. After all, that is the job.
But beyond that, the most common criteria shared among all these entities is a concept of intelligence. Though they use different words and define the terms slightly differently, all point to some notion of what psychologies call “cognitive ability.”
|Source||Definition of cognitive ability|
|“General Cognitive Ability. Not surprisingly, we want smart people who can learn and adapt to new situations. Remember that this is about understanding how candidates have solved hard problems in real life and how they learn, not checking GPAs and SATs.”|
|Microsoft (circa 2003)||“The goal of Microsoft’s interviews is to assess a general problem-solving ability rather than a specific competency… It is rarely dear what type of reasoning is required or what the precise limits of the problem are. The solver must nonetheless persist until it is possible to bring the analysis to a timely and successful conclusion.”|
|Joel Spolsky||“For some reason most people seem to be born without the part of the brain that understands pointers.”|
|Gayle Laakmann McDowell||“If you’re able to work through several hard problems (with some help, perhaps), you’re probably pretty good at developing optimal algorithms. You’re smart.”|
All these definitions of intelligence resemble early 19th-century psychologist Charles Spearman’s theory of intelligence, the most widely acknowledged framework for intelligence. After performing a series of cognitive tests on school children, Spearman found that those who did well in one type of test tended to also perform well in other tests. This insight led Spearman to theorize that there exists a single underlying general ability factor (called “g” or “g factor”) influencing all performance, separate from specific, task-specific abilities (named “s”).
If you believe in the existence of “g” (many do, some do not… there exist different theories of intelligence), finding candidates with high measures of “g” aligns neatly with the intelligence criteria companies look for.
While criteria like leadership and culture fit matter to companies, “The Bar” is not usually defined in those terms. “The Bar” is defined as having technical skills but also (and perhaps more so) having general intelligence. After all, candidates aren’t typically coming to interviewing.io to specifically practice leadership and culture fit.
The question then becomes how you measure these two things. Measuring technical skills seems tough but doable, but how do you measure “g?”
Mentioned in Bock’s book, Frank Schmidt’s and John Hunter’s 1998 paper “The Validity and Utility of Selection Methods in Personnel Psychology” attempted to answer this question by analyzing a diverse set of 19 candidate selection criteria to see which predicted future job performance the best. The authors concluded general mental ability (GMA) best predicted job performance based on a statistic called “predictive validity.”
In this study, a GMA test referred to an IQ test. But for Microsoft circa 2003, puzzle questions like “How many piano tuners are there in the world?” appear to have taken the place of IQ tests for measuring “g”. Their reasoning:
“At Microsoft, and now at many other companies, it is believed that there are parallels between the reasoning used to solve puzzles and the thought processes involved in solving the real problems of innovation and a changing marketplace. Both the solver of a puzzle and a technical innovator must be able to identify essential elements in a situation that is initially ill-defined.”
– “How Would You Move Mount Fuji?” – page 20
Fast forward to today, Google denounces this practice, concluding that “performance on these kinds of questions is at best a discrete skill that can be improved through practice, eliminating their utility for assessing candidates.”
So here we have two companies who test for general intelligence, but who also fundamentally disagree on how to measure it.
But maybe as Spolsky and McDowell have argued, the traditional algorithmic and computer science-based interview questions are themselves effective tests for general intelligence. Hunter & Schmidt’s study contains some data points that could support this line of reasoning. Among all single-criteria assessment tools, work sample tests possessed the highest predictive validity. Additionally, when observing the highest validity regression result of two-criteria assessment tool (GMA test plus work sample test), the standardized effect size on the work sample rating was larger than that of the GMA rating, suggesting a stronger relationship with future job performance.
If you believe algorithmic exercises function as work sample tests in interviews, then the study suggests traditional algorithm-based interviews could predict future job performance, maybe even more than a GMA/IQ test.
Recursive Cactus doesn’t believe there’s a connection.
There’s little overlap between the knowledge acquired on the job and knowledge about solving algorithmic questions. Most engineers rarely work with graph algorithms or dynamic programming. In application programming, lists and dictionary-like objects are the most common data structures. However, interview questions involving those are often seen as trivial, hence the focus on other categories of problems.
– Recursive Cactus
In his view, algorithms questions are similar to Microsoft’s puzzle questions: you learn how to get good at solving interview problems, which to him don’t ever show up in actual day-to-day work, which, if true, wouldn’t actually fit with the Schmidt & Hunter study.
Despite Recursive Cactus’s personal beliefs, interviewers like Spolsky still believe these skills are vital to being a productive programmer.
A lot of programmers that you might interview these days are apt to consider recursion, pointers, and even data structures to be a silly implementation detail which has been abstracted away by today’s many happy programming languages. “When was the last time you had to write a sorting algorithm?” they snicker.
Still, I don’t really care. I want my ER doctor to understand anatomy, even if all she has to do is put the computerized defibrillator nodes on my chest and push the big red button, and I want programmers to know programming down to the CPU level, even if Ruby on Rails does read your mind and build a complete Web 2.0 social collaborative networking site for you with three clicks of the mouse.
– Joel Spolsky
Spolsky seems to concede that traditional tech interview questions might not mimic actual work problems, and therefore wouldn’t act as work samples. Rather, it seems he’s testing for general computer science aptitude, which is general in a way, but specific in other ways. General intelligence within a specific domain, one might say.
That is, unless you believe intelligence in computer science is general intelligence. McDowell suggests this:
There’s another reason why data structure and algorithm knowledge comes up: because it’s hard to ask problem-solving questions that don’t involve them. It turns out that the vast majority of problem-solving questions involve some of these basics.
– Gayle Laakmann McDowell
This could be true assuming you view the world primarily through computer science lenses. Still, it seems pretty restrictive to suggest people who don’t speak the language of computer science would have more difficulty solving problems.
At this point, we’re not really talking about measuring general intelligence as Spearman originally defined it. Rather, it seems we’re talking about specific intelligence, defined or propagated by those grown or involved in traditional computer science programs, and conflating that with general intelligence (Spolsky, McDowell, Microsoft’s Bill Gates, and 4 of 5 FAANG founders studied computer science at either some Ivy League university or Stanford).
Maybe when we’re talking about “The Bar,” we’re really talking about something subjective, based on whoever is doing the measuring, and whose definition might not be consistent from person-to-person.
Looking at candidate assessment behavior from interviewers on the interviewing.io platform, you can find some evidence that supports this hypothesis.
On the interviewing.io platform, people can practice technical interviews online and anonymously, with interviewers from top companies on the other side. Interview questions on the platform tend to fall into the category of what you’d encounter at a phone screen for a back-end software engineering role, and interviewers typically come from companies like Google, Facebook, Dropbox, Airbnb, and more. Check out our interview showcase to see how this all looks and to watch people get interviewed. After every interview, interviewers rate interviewees on a few different dimensions: technical skills, communication skills, and problem-solving skills. Each dimension gets rated on a scale of 1 to 4, where 1 is “poor” and 4 is “amazing!”. You can see what our feedback form looks like below:
If you do well in practice, you can bypass applying online/getting referrals/talking to recruiters and instead immediately book real technical interviews directly with our partner companies (more on that in a moment).
When observing our most frequent practice interviewers, we noticed differences across interviewers in the percent of candidates that person would hire, which we call the passthrough rate. Passthrough rates ranged anywhere between 30% and 60%. At first glance, certain interviewers seemed to be a lot stricter than others.
Because interviewees and interviewers are anonymized and matched randomly[^1], we wouldn’t expect the quality of candidates to vary much across interviewers, and as a result, wouldn’t expect interviewee quality to explain the difference. Yet even after accounting for candidate attributes like experience level, differences in passthrough rates persist.[^2]
Maybe some interviewers choose to be strict on purpose because their bar for quality is higher. While it’s true that candidates who practiced with stricter interviewers tended to receive lower ratings, they also tended to perform better on their next practice.
This result could be interpreted in a couple of ways:
If the latter were true, you would expect that candidates who practiced with stricter interviewers would perform better in real company interviews. However, we did not find a correlation between interviewer strictness and future company interview passthrough rate, based on real company interviews conducted on our platform.[^3]
Interviewers on our platform represent the kinds of people a candidate would encounter in a real company interview, since those same people also conduct phone screens and onsites at the tech companies you’re all applying to today. And because we don’t dictate how interviewers conduct their interviews, these graphs could be describing the distribution of opinions about your interview performance once you hang up the phone or leave the building.
This suggests that, independent of your actual performance, whom you interview with could affect your chance of getting hired. In other words, “The Bar” is subjective.
This variability across interviewers led us to reconsider our own internal definition of “The Bar,” which determined which candidates were allowed to interview with our partner companies. Our definition strongly resembled Spolsky’s binary criteria (“be smart”), heavily weighing an interviewer’s Would Hire opinion way more than our other 3 criteria, leading to the bimodal, camel-humped distribution below.
While our existing scoring system correlated decently with future interview performance, we found that an interviewer’s Would Hire rating wasn’t as strongly associated with future performance as our other criteria were. We lessened the weight on the Would Hire rating, which in turn improved our predictive accuracy.[^4] Just like in “Talledega Nights” when Ricky Bobby learned there existed places other than first place and last place in a race, we learned that it was more beneficial to think beyond the binary construct of “hire” vs. “not hire,” or if you prefer, “smart” vs. “not smart.”
Of course, we didn’t get rid of all the subjectivity, since those other criteria were also chosen by the interviewer. And this is what makes assessment hard: an interviewer’s assessment is itself the measure of candidate ability.
If that measurement isn’t anchored to a standard definition (like we hope general intelligence would be), then the accuracy of any given measurement becomes less certain. It’s as if interviewers used measuring sticks of differing lengths, but all believed their own stick represented the same length, say 1 meter.
When we talked to our interviewers to understand how they assessed candidates, it became even more believable that different people might be using measuring sticks of differing lengths. Here are some example methods of how interviewers rated candidates:
Having different assessment criteria isn’t necessarily a bad thing (and actually seems totally normal). It just introduces more variance to our measurements, meaning our candidates’ assessments might not be totally accurate.
The problem is, when people talk about “The Bar,” that uncertainty around measurement usually gets ignored.
You’ll commonly see people advising you only to hire the highest quality people.
A good rule of thumb is to hire only people who are better than you. Do not compromise. Ever.
– Laszlo Bock
Don’t lower your standards no matter how hard it seems to find those great candidates.
– Joel Spolsky
In the Macintosh Division, we had a saying, “A player hire A players; B players hire C players”–meaning that great people hire great people.
– Guy Kawasaki
Every person hired should be better than 50 percent of those currently in similar roles – that’s raising the bar.
– Amazon Bar Raiser blog post
All of this is good advice, assuming “quality” could be measured reliably, which as we’ve seen so far, isn’t necessarily the case.
Even when uncertainty does get mentioned, that variance gets attributed to the candidate’s ability, rather than the measurement process or the person doing the measuring.
[I]n the middle, you have a large number of “maybes” who seem like they might just be able to contribute something. The trick is telling the difference between the superstars and the maybes, because the secret is that you don’t want to hire any of the maybes. Ever.
If you’re having trouble deciding, there’s a very simple solution. NO HIRE. Just don’t hire people that you aren’t sure about.
– Joel Spolsky
Assessing candidates isn’t a fully deterministic process, yet we talk about it like it is.
“Compromising on quality” isn’t really about compromise, it’s actually about decision-making in the face of uncertainty. And as you see from the quotes above, the conventional strategy is to only hire when certain.
No matter what kind of measuring stick you use, this leads to “The Bar” being set really high. Being really certain about a candidate means minimizing the possibility of making a bad hire (aka “false positives”). And companies will do whatever they can to avoid that.
A bad candidate will cost a lot of money and effort and waste other people’s time fixing all their bugs. Firing someone you hired by mistake can take months and be nightmarishly difficult, especially if they decide to be litigious about it.
– Joel Spolsky
Hunter and Schmidt quantified the cost of a bad hire: “The standard deviation… has been found to be at minimum 40% of the mean salary,” which in today’s terms would translate to $40,000 assuming a mean engineer salary of $100,000/year.
But if you set “The Bar” too high, chances are you’ll also miss out on some good candidates (aka “false negatives”). McDowell explains why companies don’t really mind a lot of false negatives:
“From the company’s perspective, it’s actually acceptable that some good candidates are rejected… They can accept that they miss out on some good people. They’d prefer not to, of course, as it raises their recruiting costs. It is an acceptable tradeoff, though, provided they can still hire enough good people.”
In other words, it’s worth holding out for a better candidate if the difference in their expected output is large, relative to the recruiting costs from continued searching. Additionally, the costs of HR or legal issues downstream from potentially problematic employees also tilt the calculation toward keeping “The Bar” high.
This is a very rational cost-benefit calculation. But has anyone ever done this calculation before? If you have done it, we’d love to hear from you. Otherwise, it seems difficult to do.
Given that nearly everyone is using hand-wavy math, if we do the same, maybe we can convince ourselves that “The Bar” doesn’t have to be set quite so high.
As mentioned before, the distribution of candidate ability might not be so binary, so Spolsky’s nightmare bad hire scenario wouldn’t necessarily happen with all “bad” hires, meaning the expected difference in output between “good” and “bad” employees might be lower than perceived.
Recruiting costs might be higher than perceived because finding and employing +1 standard deviation employees gets increasingly difficult. By definition, fewer of those people exist as your bar rises. Schmidt and Hunter’s “bad hire” calculation only compares candidates within an applicant pool. The study does not consider the relative cost of getting high-quality candidates into the applicant pool to begin with, which tends to be the more significant concern for many of today’s tech recruiting teams. And when you consider that other tech companies might be employing the same hiring strategy, competition would increase the average probability that offers get rejected, extending the time to fill a job opening.
Estimating the expected cost of HR involvement is also difficult. No one wants to find themselves interacting with HR. But then again, not all HR teams are as useless as Toby Flenderson.
Taken together, if the expected output between “good” and “bad” candidates were less than expected, and the recruiting costs were higher than perceived, it would make less sense to wait for a no-brainer hire, meaning “The Bar” might not have to be set so high.
Even if one does hire an underperformer, companies could adopt the tools of training and employee management to mitigate the negative effects from some disappointing hires. After all, people can and do become more productive over time as they acquire new skills and knowledge.
Employee development seems to rarely get mentioned in conjunction with hiring (Laszlo Bock makes a few connections here and there, but the topics are mostly discussed separated). But when you add employee development into the equation above, you start to see the relationship between hiring employees and developing employees. You can think of it as different methods for acquiring more company output from different kinds of people: paying to train existing employees versus paying to recruit new employees.
You can even think of it as a trade-off. Instead of developing employees in-house, why not outsource that development? Let others figure out how to develop the raw talent, and later pay recruiters to find them when they get good. Why shop the produce aisle at Whole Foods and cook at home when you can just pay Caviar to deliver pad thai to your doorstep? Why spend time managing and mentoring others when you can spend that time doing “real work” (i.e. engineering tasks)?
Perhaps “The Bar” is set high because companies don’t develop employees effectively, which puts more pressure on the hiring side of the company to yield productive employees.
Therefore, companies can lower their risk by shifting the burden of career development onto the candidates themselves. In response, candidates like Recursive Cactus have little choice but to train themselves.
Initially, I thought Recursive Cactus was a crazy outlier in terms of interview preparation. But apparently, he’s not alone.
Last year we surveyed our candidates about how many hours they spent preparing for interviews. Nearly half of the respondents reported spending 100 hours or more on interview preparation.[^5]
We wondered whether hiring managers and recruiters had the same expectations for candidates they encounter, Aline asked a similar question on Twitter, and results suggest they vastly underestimate the work and effort candidates endure prior to meeting with a company.
Are you a hiring manager or recruiter? If so, please vote on my poll! Results will be shared in an upcoming, titillating blog post.— Aline Lerner (@alinelernerLLC) July 3, 2019
How many dedicated hours of interview prep do you believe candidates should invest in their job search?
Decision makers clearly underestimate the amount of work candidates put into job hunt preparation. The discrepancy seems to reinforce the underlying and unstated message pervading all these choices around how we hire: If you’re not one of the smart ones (whatever that means), it’s not our problem. You’re on your own.
So this is what “The Bar” is. “The Bar” is a high standard set by companies in order to avoid false positives. It’s not clear whether companies have actually done the appropriate cost-benefit analysis when setting it, and it’s possible it can be explained by an aversion to investing in employee development.
“The Bar” is in large part meant to measure your general intelligence, but the actual instruments of measurement don’t necessarily follow the academic literature that underlies it. You can even quibble about the academic literature.[^6] “The Bar” does measure specific intelligence in computer science, but that measurement might vary depending on who conducts your interview.
Despite the variance that exists across many aspects of the hiring process, we talk about “The Bar” as if it were deterministic. This allows hiring managers to make clear binary choices but discourages them to think critically about whether their team’s definition of “The Bar” could be improved.
And that helps us understand why Recursive Cactus spends so much time practicing. He’s training himself partially because his current company isn’t developing his skills. He’s preparing for the universe of possible questions and interviewers he might encounter because hiring criteria varies a lot, which cover topics that won’t necessarily be used in his day-to-day work, all so he can resemble someone that’s part of the “smart” crowd.
That’s the system he’s working within. Because the system is the way it is, it’s had significant impact on his personal life.
My wife’s said on more than one occasion that she misses me. I’ve got a rich happy life, but I don’t feel I can be competitive unless I put everything else on hold for months. No single mom can be doing what I’m doing right now.
– Recursive Cactus
This impacts his current co-workers too, whom he cares about a lot.
This process is sufficiently demanding that I’m no longer operating at 100% at work. I want to do the best job at work, but I don’t feel I can do the right thing for my future by practicing algorithms 4 hours a day and do my job well.
I don’t feel comfortable being bad at my job. I like my teammates. I feel a sense of responsibility. I know I won’t get fired if I mail it in, but I know that it’s them that pick up the slack.
– Recursive Cactus
It’s helpful to remember that all the micro decisions made around false positives, interview structure, brain teasers, hiring criteria, and employee development add up to define a system that, at the end of the day, impacts people’s personal lives. Not just the lives of the job hunters themselves, but also all the people that surround them.
Hiring is nowhere near a solved problem. Even if we do solve it somehow, it’s not clear we would ever eliminate all that uncertainty. After all, projecting a person’s future work output after spending an hour or two with them in an artificial work setting seems kinda hard. While we should definitely try to minimize uncertainty, it might be helpful to accept it as a natural part of the process.
This system can be improved. Doing so requires not only coming up with new ideas, but also revisiting decades-old ideas and assumptions, and expanding upon that prior work rather than anchoring ourselves to it.
We’re confident that all of you people in the tech industry will help make tech hiring better. We know you can do it, because after all, you’re smart.