When Engineers Underperform in Interviews

How Cognitive Load and Automation Are Reshaping Engineering Competence

Introduction

This post came out of a specific sequence. Recently, I've been providing a second perspective on technical interviews conducted by colleagues (not leading them, just observing and offering input afterward). That peripheral involvement triggered something: I started mentally revisiting evaluations I had conducted more directly in the past, and also ones where I was the candidate. The accumulation made me want to articulate what I was seeing. And that, in turn, made me curious to check whether any of it held up against existing research. Hence the references scattered throughout.

But first, some context.

Over the past year, something structurally significant has changed in software engineering, and I would venture to say that the most decisive shift has occurred in just the last two months. AI systems now generate code that frequently reaches production quality with minimal correction. Engineers are not becoming obsolete. Their cognitive role is shifting. The role shifts toward oversight and specification. Less time is spent typing every line from scratch. The center of gravity moves from execution to supervision. Technical interviews, however, remain largely unchanged. They continue to measure real-time reasoning, live coding fluency, and verbal clarity under observation.

The mismatch is increasingly visible.

The Cognitive Cost of Being Watched

Live interviews create a very specific cognitive condition: social evaluation. The candidate is observed and assessed, with an implicit ranking forming underneath. This has measurable effects on working memory and executive control.

Research on performance pressure shows that individuals with higher working memory capacity can actually experience greater performance declines under pressure, because anxiety and self-monitoring consume the very resources needed for complex reasoning.^[1] What looks externally like hesitation or lack of clarity can be cognitive resource interference rather than lack of knowledge.

Work on social-evaluative threat reaches a similar conclusion. Evaluative contexts reduce the working memory available for the task, even when underlying competence is unchanged.^[2] The familiar phenomenon of "choking under pressure" reflects this mechanism: highly practiced skills can degrade when attention shifts from the task itself to self-monitoring and perceived judgment.^[3]

A live technical interview amplifies this problem. Candidates are expected to solve a problem, explain their reasoning, monitor the interviewer's reactions, manage time constraints, and regulate their own internal evaluation, all at once. That layered demand does not resemble the conditions of most production engineering work.

The Shift From Execution to Supervision

Automation adds a second structural factor.

Historically, technical expertise depended on continuous engagement with procedural loops: decomposing problems, writing code, debugging, refactoring, and iterating. That repetition built procedural fluency and a kind of embodied coding choreography.

AI-assisted development has altered that loop. Increasingly, engineers specify constraints, evaluate generated implementations, detect architectural inconsistencies, refine edge cases, and integrate components. Design and review skills become more central. Direct from-scratch implementation, especially under time pressure and observation, is simply less common in day-to-day practice.

Human-automation research describes the "out-of-the-loop performance problem": when operators primarily supervise automated systems instead of directly controlling them, their ability to take over manual control degrades over time.^[4] Complementary work on skill decay shows that procedural abilities weaken when they are not regularly exercised, particularly in highly automated environments.^[5]

The field is now experiencing a version of this pattern. AI does not remove competence. It redistributes practice. High-level design, specification, and review become stronger. The fluency of live, manual implementation can diminish unless deliberately maintained.

Technical interviews, however, continue to focus heavily on exactly that kind of live procedural fluency under scrutiny.

The irony is clear. Now that most of the work is about generating good specifications and reviewing code produced by others (or by AI), you still want the most knowledgeable person doing it. But daily practice atrophies exactly the skills that technical recruitment still values: producing solutions by hand. This applies at least to the review side, where knowing how to generate is a precondition for reviewing well. As for specifying, I haven't seen (and don't expect to see anytime soon) interviews designed around generating specs on the fly so that an agent can write the code.

I'm not here to call for the end of algorithmic or system design interviews. They have their place, especially for certain specialized positions. And in a way, if everything were leetcode, things would be simpler: you memorize the hundred canonical cases, you go in, you perform. It's not like the old days of certifying once and never having to repeat the same questions at every new job, but at least it offers a predictable format. You know what you're diving into. The deeper problem is when you're thrown into a pool without knowing its depth, or what's been poured into it. And in the meantime, the format keeps testing for the past while avoiding the skills that will matter for the next decade.

What This Looks Like From Both Sides

From the evaluator's side, the challenge is learning to distinguish actual lack of competence from temporary cognitive interference. From the candidate's side, it's not a challenge at all, just honesty: you know whether you didn't know something, or whether you knew it and couldn't get it out. This isn't about hubris or blaming the process. It assumes enough self-awareness to tell the difference.

Across evaluations, a consistent pattern appears. Capable candidates sometimes struggle to structure explanations under observation. Solutions that would be straightforward to explain in a normal working context become harder to articulate in real time. The problem is not that they cannot solve it. It is that the cognitive conditions are different.

The same pattern appears when I'm on the other side of the table, as the one being evaluated. There are sessions where articulation flows effortlessly and reasoning is easy to express. Those sessions feel like exceptions. Some people are simply good at this, whether they truly know their stuff or are just skilled at projecting that they do. More often, even relatively simple ideas become harder to put into words once evaluation pressure enters the picture.

What makes this harder to dismiss is that the interference isn't tied to immediate material pressure. It's not about needing the job or facing a penalty. It has happened in situations driven almost entirely by curiosity, already satisfied in my current role, but drawn to an opportunity in a niche I found interesting. It has happened in processes I entered only because a recruiter insisted, having "seen something" worth exploring. And yes, there have been interviews that went smoothly, even well. But I can't identify what made them different. They feel like singularities.

I'll say it plainly: I'm not good at this format. Not out of false modesty, and not because I doubt how I work. Real pressure (production incidents, tight deadlines, debugging under constraints) I can handle. There's a problem, and the pressure is in service of solving it. Interview pressure is something else. It's examination pressure, but without a syllabus: you don't know what's on the test. Academic exams can be grueling, but you've done a thousand similar problems and you know the terrain. And even under time constraints, you're alone with the page, governing your own pace. No one is standing behind your shoulder the whole time, registering every hesitation.

At this point, I've made peace with it. These situations don't come up often, and when they land on the worse end, I don't beat myself up over it anymore. It almost feels like something that happens to someone else. Supportive interviewers can mitigate the effect, but they cannot fully remove it. Variability across days, personal context, and cognitive state remains. Some days, structured thinking is easier to surface; other days, it is noticeably harder, even when the underlying skill has not changed. And sometimes, once the first stumble happens, there's no recovering. You've seen it, on both sides of the table: someone trips, tries to regain balance, trips again, and eventually just falls. A kind of performance cascade. The interview never quite recovers from that point.

An Alternative Approach

Our software house doesn't hire often, but when we do, I'm usually involved. In the past I took a more active role; lately it's more about reviewing recordings, with consent, and giving feedback to the team afterward. I've also participated in evaluations at companies where I worked in more formal roles, but there the methodology wasn't up to me. And I'll admit something uncomfortable about those situations: I've seen candidates get rejected, sometimes by my own recommendation, not because I doubted their ability, but because I knew they wouldn't survive a second round with evaluators who didn't share this perspective. People who struggled in the interview but who I'm confident would have performed excellently once actually working.

What follows isn't about those places. It's about how we do things here.

Our default, maybe old school, is to look for reasons to give someone a chance rather than reasons to discard. What we pay attention to is the attempt: the effort, the drive, the will to get somewhere, which can shine through even when the evaluation context distorts the signal. That's not naive goodwill; the attempt has to have substance. But it stands in opposition to a common mindset: the field is crowded, we can afford to filter hard, and discarding dozens of false negatives is an acceptable cost. We don't think that way.

I understand where the preference for metrics and dashboards comes from. As a technical person, I can see the appeal: things that can be measured feel easier to handle and justify. But as a person, plain and simple, with access to intention, to the forces that drive people to do things, I also have an appreciation for the attempt in a deeper sense. The industry, I think, has largely lost that appreciation. I'm not going to shout about it. It's just something I see and can't unsee. I've watched people with fierce intent fail miserably. And I've seen the implicit logic play out: better something modest and measurable than something truly productive that resists being quantified.

I'm not here to rescue Jordan Belfort's crew (well, maybe a little). But I'll admit a certain nostalgia for that energy: a room full of people with questionable credentials and undeniable drive, getting results that nobody expected. None of this is an excuse to lower the bar or hire the clearly incapable. The point is simpler: sometimes the methodology places people below the line who don't really belong there.

Repositories and Real Work

When possible, I prefer to start from publicly available repositories. A set of well-documented repositories often provides richer signal than a compressed, artificial exercise. One repo can be enough when it's clearly the owner's work; several reveal progression over time, or breadth of delivery if they're roughly concurrent. You can see architectural evolution, documentation habits, commit structure, patterns of problem decomposition, and long-term thinking. This preference clearly reflects a bias toward sustained curiosity and externally visible work, but in practice it often yields better information about how someone actually builds software. Often this can happen before any contact with the candidate, which saves time on both sides and avoids the redundancy of yet another exercise when there is already plenty of material to evaluate.

Not all good engineers have public code, and not all professional excellence appears in open repositories. That limitation is real. Still, when repositories do exist and have substance, they are an unusually direct window into applied skill over time.

Take-Home Exercises and Timing

When repositories do not provide enough substance to evaluate technical depth, a take-home exercise becomes a reasonable alternative. It has one obvious advantage over live coding: it removes the immediate pressure of being watched. Candidates can think in their own environment, work at their own pace, consult documentation, and build a solution without someone looking over their shoulder. In that sense, a well-designed take-home exercise can resemble real engineering conditions more closely than a live session.

But take-home work has its own structural trade-offs.

First, it consumes unpaid time. For candidates already committed to demanding projects or collaborations, that time often comes at the expense of rest or personal bandwidth. This does not make the format inherently unfair, but it defines the context under which the work is produced.

Second, and more important for interpretation, is what happens later, during the discussion. Nobody schedules a deliberate gap between submission and review; it just happens. If too much time passes between submission and evaluation, your grasp of the solution naturally decreases. Procedural details fade. The nuances of specific design trade-offs are no longer fresh. In the conversation, the candidate is no longer speaking from the same reasoning state that produced the work. They are reconstructing that reasoning under evaluative pressure.

The reverse distortion also appears. An evaluation session may happen weeks after you submitted the exercise, during which time your attention has been fully absorbed by your actual job. The exercise was a one-time effort, maybe something slightly outside your usual domain. Your day-to-day is something else entirely, more demanding, more present. And now you're asked to explain decisions you made in that other mode, while your head is still half-occupied with what you were doing yesterday. Context switching implies two things of equal weight. This is different: being pulled back to something peripheral while the real work still has a grip on you.

It also depends on how close the exercise was to your daily routine. If it's something you do all the time, you barely need to prepare. But if it required you to step into unfamiliar ground, you might have done it well at the time and still not remember the nuances a week later. Sure, you can re-study before the interview. It's not impossible. And if you're actively looking for a change, that small sacrifice is expected. But it's worth being honest: preparing while you already have a demanding job isn't the same as preparing when job hunting is your main focus. But it's not always simple either, once you add real job responsibilities, family, and everything else that doesn't pause just because you have an evaluatory process coming up.

None of this makes the take-home format inherently invalid. In many respects, it is more aligned with real work than live coding. What it does mean is that hesitation or patchy recall during the defense of an exercise should not automatically be interpreted as shallow understanding. Temporal distance, context shift, and state-dependent recall can all make articulation harder without implying weakness in the original reasoning.

Timing and context quietly shape the signal, even when nobody intends that to happen.

What Interviews Actually Measure

Industrial-organizational psychology has shown for decades that selection methods differ in predictive validity. Work sample tests that closely resemble real tasks tend to predict job performance more reliably than abstract or improvised interviews.^[6] Modern engineering work is iterative, tool-assisted, AI-augmented, and collaborative; live interviews are compressed, memory-intensive, and performance-centered. They do measure something real: fluency under evaluation. That is a real skill, and in some roles it matters a lot. But it is not the same thing as sustained competence in real production environments.

To be fair, evaluating someone under direct observation isn't inherently illegitimate. If the job itself involves performing in a kind of panopticon, then testing for that makes sense. But in practice, I've never encountered a situation remotely like it: coding or designing something in real time, watched by two people in a position of hierarchical authority over me. That's not what the work looks like. The interview format tests for a condition that the job almost never reproduces.

The danger appears when that narrow dimension is treated as a complete proxy for overall ability.

When Thinking Changes Shape

There is another dimension to this problem, less commonly discussed.

The standard interview format assumes that reasoning is verbal. That the path to a solution unfolds as a sequence of words, and that externalizing it is simply a matter of speaking what was already being thought. This assumption is not always correct.

Not everyone thinks in words. And for those who do, it is not necessarily permanent. Cognition can reorganize. What was once a stream of internal dialogue can become something else: a more direct form of processing, less mediated by language, closer to pattern recognition or structural intuition. The reasoning still happens, but it no longer takes verbal form. It emerges as something closer to spatial orientation or embodied knowledge.

This is not mystical. Cognitive science has long recognized that expert performance often involves a transition from explicit, step-by-step reasoning to implicit, pattern-based processing. The verbal scaffolding that once supported learning becomes unnecessary once the skill is internalized. In some cases, attempting to re-engage that scaffolding under pressure actually degrades performance. This is documented in research on verbal overshadowing and expertise reversal.

The problem appears when the environment still demands verbal externalization. In a technical interview, explaining your reasoning is not optional. The expectation is that thought can be serialized, narrated, made visible in real time. For someone whose cognition has reorganized around non-verbal processing, this demand creates a specific kind of friction. The answer is available, but the path to it is not stored in words. Reconstructing a verbal trace under observation becomes an act of translation rather than recall, and translation under pressure is slow, effortful, and prone to collapse.

There is an analogy in Chinese internal martial arts. In disciplines like Taijiquan, practitioners cultivate a quality called Peng: an expansive, structural force that replaces conventional muscular force, Li. Developing it requires inhibiting the instinctive tension the body defaults to when threatened. There is a transitional phase where the old pattern is deliberately suppressed, but the new capacity has not yet stabilized. The practitioner feels exposed. The reflex to contract is strong, and giving in to it provides temporary relief but blocks the integration being sought.

The parallel is not exact, but it illuminates something real. The verbal mode of thinking was the old force: protective, familiar, immediately deployable. Letting it recede made room for something more integrated. But the interview format still rewards contraction. Show your reasoning. Narrate your process. Prove you are not guessing. And in that moment, neither system responds cleanly. The old mode no longer operates automatically; the new mode does not translate on demand.

This is not a complaint, and not a call for the market to adapt. The market operates as it does. But it is worth noting, without any expectation that it will change, that a format optimized for verbal demonstration may systematically overlook capacities that do not express themselves that way. Whether that trade-off matters depends on what you think the work actually requires.

Synthesis

Three forces are converging.

First, social evaluation reduces the working memory available for the task at hand, which can impair real-time articulation even in highly capable engineers.^[1][2]

Second, AI-assisted development is shifting the skill profile of experienced engineers toward supervision, specification, and review, and away from constant manual implementation. This can weaken live procedural fluency without reducing overall engineering capability.^[4][5]

Third, traditional interview formats continue to emphasize exactly those skills that are most sensitive to evaluative interference and to the kind of skill displacement created by automation and changing daily practice. The result is predictable: engineers who perform strongly in sustained production contexts may underperform in interviews. Call it what it is: context sensitivity colliding with an outdated evaluation format, not a lack of seniority.

I'm not here to change how the industry hires. That's not the point, and clearly not my place. Industry practices spread through imitation anyway: someone says the big ones do it this way, smaller teams copy without much verification, and trends form around what large companies supposedly do. Then everyone else replicates, or thinks they replicate, in contexts where something else would work better. I just wanted to articulate something I kept running into, on both ends of the process, and check whether any of it held up. It did. That's all.

References

Beilock, S. L., & Carr, T. H. (2005). When high-powered people fail: Working memory and "choking under pressure." Psychological Science, 16(2), 101-105. DOI
Schmader, T., Johns, M., & Forbes, C. (2008). An integrated process model of stereotype threat effects on performance. Psychological Review, 115(2), 336-356. DOI · PMC
Baumeister, R. F. (1984). Choking under pressure: Self-consciousness and paradoxical effects of incentives. Journal of Personality and Social Psychology, 46(3), 610-620. DOI · RG
Endsley, M. R., & Kiris, E. O. (1995). The out-of-the-loop performance problem and level of control in automation. Human Factors, 37(2), 381-394. DOI · RG
Arthur, W., Bennett, W., Stanush, P., & McNelly, T. (1998). Factors that influence skill decay and retention. Human Performance, 11(1), 57-101. DOI · PDF
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology. Psychological Bulletin, 124(2), 262-274. DOI · RG