Michael Rizvi-Martel

Agentic Coding: A New Abstraction Layer in the Programming Stack

Or How I Learned to Stop Worrying About AGI and Love Agents

February 2026

TLDR: I give some thoughts on how AI has influenced my coding setup throughout the years since ChatGPT's release. I argue that agents can be thought of as a new layer of abstraction in the programming stack, speculating (with some uncertainty) that future software developers will "code" requirements in natural language. (I am not sure what form this will take; what the "IDEs of the future" will look like.) I present a historical comparison of the mistrust at the advent of new layers of abstraction in the stack, motivating the previous point and arguing that the resistance to coding agents is similar to that of other major paradigm shifts in programming.

Agents and Me

Agents have taken the world by storm. I think it is fair to say that no one in the programming world has been left indifferent by their advent. Although these systems have not made their way into every workplace and every developer's computer yet, they have their fair share of early adopters. Across conversations with AI researcher and developer friends, I have really seen a range of opinions. Some PhD colleagues flat out refuse to use them out of mistrust: they feel like understanding the internals of these models makes them less trusting — like aviation engineers being scared to take the plane, knowing every possible thing that could go wrong. Others, like myself at this point, have totally leaned into the coding agents, for better or for worse. I have been using Claude Code with the --dangerously-skip-all-permissions flag active all the time (meaning it no longer asks me for permission to run commands) and have fully leaned into the "skills" and CLAUDE.md development philosophy. Customizing my Claude Code setup reminds me of how I would customize my IDE back when I started grad school. To be quite honest, agentic coding has made me love building again. I feel like the barrier to entry to forking a codebase or starting a new one from scratch is much lower. It is hard to say how much of a productivity gain coding with agents has brought me, but I am certain that I have saved myself a lot of time spent with the "blank page syndrome," wondering how to structure a codebase or just procrastinating starting by lack of motivation. Agentic systems might not always be right (however, being honest, they are right a lot of the time), but starting from a first draft, or saving the pain of reading through someone else's code to start a new project is definitely freeing for me. Going back to this IDE vs. CLAUDE.md thing, I noticed more broadly that, throughout my budding academic career, I can chart my software development paradigm shifts into four broad categories:

  1. Coding + Stack Overflow: from undergrad to early ChatGPT days, I would code manually and go to Stack Overflow to find functions or one-liners to solve specific problems. I would typically organize my code (e.g., a main or a complicated object/function) by first writing the headers and writing comments in natural language explaining what each subpart of the code would do. Then I would either code each "comment" or go look for a Stack Overflow post or such to fill it in.
  2. Coding + ChatGPT: The first "AI revolution" for me was the shift from Stack Overflow to ChatGPT. When ChatGPT got good enough at writing Python code, I started using it the same way I would use Stack Overflow. The rest of the development remained the same. Through this phase, I started using ChatGPT to generate more and more ambitious bits of code, giving it more and more freedom (from one or two lines of code to writing full functions/methods for an object).
  3. Coding + Copilot: At first this was similar to ChatGPT use, writing comments and having Copilot autocomplete the missing bits according to the spec. The magic here was that I could write the comments and have Copilot directly add the lines of code in the IDE! At this phase I started dabbling with agents in the Copilot setup. I was not convinced though; at that point Copilot's agentic setup was going too fast for me and was making a fair bit of mistakes. I felt that I needed to give more supervision to the AI I was letting code.
  4. Agents (Claude Code in my case): I finally decided to let the barbarians in in the summer of 2025. A friend of mine at ACL said "You have to try Claude Code man, it's really good" in response to my skepticism concerning agents. I did, and I have not looked back. Depending on the nature of the project, I sometimes don't even look at the code anymore. The more I use it, the more I realize how good it is. I am not the best programmer, but I am not a terrible one either. In any case, Claude is definitely a better programmer than I am.

Agents as a New Layer of Abstraction

Going through these shifts in my software development philosophy got me thinking about the nature of software development. Which is, namely, about developing software. In terms of a black box, we have something like "requirements in → software out." Throughout the history of programming, we built new layers of abstraction to make this pipeline easier. At the current point of my ridiculous vibe coding follies, I think we are at the cusp of a new one — except this time, the layer of abstraction is natural language itself. Andrej Karpathy tweeted a couple years back that "The hottest new programming language is English." This could not be truer today. I code in English now. Claude handles the implementation. But this does not mean I code mindlessly. Throughout all the tokens I burned, I noticed a principle which I dub "information in, information out": I can abstract away the implementation — how a tensor operation is done, what library call to use — but I have to be precise about design decisions. What loss function? What positional encodings? Batch-first or not? When I leave these unspecified, the model gets it wrong; and how could it not? Especially in research, there is often no objective right answer to these questions.

To make this concrete, here is a rough example from my actual workflow. Say I want to train a small transformer on a sequence copying task (the kind of toy task you would use to study length generalization). If I just tell Claude Code "write me a transformer that learns to copy sequences," I will get something — but it will make a bunch of design choices that I probably disagree with. It might use learned positional encodings when I wanted NoPE or RoPE. It might set up the causal mask wrong for an encoder-decoder vs. a decoder-only setup. It might default to cross-entropy when the task structure calls for something else. The code will run, but it won't be my experiment. What I actually need to specify is closer to: "decoder-only transformer, RoPE embeddings, 4 layers, 256 hidden dim, trained on sequence copying where input is [tokens] [SEP] and target is the same tokens again, use teacher forcing, cross-entropy loss only on the target portion of the sequence, train lengths 1–20 and evaluate on lengths 21–40." Once I give it that, it gets it right almost every time. The implementation details — how to build the RoPE rotation matrices, how to set up the data loader, how to mask the loss — those it handles fine on its own. The point is: the things I still need to specify are design-level decisions, not implementation-level ones. The abstraction eats the implementation but not the design. Circling back to my pre-ChatGPT workflow in undergrad, I don't need to fill in the code after the comments; the comments are the program.

There is, however, one fundamental difference between this abstraction layer and every one that came before it: it is inherently stochastic. When a C compiler compiles your code, it produces the same assembly every time for the same input. When the JVM runs your bytecode, the behavior is deterministic. When you write a Python script, it does the same thing on every run (modulo random seeds and such). Agents don't work this way. Ask Claude Code the same thing twice and you might get different implementations. Sometimes meaningfully different. This is, I think, the core reason the trust problem with agents feels different from the trust problem with compilers or garbage collectors. With a compiler, once you've verified it works, you can trust it on new inputs because the mapping from source to output is fixed. With an agent, every invocation is in some sense a new roll of the dice.

That being said, no two human programmers would implement exactly the same program, even if given the same spec. In the case of humans, as long as the code is efficient and it passes the required tests, we typically don't have a problem with this. This leads me to believe that the future of coding is going to be much more test-driven. In my own experience, as long as I scrutinize the outputs of Claude's code enough, it eventually finds and fixes the bug. For research, this is through looking at plots or data and following up with Claude about these things: "Isn't it weird that the model is training so slow?" "How come metric X is so low when metric Y is quite high?" For frontend development, given we can literally see the results, I would figure this output-driven development would also be feasible. For production-level backend code, where bugs can be silent and the stakes are higher, adoption will likely lag. But if the history of programming is any guide, that is exactly how every new abstraction starts.

A Historical Outlook

As is the case with most paradigm shifts, there is resistance to change. I have seen this myself (see intro) and I figure most of this mistrust is not unfounded. An interesting way to see we are in a paradigm shift is to look back, which I will try to do in this section. Previous major shifts in programming showed a similar resistance-to-change timeline. For instance, when compilers were introduced, many senior programmers refused to use them, stating that they could optimize instructions better than the machine. They weren't totally wrong: at the time, early compilers were producing worse code than a skilled assembly programmer could write by hand. John Backus, the creator of Fortran, described the programmers of the 1950s as "members of a priesthood guarding skills and mysteries far too complex for ordinary people" who met the idea of compilers with "considerable hostility and derision." Even von Neumann himself — the father of the stored-program computer — reportedly dismissed the idea of compilers: "Why would you want more than machine language?". Grace Hopper built the first compiler (A-0) in 1952 and nobody would touch it for three years. She was told computers could only do arithmetic. She had a working compiler and people still didn't believe it. In a way, there is something deeply human about this. Not only was this a matter of trust ("can the machine really do what I do?"), but also of identity ("if the machine can do what I do, what am I?"). There is a famous Usenet poem from 1983 called "The Story of Mel" about a programmer who refused to use even an optimizing assembler because, as he put it, "you never know where it's going to put things." The thing is, Mel's hand-tuned code did run faster. The resistance was not irrational; it was empirically grounded — at the time. But by 1958, over half of all code on IBM machines was already being generated by Fortran. Today compiling by hand would be unthinkable. Certain software engineering programs have even stopped teaching assembly altogether!

The compiler story is probably the most directly analogous to agents, but the pattern repeats at every rung of the abstraction ladder. Object-oriented programming in the '80s and '90s is another good example. The core idea of OOP was to bundle data and the functions that operate on it into "objects," organized into inheritance hierarchies. The promise was that this would make code more modular and reusable. The criticism, broadly, was that this bundling created more problems than it solved. Inheritance hierarchies tend to get deep and tangled — you change something in a parent class and break five child classes you forgot about. State gets hidden inside objects, making it hard to reason about what a program is actually doing. And the reusability promise often didn't pan out in practice: Joe Armstrong, the creator of Erlang, put it memorably in an interview for Coders at Work (2009): "You wanted a banana but what you got was a gorilla holding the banana and the entire jungle." Armstrong also wrote an essay called "Why OO Sucks" where he noted that at the time "criticising OOP was rather like swearing in church" — there was real social pressure to go along with the paradigm. Alexander Stepanov, the guy who created C++'s own Standard Template Library, called OOP "almost as much of a hoax as Artificial Intelligence". His argument was that you should start with algorithms, not with class hierarchies — that OOP gets the decomposition backwards. Torvalds called C++ "a horrible language" and deliberately wrote Git in C, in part to keep C++ programmers away from the project. Dijkstra quipped that "object-oriented programming is an exceptionally bad idea which could only have originated in California." But OOP won anyway. Not because the critics were wrong about the problems (inheritance hierarchies are a mess), but because the abstraction made it easier for large teams to build large systems.

Looking at all of these transitions together, there is a pattern that I think is hard to ignore. First: "it can't be done" (compilers can't match hand-coded assembly). Then: "it's too slow" (Java is 20x slower, GC pauses are unacceptable). Then: "real programmers don't use it" (real programmers use Fortran, real programmers manage their own memory). Then, after the new abstraction wins anyway: "they ruined it" (see: every Agile Manifesto signatory writing essays titled "Agile is Dead" a decade later). I figure we are somewhere between stages two and three with coding agents right now. The objections are partially valid (they do hallucinate, they do sometimes write subtly wrong code), just like early compilers did produce worse assembly, and early Java was genuinely slow. But if history is any indication, the trajectory seems fairly clear. In a sense, every layer of abstraction we ever built did the same thing: we distilled insights about what we understood in programming, taking repetitive tasks and making software to do them for us. LLMs are no different. By training on large amounts of text and code, they compressed a lot of boilerplate and knowledge into a new abstraction layer. The insights might be fuzzier and based on probabilities, but at the end of the day coding is creative and when humans code there is some amount of randomness. Throughout my coding journey, I have been told multiple times by many programmer colleagues and friends that there is a "right way" to code. They all had a different conception of what this meant. In any case, if you are looking, you can find me vibe coding with my buddy Claude.