How to create loops for AI Research
Learnings from Schulman, Nielsen, Karpathy, Neel Nanda and Vivek.
Research is hard. You get a desk, a problem, and a vague feeling to produce something novel. So most people reverse-engineer the job from what they can see: papers, threads, announcements. What they learn is how to look like a researcher.
The actual skill is fascinating and the people who’ve tried hardest end up circling the same handful of ideas, even though they work on completely different things.
Pick problems that generalize
Schulman splits research into two modes. In one, you read the literature and hunt for things to improve. In the other, you choose an outcome you want to exist and reason backwards to the experiments.
He argues for the second, and the quiet reason is that it manufactures originality. A goal you care about drags you into territory no survey paper covers. It gives you a perspective the community doesn’t share, which means you’ll ask questions nobody else is asking.
Vivek names the failure mode sharply. An absorbed problem is one you picked up from a famous lab or a trending thread without owning the reasoning. You hold the conclusion but not the logic behind it. You know some big lab cares about a direction, but not why, not what they expect to find, not what would make them drop it. When they pivot, you find out a year later.
There’s one constraint worth underlining. Whatever you build toward your goal should work beyond that goal. Schulman’s example: during his work on robotic locomotion, he avoided domain-specific hacks in the algorithm itself. The point wasn’t just to make a robot walk. It was to develop methods that would transfer. TRPO and PPO came out of that restraint. If he’d let himself get narrow, those algorithms probably don’t exist.
He also has a useful rule for calibrating ambition against complexity:
Ideas are cheap, and there are lots of them in the air. Your skill comes in when you decide which one to work on, and how well you execute on it. — Schulman
And a practical test for when incremental work is worth it: a method that gives a 10% improvement better be 2 lines of code. A 50% improvement can justify 10 lines. The usefulness of a small advance depends entirely on its simplicity. If nobody will bother using it, including you, it wasn’t worth the paper.
Aim higher than feels reasonable
Karpathy points out a psychological bug worth internalizing:
A 10x more important problem intuitively feels 10x harder to achieve. This is a fallacy — in my experience a 10x more important problem is at most 2-3x harder to achieve. In fact, in some cases a 10x harder problem may be easier to achieve. — Karpathy
Because thinking 10x forces you to confront the real limitations of your current approach. You can’t get there with incremental improvements. You have to reason from first principles and change strategy entirely.
Nielsen makes a related point: difficulty is a bad proxy for importance. People consistently overrate the value of hard problems and underrate what a piece of work enables. The connections it reveals. The questions it opens. The tools it creates. These matter more than how many hours the proof took.
But importance alone isn’t enough. Hamming’s filter (cited in both Schulman and Karpathy) is that you need a reasonable attack. Time travel would win a Nobel Prize, but nobody works on it because there’s no path forward. The problems to work on are the ones that are both important and where you can see a way in.
Karpathy adds a test:
The ideal scenario is that by the end of the PhD you own some part of an important area, preferably one that is also easy and fast to describe. You want people to say things like “she’s the person who did X.” — Karpathy
If you can fill in that blank with something crisp, you’re on the right track. If you can’t, rethink.
Taste is a muscle
Your ability to judge which problems matter, which experiments to run, which anomalies to chase. Schulman says it’s more important than raw technical skill. Karpathy describes watching his adviser Fei-Fei Li pick up four papers he’d spent hours reviewing, flip through each for ten seconds, and correctly sort the accept from the rejects. That’s not magic. That’s a trained pattern-matching ability built over years.
The interesting claim, the one Vivek builds his whole piece around, is that taste behaves more like a muscle than a gift. You can train it.
His method is simple. Predict the result of every experiment before you run it. Cover a paper’s results section and guess the numbers from the method alone. Write down which of this month’s releases will matter in two years, then check your hit rate later.
A forecast plus a correction, repeated a few hundred times, is how every good model gets trained. Including the one in your head. — Vivek
Nanda adds a social version of the same loop. Before you ask your mentor a question (”Should I run experiment A or B?”, “Is this result interesting?”), predict their answer. When they disagree, that gap is where the learning lives. Paraphrase their reasoning back until you find the misunderstanding. He calls it the single best accelerator for developing taste.
The outer loop
Before you can train your taste, you have to understand what you’re actually doing all day. Karpathy has the best frame for this.
A PhD is simultaneously a fun and frustrating experience because you’re constantly operating on a meta problem level. You’re not just solving problems — that’s merely the simple inner loop. You spend most of your time on the outer loop, figuring out what problems are worth solving. — Karpathy
This is what makes research feel maddening compared to engineering. You’re working long hours and you’re not even sure you’re working on the right thing. You keep imagining yourself solving hypothetical problems and asking: where does that put me? Would anyone care?
That discomfort is the job. Most of the skill is in the outer loop.
Solvers and creators
Nielsen draws a distinction I haven’t seen made this cleanly anywhere else.
Problem-solvers work on well-posed technical challenges. They’re valued for the difficulty of what they crack. Problem-creators ask new questions, frame old problems in new ways, or reveal connections nobody had noticed. Their papers are sometimes technically simple, but they open entirely new territory.
Problem-solvers get more social reward. It’s easier to celebrate someone who solved a famously hard problem. But problem-creators often have larger long-term impact. The scanning tunneling microscope used an idea that had been around for years. Nobody had tried to build it. The inventors put it together on a shoestring and created one of the major tools of modern physics.
Nielsen’s advice for developing as a problem-creator: look for the messes. Areas where everything is confusing, where textbooks are hard to follow, where working through the literature feels needlessly painful. That confusion usually means the right foundational concepts haven’t been found yet. Most people see confusion and shy away. The few who lean in often find deep simplifying ideas waiting underneath.
The notebook
Schulman keeps a daily research notebook. Ideas, experiments, plots, results, all in one place. Every week or two he reviews the entries and condenses them: experimental findings, insights, code progress, next steps. Then he checks last week’s review to see if he followed through.
It sounds like overhead. It’s not. The notebook captures ideas before they evaporate. It creates a record so you can retrieve hyperparameter decisions and dead ends months later instead of re-running experiments you’ve already done. And it forces you to see where your time actually went.
Nielsen tells a story about a colleague who tracked his actual research time with a stopwatch. After factoring in email, web browsing, meetings, interruptions, and chatting, the number came back: thirty minutes of real research per day. Nielsen suspects this is typical.
Vivek cites Darwin, who took note-keeping further and made it procedural.
Any fact that cut against his theory got written down on the spot, because he’d caught his own memory deleting inconvenient evidence faster than the convenient kind. — Vivek, paraphrasing Darwin
Your memory does the same thing to your failed runs. It smooths them over. A structured log (hypothesis, setup, expectation, result, updated belief) protects you from the version of yourself that wants to forget what didn’t work.
Then put some of it in public. A lot of people working in interpretability today found the field through blog posts, not conference papers. Karpathy says the same about his own blog, his ImageNet human-accuracy experiments, teaching CS231n. These probably cost him on standard metrics. They moved the field. He’d do it the same way again.
Read old things
Vivek makes a point about information homogeneity that I keep coming back to. Shared reading lists produce shared ideas. If your diet is the trending page of arxiv plus whatever survives the group chat, you’ll reach the same conclusions as everyone else, at the same time, and those conclusions are worth approximately nothing.
Old material is criminally underpriced. — Vivek
ML reruns its own past on a delay. Mixture of experts dates to 1991, LSTMs to 1997, backprop went mainstream in 1986. Sutton’s Bitter Lesson is about a thousand words long and predicts the shape of the field better than surveys ten times its length.
Shannon gave a talk on creative thinking in 1952. His opening move was to shrink a problem until it’s nearly trivial, solve the small version, then reintroduce the difficulty one piece at a time. That single trick will carry you through more walls than any modern productivity thread.
Schulman’s version: most students stop reading textbooks after their courses end. He thinks this is a mistake. A textbook collects decades of ideas in proper order with consistent notation. A conference paper gives you one idea plus a background section too short to learn from. PhD theses are even better — read the intro and conclusion of theses by researchers whose work interests you. Those sections contain a unifying view of the field, written by an expert, that you won’t find anywhere else.
One more thing. Read the paper itself, not the thread. The appendix is where the bodies are buried.
Speed
The stories about Alec Radford, rarely involve a single flash of insight. They involve volume. More runs per day, more wrong ideas discarded per week, a model of reality that updated faster than anyone else’s.
Research speed is mostly the speed at which you discover you’re wrong. — Vivek
This makes tooling a first-class activity, not a distraction. Launching a run should be one command. Plotting it should be one more. Every experiment should be reproducible from its config, and comparing two runs should take seconds, not an afternoon of archaeology.
Karpathy’s recipe for training neural networks includes a step that pays for itself a hundred times over: overfit a single batch before training at scale. Thirty seconds, half your bugs, gone. Shrink everything until it’s cheap, get it right, then spend the compute.
And retire the idea that engineering is the junior partner. At the frontier the two have fused. The researcher who can build the harness, the eval, and the data pipeline is the one whose hypotheses actually get tested. Everyone else is waiting in a queue.
Stare at the outputs
A descending loss curve is not analysis. It’s reassurance. Your experiments throw off far more information than you consume. Transcripts, failure cases, the strange tail of the distribution. Most of it dies unread in a logs folder.
Karpathy starts before any training code gets written, spending hours looking at raw data by hand. Most ML bugs live in the data and fail silently. Nothing crashes. You just get a mediocre model and a wrong theory about why.
Andrew Ng’s move is unglamorous and it beats everything. Pull a hundred failures. Read all of them. Sort them into piles. Attack the biggest pile. A benchmark you’ve never read transcripts from is a benchmark you don’t understand.
Vivek adds the ablation rule. Keep removing components until you know which one actually carries the result.
It’s usually one, and it’s usually not the one in the title. — Vivek
Tune your baselines until it hurts, because the graveyard of ML is full of gains that evaporated against a properly tuned baseline. A reviewer is the worst person to learn that from.
On papers
Karpathy’s core advice on writing: a paper sells a single thing that was not obvious before. Not a collection of experiments you happened to run. The entire paper is organized around this one contribution with surgical precision.
He learned this the hard way. In an early video classification paper, he packed in two contributions. The second was minor and diluted the paper. Nobody cared about it. It was distracting.
One of his best practical tips: review bad papers, not just good ones. Reading only good papers is like training a classifier on positive examples only. Reviewing rejected work teaches you what to avoid, and at a 25% acceptance rate, most of what you review will be bad. You’ll read through a terrible paper and notice how unclear it is, how vague the intro is, how it dives into details too fast, and you’ll learn to dodge the same mistakes in your own writing.
Stay focused, wander on purpose
All five authors agree: switching problems too often is a more common failure mode than staying too long.
Schulman’s test: look back at months of work. You should see lots of small dead ends, but the majority of time should have gone to projects that yielded a deliverable. If a big chunk went to half-finished projects abandoned for the next shiny idea, you need more follow-through.
One practical strategy he suggests: allocate a fixed exploration budget, say one day per week, for divergent ideas. Epsilon-greedy exploration. The rest stays focused on your main line.
But early on, you need to wander with intent.
Your first subfield is an accident of timing, so treat it like one. Somewhere in this field is a corner where your specific weirdness is an unfair advantage, and the only way to locate it is to pay tuition in several places. Nobody waives the tuition. — Vivek
Karpathy’s PhD illustrates this working. He meandered for about two years — 3D things, video things, nothing clicking. Then one Saturday at 2am he stopped by Richard Socher’s office, they talked about images and language, and everything fell into place. Fertile ground. Easy to explain. At the boundary of what was possible. Datasets just becoming available. A good fit for his adviser’s interests. He went deep from there and it became his thesis.
Breadth is also insurance. Subfields saturate, all of them, usually right after they peak on Twitter. The people who keep producing through those transitions already know the neighboring territory.
Two tricks for hard problems
Nielsen describes two psychological strategies for taking on problems that feel too big. They solve the same underlying issue: the intimidation of working on something for years with no guarantee of a result.
The first is Kolmogorov’s approach. Don’t invest everything in a direct attack. Embed the problem in a larger effort. Announce a seminar series on related material. Write lecture notes that might become a book. This way you produce something valuable no matter what happens, and you lower the psychological stakes. If you solve the problem, great. If you don’t, you’ve still contributed something real. The process was a success either way.
The second is Feynman’s approach. Convince yourself you have some kooky insight into the problem that nobody else has. Feynman admitted this belief was usually wrong. But he found it gave him the forward momentum needed to make a real dent. Fool yourself into thinking you have the inside track.
Kolmogorov removes the downside. Feynman inflates the upside. Together they get you moving.
Generosity compounds
Hamming noticed that colleagues with closed office doors got more done in any given year, but colleagues with open doors did the work that mattered. The interruptions carried information about what the world actually needed.
Generosity compounds in research like nothing else. Replicate a result and publish what you find. Release the tool you built for yourself. Explain something hard in plain language. The returns arrive sideways, months later, as the collaboration, the reference, or the role you couldn’t have applied for. — Vivek
Float your half-formed ideas in public, because being wrong on the timeline is far cheaper than being wrong in print. And the collaborator who tells you an idea is bad before you sink three months into it is worth more than compute. That relationship can’t be bought, only earned.
Karpathy makes the same point about conferences. They matter not for the talks, which are usually old news, but for the hallway. There’s a shared understanding in every field that never gets serialized into papers. Becoming part of the community gets you direct access to it.
The long game
Vivek ends his piece with a line I keep thinking about.
Knowledge and productivity compound like interest. The daily edges look trivial in isolation. What you read, what you record, how fast your loop runs, who you argue with. Give them a few years and they produce careers that look like luck from the outside.
Nielsen says the same thing using Aristotle: we are what we repeatedly do. Excellence, then, is not an act but a habit.
References
John Schulman, “An Opinionated Guide to ML Research” (2020). http://joschu.net/blog/opinionated-guide-ml-research.html
Michael Nielsen, “Principles of Effective Research” (2004). https://michaelnielsen.org/blog/principles-of-effective-research/
Andrej Karpathy, “A Survival Guide to a PhD” (2016). https://karpathy.github.io/2016/09/07/phd/
Neel Nanda, “My Research Process: Understanding and Cultivating Research Taste” (2025). https://www.alignmentforum.org/posts/Ldrss6o3tiKT6NdMm
Vivek (@itsreallyvivek), “How to Be Good at Research” (2025).
Also referenced within these pieces: Richard Hamming’s “You and Your Research,” Rich Sutton’s “The Bitter Lesson,” Chris Olah and Shan Carter’s “Research Debt,” Claude Shannon’s 1952 talk on creative thinking, and Karpathy’s “A Recipe for Training Neural Networks.”


