Self-avoiding walks, by picture
How the counts for shorter walks help compute the next length — and the precise, known reason there's a limit to how far they can take you. A guided tour that rebuilds, with interactive pictures, structure that is well established in the study of self-avoiding walks.
01 The folding problem
Imagine a tiny protein: a chain of beads. Each bead is one of two kinds — H (hydrophobic, "oily") or P (polar, "water-loving"). The chain lives on a square grid and folds so that no two beads sit on the same square. This is the HP lattice model — a stripped-down cartoon of real protein folding called a prototein. The model and that name come from a 1998 American Scientist column by Brian Hayes, "Prototeins" (reprint) — the original introduction that started this investigation.
The chain wants to fold so that H beads clump together, hiding from water. We score a fold by counting H–H contacts: pairs of H beads that sit next to each other on the grid but are not already neighbors along the chain. The best fold is the one with the most contacts.
The obvious program just tries every fold by brute-force recursion and keeps the best. That works for short chains and slams into an exponential wall for longer ones. To go further we have to understand the structure of folds themselves, independent of the H/P labels.
02 It's really about self-avoiding walks
Strip away H and P. A fold is just a path that starts at the origin, takes unit steps up/down/left/right, and never revisits a square. That's a self-avoiding walk (SAW). The number of folds of a chain of length n is the number of SAWs of n steps.
Watch one get built. The walk grows step by step; when it paints itself into a corner it backtracks. Every distinct path that reaches full length is one fold.
The catch is how fast they multiply. The count of SAWs by length is the sequence A001411:
Each step multiplies the count by roughly 2.64 (the lattice's connective constant µ). Brute force is hopeless past a few dozen steps. We need to stop listing walks and start counting them cleverly — enumerating the distinct shapes once (exploiting their symmetry and packing them into lookup tables), separately from any scoring. This is a well-trodden path in the self-avoiding-walk literature (see §13).
03 The endpoint diamond
Here's the compression that makes everything tractable. Instead of remembering every walk, ask a smaller question: how many n-step walks end on each square? Call that grid of counts Wn(x,y). It throws away the route and keeps only the destination. This object is not ad hoc — it is the two-point function of the self-avoiding walk, the central quantity of the mathematical theory (see Madras & Slade, The Self-Avoiding Walk, §13).
Drag the slider. Brighter = more walks end there. Two things pop out immediately: the counts live inside a diamond (you can't end farther than n steps away), and they sit on a checkerboard (after n steps you're always an even/odd distance from home — half the squares are unreachable).
04 Symmetry: you only ever need one eighth
The grid has the symmetry of a square: 4 rotations × 2 mirror flips = 8 copies of everything. A walk rotated or reflected is still a valid walk, so the diamond is identical under all 8 operations. That means you only need to compute one wedge — a single 45° octant — and stamp out the other seven for free.
0 ≤ y ≤ x. This is the symmetry every SAW enumeration exploits (typically by fixing the first steps and skipping mirror images) — nothing special to here.05 The boundary is Pascal's triangle
Look at the outer rim of the diamond — the squares exactly n steps away. To get that far you can never waste a move, so you can only ever step (say) right or up, never back. A walk that only goes right/up cannot cross itself. So reaching the rim is just "how many ways to arrange k ups among n steps" — a binomial coefficient. The boundary of the diamond is literally Pascal's triangle.
06 Peeling shells: bands and "surplus"
Generalise the rim idea. Peel the diamond into concentric shells, indexed by c: shell c is every square at distance n − 2c from the origin. The rim is c = 0; c = 1 is the next ring in; and so on toward the centre.
The index c has a clean meaning: a walk ending on shell c used 2c more steps than the straight-line minimum. We call 2c the surplus — the number of "wasted" steps the walk spent doubling back. Small c (near the rim) = nearly straight walks. Large c (near the centre) = walks that coil back on themselves.
Surplus is the whole game: self-avoidance only bites when the walk has spare steps to curl back. Near the rim there's no room to collide; near the centre there's nothing but collisions to worry about.
07 What's free, and what is genuinely hard
Now the heart of it. Read each shell down the lengths — fix c, watch the counts as n grows. Some shells follow a simple closed-form rule (you can extend them with arithmetic). Others don't, and provably never will.
Near the rim: polynomials
Along the axis, shell c behaves like a polynomial of degree 2c in n, with leading coefficient the central binomial C(2c, c) = 2, 6, 20, 70, 252, … A degree-2c polynomial is pinned down by 2c+1 numbers you already have from previous lengths, so these near-axis shells extrapolate by arithmetic. (Elementary — not from the literature. I verified it numerically to length ~24; "bounded-surplus walks are eventually polynomial" is the kind of fact that is easy to believe and surely folklore, but I'm not citing a specific source.)
Near the centre / diagonal: no closed form, ever
Now read a fixed central square down the lengths — say "walks that end one step from home", Wn(1,0). That sequence is A225877, and it equals n × the number of self-avoiding polygons (A002931) — closed loops on the grid. This is a famously hard object.
08 Patching a length — the lace expansion, by hand
Suppose you have the diamonds for every length below n. Start with a free guess: pretend walks don't avoid themselves and push the previous diamond outward — every square gets the sum of its four neighbours from length n−1. Call this Un. That's a plain random-walk step: pure arithmetic on data you already own.
Un overcounts — it includes walks that secretly collide. The overcount is the correction Cn = Un − Wn, the part the random walk gets wrong because real walks avoid themselves. This split — random-walk step minus a self-avoidance correction — is exactly the decomposition the lace expansion formalizes (Brydges & Spencer 1985; Madras & Slade; §13). What we're drawing by hand is its crudest, one-step form.
|x|+|y|+1 steps. So Cn = 0 on the outermost shell: the rim patches itself, and the correction lives in the interior, on exactly (n−1)² squares. (Both the bound and the (n−1)² count are elementary bookkeeping — stated here without a citation.)Cn is small in where it lives but not in how much it costs: the colliding walks are a constant fraction of all walks, so computing Cn head-on is as hard as the whole count. The lace expansion's real content is a resummation that replaces this one-step correction with a sparse, short-range object (the lace coefficients), which is what buys an actual speedup — see §9–§12.09 Where the difficulty concentrates
Stack the easy pieces and colour the diamond by how each square gets filled — symmetry copy, binomial rim, extrapolated near-axis shell, the random-walk push, or genuinely new:
10 Back to the original question: enumerate length N
We started at "fold this protein" and detoured through endpoint diamonds. Now unwind it. Every serious SAW-counting method shares one shape: don't attack N head-on — build up by length, carrying forward the smallest state that lets the next length be cheap. The endpoint-table patch below is the easy-to-picture version of that idea; the research methods carry richer state (cut-connectivity in the finite-lattice method, the lace coefficients in the two-step method), which is what makes them fast.
Run the build-up. The bar shows, at each length, how many diamond squares were filled for free (symmetry · formula · extrapolation · the outward push) versus how many were genuinely enumerated.
11 Method 1: the finite-lattice / transfer-matrix method
This is the workhorse behind the record-length enumerations. The idea: sweep an imaginary cut line across the lattice one site at a time, extending walks as you go. The cleverness is in what you remember. You do not store the filled-in part behind the line — only how the partial walk's strands connect across it (which loose ends are joined behind the line, drawn as a non-crossing pairing), plus a running count of how many fillings realise each pairing.
Why that's enough: anything to the right of the line only cares about which strands are available to connect to and how they are already paired — never about the exact shape on the left. So two completely different left-fillings with the same crossing pattern are interchangeable, and the method collapses exponentially many configurations into one state. Slide the cut and watch the stored pairing (orange) update:
12 Method 2: the lace expansion & the two-step method
Recall §8: Wn = (random-walk push) − (correction). That correction is honest but expensive — a constant fraction of walks self-collide, so it costs about as much as the whole count, at every length. The lace expansion rewrites the same correction in a far cheaper form.
It expresses the two-point function as a random-walk line decorated with local self-avoidance corrections — an exact series, like beads on a string. Each bead (a lace coefficient, π) is a small, signed inclusion–exclusion count of one irreducible self-intersection pattern. The key property is that these beads are short-range: only small ones matter and they decay quickly. So you enumerate the small irreducible pieces once and slide them along by convolution — instead of recomputing a fresh wholesale correction at every length. Add terms to build the series:
13 Related work
Self-avoiding walks (SAWs) are one of the most-studied unsolved models in statistical mechanics, so the ideas on this page sit inside a large literature. The pieces most relevant to the hypothesis — use the shorter lengths to get the next one — are worth separating out.
The model
The HP lattice model of protein folding is due to Lau & Dill (1989) — a two-letter (hydrophobic / polar) chain on a lattice whose energy counts H–H contacts. It became a standard testbed for folding algorithms. The popular framing used here — "prototeins" — is Brian Hayes, "Prototeins," American Scientist 86(3), 1998, with a companion column on the walks themselves, "How to Avoid Yourself," 86(4), 1998.
Counting the walks
The totals (A001411) grow like µⁿ; the square-lattice connective constant µ ≈ 2.638 is not known in closed form (the honeycomb value √(2+√2) was proven only in 2012 by Duminil-Copin & Smirnov). The conjectured critical exponents γ = 43/32, ν = 3/4 come from Coulomb-gas arguments (Nienhuis, 1982) and the SLE8/3 picture (Lawler, Schramm & Werner). None of this yields an exact formula for the counts — it constrains their growth, exactly as our asymptotics did.
Building longer walks from shorter ones — the closest kin to the hypothesis (walked through visually in §11–§12)
- Finite-lattice / transfer-matrix method. Enting; Conway, Enting & Guttmann (1993); Jensen (2003); Clisby & Jensen (2012). These advance length-by-length while tracking only the connectivity of partial walks across a cut-line — the field's answer to "what is the minimal incremental state." It is the same instinct as our patch, but the carried state is cut-connectivity, not the endpoint table. It reached self-avoiding polygons to perimeter 130.
- Lace expansion + the "two-step" method (Clisby, Liang & Slade, 2007) and the length-doubling method (Clisby, 2013). These literally reuse shorter-length data — the two-point function / subset-visit counts — to reach longer walks (N = 30, then 36), a result that would take ~10³× longer by naïve enumeration. This is the established, working form of the hypothesis. The crucial caveat: they recurse on richer objects than the bare endpoint counts, because — as the next item shows — the counts alone are not enough.
Why a pure recurrence on the counts cannot exist
- Conway & Guttmann (1996) — analysis of a 51-term series gives compelling evidence the SAW generating function is not D-finite (satisfies no linear ODE with polynomial coefficients ⇔ the counts satisfy no finite polynomial-coefficient recurrence).
- Guttmann & Enting's "solvability test" — a model's anisotropic generating function showing a natural boundary signals it is not D-finite.
- Rechnitzer, "Haruspicy 2" (2006) — proves the anisotropic self-avoiding-polygon generating function is not D-finite, settling the Guttmann–Enting conjecture. This is the rigorous backbone behind the wall at the centre of the diamond (§7): the central counts (A225877 = n·A002931) inherit that hardness.
(n−1)² correction support and the degree-2c near-axis shells — flagged as such where they appear.14 What we saw — and where it comes from
Each claim in the walkthrough, with its status. Nothing here is a new result; the entries marked elementary are bookkeeping observations made for this page and stated without a citation.
| Claim (where) | Status |
|---|---|
| Endpoint diamond = the SAW two-point function (§3) | Established — Madras & Slade |
| 8-fold symmetry → one wedge (§4) | Standard practice |
| Rim = C(n,k), directed/staircase walks (§5) | Elementary / classical |
| Near-axis shell c = degree-2c polynomial, lead C(2c,c) (§7) | Elementary observation |
| Centre is non-holonomic / no finite recurrence (§7) | Established — Conway–Guttmann; Rechnitzer |
| Wn = (random-walk push) − correction (§8) | Established — lace expansion |
| Correction vanishes on rim; support = (n−1)² (§8) | Elementary observation |
| Incremental build-up carries minimal state (§10) | Established — finite-lattice / two-step |
How the pieces are computed (methods, not magic)
- Brute force — recurse over every fold, score, keep the best. Correct, exponential; the baseline §1 reacts against.
- Shape enumeration — list the distinct walk shapes once, fixing the first steps to kill mirror images and packing the local "does this step fit?" test into a lookup table; score afterwards. (A version of this was written for the PlayStation 3's Cell processor, with the shapes bit-packed into 128-bit vectors.) This is §2 + §4 in practice.
- Difference tables — lay the endpoint counts out by shell and repeatedly subtract Pascal's rule. The rim collapses to zero (it is Pascal); the near-axis shells collapse after a few rounds (polynomials); the centre never collapses. That last fact is the whole point of §7.
- Endpoint generator — a self-avoiding-walk counter that bins walks by their endpoint, producing the diamonds in §3 directly. (The exact counts shipped in
data.jswere produced this way for lengths 1–16.)
Number sequences
OEIS A001411 (all SAWs, §2) · A225877 (walks to a neighbour, the diamond's centre) = n × A002931 (self-avoiding polygons) = ¼ A010566 (§7). Full literature in §13.