a visual teaching walkthrough · grounded in the self-avoiding-walk literature

Self-avoiding walks, by picture

How the counts for shorter walks help compute the next length — and the precise, known reason there's a limit to how far they can take you. A guided tour that rebuilds, with interactive pictures, structure that is well established in the study of self-avoiding walks.

What this is — and isn't This is an explainer, not a research claim. It reconstructs by visualization things that are well established in the theory of self-avoiding walks: the two-point function, the random-walk-plus-correction decomposition (the lace expansion), incremental enumeration (the finite-lattice and two-step methods), and the proof that the generating function is not D-finite. A handful of small bookkeeping observations along the way are elementary and stated without a citation — these are flagged (elementary — not from the literature). Nothing here is offered as new. The aim is intuition for one question, and an honest account of its known answer. Full references in §13.

01 The folding problem

Imagine a tiny protein: a chain of beads. Each bead is one of two kinds — H (hydrophobic, "oily") or P (polar, "water-loving"). The chain lives on a square grid and folds so that no two beads sit on the same square. This is the HP lattice model — a stripped-down cartoon of real protein folding called a prototein. The model and that name come from a 1998 American Scientist column by Brian Hayes, "Prototeins" (reprint) — the original introduction that started this investigation.

The chain wants to fold so that H beads clump together, hiding from water. We score a fold by counting H–H contacts: pairs of H beads that sit next to each other on the grid but are not already neighbors along the chain. The best fold is the one with the most contacts.

A folded prototein · H–H contacts highlighted

The obvious program just tries every fold by brute-force recursion and keeps the best. That works for short chains and slams into an exponential wall for longer ones. To go further we have to understand the structure of folds themselves, independent of the H/P labels.

Key move Separate the shape of a fold from its score. Every fold is a self-avoiding path on the grid; the H/P labelling only matters when we score. So first: understand all the shapes.
The question we'll build intuition for Can the data for previous path lengths (< N) accelerate computing the paths of length N? The short, established answer is yes — and this is exactly what the best enumeration methods do (the finite-lattice and lace-expansion "two-step" methods, §11–§12). The rest of this page is a slow, visual unpacking of how that works — what carries over for free — and of the precise, proven reason it has a hard limit. We'll use the simplest possible version of the idea so the structure is visible; the research-grade methods are more powerful relatives of it.

02 It's really about self-avoiding walks

Strip away H and P. A fold is just a path that starts at the origin, takes unit steps up/down/left/right, and never revisits a square. That's a self-avoiding walk (SAW). The number of folds of a chain of length n is the number of SAWs of n steps.

Watch one get built. The walk grows step by step; when it paints itself into a corner it backtracks. Every distinct path that reaches full length is one fold.

The catch is how fast they multiply. The count of SAWs by length is the sequence A001411:

Each step multiplies the count by roughly 2.64 (the lattice's connective constant µ). Brute force is hopeless past a few dozen steps. We need to stop listing walks and start counting them cleverly — enumerating the distinct shapes once (exploiting their symmetry and packing them into lookup tables), separately from any scoring. This is a well-trodden path in the self-avoiding-walk literature (see §13).

03 The endpoint diamond

Here's the compression that makes everything tractable. Instead of remembering every walk, ask a smaller question: how many n-step walks end on each square? Call that grid of counts Wn(x,y). It throws away the route and keeps only the destination. This object is not ad hoc — it is the two-point function of the self-avoiding walk, the central quantity of the mathematical theory (see Madras & Slade, The Self-Avoiding Walk, §13).

Drag the slider. Brighter = more walks end there. Two things pop out immediately: the counts live inside a diamond (you can't end farther than n steps away), and they sit on a checkerboard (after n steps you're always an even/odd distance from home — half the squares are unreachable).

Hover a square to read its exact count.
Why this view The whole problem now lives in one picture per length. The rest of the tour is about reading structure off this picture: which squares are forced by the previous lengths and which carry genuinely new information. That split — and its limits — is what the literature in §13 makes precise.

04 Symmetry: you only ever need one eighth

The grid has the symmetry of a square: 4 rotations × 2 mirror flips = 8 copies of everything. A walk rotated or reflected is still a valid walk, so the diamond is identical under all 8 operations. That means you only need to compute one wedge — a single 45° octant — and stamp out the other seven for free.

click any square to see its 8 images
The highlighted wedge is all you ever compute. ×8 gives the rest.
Standard practice A provable reduction: compute only the wedge 0 ≤ y ≤ x. This is the symmetry every SAW enumeration exploits (typically by fixing the first steps and skipping mirror images) — nothing special to here.

05 The boundary is Pascal's triangle

Look at the outer rim of the diamond — the squares exactly n steps away. To get that far you can never waste a move, so you can only ever step (say) right or up, never back. A walk that only goes right/up cannot cross itself. So reaching the rim is just "how many ways to arrange k ups among n steps" — a binomial coefficient. The boundary of the diamond is literally Pascal's triangle.

Rim squares labelled with C(n,k). A monotone (right/up) walk traces the edge — it can't self-intersect, so its count is exact.
Elementary & classical The entire boundary shell equals C(n, k) — a formula, no computation. These are exactly the fully directed (staircase) walks, the simplest exactly-solved case in lattice-walk combinatorics. So the rim of the diamond is free for textbook reasons.

06 Peeling shells: bands and "surplus"

Generalise the rim idea. Peel the diamond into concentric shells, indexed by c: shell c is every square at distance n − 2c from the origin. The rim is c = 0; c = 1 is the next ring in; and so on toward the centre.

The index c has a clean meaning: a walk ending on shell c used 2c more steps than the straight-line minimum. We call 2c the surplus — the number of "wasted" steps the walk spent doubling back. Small c (near the rim) = nearly straight walks. Large c (near the centre) = walks that coil back on themselves.

Surplus is the whole game: self-avoidance only bites when the walk has spare steps to curl back. Near the rim there's no room to collide; near the centre there's nothing but collisions to worry about.

07 What's free, and what is genuinely hard

Now the heart of it. Read each shell down the lengths — fix c, watch the counts as n grows. Some shells follow a simple closed-form rule (you can extend them with arithmetic). Others don't, and provably never will.

Near the rim: polynomials

Along the axis, shell c behaves like a polynomial of degree 2c in n, with leading coefficient the central binomial C(2c, c) = 2, 6, 20, 70, 252, … A degree-2c polynomial is pinned down by 2c+1 numbers you already have from previous lengths, so these near-axis shells extrapolate by arithmetic. (Elementary — not from the literature. I verified it numerically to length ~24; "bounded-surplus walks are eventually polynomial" is the kind of fact that is easy to believe and surely folklore, but I'm not citing a specific source.)

On-axis shell sequences Wn(n−2c, 0) and their finite differences
Watch the differences collapse to a constant (the central binomial) — the signature of a closed-form polynomial.

Near the centre / diagonal: no closed form, ever

Now read a fixed central square down the lengths — say "walks that end one step from home", Wn(1,0). That sequence is A225877, and it equals n × the number of self-avoiding polygons (A002931) — closed loops on the grid. This is a famously hard object.

Does any finite recurrence with polynomial coefficients fit? (65 exact terms)
A known wall — re-seen here A quick rank test over 65 exact terms finds no finite linear recurrence with polynomial coefficients (order ≤ 6, degree ≤ 7): the central sequence is non-holonomic. This is not a discovery — it reflects the established result that the SAW/SAP generating functions are not D-finite (Conway & Guttmann 1996; proven for self-avoiding polygons by Rechnitzer 2006, §13). The takeaway for us: the endpoint counts are not a sufficient statistic for their own continuation — no fixed formula on the table can advance its centre, so genuinely new information enters there at every length.
The dividing line (observed) In this picture, closed-form structure holds for shells c = 0 and c = 1 (binomial rim, then "one-defect" walks) and breaks at c = 2 — the first shell roomy enough for a real self-avoiding detour. The hard, information-bearing part lives near the centre/diagonal of shells c ≥ 2. (Where that boundary sits is an observation from these pictures; that the centre is hard is the cited result above.)

08 Patching a length — the lace expansion, by hand

Suppose you have the diamonds for every length below n. Start with a free guess: pretend walks don't avoid themselves and push the previous diamond outward — every square gets the sum of its four neighbours from length n−1. Call this Un. That's a plain random-walk step: pure arithmetic on data you already own.

Un overcounts — it includes walks that secretly collide. The overcount is the correction Cn = Un − Wn, the part the random walk gets wrong because real walks avoid themselves. This split — random-walk step minus a self-avoidance correction — is exactly the decomposition the lace expansion formalizes (Brydges & Spencer 1985; Madras & Slade; §13). What we're drawing by hand is its crudest, one-step form.

Elementary A correction needs a walk that reaches a square and loops back to touch it — at least |x|+|y|+1 steps. So Cn = 0 on the outermost shell: the rim patches itself, and the correction lives in the interior, on exactly (n−1)² squares. (Both the bound and the (n−1)² count are elementary bookkeeping — stated here without a citation.)
The catch that the lace expansion solves Drawn this naïvely, Cn is small in where it lives but not in how much it costs: the colliding walks are a constant fraction of all walks, so computing Cn head-on is as hard as the whole count. The lace expansion's real content is a resummation that replaces this one-step correction with a sparse, short-range object (the lace coefficients), which is what buys an actual speedup — see §9–§12.

09 Where the difficulty concentrates

Stack the easy pieces and colour the diamond by how each square gets filled — symmetry copy, binomial rim, extrapolated near-axis shell, the random-walk push, or genuinely new:

symmetry copies (free ×8) rim = Pascal (formula) shell c=1 (closed form) near-axis (extrapolate) core · no closed-form rule
A map, not a faster algorithm The squares carrying genuinely new information are the near-diagonal interior of shells c ≥ 2, in one wedge. But read the count honestly: this is small in the number of squares, not in the work — each core square is a count as hard as the whole problem (§8). So this picture is a map of where the difficulty concentrates, not a speed-up by itself. The genuine acceleration of "use the shorter lengths" comes from the research methods that carry sparse extra state — the lace-expansion two-step method and length-doubling (§12), which reach lengths 30–36 where naïve enumeration cannot.

10 Back to the original question: enumerate length N

We started at "fold this protein" and detoured through endpoint diamonds. Now unwind it. Every serious SAW-counting method shares one shape: don't attack N head-on — build up by length, carrying forward the smallest state that lets the next length be cheap. The endpoint-table patch below is the easy-to-picture version of that idea; the research methods carry richer state (cut-connectivity in the finite-lattice method, the lace coefficients in the two-step method), which is what makes them fast.

Run the build-up. The bar shows, at each length, how many diamond squares were filled for free (symmetry · formula · extrapolation · the outward push) versus how many were genuinely enumerated.

Squares fillable by a rule (symmetry, formulas, extrapolation, the random-walk push) vs. squares with no closed-form rule — the core — per length. Bars count squares, not work.
The shape of the method Build lengths 1…N−1, carrying the table forward; then extend to N. At each length: fold into one wedge, write the rim as binomials, extrapolate the near-axis shells, push the bulk outward, and spend real effort only on the core of shells c ≥ 2. That last part is where the field's machinery does the heavy lifting that this hand-version only sketches — the two methods that do it are unpacked next, in §11 (finite-lattice) and §12 (lace expansion). The HP folding score from §1 is then read off the finished shapes.

11 Method 1: the finite-lattice / transfer-matrix method

This is the workhorse behind the record-length enumerations. The idea: sweep an imaginary cut line across the lattice one site at a time, extending walks as you go. The cleverness is in what you remember. You do not store the filled-in part behind the line — only how the partial walk's strands connect across it (which loose ends are joined behind the line, drawn as a non-crossing pairing), plus a running count of how many fillings realise each pairing.

Why that's enough: anything to the right of the line only cares about which strands are available to connect to and how they are already paired — never about the exact shape on the left. So two completely different left-fillings with the same crossing pattern are interchangeable, and the method collapses exponentially many configurations into one state. Slide the cut and watch the stored pairing (orange) update:

How it relates to our question This is "build up by length, carrying the minimal sufficient state" — exactly the hypothesis — but the state carried is the cut-line connectivity, not the endpoint table. That choice is forced by §7: the endpoint counts are not a sufficient statistic, so the right thing to remember is the interface. The number of interface states grows exponentially in the cut width, yet far slower than the number of walks — which is why it reaches lengths brute force cannot. Introduced by Enting; developed by Conway, Enting & Guttmann (1993), Jensen, and Clisby & Jensen, who counted self-avoiding polygons to perimeter 130 (§13).

12 Method 2: the lace expansion & the two-step method

Recall §8: Wn = (random-walk push) − (correction). That correction is honest but expensive — a constant fraction of walks self-collide, so it costs about as much as the whole count, at every length. The lace expansion rewrites the same correction in a far cheaper form.

It expresses the two-point function as a random-walk line decorated with local self-avoidance corrections — an exact series, like beads on a string. Each bead (a lace coefficient, π) is a small, signed inclusion–exclusion count of one irreducible self-intersection pattern. The key property is that these beads are short-range: only small ones matter and they decay quickly. So you enumerate the small irreducible pieces once and slide them along by convolution — instead of recomputing a fresh wholesale correction at every length. Add terms to build the series:

The genuine speed-up — with an honest caveat Because the correction becomes a small, reusable, local table, the cost collapses: the two-step method (Clisby, Liang & Slade, 2007) reached length 30, and length-doubling (Clisby, 2013) reached 36 — orders of magnitude past naïve enumeration. This is the real, working form of "use the shorter lengths to get the next one." The picture is schematic: the true lace coefficients are signed sums over diagrams, not literal single loops. But the essence is faithful — replace a wholesale correction with a small, reusable, local one. References in §13.

13 Related work

Self-avoiding walks (SAWs) are one of the most-studied unsolved models in statistical mechanics, so the ideas on this page sit inside a large literature. The pieces most relevant to the hypothesis — use the shorter lengths to get the next one — are worth separating out.

The model

The HP lattice model of protein folding is due to Lau & Dill (1989) — a two-letter (hydrophobic / polar) chain on a lattice whose energy counts H–H contacts. It became a standard testbed for folding algorithms. The popular framing used here — "prototeins" — is Brian Hayes, "Prototeins," American Scientist 86(3), 1998, with a companion column on the walks themselves, "How to Avoid Yourself," 86(4), 1998.

Counting the walks

The totals (A001411) grow like µⁿ; the square-lattice connective constant µ ≈ 2.638 is not known in closed form (the honeycomb value √(2+√2) was proven only in 2012 by Duminil-Copin & Smirnov). The conjectured critical exponents γ = 43/32, ν = 3/4 come from Coulomb-gas arguments (Nienhuis, 1982) and the SLE8/3 picture (Lawler, Schramm & Werner). None of this yields an exact formula for the counts — it constrains their growth, exactly as our asymptotics did.

Building longer walks from shorter ones — the closest kin to the hypothesis (walked through visually in §11§12)

  • Finite-lattice / transfer-matrix method. Enting; Conway, Enting & Guttmann (1993); Jensen (2003); Clisby & Jensen (2012). These advance length-by-length while tracking only the connectivity of partial walks across a cut-line — the field's answer to "what is the minimal incremental state." It is the same instinct as our patch, but the carried state is cut-connectivity, not the endpoint table. It reached self-avoiding polygons to perimeter 130.
  • Lace expansion + the "two-step" method (Clisby, Liang & Slade, 2007) and the length-doubling method (Clisby, 2013). These literally reuse shorter-length data — the two-point function / subset-visit counts — to reach longer walks (N = 30, then 36), a result that would take ~10³× longer by naïve enumeration. This is the established, working form of the hypothesis. The crucial caveat: they recurse on richer objects than the bare endpoint counts, because — as the next item shows — the counts alone are not enough.

Why a pure recurrence on the counts cannot exist

  • Conway & Guttmann (1996) — analysis of a 51-term series gives compelling evidence the SAW generating function is not D-finite (satisfies no linear ODE with polynomial coefficients ⇔ the counts satisfy no finite polynomial-coefficient recurrence).
  • Guttmann & Enting's "solvability test" — a model's anisotropic generating function showing a natural boundary signals it is not D-finite.
  • Rechnitzer, "Haruspicy 2" (2006) — proves the anisotropic self-avoiding-polygon generating function is not D-finite, settling the Guttmann–Enting conjecture. This is the rigorous backbone behind the wall at the centre of the diamond (§7): the central counts (A225877 = n·A002931) inherit that hardness.
What this walkthrough adds: exposition, not results Everything load-bearing here is cited above. The page just draws those objects in one consistent picture — the two-point function as a diamond, the lace decomposition as a random-walk push plus a correction, non-D-finiteness as a wall at the centre — so the answer to the opening question is visible: "use the shorter lengths" works because the random-walk part is free, and it has a hard limit because the counts are not a sufficient statistic for their own continuation (which is why the real methods carry extra state). The only non-cited bits are elementary bookkeeping — the (n−1)² correction support and the degree-2c near-axis shells — flagged as such where they appear.

14 What we saw — and where it comes from

Each claim in the walkthrough, with its status. Nothing here is a new result; the entries marked elementary are bookkeeping observations made for this page and stated without a citation.

Claim (where)Status
Endpoint diamond = the SAW two-point function (§3)Established — Madras & Slade
8-fold symmetry → one wedge (§4)Standard practice
Rim = C(n,k), directed/staircase walks (§5)Elementary / classical
Near-axis shell c = degree-2c polynomial, lead C(2c,c) (§7)Elementary observation
Centre is non-holonomic / no finite recurrence (§7)Established — Conway–Guttmann; Rechnitzer
Wn = (random-walk push) − correction (§8)Established — lace expansion
Correction vanishes on rim; support = (n−1)² (§8)Elementary observation
Incremental build-up carries minimal state (§10)Established — finite-lattice / two-step

How the pieces are computed (methods, not magic)

  • Brute force — recurse over every fold, score, keep the best. Correct, exponential; the baseline §1 reacts against.
  • Shape enumeration — list the distinct walk shapes once, fixing the first steps to kill mirror images and packing the local "does this step fit?" test into a lookup table; score afterwards. (A version of this was written for the PlayStation 3's Cell processor, with the shapes bit-packed into 128-bit vectors.) This is §2 + §4 in practice.
  • Difference tables — lay the endpoint counts out by shell and repeatedly subtract Pascal's rule. The rim collapses to zero (it is Pascal); the near-axis shells collapse after a few rounds (polynomials); the centre never collapses. That last fact is the whole point of §7.
  • Endpoint generator — a self-avoiding-walk counter that bins walks by their endpoint, producing the diamonds in §3 directly. (The exact counts shipped in data.js were produced this way for lengths 1–16.)

Number sequences

OEIS A001411 (all SAWs, §2) · A225877 (walks to a neighbour, the diamond's centre) = n × A002931 (self-avoiding polygons) = ¼ A010566 (§7). Full literature in §13.