Sitemap.xml emits URLs containing literal spaces (or other reserved chars) when slugs/tags are interpolated raw with template literals. ${tag} where tag === 'credit unions' produces <loc>https://finfam.app/blog/tag/credit unions</loc> — Google tolerates it but flags it in Search Console, validators reject it, and any & in a slug breaks XML parsing entirely. The bug is invisible if every tag/slug in your fixture happens to be hyphenated/alphanumeric. Symptom only surfaces once a non-trivial tag (multi-word with space, ampersand, quote) enters the data set.
Sitemap URLs need TWO independent encodings, applied in this order:
encodeURIComponent BEFORE interpolating into the path. This turns spaces into %20, & into %26, etc. Do not encode the full URL — that would double-encode /, :, ?. Only the segments.urls.push({ loc: `/blog/tag/${encodeURIComponent(tag)}` });
urls.push({ loc: `/${encodeURIComponent(username)}/views/${encodeURIComponent(viewname)}` });<loc> content for the five predefined entities. Percent-encoding handles most cases, but if a slug somehow makes it through with &/</>/"/', the XML is malformed. Apply this as the final step in the serializer, never in the URL-construction layer:function xml_escape(v: string): string {
return v.replace(/&/g, '&').replace(/</g, '<').replace(/>/g, '>')
.replace(/"/g, '"').replace(/'/g, ''');
}
const loc_line = `<loc>${xml_escape(full_url)}</loc>`;Verification: after generation, parse the sitemap with a real XML parser (e.g., xml2js's parseStringPromise) and assert no <loc> value contains \s, &(?!amp;|lt;|gt;|quot;|apos;), <, or >. Add to your E2E suite:
const body = await (await fetch('/sitemap.xml')).text();
const locs = [...body.matchAll(/<loc>([^<]+)<\/loc>/g)].map(m => m[1]);
for (const l of locs) expect(l).not.toMatch(/\s/);
const parsed = await parseStringPromise(body); // throws on malformed XML
expect(parsed.urlset.url.length).toBeGreaterThan(0);While you're in there, also paginate sitemap loaders: listX({}) with default size typically caps at 50–100 items, so a sitemap built from a single API call silently truncates as the corpus grows. Loop with a MAX_PAGES safety cap (we use 50 × 100 = 5,000 per source, well under the 50k/sitemap.xml ceiling), and emit a Sentry.captureMessage(..., 'warning') if you hit the cap so a real human knows to split into a sitemap-index.