The Problem
Now that this site is taking shape, I wanted to cover basic SEO by adding it to Google Search and the Bing Webmaster tools. Both need a sitemap, which is just a list of the pages you want included by search crawlers.
Hugo will build you a sitemap as long as you set sitemap in the config TOML/YAML file. Which should be pretty easy, if not for the fact that the Cloudflare Pages worker rewrites all URLs in the site from its default subdomain to the base domain.
Hence we need a custom sitemap, and accompanying robots.txt, to control search engine crawling. A quick attempt didn’t work, as even though we’re building the site with the -b option to specify the real base URL, that didn’t seem to propagate to Hugo’s default sitemap.
After some fiddling, the following worked for me.
What Worked
Config File
First, enable the sitemap and robots.txt in the config file, and let’s specify a new parameter for the base URL to just use in the sitemap:
File:hugo.yml
# Sitemap settings. Change frequency is a hint to search engines about how often your content changes.
# Options include: always, hourly, daily, weekly, monthly, yearly, never.
sitemap:
changeFreq: monthly
params:
# Canonical base URL for the sitemap
canonicalBaseURL: https://meantimecyber.com
# Robots.txt settings. If you want to disable it, set enableRobotsTXT to false or just delete the variable. By default, it allows all bots to access all content. You can customize it by adding rules under the robots section.
enableRobotsTXT: trueSitemap Template
Then we need a sitemap template. GitHub Copilot drafted the following, with some tweaks:
File:layouts/sitemap.xml
{{ printf "<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?>" | safeHTML }}
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
{{- /* Use canonicalBaseURL from params to avoid Cloudflare Pages preview URLs appearing in the sitemap */ -}}
{{- $sitemapBaseURL := strings.TrimSuffix "/" (or site.Params.canonicalBaseURL site.BaseURL) -}}
{{- range .Site.Pages }}
<!-- Exclude tag taxonomy pages. Also exclude future-dated regular pages (scheduled posts),
but always keep section pages and the home page — their date is derived from children. -->
{{- if and (not (hasPrefix .RelPermalink "/tags/")) (not (hasPrefix .RelPermalink "/blogs/tags/")) (or .IsSection .IsHome (not (.Date.After now))) }}
<!-- Default lastmod to the page's own value -->
{{- $effectiveLastmod := .Lastmod -}}
<!-- For section pages (e.g. /blogs/), use the most recent *published* child page's date.
Explicitly filter to pages whose date is not in the future, so this works correctly
even in dev builds that use -F (include future posts). -->
{{- $publishedPages := where .Pages "Date" "<=" now -}}
{{- if and .IsSection $publishedPages -}}
{{- $effectiveLastmod = (index $publishedPages 0).Lastmod -}}
{{- end -}}
<url>
<loc>{{ printf "%s%s" $sitemapBaseURL .RelPermalink }}</loc>{{ if not $effectiveLastmod.IsZero }}
<lastmod>{{ $effectiveLastmod.Format "2006-01-02T15:04:05-07:00" | safeHTML }}</lastmod>{{ end }}{{ with .Sitemap.ChangeFreq }}
<changefreq>{{ . }}</changefreq>{{ end }}{{ if ge .Sitemap.Priority 0 }}
<priority>{{ .Sitemap.Priority }}</priority>{{ end }}
</url>
{{- end }}
{{- end }}
</urlset>This template:
- Uses our new
canonicalBaseURLparameter as the base URL, or falls back to the Hugo default. - Enumerates site pages and for each one:
- Adds them to the list as long as they don’t include
tags - Adds a sensible value for the last modified date, covering parent pages like
/blogs/where we want the date of the most recently updated child.
- Adds them to the list as long as they don’t include
For each page, we get a loc entry in the generated XML:
<url>
<loc>https://meantimecyber.com/blogs/</loc>
<lastmod>2026-04-29T00:00:00+00:00</lastmod>
<changefreq>monthly</changefreq>
</url>Robots.txt
The sitemap tells crawlers what you want indexed. The accompanying robots.txt points to the sitemap, and instructs the crawlers on what to ignore:
layouts/robots.txt
User-agent: *
Disallow: /tags/
Sitemap: https://meantimecyber.com/sitemap.xml