Web presence readiness checklist

OK… let’s collect everything you need to be the most discoverable – and to have the best chance for the best rich sharing assets and general experience for people finding your site.

You could think of this as “Optimizing your site for Search Engines” (Search Engine Optimization / SEO) but that always sounded like you were optimizing the engine… not FOR the searches.

Today, your site is also being visited and evaluated by a growing ecosystem of crawlers, bots, recommendation engines, LLM-based assistants (like ChatGPT), voice-activated systems (like Siri and Alexa), content summarization tools, shopping AI, and even third-party knowledge graphs that feed into social platforms and apps. It’s not just about Google’s search / and who knows — things might change drastically. So, let’s make sure we’re taking advantage of everything we can.

Robots.txt

If you block things (first off note they might not listen to you anyway) – but if they are blocked – then you’re page won’t be crawl-able – so, that’s the first thing to consider. What are those rules? Where are they set? In your project? Or at a higher level – like your host or something out of your control? Who do you want to allow? Who do you want to block? Do you have any ability to rate limit?

HTML element

Deciding who can crawl your site

# in terminal
curl https://example.com
curl example.com

curl --head example.com
curl -I example.com          #same, but with annoying shorthand


# in your robots.txt
User-agent: *
Disallow: /

Give these a shot in order / in different combinations

Semantic HTML (a functional web page)

This comes first because no bot or AI can do anything useful if the core HTML is broken or meaningless. If you’ve gone through DFTW, well – you know all about that to an expert level. But most people seem to totally brush this off. Lame. It’s not just about your eyes. It’s about everyone and every thing being able to read and explore your content (otherwise / why have it at all). Technically this also includes a language and an official <title> (but we put that in the next part). So, get it! the page will be more discoverable if it isn’t broken or incorrectly authored. The higher the quality of definition – the more likely it will be received and chosen as the canonical source.

Getting distracted from WORK!!! so – I fed my general outline into the LLM to make a loose list for me to come back to later — so, I don’t get off track. Want to help flesh all this out and test it – and document it with me!?!?