Crawl budget basics - Why Google isn’t indexing your pages

As a marketer, you’ve invested countless hours adding value to your website. Now imagine a regular visitor drops by to see what’s new and determine what’s worth showing in Google Search.

That visitor? It’s called Googlebot—the crawler responsible for discovering and indexing your content. It scans your pages to decide what should be included in Google Search and how frequently to return for updates.

But here’s the catch…

Googlebot doesn’t have unlimited resources. Every site gets a set crawl budget—an allowance of time and bandwidth Googlebot uses to explore your website.

The more efficiently you use your crawl budget, the easier it is for Googlebot to find and prioritise your most valuable content, helping you rank more effectively.

What is crawl budget (and why does it matter)?

Crawl budget is the limit Googlebot places on the number of pages it’s willing to crawl on your website within a specific timeframe.

Imagine Googlebot has a fixed amount of time and energy each day to explore your site. It browses your pages, deciding what to read and what to skip.

If your website has 10,000 URLs but Googlebot only has the capacity to crawl 2,000 today, it must prioritise. Without direction, it might waste time on low-value or duplicate pages—missing out on your most important updates.

Example:

Let’s say you run an eCommerce site with 6,000 pages. Half of them are just filtered versions or slight variations—colour, size, or duplicates.

To users, those variations are useful. But to Googlebot, they’re essentially the same.

While Googlebot is busy crawling:

/product/red
/product/blue
/product/xl

…it might completely miss:

Your updated homepage
A new seasonal landing page
Your latest blog post going viral on social media

Even if the content is live and valuable, it may not be crawled—or indexed—in time, all because your crawl budget was misused.

Crawlability vs crawl budget: What’s the difference?

Though they sound similar, crawlability and crawl budget are different—but both are vital.

Without access (crawlability) and priority (crawl budget), even your best pages could go unseen by Google.

1. Crawlability = Access

Crawlability asks a simple question: Can Googlebot access this page?

If the answer is “no”, it doesn’t matter how good the content is—Googlebot will skip it entirely.

For instance, if a page is blocked in your robots.txt, Googlebot treats it like a “Do not enter” sign.

2. Crawl budget = Priority and choice

Once a page is crawlable, crawl budget kicks in. It’s now a question of:

“Do I have the time and resources to crawl this page soon?”

Even if the page is accessible, it might be ignored if Googlebot deems it low priority.

Example: A live event page from 2017 that’s still crawlable but outdated.

Googlebot might think, “Not important—I’ll come back later… maybe.”

So the page could remain untouched for months.

TL;DR:

If a page isn’t crawlable, it won’t be discovered.
If it’s crawlable but low priority, it might be skipped.

You need both access and prioritisation for Google to see (and index) your key content.

Why crawl budget matters—and when it applies

If Googlebot hasn’t crawled your page, it can’t rank it.

It might not even know the page exists—or worse, display an outdated version in search results.

This can seriously impact visibility. For instance:

Launch a new product page that isn’t crawled? It won’t appear in search.
Update prices across service pages but they’re not recrawled? Google might still show old ones in SERPs.

This is when crawl budget becomes critical.

When crawl budget becomes a concern

While crawl budget affects all sites to some extent, it’s especially important for:

Large websites (thousands or millions of URLs)
News and media sites (high publishing frequency)
eCommerce platforms (product filters, variations, and categories)

If Googlebot can’t keep up, your most crucial or time-sensitive pages may be the ones missed.

What if you run a smaller site?

If your website has fewer than 500–1,000 indexable pages, crawl budget likely isn’t a major issue. Google can typically crawl all areas of smaller sites with ease.

Instead, focus on what’s preventing indexing:

Pages blocked by noindex or misused canonical tags
Weak internal linking
Thin, duplicate, or low-value content

Pro tip: Use the Pages report in Google Search Console to spot excluded URLs and quickly detect indexability problems.

How Google calculates crawl budget

Google uses two primary factors:

Crawl Demand – How much Google wants to crawl from your site
Crawl Capacity Limit – How much your server can handle without slowing or breaking

What drives crawl demand?

Perceived inventory: If your sitemap lists 40,000 URLs but internal links only show 3,000, Google may assume the rest aren’t important or don’t exist.
Popularity: Pages with backlinks or engagement get crawled more frequently.
Freshness: Frequently updated pages signal value—Google will prioritise revisiting them.

What limits crawling?

Crawl health: If your site is slow or unstable, Google will crawl less.
Google’s crawl limits: Google itself sets a cap to avoid overloading your server.

Think of it as:

Crawl Demand × Site Capacity = Crawl Budget

Signals that influence crawl budget

Google doesn’t crawl everything equally. Some signals say “skip this”; others shout “important!”

Key signals include:

Robots.txt: Tells Google what not to crawl.
Noindex tags: Tells Google not to index a page, even if it’s crawled.
Canonical tags: Prevent duplicate pages from consuming budget.
Sitemap entries: Highlight priority pages you want Google to find.
Internal linking depth: Pages closer to the homepage are seen as more important.

What wastes crawl budget—and how to fix it

Think of crawl budget like energy. The more you waste on low-value or duplicate content, the less gets spent on valuable pages.

Here are the biggest crawl budget wasters:

1. Duplicate pages

Problem: Google crawls multiple pages with identical or very similar content.
Fix: Use canonical tags or set unimportant duplicates to noindex.

2. Broken links and soft 404s

Problem: Wastes time crawling non-existent or useless pages.
Fix: Clean up internal links, redirect removed content properly, and update your sitemap.

3. Orphan pages

Problem: Pages that no other page links to, making them hard to find.
Fix: Link to them from relevant pages or remove them if outdated.

4. Faceted navigation

Problem: Thousands of filter-based URL combinations dilute crawl budget.
Fix: Block them in robots.txt, manage parameters in Search Console, and use canonical tags.

How to monitor crawl activity

Use Google Search Console (GSC) for direct insight into Googlebot’s behaviour.

1. Crawl stats overview:

Go to your GSC property
Click “Settings” → “Crawl Stats”

You’ll see:

Total Crawl Requests
Download Size
Average Response Time

2. Overtime charts

Visual trends of crawl activity over 90 days. Look for spikes, dips, or changes.

3. Host status

Check whether your server is healthy or struggling to handle Googlebot.

4. Crawl request breakdown

By:

Response Code (e.g. 404s, 301s)
File Type (e.g. HTML, JavaScript, CSS)
Purpose (Discovery vs Refresh)
Googlebot Type (e.g. Smartphone vs Desktop)

Red flags:

Lots of “Discovered – not indexed” or “Crawled – not indexed” messages? Could mean crawl budget is being misused.

Pro tip:

Use log file analysis tools like Semrush Log File Analyzer, Botify, or OnCrawl for deeper insights.

Match crawl data with performance pages—if high-converting URLs aren’t getting crawled, it’s time to optimise.

Final thoughts: Is your crawl budget working for you?

Crawl budget plays a vital role in how your content gets discovered and ranked. When Google prioritises the right pages, your SEO efforts go further.

Start by checking what Google can already see. Use a SERP checker to spot gaps, and take control of how crawl budget gets spent across your site.

Crawl budget basics – Why Google isn’t indexing your pages