We had this problem with Tutorbase where Google stopped indexing our pages. We tried everything with sitemaps and robot.txt, but that wasn't it. Google just needed clearer signals about what was important. We started adding more links between our articles and updating the popular ones with new info. Once people started clicking around on those key pages, Google noticed and came back to check them out more often. For your site, I'd try refreshing a couple of your best articles. It's a quick test to see what gets Google's attention again.
Nothing's more annoying than watching your articles vanish from Google when everything looks fine technically. I see this happen a lot with the sites I host. Usually it's server response time - Google hates slow sites. Check your uptime logs and core web vitals first. One client fixed their server lag and their indexed pages bounced right back. If that's not it, try resubmitting your best articles through Search Console to see if that wakes Google up.
Yeah, I've seen this happen after big content updates. What fixed it for us was manually checking batches of URLs in Search Console and adding structured data to every single article. When we did that at Superpower, our tech posts started showing up again and our crawl budget wasn't getting wasted. My advice is to get really specific with your schema and use the Inspection tool. The problem is rarely just technical, it's about whether Google thinks your content has any authority.
I've seen this de-indexing happen before. It usually comes down to missing structured data or pages that are too slow for Google to crawl. My team found that improving Core Web Vitals, like compressing images and cutting back on scripts, really helps with crawl rates. We always review the NewsArticle schema and resubmit sitemaps after making changes. I'd check your page speed and structured data first. That's your best shot at fixing it.
Adding NewsArticle schema to our healthcare website didn't do much at first. But after we made sure to include the author, date, and full article text every single time, Google started indexing way more content over a few weeks. You also have to watch your page speed. If articles load slowly, Google just won't bother crawling them. Honestly, you need to do both the schema markup and the speed fixes together. That's what actually works.
From running big content sites, I've seen missing or broken schema, like NewsArticle, tank your indexation. Just tweaking our sitemap didn't do much at first. But once we added detailed structured data and made sure pages loaded fast, things picked up. Google crawls fewer articles if they're slow, so run a speed audit. You need to get both the technical signals and the content right. That's where I'd put my focus.
Based on my experience recovering from a similar issue on a news website, you may want to consider strategically no-indexing your thin content articles rather than just modifying or deleting them. This approach helped restore search rankings when dealing with mass de-indexing problems. The key is being selective about which articles to no-index based on content quality and value. This strategy successfully resolved the indexing issues and brought rankings back within a couple of months.
News sites often publish fast content which makes it easy for Google to mark pages as low value. When articles do not offer fresh insights the crawler may see them as less useful. Adding more depth to major stories can lift the overall quality of the site. Clear headlines and simple summaries can also help guide search engines more effectively. Google also reviews the health of the site and assesses the strength of each section. If many parts have thin stories the crawler may reduce its attention. Removing older articles that no longer hold value can help build stronger signals. A steady plan for content growth and internal linking helps crawlers find pages worth bringing back.
CEO at Digital Web Solutions
Answered 4 months ago
A news site with many pages removed from search may be sending weak quality signals. Google wants to see originality in every story so short stories that feel too similar can slip out of the index. Adding clear insights and simple context helps each page stand out in a crowded space. When this happens the system starts to trust the site and holds the content for longer. Crawl budget pressure also affects indexing on large news sites. Many pages can stretch what Google wants to crawl in a day, which slows the process. A strong internal linking plan helps crawlers reach the most important stories first. Removing repeated or low value articles lifts the overall quality and supports better indexing over time.
When you see a massive drop in indexed pages despite perfect technical configurations, you are almost certainly facing a quality classification issue rather than a technical error. In building large-scale data retrieval systems, we design filters to prioritize signal over noise to save computational cost. Google does the same. The status you are seeing essentially means their systems analyzed your content and decided it did not meet the threshold to be stored in their index. You cannot fix this by tweaking a sitemap or a robots file because the door is not locked. The system simply does not want what you are bringing inside. The mistake here is treating a content problem like a code problem. You mentioned fixing thin content, but in the era of AI-driven search, thin does not just mean short word counts. It refers to a lack of information gain. If your news articles merely aggregate facts found elsewhere without adding unique data, expert analysis, or original reporting, modern algorithms categorize them as redundant. When the ratio of low-value to high-value pages tips too far, the entire domain suffers. The system stops trusting the source, so it stops indexing pages to conserve resources. I remember a project where we were ingesting millions of documents to train a specialized language model. We spent weeks debugging the ingestion pipeline, thinking the code was broken because the model performance remained flat. Eventually, we realized the pipeline was fine, but the source data was derivative and circular. We had to purge forty percent of our dataset to get the model to actually learn. Search engines operate on similar logic. You do not need more technical fixes. You likely need to prune the dead weight and demonstrate that your remaining content offers something no one else has. Sometimes the only way to grow is to cut back until quality is undeniable.
When a news site drops from 1500 indexed articles to around 100 in a week, it's almost never robots.txt or sitemap issues. What I've seen in these cases is that Google is throttling indexation because it does not trust the site's overall quality signals. News sites get hit hardest because Google evaluates them as a whole, not page by page. If too many thin or low value posts pile up, the crawler stops promoting new URLs into the main index. The real issue is sitewide quality, not technical settings. I've seen this happen when a site pushes lots of short rewrites, repeats the same topics, or has weak E E A T. Google crawls, but refuses to index to protect SERP quality. You also need strong author pages, clean category structures, and internal links that connect every article to related stories. The quickest fix is to unpublish thin posts, improve your best performing categories, and release a batch of truly original articles. That usually restores trust and brings indexation back.
In my experience, when a news site with 1,500+ articles suddenly drops to around 100 indexed pages, it's almost never your robots.txt or sitemap. What I've seen with publishers is that the 'Crawled, currently not indexed' issue usually signals a sitewide quality problem. Google slows indexing when too many URLs look thin, repetitive, or low-engagement, even if the technical setup is clean. For one client, nearly half the archive was under 250 words and reused the same templates. We deleted weak posts, merged overlapping stories, and fully rebuilt the top 200 evergreen articles with stronger headlines, authorship, and internal links. Indexing started returning within four weeks. My advice is to stop making isolated page fixes and run a full quality audit. Prune aggressively and strengthen your best categories so Google can see a consistent, high-quality signal again. That's usually what unlocks indexing.
When you see "Crawled - Currently Not Indexed" even after optimizing content and fixing technical issues, it usually means Google's quality signals are flagging your site. At this stage, it's important to seriously evaluate your site's overall Experience, Expertise, Authoritativeness, and Trustworthiness (EEAT) as Google may see many of your articles as too similar or lacking unique value compared to competitors. Focus on making your best 100 indexed articles stand out with unique, valuable content, and strengthen internal linking to improve their authority. Also, ensure your site offers a great user experience with no annoying ads or mobile issues. This is a slow recovery process that requires proving your site's real value consistently over time. Keep refining and be patient, you'll see better indexing as your site quality shines. If needed, clear out thin, duplicate, or low-value content, use canonical tags properly, and request re-indexing for improved results. This approach aligns with how Google assesses which pages deserve to rank and appear in search results.
When a news site suddenly drops from thousands of indexed pages to only a few hundred, the real issue is almost always sitewide quality, not technical settings. Even if Google crawls your pages, it won't index them if too many articles look similar, thin, or disconnected. The fix that works best is cleaning up the structure: stronger internal links, fewer orphan pages, and a focus on your top categories. We also improve a small group of key articles with deeper insights and remove low-value tag pages. This combination usually brings indexing back, because Google sees clearer signals of quality and a healthier content structure.
Your misunderstanding is that Google is attributing the issue to an Accessibility problem when it's actually telling you that 90% of your total content did not meet the E-E-A-T threshold set by Quality Raters. Therefore, the first thing you must do is stop modifying your website and implement a Strategic Content Consolidation and Pruning Plan. This plan entails identifying low-value, duplicate, or unoriginal content and significantly revamping those articles or removing them entirely from the index (with a 410 or no-index command). Second, focus on strengthening the ranking of the remaining 100 articles by building a solid internal linking structure that directs all link equity to those high-quality pages. By providing your audience with a high concentration of valuable information, you signal to Google's algorithms that your site is an extremely authoritative resource, subsequently raising your indexation threshold.
"Crawled - currently not indexed" affecting hundreds or thousands of pages in a short time is rarely caused by a technical issue. Instead, it signals a site-wide quality or trust problem from Google's perspective, especially for news websites. News publishers are heavily evaluated on EEAT (Experience, Expertise, Authoritativeness, Trustworthiness). If Google doesn't see strong EEAT signals, large amounts of your content may stop being indexed. This often happens when articles are very similar to existing content online, rely heavily on AI generation, are published too quickly without depth or are too thin to provide unique value. You also need to consider publishing volume and speed. Publishing a high number of articles too quickly can overwhelm your crawl budget. When this happens on a site that Google already views as low-quality, indexing can drop even further. Internal linking also plays a role. Pages with weak internal links, no contextual links or that sit too deep in pagination are much more likely to remain non-indexed. If your site is losing indexed pages rapidly, the issue is almost always related to overall content quality, trust signals and discoverability.
"Crawled currently not indexed" at that scale usually points to a quality and duplication problem, not a technical one. When a news site with 1,500 articles drops to around 100 indexed, Google has likely re evaluated the overall value of the archive after a core or helpful content update. I would stop tweaking robots, sitemap, and minor on page elements and instead run a brutal audit in the style that John Mueller often hints at. Remove or noindex tag pages, thin rewrites, near duplicates, syndicated pieces, and low value archives that add nothing beyond what large publishers already cover. Next, tighten your internal linking so that your best stories form clear topical hubs. For example, create strong pillar pages around key beats and link your most original coverage into those rather than letting everything hang off date based archives. In Google Search Console, pick 20 high quality articles, improve them with unique reporting, original images, and clear bylines, request indexing, and watch whether they come back. If even your best work is not being indexed, you have a site level trust problem and need to focus on E E A T signals. That means consistent author profiles, transparent masthead, clear ownership, and visible editorial standards, not just technical tweaks. I have seen news publishers recover only after cutting a large portion of their inventory, consolidating overlapping stories, and publishing fewer but more differentiated articles. The goal is to convince both Google and users that every indexed page deserves to exist. Once the quality bar is obvious across the site, indexation begins to climb again and the "crawled currently not indexed" warnings start to fade.
Been there. The publishing client experienced a sudden decline that turned out to be trust-related rather than a technical issue. Google had updated its quality assessment system, which ended up flagging a large amount of AI-generated or low-quality archive content. Fixing robots.txt, sitemaps, and meta tags won't help if Google now sees your bulk pages as unhelpful or lacking authority. What worked for us was updating twenty essential articles through thorough human rewriting, adding genuine information, real images, and direct quotes. At the same time, the team removed over 300 pages that hadn't seen traffic in the last six months. After about three weeks, Google Search Console began reprocessing the content. The key was to reduce the site to only essential, valuable content, improve E-E-A-T on author pages, and build trust with authentic work. It took real editorial effort to restore trust--technical fixes alone weren't enough.
When a news website with hundreds of articles suddenly faces mass de-indexing, the issue often goes beyond technical fixes like robots.txt or sitemaps. The "Crawled Currently Not Indexed" status in Google Search Console usually signals a quality or trust problem rather than a crawling error. From my experience in SEO consulting, here are the key areas you may be missing: Content Quality & Originality: News sites are vulnerable to thin or duplicate content. If many articles are short rewrites of syndicated stories, Google may deprioritize them. Expanding coverage with unique insights, analysis, or multimedia can help. Site Authority & E-E-A-T: Google evaluates expertise, author transparency, and trustworthiness. Ensure author bios, clear sourcing, and consistent editorial standards are visible. Internal Linking & Crawl Signals: Even if your sitemap is correct, weak internal linking can make articles appear isolated. Strengthen category pages and topical hubs to signal relevance. Indexing Limits & Crawl Budget: Large sites can hit crawl budget constraints. Focus on pruning low-value pages (tag archives, duplicate categories) so Google prioritizes your core articles. Technical Health: Check for slow load times, mobile usability issues, or accidental noindex tags. Even small errors can cascade across thousands of pages. The key takeaway: indexing is not guaranteed. Google rewards sites that demonstrate consistent value, authority, and technical health. Beyond fixes, invest in editorial depth and trust signals to regain indexation.
From my side this kind of crash usually means it is not a sitemap or robots issue anymore. I read "crawled currently not indexed" as Google saying I can see it but I do not trust or need it. If a lot of articles feel thin repeated or too similar to other sites the whole domain loses index priority. Then I look for silent bloat that drags everything down. Tag pages author archives pagination and URL variants can create thousands of low value URLs that get crawled first. That dilutes the site and makes even good stories look less important. I also check for near duplicate posts on the same topic instead of one strong updated article. Next I would clean and refocus instead of tweaking more tech. I would pick a few hundred best posts and deepen them with real value and link them from strong category hubs. I would noindex or merge low value archive pages and thin updates so the crawl budget goes to real articles. In my experience that quality and pruning combo is what brings pages back.