Hi there, here are my takes on your questions: 1. Infrastructure Capacity Infrastructure handles the volume, but economics don't work. HTTP was built for human-paced requests, agents generate 100-1000x volume with parallel requests. CDNs process trillions of requests already, so the cost, not technical capacity. Human traffic costs pennies per thousand requests, agent traffic runs into pounds. 2. Bottleneck Hierarchy Individual websites are the primary constraint. APIs are rate limiting, search engines handle volume. But standard WordPress on shared hosting fails with 10 concurrent agent sessions. 3. Bot Detection Impact Current failure rates: 40-60% for first-attempt agent scraping on major sites. Agents pivot to "authorised sources only" - hitting partner APIs, not the open web. Real-time reasoning becomes "works with integrated platforms", not universal access. 4. Accuracy Under Restrictions Most agents use cached page versions - potentially days, weeks, or months old. They won't flag confidence scores on restricted data, serving outdated cached content without indication. I encounter this constantly with my daily newsletter automation. 5. Indexing Changes We're seeing a split: agent-readable web (structured, API-based, paid) versus human web (JavaScript-heavy, ad-supported). Search engines have separate agent indices. Google's AI Overview pre-negotiates bulk access agreements rather than crawling. 6. Technical Workarounds Solutions include request pooling, federated caching, micropayments (Cloudflare's developing this), and granular robot.txt controls. Most viable: proxy aggregation services batching agent requests and maintaining compliance. 7. Power vs Standards It's economic negotiation, not technical. Structured formats exist, payments work, authentication's solved. Facebook won't give free agent access to data they sell for millions. Standards discussion masks the real issue: pricing and access control. 8. Open Web Changes Open web becomes the showroom, actual data transactions via private APIs and MCP. Shift from public library to data marketplace. The interesting point will come when users start to distrust AI overview results after discovering outdated information, transforming search behaviour.
I've seen APIs stall and websites slow down when they get heavy automated traffic, and those waves were much smaller than what autonomous agents will send out. The current web isn't designed for that kind of load, so it will have to change. More data is moving behind APIs because that helps keep servers steady, but it also chips away at the idea of an open web. The weakest points are usually on individual websites. Most sites aren't built for constant machine requests, so they answer with captchas, throttling, or bot filters. Search engines have stronger setups, but they still limit crawling to protect bandwidth. APIs can take more load because they are controlled and metered, but that also means whoever owns the data decides who gets access. When scraping or direct traffic gets blocked, AI agents lose reliability right away. They often fall back on cached or older data, so there's a lag between real events and what people see. Some systems mix a few live queries with stored data to smooth it out, but once access closes off, accuracy drops. That lag will only grow as more sites tighten entry. Indexing is already changing. The old style of crawling and keeping everything open is fading. Private datasets and gated APIs are taking its place. The agents that last will be the ones built to work within those walls instead of trying to break them. Engineering tricks like caching, batching, and routing can make traffic easier to handle, but the real issue comes down to control. Site owners want to manage their data and limit costs, while agents need access to stay useful. What is forming is a divided web, with a smaller open layer on one side and bigger private ecosystems on the other. That divide isn't far off. It is already starting.