Why You Need to Monitor Legitimate Bot Traffic

Identifying and preventing malicious bot traffic has long been mission critical. According to Imperva’s 2025 Bad Bot Report, automated traffic accounted for 51% of web traffic in 2024. As teams continue to focus on bad bots and their impacts, they’ll also need to consider so-called “good” bots such as agents and crawlers. These bots account for more than 25% of all bot traffic, and they’re interacting with your web properties and shaping business outcomes in ways you might not even realize.

Search engine crawlers index your content while AI-powered assistants scrape your pages to train language models. Meanwhile, agentic AI is on the cusp of further transforming the internet. Some of these bots drive revenue to your business while others extract value without any return. The difference between the two can mean millions of dollars in lost revenue or missed opportunities.

The challenge is no longer just about detecting and mitigating bad bot behavior. You also need to understand what legitimate bots are actually doing on your site, how their behavior is changing over time, and whether your current bot policies are serving your business objectives. You need to improve flows and maximize cost efficiency for bot traffic that benefits your business, all while determining how to mitigate so-called bot traffic that’s not necessarily malicious but nonetheless damages the bottom line. Comprehensive visibility into these bot traffic patterns is already a requirement, and will be even more imperative in the age of agentic AI.

Shifting from Bot Blocking to Bot Strategy

Traditionally, bot management has been a binary decision: allow or deny. For example, search engine bots are allowed because they drive traffic. Meanwhile, spam bots are blocked because they degrade service. Malicious behavior is blocked regardless of whether it’s due to humans or not. The rules used to be fairly straightforward, even if discerning between humans and bots (that conceal their origin) has long been more complex.

However, AI has complicated this equation, and not just because AI is a powerful tool for malicious actors launching volumetric attacks while concealing their origins as bots. Crawlers collecting data for training models, including LLMs, agentic AI, and other machine learning use cases, consume massive amounts of content. For some organizations, AI bots represent pure extraction: enterprises scrape content to serve the needs of their own LLMs and users without returning value. For others, the value proposition remains unclear.

Consider a service like OpenAI. Users are increasingly using LLMs to build awareness and evaluate products, and GEO (generative engine optimization) is quickly becoming as important as traditional SEO. But with LLMs answering questions and engaging in full conversations, users will spend less time on your site engaging with your content, and you have considerably less control over how they interact with your brand. What happens when your intellectual property is repurposed? Or your logos and brand identity end up in image generators?

This is just the tip of the iceberg. As agentic AI services interact with your web properties on behalf of human users, they can be used to purchase your services, but they can also engage with and recommend competitors, misrepresent your content, and otherwise use your content in unintended ways.

The executives responsible for bot strategy, such as CISOs, CTOs, and Chief Revenue Officers, face a fundamental challenge. They need to move beyond reactive blocking and develop informed, organization-wide bot policies that harness the benefits of “good” bots while mitigating negative impacts. However, most bot management products aren't designed to support long-term strategic decision-making. They're focused on operations, designed to mitigate bad bot behavior, and typically include short retention periods, often thirty days or less. This makes it nearly impossible to understand long-term patterns, understand whether your strategy is working, and prepare for a future shaped by agentic AI.

Real-World Use Case: Should You Protect or Share Content to Increase Revenue?

Let’s take a look at a prominent example: publishers and media outlets. AI bots represent an existential threat to their business model. Traditional search engines indexed content and referred users back to publishers through search results. This drove traffic, which generated advertising revenue, subscriptions, and conversions.

In contrast, AI bots scrape content to train models or generate responses, ultimately competing with publishers for audience attention. Users consume this content through an LLM like ChatGPT or Claude instead of visiting the web properties of the original outlet or publisher. Lower traffic means less lucrative advertising, fewer subscriptions, and less revenue.

Leaders responsible for generating revenue are tasked with protecting these revenue streams while maintaining beneficial relationships with traffic-driving platforms, whether that’s a traditional search engine or an LLM. They need to distinguish between “benign” bots that drive value and bots that extract it. This requires visibility into which bots are accessing their properties, what content they're consuming, how frequently they're crawling, and whether they're respecting crawl policies like robots.txt.

As an example, The New York Times and Chicago Tribune recently sued Perplexity, an AI search engine. As the article discusses, AI companies are testing the boundaries of fair use and extracting as much content as possible from publishers. The challenge for leaders at these enterprises is immense: do they block this traffic, engage in costly legal battles that may not be successful, or otherwise limit how their content is accessed? Or do they sign licensing agreements and create more permissive policies for bots and agents? Both approaches could lead to loss of revenue and unexpected costs, and there is a wide spectrum between each extreme. Leaders can only make these decisions if they have long-term data about bot behavior to better understand trends, build support for legal cases, and update bot management policies. Regardless of the approach, the enterprises that attempt to do business as usual without adapting will be left behind.

The Strategic Visibility Gap: Long-Term Bot Insights

It’s not just publishers and content creators. Executives across industries are grappling with AI bot traffic. Unlike bots that clearly engage in malicious behavior, AI bots often occupy an ambiguous space. Neither the harms nor the benefits are fully clear, and this ambiguity makes strategic decision-making difficult.

Consider all the stakeholders that need to be involved in bot policy decisions. Marketing teams need to worry about SEO, overall traffic, and now GEO. Revenue teams focus on monetization opportunities. Legal departments consider intellectual property rights. Product and UX teams want to optimize user experience and site performance. Security teams need to assess and mitigate risk. Meanwhile, ops and financial teams are concerned about infrastructure costs. If 25% of all your traffic is “good” bots hoovering up content, that can lead to major CDN and cloud provider bills.

How do you bring all these stakeholders together, meet their needs while building consensus, develop a clear yet flexible policy that must continue to evolve, and prove that your policy is working? It all starts with the data.

And this is where most bot management solutions fall short. They're designed for operators who need real-time visibility to respond to threats, not executives who need to understand historical context and long-term trends. The thirty-day lookback window common to most bot management products isn't sufficient for strategic planning. You can't understand seasonal patterns, validate long-term policy success, or track how bot behavior evolves in response to your actions with such limited data.

One business might want to identify AI bot traffic to understand the scale of the problem before introducing any major changes. Another might want to identify potential security risks associated with AI bot exposure. And some businesses are already experimenting with bot-specific responses, using edge workers to serve different content to bots than to human users. Each of these strategies requires different kinds of visibility and analysis. All require an understanding of how AI bots are interacting with your content.

Building an AI Bot Strategy: The Questions You Need to Answer

Strategic bot management means being able to answer specific questions about your traffic over an extended period of time. Here are just a few of the questions you need to be able to answer:

What percentage of your overall traffic comes from bots (both good and bad)?
How has that percentage changed over the past year?
What proportion of “benign” bot traffic is AI-related versus traditional search crawlers? (GEO vs SEO)
Are bots being served cached content efficiently, or are they creating unnecessary load on your origin servers?
As agentic AI becomes more common, how are agents interacting with your web content?

When it comes to specific vendors, these questions quickly become even more nuanced.

How is request volume changing over time for different AI companies?
Are certain bots crawling more aggressively than others?
How frequently are they re-crawling your properties: daily, weekly, or monthly?
Are they requesting crawl policy files like `robots.txt` and respecting those policies, or are your policies being ignored?
Are agents from other enterprises delivering or extracting value?

The ability to segment benign bot traffic by category, such as search engines, social media preview bots, AI crawlers, and monitoring services, will help you craft more nuanced policies. You might welcome some agents that provide business value while restricting others. You could share some AI-optimized content with crawlers while restricting access to content intended for human users. Or you could maximize the value of licensing agreements by allowing AI companies that have negotiated licensing agreements while blocking others. All of these distinctions require granular visibility.

The Cost of Serving Content to Bots

Bot traffic comes with a cost. Every request consumes bandwidth, processing power, and often edge resources or origin server capacity. For organizations with usage-based CDN pricing, bot traffic directly impacts costs. Legitimate bots can run up infra bills and take up bandwidth. In addition to security teams dealing with denial of service attacks, you’ll have to deal with bots legitimately consuming too many of your services.

To manage costs, you need a complete understanding of the following:

Total requests from bots (by service and type)
Content being requested
Cache hit rates for bots (versus human traffic)

For example, if bots have a high cache miss rate and often retrieve large media files from origin servers, you’ll end up with higher bills with minimal ROI. And it’s not just about improving the cache hit rate and reducing your bill. It’s about quantifying the ROI of that bot traffic just as you would with human users. You can’t fully understand the return you’re getting if you don’t have a complete picture of the cost.

Building Bot Policies Designed to Evolve

The bot landscape is constantly changing. New AI companies launch while existing companies release new models. And just about every enterprise is adopting AI policies and trying to understand how they can harness the power of agentic AI. Effective bot strategy requires continuous monitoring and adaptation, not just as the landscape evolves but your own business priorities shift.

Once again, you need long-term data. In addition to helping you create new policies, you can use that data to understand if your updated policies are leading to the right outcomes. If you block a specific bot, you should be able to confirm that it's no longer accessing your content, just as you would with malicious bots. If you make architectural changes to your services, you should be able to verify that search engine bots can still reach and index your pages effectively. If you negotiate licensing agreements with an AI company, you should be able to track whether their crawling patterns change in response. And if you partner with enterprises using agentic AI to drive users to your business, you should be able to measure their success.

Long-term policy enforcement matters too. A bot that respects your `robots.txt` files initially might start ignoring them months later, or monitoring may reveal that new files need to be added or removed. An AI company that promised to limit crawling frequency might gradually increase it. Without continuous visibility over time, these changes go unnoticed until they become problems.

Evolving Your Bot Strategy

Monitoring and understanding legitimate bot traffic is a crucial facet of a larger strategy. The need for tools that detect malicious bot behavior is greater than ever. Mitigating and blocking this malicious activity will remain mission critical, especially as bot attacks increase in volume and become more sophisticated. Failure to mitigate these bots will lead to negative outcomes, such as brand damage, breaches, and lost revenue.

But you also need to have a deep understanding of so-called benign bot traffic. The impact of this traffic is more nuanced and ambiguous. Sometimes it should be blocked, sometimes encouraged. Failure to understand this seismic shift will lead to negative outcomes and missed opportunities. Because it’s not as simple as allow or deny. The next generation of bots and agentic AI can help transform your business and lead to new revenue streams.

Organizations that master strategic bot management will have a competitive advantage. They'll protect revenue streams while maintaining beneficial partnerships. They'll optimize infrastructure costs by serving content efficiently. They'll make informed decisions about which bots to welcome, which to monetize, and which to block. And they'll adapt as the bot landscape evolves, rather than being caught off-guard by changes in bot behavior or new categories of automated traffic.

The first step in this journey isn't blocking bots or negotiating licensing deals. It's gaining comprehensive visibility into what legitimate bots are actually doing on your properties. That means not just today or the last thirty days, but over the last year and beyond. And if you don’t have that data yet, start gathering it today. A year from now, you’ll be better positioned for success.

Hydrolix’s newest solution, Bot Insights, provides security operations teams, threat hunters, marketing organizations, web teams, and executives and delivery teams with dynamic visibility into traditional, malicious, and AI-driven bot and agentic behavior.

Why You Need to Monitor Legitimate Bot Traffic

Why You Need to Monitor Legitimate Bot Traffic

Shifting from Bot Blocking to Bot Strategy

Real-World Use Case: Should You Protect or Share Content to Increase Revenue?

The Strategic Visibility Gap: Long-Term Bot Insights

Building an AI Bot Strategy: The Questions You Need to Answer

The Cost of Serving Content to Bots

Building Bot Policies Designed to Evolve

Evolving Your Bot Strategy

About Todd Persen