Security analyst monitoring SIEM dashboard in modern control room with multiple screens
Publié le 17 mai 2024

The constant flood of SIEM alerts isn’t a rule-tuning problem; it’s an architectural failure stemming from indiscriminate data collection.

  • Default SIEM configurations are inherently noisy, generating thousands of low-value alerts that mask real threats.
  • Treating log storage as a cost centre, not an investment, forces teams to make difficult trade-offs between detection capability and budget.

Recommendation: Shift from a ‘collect everything’ model to a ‘signal-first’ data pipeline. Proactively filter logs at the source and define the economic and security value of every data point you choose to ingest and retain.

For a Security Operations Centre (SOC) Manager in a UK firm, the promise of a Security Information and Event Management (SIEM) system is clear: a single pane of glass for centralising security logging and detecting threats. Yet, the reality is often a deluge of notifications, with teams spending more time chasing ghosts than hunting attackers. You’re not alone if your new SIEM implementation has created more noise than signal, a phenomenon known as « alert fatigue. » Analysts become desensitised, and critical events get lost in the shuffle.

The standard advice revolves around reactive measures: endless rule tuning, constant baselining, and disabling noisy alerts. While these actions have their place, they are merely symptoms of a deeper, more fundamental issue. They treat the SIEM as a passive bucket for data, forcing analysts to sift through digital garbage to find threats. This approach is inefficient, costly, and ultimately unsustainable as data volumes explode.

But what if the key wasn’t better filtering at the end of the process, but a more intelligent architecture at the beginning? The most effective SIEM deployments are not built on collecting everything, but on a ‘signal-first’ philosophy. This strategy involves architecting a data pipeline where every log source is intentionally chosen, its purpose defined, and its value weighed against the cost of storage and analysis. It’s about engineering clarity from the start, not trying to find it in the chaos later.

This guide provides a technical, filter-focused framework for SOC Managers to move beyond chasing false positives. We will explore the architectural decisions, from data ingestion and storage economics to automation and reporting, that enable you to build a SIEM that serves as a high-fidelity detection engine, not a source of constant distraction.

This article provides a comprehensive roadmap for transforming your SIEM from a source of noise into a powerful security asset. The following sections break down the key strategic pillars required for this transformation.

Why is your SIEM generating 1,000 alerts a day for normal behaviour?

The primary reason your SIEM is overwhelmingly noisy is that most out-of-the-box rule sets are designed to be generic. They cast a wide net to avoid missing potential threats in any environment, but they lack the specific context of your organisation’s network, applications, and user behaviours. This « one-size-fits-all » approach inevitably flags legitimate, everyday activities as suspicious, burying your SOC team in a mountain of false positives. The scale of this issue is significant; research shows that over 59% of organizations receive more than 500 cloud security alerts per day, with many admitting that critical alerts are missed daily as a result.

This alert fatigue is not just an annoyance; it’s a critical vulnerability. When analysts spend the majority of their day investigating and closing benign alerts, their ability to respond to genuine threats is severely diminished. Further research highlights the source of the problem: a RedLegg study revealed that in many organisations, over 20% of all security alerts are false positives, primarily because teams are relying on default rule sets that were never customised. Without tuning, the SIEM doesn’t understand what « normal » looks like for your business, so it assumes the worst for every minor deviation.

The solution is not to simply disable rules, but to engage in a rigorous process of contextualisation. This involves establishing a baseline of normal behaviour over several weeks to understand your unique traffic patterns and user actions. Only then can you adjust detection thresholds and correlation rules to distinguish between truly anomalous activity and the routine operations of your business. This foundational work transforms the SIEM from a generic alarm system into a tailored detection instrument.

Action Plan: Implementing a Signal-First Filter Strategy

  1. Redefine « Alert »: Audit your correlation rules and reclassify them. An « alert » should only signify an event requiring immediate human action. Downgrade informational events to « observations » for later review.
  2. Inventory & Prune: Systematically disable default rules that apply to technologies, systems, or attack vectors not present in your UK-based environment. There is no value in monitoring for threats against systems you don’t own.
  3. Contextualise Thresholds: After establishing a baseline of normal activity (e.g., login failures, data transfer volumes), adjust rule thresholds to reflect your specific operational tempo, not a generic industry average.
  4. Implement Feedback Loops: Create a formal process for analysts to document false positives. Use this data during weekly or bi-weekly rule review sessions to continuously refine detection logic and prevent recurrence.
  5. Filter at the Source: Before logs even reach the SIEM, use forwarders and agents to filter out low-value, high-volume events (e.g., routine health checks, debug-level application logs). This is the cornerstone of log economy.

Ultimately, reducing false positives is an ongoing engineering discipline, not a one-time project. It requires a commitment to continuous improvement and a deep understanding of your own operational landscape.

How to decide which logs are worth paying to store?

The challenge of managing SIEM costs is directly tied to data volume. With some surveys finding a staggering 250% year-over-year growth in log data volume, indiscriminately collecting everything is no longer economically viable. To make informed decisions, you must adopt a principle of « log economy, » where every data source is evaluated based on its potential return on investment for security. This requires classifying logs into distinct categories based on their purpose: real-time detection, compliance, or forensic investigation.

Not all logs are created equal. High-value sources like firewall denies, VPN authentication logs, and endpoint detection and response (EDR) alerts are critical for real-time threat detection and justify the cost of « hot » storage within the SIEM. These logs are actively correlated and analysed for immediate threats. Conversely, verbose application logs or full packet captures might be essential for a deep-dive forensic investigation after a breach but are too noisy and expensive for real-time analysis. These belong in cheaper, « cold » storage solutions like a data lake.

This concept of tiered storage is crucial for balancing security needs with budgetary constraints. By implementing a data lifecycle policy, you can automatically move data from expensive, high-performance SIEM storage to low-cost archival storage as its value for real-time detection diminishes over time, typically after 30-90 days.

As illustrated, a tiered approach allows for a flexible and cost-effective data retention strategy. This model enables you to meet long-term compliance requirements without paying a premium for data that is rarely accessed. The following table breaks down the typical trade-offs between different storage models, demonstrating that a hybrid approach often provides the optimal balance of performance and cost for most UK firms.

SIEM vs Data Lake Storage Cost Comparison
Storage Type Cost Model Retention Period Query Speed Best Use Case
SIEM Hot Storage $25,000/month for 105TB 30-90 days typical Real-time Active threat detection
Data Lake (Object Storage) $2,400/month for 105TB Years of data Minutes to hours Compliance & forensics
Hybrid Model $8,000/month combined Hot: 30 days, Cold: 2+ years Tiered performance Balanced detection & investigation

The decision of what to log should be a deliberate, risk-based calculation, not a default setting. By treating log data as a strategic asset with varying levels of importance, you can build a powerful security apparatus that is both effective and affordable.

SIEM or SOAR: Do you need automation to handle the load?

Even with a well-tuned, signal-first SIEM, a certain volume of alerts is inevitable and necessary for effective security monitoring. The critical question for a SOC Manager is how to manage this remaining workload without overwhelming the team. This is where Security Orchestration, Automation, and Response (SOAR) platforms enter the equation. While a SIEM is designed to detect and aggregate potential threats, a SOAR platform is built to act upon them through automated workflows, or « playbooks. »

SOAR is not a replacement for a SIEM but a powerful force multiplier. It integrates with your SIEM, EDR, threat intelligence feeds, and other security tools to automate the repetitive, time-consuming tasks associated with initial alert triage. For instance, when the SIEM generates an alert for a potentially malicious file download, a SOAR playbook can automatically:

  • Query a threat intelligence platform for the file hash’s reputation.
  • Check if other endpoints have the same file.
  • Detonate the file in a sandbox environment to observe its behaviour.
  • If confirmed malicious, create a ticket in your ITSM and trigger an EDR action to isolate the affected host.

The impact of this automation can be transformative. A case study from Elastic’s InfoSec team, who integrated the Tines SOAR platform, provides a compelling example. Their automation workflow processes over 3,000 alerts per day automatically, saving an estimated 94 full-time employees’ worth of work. The system enriches alerts and closes over 50,000 in 30 days by performing initial checks, such as verifying if activity originated from a managed workstation, freeing up analysts to focus only on pre-vetted, high-priority incidents.

Case Study: Elastic and Tines SOAR Implementation

Elastic’s InfoSec team faced a high volume of alerts from their User and Entity Behavior Analytics (UEBA) module. By integrating Tines SOAR, they built a playbook to automate the initial triage. The system now automatically investigates and closes tens of thousands of alerts by checking contextual data points, like whether the activity came from a corporate-managed device. Each alert is investigated in seconds, enabling detections that would be far too noisy for manual review and preventing the need to hire nearly 100 additional SOC personnel.

For a SOC Manager, the decision to invest in SOAR depends on maturity. If you are still drowning in basic false positives from poor data quality, automation will only accelerate your ability to investigate garbage. However, once you have established a high-fidelity signal from your SIEM, SOAR becomes the logical next step to scale your team’s capabilities and dramatically improve your mean time to respond (MTTR).

The configuration silence that leaves you blind during an attack

While SOC Managers are rightly focused on reducing the noise of false positives, an equally dangerous threat often goes unnoticed: configuration silence. This occurs when a misconfiguration, a broken agent, or an expired API key causes a critical log source to stop reporting to the SIEM. In this scenario, there are no noisy alerts; there is only a deceptive quiet. Your team might assume all is well, but in reality, you have developed a critical blind spot that an attacker could exploit undetected.

Detecting an absence of data is fundamentally harder than reacting to a present signal. This is why robust SIEM management goes beyond rule tuning and extends to monitoring the health and integrity of the data pipeline itself. You must have alerts configured for when a vital log source—such as your domain controller, primary firewall, or cloud platform—hasn’t sent any logs for an unusual period. Similarly, monitoring parsing error rates is crucial. A sudden spike in parsing errors can indicate a change in a log format that is rendering new data unreadable and therefore invisible to your correlation rules.

This deep, contextual knowledge is what separates a mechanical SOC from an effective one. It reflects a sentiment captured perfectly in a USENIX security study on analyst perspectives. As one analyst stated:

It’s not just about what you see in the security events, or in the tools, it’s about what you know of the customer and the customer’s nature—business nature.

– SOC Analyst D6, USENIX Security Study on SOC Analyst Perspectives

This quote underscores that true detection capability comes from understanding the environment so intimately that you can spot not just what is present, but also what is conspicuously absent. A proactive approach involves regularly testing your detections using frameworks like MITRE ATT&CK to validate that your rules and log sources are working as expected. Without this continuous validation, your SIEM’s silence might not be a sign of security, but a symptom of blindness.

For a SOC Manager, building dashboards and alerts to monitor the SIEM’s own operational health is just as important as building them to monitor for external threats. This « meta-monitoring » is the only way to ensure your single pane of glass isn’t actually a one-way mirror.

How to turn technical log data into a risk report for the audit committee?

For an audit committee or the board, metrics like « 10,000 alerts triaged » are meaningless. In fact, high alert volumes can be misinterpreted as a sign of a chaotic or insecure environment, especially when studies confirm that only 28% of investigated security alerts are legitimate threats. To demonstrate the value of your SIEM and the SOC’s efforts, you must master the art of risk translation. This involves converting raw, technical SIEM outputs into business-relevant language that speaks to financial risk, operational efficiency, and compliance posture.

Instead of reporting on alert counts, focus on metrics that demonstrate tangible improvements in risk mitigation. Frame your team’s work in terms of business outcomes. For example, don’t just say « we tuned 50 rules to reduce false positives. » Instead, report that « by reducing false positive alerts by 40%, we improved our Mean Time to Detect (MTTD) for potential ransomware activity by 25%, significantly lowering the potential financial impact of a breach. » This connects a technical action directly to a business-critical risk.

Effective reporting for a UK-based firm should also map SIEM capabilities directly to compliance mandates like GDPR or sector-specific regulations. Create dashboards that show:

  • Compliance Coverage: The percentage of critical assets and data sources that are being logged and monitored in accordance with regulatory requirements.
  • Breach Detection Time: Track the time from the first anomalous event to confirmed detection, showing a downward trend as your rules and processes mature.
  • Analyst Efficiency: Demonstrate how automation or improved rule fidelity has increased the number of high-value investigations per analyst, showcasing a better return on your human capital investment.

Case Study: Translating SIEM Metrics to Business Risk Language

One organisation successfully demonstrated the ROI of its SIEM tuning efforts by moving beyond technical jargon. After a project that reduced false positives by 40%, they measured the direct impact on their ability to handle real threats. The result was a 25% improvement in their mean-time-to-detect ransomware attacks. By presenting this metric, they clearly communicated the value to the audit committee: faster detection directly reduces the « blast radius » of an attack, thereby minimising potential financial and reputational damage. This made the value of the SOC’s work clear without needing to explain a single correlation rule.

By framing your SIEM’s performance in terms of reduced risk, improved efficiency, and robust compliance, you transform the perception of the SOC from a cost centre into an essential business enabler that actively protects the organisation’s bottom line.

How to detect anomalies when traffic doesn’t pass through HQ?

In a modern, distributed workforce, especially common in the UK, the traditional security model of monitoring network traffic at the corporate perimeter is obsolete. When users access cloud applications like Office 365 or Salesforce directly from their homes, their traffic never crosses the central firewall. This creates a massive visibility gap for a SIEM focused on network logs. To regain control, you must shift your detection strategy from the network to the identity and the endpoint.

The new perimeter is defined by the user’s identity. Therefore, your primary data sources for anomaly detection must become identity-based logs. This includes signals from your Single Sign-On (SSO) provider (like Azure AD or Okta), Multi-Factor Authentication (MFA) systems, and the native logs from your core cloud applications. Correlating these logs allows you to detect sophisticated identity-based attacks. For example, implementing an « impossible travel » rule can alert you when a single user account logs in from London and then, five minutes later, from a different continent—a clear indicator of a compromised account.

Endpoint visibility is the second critical piece. Data from an Endpoint Detection and Response (EDR) solution provides rich context about process execution, file modifications, and network connections directly from the user’s machine, regardless of its location. By integrating EDR data into your SIEM, you can detect lateral movement or data exfiltration attempts that would be completely invisible to a network-based tool. The focus moves from North-South traffic (in/out of the perimeter) to East-West traffic (between cloud resources) and Up-Down traffic (privilege escalation within a cloud platform).

This table illustrates the fundamental shift in focus required for a modern, remote-first SIEM strategy.

Traditional vs Remote-First SIEM Focus Areas
Detection Focus Traditional Perimeter Remote/Zero Trust
Primary Data Source Network traffic logs Identity & endpoint logs
Key Threats External intrusions Credential theft, lateral movement
Alert Volume High from VPN noise Reduced through behavioral baselines
Investigation Speed Slow (multiple tools) Fast (unified context)

Ultimately, securing remote teams requires you to create a « virtual perimeter » for each user, built by correlating signals from their identity, their device, and their activity within cloud applications. This approach provides far greater context and fidelity than simply monitoring their VPN connection.

Warehouse or Lake: Which is better for a single source of truth?

As you formalise your log retention strategy, a key architectural question arises: should your long-term security data reside in a traditional data warehouse or a more modern data lake? While both can serve as a « single source of truth, » they are built on fundamentally different principles that have major implications for cost, flexibility, and investigative capability. A data warehouse uses a predefined schema-on-write approach, meaning data must be structured and parsed *before* it is stored. This makes querying fast but is rigid and expensive for the diverse, unstructured nature of security logs.

A data lake, in contrast, uses a schema-on-read model. It stores raw, unstructured data in a low-cost object storage format (like Amazon S3 or Azure Blob Storage). The data is only parsed and structured when you need to query it. This offers immense flexibility to handle any type of log format and is dramatically cheaper for long-term retention. Industry analysis shows that for 105TB of security data, storage costs can drop from as high as $25,000/month in a performance-tiered SIEM to just $2,400/month in object storage. This cost differential is a game-changer for meeting multi-year compliance requirements without breaking the bank.

However, the trade-off is query speed. Searching a massive data lake can take minutes or even hours, making it unsuitable for the real-time detection performed by a SIEM. This has led to the emergence of a hybrid « lakehouse » model, which offers the best of both worlds. This approach is exemplified by platforms like Graylog, which can keep recent, « hot » data in the SIEM for immediate correlation while using a data lake for long-term « cold » storage.

Case Study: Graylog’s Hybrid Data Lake Implementation

Graylog’s platform introduced a hybrid ‘lakehouse’ model to address this exact challenge. Security teams can keep 30-90 days of data in the core SIEM for real-time alerting. For older data, the system allows analysts to preview and retrieve logs directly from a connected AWS Security Lake without leaving the SIEM interface. This provides seamless access to years of historical data for forensic investigations while keeping it in low-cost object storage. This model directly addresses the rising costs of data breaches, where long-term retention is essential for post-incident analysis and compliance.

For most UK organisations, a pure data lake is too slow for active detection, and a pure SIEM/warehouse is too expensive for long-term retention. A hybrid approach, where the SIEM handles the « now » and the lake handles the « then, » provides the most balanced and sustainable architecture for a comprehensive security data strategy.

Key Takeaways

  • The root cause of SIEM alert fatigue is an architectural failure, not a rule-tuning problem.
  • Adopt a « log economy » mindset by filtering noise at the source and justifying the security value of every log you pay to store.
  • For remote workforces, shift detection focus from the obsolete network perimeter to a « virtual perimeter » built on identity and endpoint signals.

Why Traditional Perimeter Defenses Are Failing UK Remote Teams?

The traditional « castle-and-moat » security model, which relies on a strong network perimeter (firewalls, IDS/IPS) to protect internal resources, is fundamentally broken in the era of remote work and cloud adoption. For UK firms with distributed teams, the perimeter is no longer a physical location; it has dissolved and re-formed around every single user, wherever they are. Relying on Virtual Private Networks (VPNs) to extend this perimeter is a flawed strategy that often introduces more problems than it solves.

VPNs effectively provide an « all-or-nothing » access model. Once a user is authenticated on the VPN, they are often considered « trusted » and given broad access to the internal network. This creates a hard outer shell but a soft, vulnerable interior. If an attacker compromises a remote user’s credentials, they can use the VPN tunnel as a highway into the corporate network to move laterally and escalate privileges. Furthermore, from a SIEM perspective, VPN logs are notoriously noisy, generating a high volume of connection and disconnection events that add little security value and contribute heavily to alert fatigue.

The modern solution is to adopt a Zero Trust architecture. This model operates on the principle of « never trust, always verify, » treating every access request as if it originates from an untrusted network, regardless of the user’s location. Instead of focusing on where the user is connecting from, a Zero Trust logging strategy for your SIEM should focus on:

  • Verifying Identity: Enforcing strong, multi-factor authentication for every access request.
  • Validating Device Health: Ensuring the endpoint meets security posture requirements (e.g., patched, EDR running) before granting access.
  • Enforcing Least Privilege: Granting access only to the specific application or data the user needs, not the entire network.

This requires a fundamental shift in data collection for your SIEM, moving away from network-centric logs and towards a rich context built from identity providers, EDR solutions, and Cloud Access Security Brokers (CASB). By linking signals from these sources, you can build a dynamic, per-user perimeter that is far more resilient and provides higher-fidelity signals for true post-compromise detection.

To truly secure a modern workforce, it is vital to understand why these legacy perimeter-based approaches are no longer sufficient.

To apply these principles effectively, the next logical step is to audit your existing data pipeline and build a strategic log ingestion roadmap. Evaluate your current tools and processes to identify where you can filter noise at the source and begin your journey towards a signal-first security posture.

Rédigé par Tariq Ahmed, Tariq is a Chief Information Security Officer and certified GDPR Practitioner dedicated to protecting corporate data assets. With an MSc in Information Security from Royal Holloway and CISSP/CISM accreditations, he advises boards on risk management. He has 18 years of experience fortifying networks against cyber threats in the fintech and public sectors.