The Rise of Security Data Pipeline Platforms as Control Plane in SOC

Table of Contents

This is 75% ready and uploaded

Publication marked as "For Review"

Your scrolling text here

Author: Aqsa Taylor is the Chief Research Officer at SACR. She is a published author of two cybersecurity books and comes with a strong background in Cloud Security and SecOps.

Co-Author: Chi Aghaizu is Founding Engineer and Research Assistant at SACR with an experience in building AI platforms for security.

Market Map Image of Security Data Pipeline Platforms.

Quick Read:

Here’s quick insights on the report –

1. Market Consolidation Accelerates with Acquisitions

The SDPP market is entering a rapid consolidation phase as major SIEM, XDR, and observability providers acquire pipeline platforms to strengthen their data architectures. Recent deals include CrowdStrike acquiring Onum for about 290 million dollars, SentinelOne acquiring Observo AI for approximately 225 million dollars, Panther Labs acquiring Datable for an undisclosed amount, and now, Palo Alto Networks announced acquisition of Chronosphere, for $3.3B dollars (one of the biggest in the industry). These acquisitions reflect a clear industry trend. Large security and observability vendors are absorbing pipeline capabilities to overcome long-standing ingestion, normalization, and cost challenges within their own platforms. Buying is proving faster and more strategic than building, and this shift is moving the center of gravity in the SOC toward the pipeline layer.

But what does this mean for the vendor neutrality benefit that security data pipelines have long been known for? Practitioners express concerns about vendor neutrality, migration bottlenecks, and more in this report.

2. Security Data Pipelines Have Become the SOC Control Plane

Pipelines no longer simply move logs. They now govern ingestion, normalization, enrichment, routing, tiering, and data health. As a result, they have become the primary control plane of the modern SOC. Every downstream system relies on them for clean, consistent, and trustworthy telemetry. In this report, you’ll find an indepth valuation of core capabilities and emerging innovation in the security data pipeline market.

3. AI Is Becoming Essential for Pipeline Operations

AI adoption is practical, assistive, and explainable. Security teams want AI that handles engineering-heavy or repetitive work such as parser creation, schema drift correction, pipeline generation, baselining, and anomaly detection. Teams aren’t comfortable yet with autonomous decision-making in the SOC but strongly support AI within pipelines to reduce workload and increase consistency in pipeline operations.

4. Telemetry Health Monitoring Is Now Critical

Security teams express more fear of missing data than of noisy data. Pipelines now provide intelligent, continuous telemetry health: silent source detection, schema drift, volume anomalies, baseline deviation, noisy source spikes, and rerouting options based on destination failures. This monitoring in the data layer ensures the SOC never operates blind.

5. Shift Detections Left

Some platforms are pushing detections into the pipeline, performing lightweight IOC checks and early pattern recognition before events reach the SIEM. Practitioners value earlier context but note that response speed, not detection timing, often limits real impact. Learn more about how this trend is shaping impressions within the security community.

6. Pipelines Form the Foundation for AI Driven Security Operations

AI systems depend on high-quality, normalized, enriched, and complete data. Pipelines are becoming the preparation layer for AI copilots, LLM-based SOC assistants, advanced correlation engines, and autonomous triage. Without pipelines, AI performance degrades significantly. This makes SDPPs a strategic enabler for future SOC automation.

7. SDP PLUS Vision

In addition to core pipeline features, we are seeing a trend in which pure-play SDP platforms aim to expand horizontally across the SOC stack by taking on adjacent category capabilities beyond traditional pipeline functions. These include in-house data lake options with tiered storage, threat detection and analytics at the pipeline layer, federated search and querying across SIEMs and data lakes, observability convergence, and AI SOC-like capabilities.

Summary for Security Leaders

Security data pipelines are now the control plane of the modern SOC. They own data and deliver cost efficiency, improved data quality, faster investigations, cleaner enrichment, better telemetry reliability, and vendor-neutral routing. They are also becoming the data foundation needed for next-generation AI-driven operations. As acquisitions accelerate, the market is shifting toward two branches: standalone security data pipeline platforms and SDP capabilities within broader architectures. Either direction underscores the importance of the data pipeline as the most critical layer in the security stack.


Introduction

Before diving into this report, it’s important to set some context for readers who are learning about security data pipeline platforms (abbreviated as SDPP throughout the report). In Francis’s first report, The Market Guide 2025: The Rise of Security Data Pipeline Platforms, he introduced what these platforms bring to the world of security operations. It was the first analyst report focused on this category, even though the solution had existed for years. Their rise was largely driven by practitioner concerns about legacy SIEM platforms and ongoing issues with data quality. I explained the practitioner concerns in detail in my Convergence of SIEM Platforms report, where I also highlighted how two major SIEM vendors acquired security data pipeline companies to redefine SIEM capabilities through pipeline integration. Since then, there has been another acquisition by Panther of the Datable.io security pipeline platform. And as of today, November 19th, Palo Alto Networks announces acquisition of Chronosphere.I expect we’ll continue to see more of these acquisitions soon as SIEM vendors race to outpace legacy limitations and evolving practitioner concerns.

In this version of the report, we take a deep dive into the world of security data pipeline platforms, exploring how the category has evolved over the past year and the different directions vendors are now taking. The focus is on mapping how these platforms have matured in both capability and purpose, moving from basic data routing tools to core components of modern security architectures. This report expands on the original framework with a new layer of analysis that explores several key pipeline capabilities in depth, taking into account both the breadth and the maturity of the features.

The report further captures the different paths SDP vendors are taking, from those deepening their integrations with SIEMs as data routing platforms to those moving toward in-house data lake capabilities in a vision to rise as “SDP PLUS” platforms. It draws from practitioner conversations, customer interviews, and in-depth briefings to provide a grounded view of how these platforms are being adopted and adapted within real SOC environments.

Revisiting the SDPP Report version 1: How the Data Layer Became the Heart of the Modern SOC

In our first report, we defined what Security Data Pipeline Platforms are –

Security Data Pipeline Platforms (SDPP) are purpose-built systems that ingest, normalize, enrich, filter, and route large volumes of security telemetry across hybrid and cloud environments. These platforms sit between data sources (like EDRs, cloud logs, and firewalls) and destinations (like SIEMs, data lakes, XDRs, and analytics tools). Their goal is to optimize the flow and quality of telemetry data to reduce operational complexity and cost while increasing the speed and accuracy of detection and response.

When the Security Data Pipeline Platform (SDPP) report was first published, it drew attention to something that many security teams had quietly been feeling for years. SIEMs were reaching a breaking point, yet, not disappearing. As organizations collected more data, the traditional model of ingesting everything was becoming impossible to sustain. The report highlighted that this shift marked a deeper transformation in how modern Security Operations Centers (SOCs) would be built with the introduction of security data pipeline platforms in the data fabric.

At the center of the report’s findings was the rise of the SDPP. These platforms were described as a new foundational layer in the SOC, sitting between data sources and destinations like SIEMs, data lakes, and XDR tools. We called them the “security refinery” of the modern era because they clean, enrich, and route raw telemetry into structured, high-quality data that analysts can actually use.

It also made an important point about why this market was growing so fast. Data growth, rising compliance demands, and tool sprawl were all putting pressure on SOCs to find a more efficient way to manage telemetry. The report highlighted that SDPPs not only reduce cost but also improve the quality of data, helping faster threat detection.

We also highlighted how SIEM was evolving. Instead of serving as a single, monolithic system, the modern SIEM is shifting toward a modular architecture that separates storage from analytics. This new model allows data to live in cheaper cloud storage while being queried on demand, giving organizations the flexibility to scale without breaking their budgets.

The report clearly anticipated the convergence between pipelines, data lakes, and SIEM systems. It painted a picture of a security data fabric where ingestion, storage, and analysis would become part of one unified layer. For many in the industry, that idea shifted the conversation away from which SIEM to buy toward how to build the data architecture that supports it.

In short, Security Data Pipeline Platforms are becoming a must-have for modern organizations because they completely change how security data is collected, processed, and used. Here’s a deep dive into why they matter and how they have now evolved to become the control plane for SOC.

Acquisitions: A Wave of Consolidation Across SDPP Vendors

Good security depends on good data. Because of this, SIEM and XDR vendors are moving quickly to control the pipelines that clean, shape, and route telemetry. This shift marks the beginning of a new phase in the market, where data quality becomes just as important as detection or response.

Over the past two years, several major security and observability companies have acquired smaller pipeline and telemetry vendors. This trend shows that the industry now understands how important the data pipeline has become. Modern security platforms need high quality, well prepared data before they can deliver strong analytics or AI driven outcomes. This growing importance is evident in the push from large vendors to bring pipeline technology in house instead of relying on third parties.

In chronological order of announcements –

  • Tarsal: Tarsal was acquired by Monad to enhance its security operations and data management capabilities in July 2025 – one of the first acquisitions.
  • Onum: CrowdStrike acquired Onum for about US$290 million.
  • Observo AI: SentinelOne acquired Observo AI for approximately US$225 million (cash + stock) to enhance its data-pipeline/SIEM capabilities.
  • Datable: Panther Labs announced acquiring Datable, a security-data-pipeline platform. (Amount undisclosed)
  • Chronosphere: Palo Alto Networks announced acquisition of Chronosphere, a major observability platform with pipeline capabilities for $3.3B dollars, on November 19, 2025.
Acquisitions Trend in SDPP Market.

What Acquisitions Mean for Broader Security Platforms

These acquisitions show a clear trend: Security Data Pipelines are becoming the control plane of modern security operations. Vendors want to sit closer to the source of data because strong AI and strong analytics depend on clean, well-structured telemetry. Instead of competing on dashboards or detection content, companies now compete on data quality, consistency, and readiness.

This shift brings clear benefits for customers of the broader platforms – the SIEMs. Performance improves, noise decreases, and storage costs go down. The core message is simple. Whoever owns the data quality and routing, has a larger play in the modern, decoupled, SOC architecture. And hence, the pipeline layer is becoming the heart of the SOC and the operational place where teams decide what data matters, how it should be shaped, and where it should go. It governs data quality, routing, enrichment, and lifecycle management, shaping how downstream tools perform.The platforms that integrate natively with Security Data Pipeline platforms will define the next generation of modern analytics platforms.

These acquisitions confirm that the future of SIEM, XDR, and AI SOC technologies will rely on a strong, unified control plane built in the pipeline layer. Whoever controls this layer ultimately controls the quality, cost, and intelligence of the entire SOC stack.

Neutrality Concerns with Acquisitions

Although the acquisition strengthens the larger platforms, it raises concerns about neutrality for users of the security data pipeline platforms.

As more SDPP vendors are acquired by large SIEM, XDR, and observability platforms, security leaders are beginning to express clear concerns. The biggest worry is the potential loss of neutrality. Many organizations adopted independent pipeline platforms because they provided flexibility, transparent routing, and the freedom to choose or change destinations without friction. When these platforms become part of a larger ecosystem, their priorities may shift toward favoring the parent vendor’s integrations while downplaying independent capabilities. This can limit multi-destination routing, reduce portability, and recreate the very vendor lock-in that SDPPs were designed to eliminate.

There is also apprehension that innovation in the category may slow as acquired platforms are folded into broader product roadmaps. Independent SDPPs often moved quickly, responding directly to practitioner needs. Once inside a major vendor, development may be shaped by platform alignment rather than customer choice. For now, many of the acquired companies have shared that their vision is to continue supporting the standalone security data pipeline platform to its users without forcing lock-in. Whether this trend will continue to evolve, is to be determined.

The Evolution of Security Data Pipeline Platforms as a Distinct Category

Pipeline capabilities are sometimes merged into broader platforms such as SIEMs, observability tools, or XDRs. However, the focus of this report is primarily on what we refer to as “pure play” security data pipeline platforms.

Pure Play Security Data Pipeline Platforms

These platforms focus primarily on the data transformation layer between data sources and data destinations. We will see in the latter parts of the report, a trend where these platforms envision to take more of the adjacent capabilities gradually, we call that “SDP PLUS”, but in their current state, they still heavily fall under “Pure play” Security Data Pipeline Platforms.

Cribl (2018) stands as dominant Series E player with funding above $600M, and at $3.5B evaluation, embodying a broader shift from log routing to full security data pipeline platform. Cribl still stands at the center of the Security Data Pipeline Platform (SDPP) market as its most mature and influential leader both technically and commercially. Many of the practitioners we interviewed know the SDP market by Cribl’s name.

Emerging Entrants

In addition to Cribl, we’ve done an in-depth analysis of these emerging security data pipeline platforms in this report. In Alphabetical Order –

  • Abstract Security founded in 2023, raised $15M dollars in Series A in 2024.
  • Axoflow founded in 2023, seed round of $7M dollars in Jan, 2025.
  • Beacon Security came out of Stealth in November, 2025.
  • Brava Security – currently in stealth
  • CeTu founded in 2024
  • Databahn in 2023, $17M dollars in series A, in Jan 2025
  • Datadog launches Observability Pipelines – June 2022
  • Datable founded in 2023
  • Onum founded in 2022, acquired by Crowdstrike
  • Observo AI in 2022, acquired by SentinelOne
  • Tenzir, founded in 2017, raised $3.3M dollars in seed round
  • Realm Security founded in 2024, raised $15M in series A
  • VirtualMetric, founded in 2025, raised $2.59M in seed round

While early infrastructure players consolidated around observability and log management, the platforms are now focusing on security and attacking specific SOC pain points: data quality for security, ingestion cost, AI normalization, and cross-platform routing.

Growing Investment in the Security Data Pipeline Category

Chart showing growing investment in SDPP.

Insights

  • The average jump from Seed to Series A/D rounds across these vendors ranges from 4x to 10x in valuation, signaling strong investor confidence in SDPP platforms.
  • Consolidation is accelerating : Observo AI, Datable, and Onum were acquired by major security players (SentinelOne, Panther Labs, and CrowdStrike).
  • Palo Alto Networks announces acquisition of Chronosphere for $3.3B on Nov 19, 2025 .

Over the next three years, we anticipate capital will continue moving toward pipeline-driven ecosystems that combine telemetry management, AI readiness, and cost efficiency, forming the backbone of the next security data economy.

Security Leaders Voice

Now that we’ve recapped what happened during our last research, it’s time to mention what we learned these past months from many practitioner calls, security vendor in-depth interviews and product briefing and a questionnaire that covered every single detail of platform capabilities.

Security data pipelines began as a cost-saving broker. But they are now strategic policy engines for visibility, control, and agility. The practitioners adopting these platforms are not chasing hype around autonomous SOCs. They are building disciplined, deterministic systems supported by selective automation. The future of detection will belong to teams that control their data with the same rigor they apply to threat response and SDPPs are becoming an important layer that make this possible. Data control is the new detection.

Across industries, from financial institutions to managed service providers and industrial operations, the message is consistent. Security data pipelines are no longer considered back-end utilities. They are becoming the operational control plane for telemetry, cost management, and detection agility.

Practitioners entered this space to reduce log costs, but they stayed because of control. By centralizing routing, transformation, and lifecycle management, the pipeline has shifted from infrastructure to intelligence. One leader summarized it plainly: “We are not just compressing data anymore. We are deciding what matters and where it should live.”

These conversations show a shift from tool-centric thinking to outcome-centric design. Practitioners prioritize three things above all:

  • The ability to reduce data ingestion at SIEMs and automatically express transformations
  • Multi-tier intelligent routing that aligns storage cost with data purpose
  • Built-in observability that measures ingestion completeness and source health

Ease of management is becoming a key differentiator. Teams managing multiple customer environments (MSSPs) prefer centralized templates where one pipeline update can propagate across tenants. Smaller organizations prioritize alignment with their deployment style, especially infrastructure-as-code.

Expanding on these use cases, we asked practitioners to stack rank SDPP capabilities. And here’s what we found –

Budget Management as an Entry Point

The original motivation was budget pressure and it remains one of the biggest reasons. Teams set reduction goals without losing context, cutting ingestion volume while improving fidelity. One leader cited processing over three terabytes of daily data but forwarding less than half of that after filtering. The immediate benefit from SDP platforms is lower cost at destination, but the deeper change is operational freedom. In the words of one leader, “You cannot automate nonsense.” Poor data quality is still the most expensive problem in the SOC.

Normalization is the New Norm

Every practitioner began their modernization story with data normalization. They view consistent schemas as the precondition for any mature detection or analytics program. When normalization is right at the start, vendor content and correlation logic across the stack finally work as designed.

Intelligent Pipelines

Modern designs favor data adjacency rather than consolidation. Practitioners anticipate and prefer Pipeline platforms to adopt AI capabilities or intelligent routing with an understanding of data to direct data to the most cost-effective and policy-compliant storage, whether local, cloud, or cold archival, without binding analysis to a single ecosystem. These leaders want to govern the full data lifecycle, deciding what stays hot, what rolls warm, and what archives cold with clear rehydration paths when investigations begin.

Noise is not the enemy, silence is

A recurring theme across interviews was the danger of quiet systems. Practitioners worry more about missing telemetry than excessive alerts. They described the problem of dormant integrations, acquisitions without visibility, and logs that silently stop forwarding. The emerging use case is “silence detection,” where the pipeline monitors the health of every data source and flags anomalies in activity levels or schema freshness.

AI is Becoming Familiar

Industry is becoming more and more comfortable now with the idea of agentic AI or copilot capabilities, however, not really buying the “autonomous” messaging yet. Leaders emphasized they are not ready to hand decisions to autonomous agents. They do, however, welcome targeted automation that eliminates repetitive work. They want AI to generate parsers when formats change, to detect version drift, to cluster similar events, and to perform quality assurance on closed investigations. They want explainable automation, not invisible reasoning. The ideal is agentic assistance, not autonomous control – yet.

Shifting Detections Left, into the Stream

The idea behind this direction that a few security data pipeline platforms are indulging in, is to detect threats based on IOCs while data is streaming through your security data pipeline. By moving detections into the stream and closer to the pipeline, you bring detection logic closer to the source and avoid the post-index costs or latencies that occur at SIEM destinations. This results in faster threat detection and reduces MTTD with near real-time speed.

While the concept in theory sounds impressive, in our interviews with practitioners, it received mixed feedback. Some welcomed the visibility into threat earlier in the stream, but some suggested they didn’t see speed of detection in stream as their priority when speed of remediation is yet to catch up. Among those who saw value, organizations are experimenting with lightweight detection logic in stream, with an aim to add more context to the data that is routed to destinations. The goal is not to replace centralized analytics but to reduce dwell time and stage response earlier. Several teams already use the pipeline to automatically collect forensics when certain triggers appear, with detection in stream, the idea is to now surface these triggers without post-index delays saving time on detection and latency.

Acquisitions and the Question of Losing Neutrality

Early adopters embraced SDPs because they sat between systems and provided architectural control, flexibility, and cost savings without locking the customer into a single platform. That neutrality was the differentiator. Now, as major SIEM and data infrastructure players acquire pipeline companies or replicate their features, the market risks returning to the very vendor dependency that SDPs were meant to eliminate.

In the practitioner’s words, “we’re just going to end up back where we started, everything re-bundled under one large platform.” The most valuable providers will integrate broadly across SIEM, observability, and data lake layers while keeping control in the hands of practitioners. The differentiator will not be who owns the data, but who enables transparent, vendor-agnostic flow across it.

If pipeline vendors continue to prioritize openness and integration, they can remain the connective tissue of modern security architectures. If they instead chase full-stack ownership, they risk becoming another feature in someone else’s platform.

The Pipeline Becomes the Control Plane

Security data pipelines are often misunderstood as ETL for security, simple brokers that route data from point A to point B. But modern platforms deliver far more. They are becoming the control plane for how security teams manage, govern, and trust their telemetry across SIEM, detection, response, AI, observability and long-term analytics.

Security leaders today face rising data volumes, constant schema changes, noisy logs, silent data dropouts, inconsistent enrichments and mounting SIEM and data lake costs. Traditional ingestion or basic filtering cannot keep up. What is emerging is a new class of platforms designed specifically for security data. These platforms reduce noise without losing security context, normalize and enrich at scale, auto generate parsers, detect schema drift, monitor data source health including silent failures and apply AI to make the pipeline self optimizing.

Across the industry, what is clear is this: Security data pipeline platforms are moving from helpful optimization to the foundational control layer of the SOC architecture. They sit at the center of the architecture and shape how every downstream tool performs.

Below is a concise breakdown of the key capabilities these platforms bring and the innovations security leaders should watch as the category evolves.

Readers Note: Some of the features mentioned in the section below may only be offered by more advanced security data pipeline platforms. In the vendor section, you will find a detailed description and an in-depth evaluation that makes it easier for security leaders to compare and understand what each platform provides.

Core Pipeline Capabilities.

Core Pipeline Capabilities

Core pipeline capabilities across most security data pipeline platforms include the following

Advanced Data Reduction Beyond Simple Filtering

Data reduction is not just about shrinking data volume. In security, it means preserving investigative value while eliminating noise and unnecessary cost. This section covers how modern pipelines intelligently reduce data without weakening detection fidelity.

Early pipeline wins came from cost savings, but reduction has become much more intelligent than dropping fields.

What SDPPs can actually do

  • Context aware suppression that removes duplicates or repetitive events while preserving indicators and security context
  • Conditional reduction at both field and event level
  • Adaptive sampling that dynamically adjusts sampling rates based on peak ingestion times
  • Payload trimming that removes non security relevant metadata like verbose debug fields or oversized payloads that add cost but not value
  • Schema aware reduction that preserves detection relevant fields while trimming high volume noise
  • Summarization and metricization that convert chatty logs into compact metrics without losing investigative value
  • Priority based reduction that adjusts logic by log type
  • Real time shape correction that transforms data in proper formats before they hit downstream systems

Together, these techniques ensure data reduction is cost efficient and security aware rather than blind trimming.

Emerging Innovations

  • Dynamic reduction tuned by threat context or incident state
  • AI assisted reduction recommendations based on historical alerting patterns
  • User configurable reduction tiers aligned to detection criticality
  • Automated validation to ensure reductions never strip fields needed for investigations

Why it matters

Security teams reduce SIEM spend while keeping the fidelity needed for investigations. Leaders repeatedly said they want tools that help them do more with less without degrading security.

Normalization and Schema Discipline

Normalization ensures that every log source speaks a consistent language. This allows detections, analytics, and investigations to work reliably across diverse destination systems. Schema discipline prevents breakage and enables large scale correlation.

Nearly every practitioner called this the top priority.

What SDPPs can provide

  • Automatic normalization into standards such as OCSF, ECS, UDM or custom schemas
  • Schema drift detection when a data source silently changes formats
  • Automatic parser creation using AI for new versions and undocumented logs
  • Consistent field naming across all data sources to unlock SIEM content and correlation

Emerging innovations

  • AI generated parsers based on sample logs and intended destination schema
  • Automated detection of unexpected new fields or missing required fields
  • Version aware normalization that adapts when vendor log formats update
  • Normalization confidence scoring to flag risky transformations

Why it matters

If data does not show up clean and consistent, SIEM, XDR, SOAR, UEBA, AI SOC and detections all suffer. Good data unlocks the entire detection library.

Contextual and Threat Intel Enrichments

Raw logs lack the context analysts need. Enrichment adds meaning, giving logs identity, asset and threat relevance so alerts and queries become more accurate and actionable.

SDP platforms act as enrichment hubs, adding rich context in stream, to strengthen analysis at destination.

Examples

  • Environmental context such as GeoIp, cloud account or region
  • Identity context such as user, department and privilege level
  • Asset context including owner, business app and criticality
  • Threat intel matches for IPs, domains and hashes

Emerging innovations

  • Pre enrichment policies that vary based on log type or threat level
  • Inline lookup optimization for high speed enrichment
  • Automated asset tagging based on behavioral patterns
  • Dynamic enrichment paths that enrich only when detection relevance is high

Why it matters

Enrichment turns raw events into signals analysts can act on. By adding context such as identity, asset and threat context in the pipeline, teams reduce triage time, improve correlation quality and make AI driven use cases more reliable without adding extra steps later.

Intelligent Routing and Multi Tier Storage Control

Not all data should be treated equally. Intelligent routing ensures each log is sent to the right place at the right cost tier while maintaining flexibility across SIEMs, data lakes, and analytics tools.

This is where the pipeline becomes the control plane.

What SDP Platforms provide

  • Route hot, warm and cold based on log value
  • Split streams to multiple SIEMs, detection tools or cloud lakes
  • Apply different reduction schemas by destination

Emerging innovations

  • Price aware routing where users can split pipeline routes to choose storage based on cost differences across cloud providers
  • Vendor agnostic SIEM migration paths

Why it matters

Routing and storage decisions directly drive cost, performance and flexibility. Intelligent routing lets security teams control where data lives, keep hot paths fast for investigation and avoid being locked into a single SIEM or storage vendor.

Intelligent Integration Health Monitoring

Noise is not the only enemy, sometimes silence is a bigger threat. Security teams need to know not just what is happening, but whether critical telemetry is flowing at all times. Monitoring for noise, errors and silent dropouts ensures visibility gaps do not turn into undetected incidents.

What SDP Platforms provide

  • Detect silent quitting of sources – Integration health at source level
  • Monitor ingested volume against historical baselines
  • Alert on pipeline stalls or destination issues

Innovations to note

  • Automatic discovery of newly active or inactive sources
  • Health scoring of each data source over time
  • Behavioral baselines for normal telemetry flow
  • Automated response actions when a source goes dark
  • Detect sudden drops in fields or event types

Why it matters

If critical sources go dark or degrade, SOC metrics may still look healthy while real blind spots grow. Source and pipeline health monitoring makes telemetry reliability visible so teams can trust their coverage claims and respond quickly to gaps.

AI Assisted Pipelines

AI brings speed and automation to pipeline tasks that were historically slow, manual and error prone. Practitioners are now growingly comfortable with the idea of AI use within pipeline platforms, The high value application of AI on pipeline platforms is not to be compared as similar to autonomous SOC claims. Instead of replacing analysts, AI in the pipeline reduces operational burden by providing pipeline recommendations, accelerates onboarding of new data sources and strengthens data quality before detections even begin.

AI directly addresses several long standing pain points. First, onboarding new log sources is too slow, especially during platform migrations or when connecting new environments. Security leaders noted that the speed at which they can ingest and normalize data determines how quickly they can address issues at destination platforms like SIEMs. AI generated parsers and automated normalization drastically shorten the time from raw logs to usable telemetry.

Second, many leaders stressed that high alert volume is often not due to weak SOC workflows but because prerequisite work in data quality, clustering and correlation is incomplete. AI in the pipeline helps reduce this noise by automatically grouping related events, generating cleaner schemas and ensuring that logs arrive enriched and structured, which lowers the alert queue burden downstream.

Third, teams repeatedly warned about silent failures in telemetry. AI powered baselining and anomaly detection on data flow can identify when sources go dark, when formats drift or when volumes shift abnormally, addressing a critical visibility gap.

Early innovations

  • AI generated parsers
  • AI driven pipeline creation
  • Automated anomaly detection in transit
  • Semantic classification of log types

Innovations to note

  • Recommendations for pipeline optimization
  • AI analysis that validate schema drift and transformation status
  • Predictive detection of missing log integrations
  • Automated rerouting when pipeline detects destination health failures

Why it matters

AI assisted pipelines absorb repetitive engineering work and constant change in vendor formats. That frees scarce security engineers and analysts to focus on detections, investigations and architecture rather than plumbing.

Unified Security Data Control Plane

As capabilities converge, security pipelines are becoming the strategic control layer that governs how telemetry is shaped, enriched and used across the entire SecOps stack.

What it provides

  • Central governance of data quality
  • One place to enforce schema, reduction and routing policy
  • A foundation for consistent AI and analytics
  • Control plane APIs for external orchestration
  • Policy as code for data governance
  • Unified dashboards showing security, cost and performance impacts
  • Automated end to end lineage tracking for every event

Why it matters

Treating the pipeline as a unified control plane gives CISOs one place to govern data quality, cost and access. This foundation makes it easier to evolve tools, adopt new analytics and AI, and respond to regulatory or business changes without constantly reworking integrations.

When reduction, normalization, enrichment, schema governance, routing, data health and AI automation converge, pipelines become the central control plane for deciding how telemetry is used in end systems.

Recent acquisitions show that SIEM vendors understand this. The pipeline is the strategic chokepoint. Whoever controls the data layer influences the entire SOC stack.

Emerging Trends

Here are some emerging trends we see across some of these modern security data pipeline platforms –

Deployment and Distribution Flexibility

We see these platforms offering flexible deployment options, typically using a split model with a Control Plane and a Pipeline Engine. Many of these vendors also offer multi-tenancy to support MSSPs and large enterprises.

Deployment Flexibility of SDPP.

Advanced Normalization, Enrichment, and Context Fabric

Normalization and enrichment are becoming richer and more automated across vendors. Abstract normalizes into multiple schemas, enriches with identity, asset, vulnerability, and threat intel, and auto-corrects drift with ASE. Databahn transforms data into CIM, OCSF, UDM, ASIM, LEEF, and more using AI while enriching with STIX/TAXII threat intel. Axoflow auto-classifies logs, applies schema mapping, and enriches with metadata. Beacon aligns to ECS, OCSF, CIM, and UDM while combining cross-source context through Recipes. CeTu provides AI-assisted normalization with lookup enrichment and threat intel overlays. Brava enriches telemetry with attack-simulation context, relevancy scoring, and MITRE mappings. Cribl enriches data through lookups, Redis, GeoIP, and DNS, with schema drift detection forthcoming.

Intelligent Routing and Multi-Destination Control

Routing decisions are becoming value-based and policy-aware across vendors. Abstract recommends routing based on detection value and cost. Axoflow uses classification labels to drive automated routing. Beacon’s AI-guided posture directs logs to SIEM, data lakes, or cold storage based on importance. CeTu’s Zoe assistant selects routes tied to analytic relevance, cost, and detection needs. Databahn’s Cruz AI evaluates query patterns and detection impact to recommend tiering and routing paths. Brava routes high-efficacy logs forward while summarizing or filtering low-value data. Cribl Stream provides granular routing from any source to any destination, using Copilot to generate logic from plain language.

Integration Health and Coverage Insights

Vendors now offer deep insight into data coverage, stability, and silent failures. Abstract detects silent dropout, schema drift, and volume anomalies with automated parser correction. Beacon’s Logging Posture highlights missing telemetry and coverage gaps using its Collectopedia knowledge base. Databahn scores source health based on quality, completeness, drift, and destination stability. Axoflow alerts on missing sources, unexpected new sources, and message drops. Brava maps coverage gaps using attack simulation aligned to MITRE techniques. CeTu VISION analyzes SIEM coverage and highlights blind spots across environments. Cribl Insights surfaces backpressure, drops, latency, and health issues across Stream, Edge, and Lake deployments.

AI-Assisted Pipeline Management

AI is maturing into a core operational layer in nearly all vendors. Databahn emerged as a leader in their AI capabilities and maturity. Abstract’s ASE generates parsers, manages drift, builds pipelines, and enriches detections. Axoflow uses supervised AI for classification, schema mapping, and natural-language pipeline creation. Databahn’s Cruz automates parser generation, correction, routing intelligence, and ecosystem-specific model transformations. Cribl’s Copilot assists with schema mapping, routing logic, and query generation. Beacon applies agentic reasoning to Recipes, posture, schema mapping, and normalization. CeTu’s Zoe and DEPTH engines power routing decisions, drift detection, and pattern intelligence. Brava uses AI to evaluate telemetry efficacy through attack simulations and relevancy scoring tied to detection strength.

SDP PLUS Platforms

In addition to core pipeline features, we are seeing an emerging trend where pure-play SDP platforms are envisioned to expand beyond traditional pipeline capabilities. These include in-house data lake option with tiered storage, AI assisted capabilities, threat detections and analytics in the pipeline layer, federated searches and querying across SIEMs and Datalakes, Observability convergence and AI SOC like capabilities. Here are some of the features we saw among the vendors we analyzed.

SDP PLUS Platform Features.

Data Lakes and Tiered Storage

Vendors increasingly offer storage and replay layers that extend pipelines into long-term retention. Abstract provides Lake Villa with hot, warm, and cold tiers and real-time querying. Cribl offers Cribl Lake and Lakehouse for open-format retention and fast access to recent data. Axoflow includes AxoStore, AxoLocker, and AxoLake as part of its multi-layer storage design. Databahn offers optional tiered lake storage for customers needing centralized history. Brava supports seamless retrieval from low-cost storage directly through the SIEM. CeTu does not mandate a lake but provides a unified architecture that can route to object stores or archival platforms. Beacon avoids owning storage but enables routing to cold tiers across customer-controlled buckets.

Search, Querying, and Federated Visibility

Search and query capabilities are expanding directly from the pipeline layer. Cribl Search enables search-in-place across S3, Edge, Lake, and external object stores. CeTu offers cross-system querying that works across SIEMs and data lakes without a query language. Brava embeds natural language querying within the SIEM and retrieves cold data transparently. Databahn supports micro-indexing for rapid search across raw pipelines. Abstract allows real-time queries over normalized data through Lake Villa. Axoflow’s debugging and inspection tools show raw and parsed data side-by-side to support source comprehension. Beacon provides transformation previews and exploratory data analysis to validate pipeline accuracy.

Shifting Detections to the Stream

Threat detection in stream represents a shift toward identifying malicious activity as data flows through the pipeline rather than after it lands in a SIEM or data lake. The idea is to evaluate events in real time, applying lightweight correlation, IOC checks, and contextual signals before logs are indexed, which can reduce dwell time and provide earlier visibility into suspicious behavior. This approach also allows detections to carry enriched context downstream, improving the quality of alerts and investigations. While teams appreciate the speed and proximity to the source, most see in-stream detection not as a replacement for SIEM analytics but as a complementary layer that helps surface high-value signals, reduce noise, and begin the investigative process earlier. Vendors such as Abstract Security, VirtualMetric, acquired pipeline platforms and Tenzir already offer some threat detection capabilities. Realm Security is another entrant that plans to add this to its roadmap in the near future.


Evaluation Framework

Security data pipeline platforms deliver lower SIEM and storage costs at the minimum, but they also provide higher detection quality, better data governance, faster investigations, safer AI adoption, more resilient telemetry and freedom from vendor lock-in.

Most importantly, they shift teams from reactive ingestion problems to proactive control of the entire data lifecycle. They are not a sidecar tool. They are becoming the backbone of modern SecOps architecture.

In order to evaluate these vendors in depth, we conducted several deep-dive platform demos, used detailed questionnaires with linked evidence and screenshots to validate responses, and interviewed their customers to confirm our findings. Here are the broader categories under which the vendors were evaluated:

Evaluation Framework Categories.

Vendors

Vendor Ranking Disclaimer.

Disclaimer: The image above is not depicted by exact ranking. Please see the ranking below. Spreadsheet with Technical and GTM assessment. Note that in-depth details are not contained within the sheet to maintain clean format but the details are covered within in each vendor’s section to prove how these rankings were made.

From our in-depth analysis, we found the pipeline platforms to rank as below –
Overall Market and Category Leader: Cribl
Pipeline Leaders: Databahn, Datadog OP, Abstract Security, Observo AI, Onum
Emerging Leaders: Cetu, VirtualMetric, Tenzir, Axoflow, Datable, Realm Security
Innovators: Brava, Beacon Security
The following top vendors were evaluated with a thorough platform demo, in-depth questionnaire (the answers of which were verified via demo and screenshots) and direct customer / practitioner feedback.

In alphabetical order and no particular ranking, all details on vendors below —

Abstract

Abstract defines itself as a modular, vendor-neutral security data platform that merges pipeline flexibility with early-stage detection. Their focus is on “shift left” threat detection, bringing analytics closer to the data source before it reaches the SIEM layer. Abstract Security’s strength and core differentiator lies in its threat detection capabilities, delivered by in-house research group, ASTRO, which builds and maintains out-of-the-box detection content and indicators of compromise. Abstract is one of the few SDPP vendors with native detection depth in pipeline, in addition to pipeline data routing capabilities.

Voice of the Customer

We interviewed a customer of Abstract Security to share their experience with Abstract and their vision for such a platform.

Life before Abstract

“We were in the process of moving from an on prem SIEM solution to a cloud SIEM. What we found that is that the current data usage, if we lift and shifted, would be 2x our annual budget, each month! So we had to figure out how to get the data we needed to our cloud SIEM, meet requirements, but also meet the budget. Exactly what the original use case was. Replacing on-prem SIEM to cloud SIEM, control costs, and really understand the logs. Because of Abstract we have been able to store more data, and at the same time reduce our storage, thus saving money.”

Most used capabilities within Abstract

“Abstract allowed us to understand what data was coming in, letting us determine what we needed, compress that data, and then move it into our cloud SIEM. We had some sources that we were able to reduce and compress over 90% of the original log size.”

What they’d like to see more of

The customer’s early feedback referenced local replay options, which Abstract has since delivered through Lake Villa’s tiered storage and hybrid replay capabilities. “We do have some large on-prem storage clusters. What our ideal solution would be is the ability to store logs on prem as well, and allow the data to be replayed to the cloud SIEM as needed. Basically bulk storage options to keep data around that isn’t worth the cloud storage costs.”

Architecture and Deployment Maturity

Abstract offers flexible deployment options supporting SaaS, managed SaaS and hybrid.

The architecture is split into “Console” which is delivered in SaaS and “Pipeline Engine” which can be deployed in SaaS or can be self hosted.

Console can be delivered as SaaS (from Abstract) or can be deployed within customer’s cloud environment (AWS, GCP, Azure) while being operationally managed by Abstract – on-time upgrades, health monitoring and resiliency. The core engine runs in the cloud (customer or Abstract hosted), but forwarders can also be deployed in customer environments (cloud or on prem).

  • Marketplace: Available across AWS, GCP, and Azure.
  • MSSPs: Abstract provides multi-tenancy with role-based and event-level access controls to support MSSPs. Partnerships in place, but exact list unknown. Company is currently engaged with a Big Four firm on deployment.
  • Compliance: SOC2 Type 2 and are working on Fedramp

Pricing

Abstract uses flexible, consumption-based pricing with options for SaaS-hosted and customer-hosted deployments.

Hosted by Abstract (SaaS)

  • Ingestion-based pricing within a broad range to avoid frequent overage charges
  • Each plan includes up to 1 TB of excess data, with true-up at renewal

Customer-Hosted Deployments

  • Enterprise licensing not tied to ingestion
  • Pricing based on deployment size and infrastructure footprint for predictable spend

Pricing Assistance

  • 1 TB of leeway for SaaS plans before adjustments are needed
  • Cost planning support on preliminary calls, including custom estimates based on data volume and architecture
  • Hybrid model gives customers SaaS-style simplicity while enabling local edge processing for cost, performance, and compliance benefits

Data Collection and Integrations

Abstract supports multiple data collection methods.

Customers can run Abstract in their own cloud or use a hosted console, with optional on‑prem forwarders for local processing. Integrations are largely built in‑house and designed to be no‑code in the UI, with exportable YAML and Python for teams that prefer code. Abstract emphasizes deep SaaS and cloud integrations where APIs and schema drift make dynamic content more valuable than basic transport-only connectors.

  1. Local forwarder and OTEL collection: Lightweight forwarders run on premises or in customer VPCs to collect and pre process data close to the source. Abstract supports OpenTelemetry compatible listeners and OTEL collectors for standardized collection in hybrid environments.
  2. API based agentless collection: For SaaS, cloud platforms, and managed services, Abstract performs authenticated API pulls on a schedule. This is a core focus area, with deep coverage for complex SaaS sources and cloud services where schemas change frequently.
  3. Webhook / push based streaming: Where native push delivery exists, Abstract receives events via HTTP webhooks for near real time ingestion, reducing latency for event driven systems.
  4. Broker and stream integrations: Abstract integrates with enterprise messaging and streaming systems such as Kafka, Kinesis, Pub/Sub, Event Hubs, and notification queues like SNS and SQS to support high throughput pipelines.
  5. Syslog, HTTP, and object storage: Abstract accepts generic transports including Syslog, HTTP, and object storage buckets for bulk or batched ingestion, with parsers and normalization to OCSF, ECS, or custom schemas. The platform distinguishes complete integrations that include parsers, pipeline functions, detections, and dashboards from generic transport only feeds.

Number of integrations: Abstract currently supports 242 out of the box integrations with expanded opportunity, because of collection methods supported.

Core Pipeline Capabilities

Diving deeper into core pipeline capabilities.

Abstract Pipeline Capabilities 1
Abstract Pipeline Capabilities 2
Abstract Pipeline Capabilities 3

Additional Pipeline Capabilities

  • Immutable raw data storage: For compliance
  • Local compression & lightweight processing by forwarders
  • Stream X-Ray Recommendations: The pipeline engine continuously analyzes telemetry in-stream and surfaces the most valuable fields.
  • Observability Convergence: Pipelines treat security and observability telemetry equally, making it possible to blend logs, metrics, and traces into a single streaming fabric.
  • Exporting/importing of pipelines as YAML for version control and portability

Pipeline Building Experience

  • Automatic Default Creation: Pipelines are auto-generated when new data sources connect, with schemas detected and flows built without manual setup.
  • Manual Configuration: Users can adjust or extend pipelines for custom filtering, enrichment, or routing as needed.
  • Drag-and-Drop Builder: A visual canvas lets analysts connect sources to destinations using a simple drag-and-drop interface, eliminating code.
  • Pipeline DSL (YAML): Advanced users can define transformations and routing declaratively for CI/CD automation and version control.

AI Maturity

ASE, the Abstract Security Engineer (an AI assistant), enables natural language pipeline creation, integration setup, and schema mapping with optional YAML or JS export for advanced users. It automates schema normalization, drift correction, and propagation across pipelines, reducing manual upkeep. ASE also extracts indicators of compromise from unstructured sources to continuously enrich the Intel Gallery.

AI correlates entities, automates data redaction, and optimizes enrichment depth to support real-time detection and compliance filtering. Stream X-Ray adds AI-driven observability by identifying high-volume or anomalous sources and recommending reduction actions. ASE also serves as the analyst interface, summarizing detections and insights in natural language, with plans to expand into full analytic narratives for enhanced explainability.

Additional AI Capabilities on Roadmap

Abstract is building Sigma-based AI rule conversion and automated correlation design under its detection engineering roadmap. Generative detection narratives and behavioral modeling are also planned.

Data Privacy with AI:

  • Access control and isolation: Multi-tenant by design with role-based and event-level access controls, allowing MSSPs to grant visibility while keeping tenants strictly separated.
  • Human oversight model: “Insights” workflow correlates noisy Findings into analyst-ready Insights where humans and AI collaborate on investigation and enrichment.

Integration Health Monitoring

Abstract Integration Health Monitoring

Additional Capabilities beyond Pipelines

This section notes capabilities that the vendor may provide beyond pure pipeline features

Data Lake- Lake Villa

Abstract’s Lake Villa is a built-in, smart data lake designed for high-performance storage and real-time querying of security telemetry. It doesn’t need rehydration delays and supports replay to other destinations or the Abstract Streaming Correlation Engine. Lake Villa can be hosted by Abstract or deployed in any major cloud or private environment, with tiered storage (hot, warm, cold) for cost-optimized retention. Fully integrated with the Abstract pipeline, it enables real-time queries on enriched, normalized data while remaining open and extensible for forwarding to external SIEMs or data lakes.

Tiering models: Hot, Warm and Cold

Threat Detection Engine

Abstract runs detections directly in the data stream with out of the box threat detection content. It measures detection effectiveness by mapping visibility gaps to real data sources and rules. The Model Service enriches events with identity, asset, vulnerability, and threat intelligence in a unified data fabric. Stream X-Ray recommends which fields to filter or aggregate, improving signal quality and reducing false positives

Vision

Abstract’s vision is to be a decomposed SIEM of the modern era that combines: a security data fabric that unifies observability, AI powered detection, and response in a modular architecture. Their vision is to be the world’s first composable SIEM. They want to challenge the traditional SIEM model, with a modular architecture, reimagining what a security operations platform should look like in an age of exploding telemetry, multi-cloud sprawl, and AI-driven adversaries.

Analyst Take

Here’s where what we see as major strength and opportunity for improvement for Abstract –

Strengths

Abstract Security stands out for its streaming-first design that detects threats as data flows, cutting response times and lowering dependence on analytics at SIEMs. Its built-in health monitoring and schema drift automation keep pipelines stable without constant tuning, a mark of operational maturity. Stream X-Ray shifts the focus from collecting more logs to collecting high quality ones, improving both fidelity and cost efficiency. Lake Villa, is another area where customers can utlize in-house capabilities without needing an external dependency.

All in all, Abstract could be a good fit for organizations looking for a modular architecture that bridges security data pipeline capabilities with analytics and storage.

Areas to Watch:

Abstract’s opportunity lies in strengthening the autonomy, transparency, and breadth of its AI capabilities. While ASE assists with schema and pipeline management, evolving them toward self-healing and predictive automation would bring the platform closer to AI maturity. Adding clearer audit trails, lineage tracking, and data-handling guardrails would improve AI privacy trust and compliance readiness. Finally, Abstract’s ambition to replace the SIEM stack is bold but challenging for an SDPP vendor entering a market dominated by mature players like Splunk. The vendor reported that it recently secured a large enterprise win in which its composable SIEM was selected following a direct comparison with a leading SIEM provider. However, in an established SIEM industry with years of maturity, demonstrating consistent outcomes across a broader range of customers and environments remains important for assessing its overall competitiveness. Expanding coverage analytics, and federated search would also round out functionality comparison for such a vision. To succeed, the company will need to continue to demonstrate end-to-end depth across detection, data management, and operational scalability.

Axoflow

Axoflow is one of the promising newcomers in the Security Data Pipeline Platforms category. The company raised a $7M seed round earlier this year and has been innovating quickly. Axoflow provides an automated approach to security data ingestion, focusing on source identification, curation, and routing rather than regex-driven configuration. Its support for hybrid and air-gapped deployments, along with an agent-agnostic collection model, makes it suitable for organizations with heterogeneous environments and stricter control requirements. The use of supervised AI for classification and natural-language pipeline creation shows a clear focus on reducing manual effort in managing telemetry pipelines.

Architecture and Deployment Maturity

Axoflow offers flexible deployment options supporting SaaS, hybrid, self‑managed, and air‑gapped models. Console and Pipeline components can run in any mix to meet environment and control requirements.

  • Marketplaces & Vendor Programs: Splunk Partnerverse, Cisco partner, Sumo Logic partner, only security data pipeline vendor with Google Private Service Connect; AWS and additional Google integrations in progress.
  • MSSP Partners: DTAsia, PCSS, Sekom, SOS, WavePort Security, Kyndryl, Ether Gulf Enterprise, CloudSpace, Securelytics, Stefanini.
  • Compliance: SOC 2 Type II and ISO 27001.

Pricing

Ingestion-based licensing model aligned to the volume of data or number of data sources onboarded.

Pricing Assistance: None provided.

Data Collection and Integrations

Axoflow’s platform supports multiple collection methods. It collects data through various protocols, agents, and infrastructure compatibility.

  1. Protocols: Axoflow can receive logs via syslog , HTTP , and other protocols.
  2. Agents: Axoflow supports multiple agent types for data collection, including native agents and OTEL collectors. Specifically, it supports OpenTelemetry collector-based agents for Windows/Linux.
  3. API Integrations: The platform leverages API-based pulling for collecting data from cloud sources. It supports API integrations using API pull agents for cloud sources including CloudWatch and Azure Event Hub.
  4. Infrastructure Compatibility: Axoflow is compatible with existing infrastructure , including examples such as syslog-ng, Splunk UF/HF. The system integrates with over 150 security products as data sources.
  5. Deployment for Collection: The AxoRouter (processing engine) can receive logs. The entire Axoflow console, data sources, and processing engine can be deployed locally. Installation requires only a one-liner on a Linux box or Kubernetes.

Total number of integrations out of box: 154 integrations, the number can be higher due to ingestion methods supported.

Core Pipeline Capabilities

Axoflow Pipeline Capabilities 1
Axoflow Pipeline Capabilities 2
Axoflow Pipeline Capabilities 3

Additional Pipeline Capabilities

  • Log Tapping and Side-by-Side Debugging: Axoflow allows users to inspect raw and parsed data simultaneously, with highlighting to show how each pipeline step transforms the event.

Pipeline Building Experience

  • FilterX Pipeline Syntax and Declarative Routing Model: Axoflow uses FilterX, a domain-specific language for log processing, along with declarative routing driven by automatically applied labels. This allows users to define transformations and routing logic without relying on regex-heavy configurations.
  • Visual Pipeline Builder: Axoflow provides a visual interface that allows users to create pipelines using point-and-click elements, reducing dependence on scripting and simplifying troubleshooting.
  • AI-Assisted Pipeline Creation: Axoflow supports natural language search and pipeline building, enabling users to describe goals in plain language and have the system generate corresponding pipeline logic. An integrated AI assistant provides contextual help and documentation.

AI Maturity

Axoflow positions its platform around a more advanced level of AI maturity by taking responsibility for maintaining data pipelines rather than using AI as just an assistive layer. The company’s AI capabilities center on an AI assisted decision tree that identifies data as it flows through the platform and applies classification labels that drive routing, curation, normalization, and reduction decisions. This classification engine uses supervised AI to adapt to changes in source data and manage schema drift without manual tuning. This autonomous change capability removes human in the loop supervision and reflects a higher level of AI maturity.

AI also supports natural-language search and in-product pipeline building, allowing users to describe transformations or routing behavior through NLQs rather than written logic.

An AI assistant is built into the console to help with documentation, guidance, and configuration support. Axoflow also applies automated, source-aware reduction and curation informed by its classification logic, improving downstream detection quality and reducing noise before data reaches the SIEM. Additional AI features, including stream-processing and threat intelligence enrichment, are in progress.

Data Privacy with AI: Provided by tenant level separation.

Integration Health Monitoring

Axoflow Integration Health Monitoring

Additional capabilities

This section notes capabilities that the vendor may provide beyond pure pipeline features

Data Lake – Multi-Layer Storage Architecture

Axoflow supports scalable ingestion and long-term analysis through a layered storage model. AxoStore provides edge storage used for local buffering and troubleshooting, Axoflow Locker serves as a standalone log-store appliance for remote operations, and AxoLake delivers a petabyte-scale tiered data lake for extended retention and analytics.

Vision

Axoflow’s vision is to create a decentralized SIEM model where some of the SIEM functions like curation, storage, policy enforcement, and analytics are no longer done in the SIEM but move closer to the pipelines, where data is streaming through. AI supports the process by automating key steps, acting as an enabler rather than a shortcut.

Analyst Take

Here’s where what we see as major strength and opportunity for improvement for Axoflow

Strengths

Axoflow’s strengths lie in its automation depth and operational flexibility. The platform offers broad deployment choice across SaaS, on-prem, hybrid, and air-gapped environments, which positions it well for regulated or distributed estates. Its classification-driven engine is also a differentiator which handles reduction, normalization, and routing without regex or manual tuning, and the ability to surface both raw and parsed data simplifies troubleshooting. Integration health is another strong point, with detailed metrics on drops, delays, queues, and host resources, paired with automatic source inventory to reduce blind spots. Axoflow also brings an uncommon storage footprint for a pipeline vendor, offering edge collection, a standalone log-store appliance, and a tiered lake for long-term retention. Combined with broad ingestion support, the platform’s ambition to minimize operational overhead and take more from the SIEM adjacent capabilities are set on track.

Areas to Watch

Axoflow’s schema handling is automated, but it lacks explicit drift alerting, which may matter for teams with strict governance or change-control requirements. The company is still early in its market maturity, though the leadership team’s background seems promising. Axoflow can invest in areas like out of box content packs, AI capabilities and features beyond reduction and curation. Pricing support appears less structured as well, with limited guided TCO tooling, which can slow enterprise evaluation. While the platform has strong foundations, similar to a typical emerging entrant in this category, it will need to expand its ecosystem presence, operational guardrails, and packaged content to execute on it’s vision.

Beacon Security

Beacon Security is a promising newcomer to the security data pipelines market category. They came out of stealth just this month (November, 2025). Beacon enters the security data pipeline space with a focus on content-aware “Recipes,” agentic AI assistance, and stable ingestion mechanics aimed at improving data quality while controlling cost. Based on early materials and briefings, the platform is positioned as a security-focused data fabric built to reduce operational overhead and support downstream detection and investigation across SIEMs, data lakes, and emerging AI workflows.

Voice of the Customer

We were able to interact with a customer ( a US based financial firm), of Beacon asking about their experience with the platform. Here is what they said –

Life before Beacon

“The main challenges our organization faced prior to adopting Beacon were related to infrastructure burden, standardization, and log volume management. In the world of endless SaaS applications, every vendor provides logs differently, or sometimes not at all. Some logs come via webhook, others might use an external syslog over different protocols, and most are polling APIs. On top of collection issues, there is no agreed upon standard for log normalization (schema). We tried to address this internally using open-source tooling, but we ran into significant limitations. It was difficult to gather logs from one-off vendors in an elegant or efficient way. This required us to build a lot of custom infrastructure internally just to support every single log source. The other major issue was the log volume problem. There is so much data flying around, and while we can’t simply not gather the data, we also don’t need absolutely everything. We needed a reliable way to consolidate logs without sacrificing meaningful information. Beacon’s capabilities exactly address these specific challenges.

The future of Beacon within our organization involves expanding its presence to cover internal data. We currently still have quite a few logs that are internal and do not yet go through the platform. While the collection of these internal logs isn’t the primary issue, we plan to ingest these logs through Beacon as well. We would utilize this expansion specifically for the purposes of normalization, log reduction, ultimately leading to better data at a lower cost.”

Most used capabilities within Beacon

“Beacon’s capabilities were the exact solution to our organizational challenges. The top capabilities we rely on are:

  • Comprehensive Log Gathering: Beacon can gather logs from essentially any system in any way we ask them to.
  • Normalization: They will normalize the logs to any schema we request using their internal AI agent.
  • Log Volume Reduction: Beacon helps us reduce log volume by analyzing the logs and combining similar logs within short periods of time.”

What they’d like to see more of

“Data volume and SIEM costs these days are astronomical – we would love to see a combination of in-pipeline detections paired with a full self service feature to send logs directly to cold storage while having the ability to rehydrate them to send to our SIEM. This would drastically reduce our SIEM spend while not lowering our security posture.”

Architecture and Deployment Maturity

Beacon provides flexible deployment options. It’s architecture is split into a SaaS control plane with either SaaS managed engine or customer-hosted engines deployed in Bring Your Own Cloud (BYOC) or on-premises environments. Pipelines include persistent queues, replay, and exactly-once delivery to ensure durability and reliability.

  • Marketplace: Beacon is listed in AWS Marketplace
  • MSSPs: Collaborates with MSSPs in joint projects for customers (not co-sell).
  • Compliance: SOC 2 Type II, ISO/IEC 27001, HIPAA

Pricing

Ingestion-based pricing model.

Pricing Assistance: The ingestion plan takes into account the types and value of data sources.

Data Collection and Integrations

Beacon allows normalization and routing across environments without sending raw log data outside customer control. Supported sources include APIs, Syslog, Webhooks, OpenTelemetry, and major cloud storage platforms. Normalization aligns to schemas such as OCSF, ECS, CIM, UDM, and ASIM, with multi-destination routing supported through exact-once delivery and persistent queuing.

Ingestion Approach

  1. OTEL Collectors: Supports OTEL collectors for standardized, vendor-neutral telemetry collection across cloud and on-prem environments.
  2. API-Based Integrations: Connects directly to platforms via REST APIs, enabling agentless collection from services like AWS, Azure, and Google Cloud. Also pulls data from cloud storage platforms including Amazon S3, Azure Blob Storage, and Google Cloud Storage.
  3. Syslog and Raw TCP: Ingests data via traditional syslog (UDP/TCP) and raw TCP sockets, supporting legacy infrastructure and security appliances.
  4. WebSocket and Webhook: Handles event-based ingestion via WebSocket streams and HTTP webhooks for modern, real-time data sources.
  5. Filesystem and STDIN: Supports file-based log ingestion and standard input streams, enabling local testing or lightweight deployment scenarios.
  6. Forwarders and Event Streams: Integrates with log forwarders like Vector and streaming platforms such as Azure Event Hub, AWS Kinesis, and Kafka for high-throughput environments.

Integrations Count: Over 100 when grouped by types of events within each integration and the collection method supported.

Core Pipeline Capabilities

Diving deeper into core pipeline capabilities

Beacon Pipeline Capabilities 1
Beacon Pipeline Capabilities 2
Beacon Pipeline Capabilities 3

Additional Pipeline Capabilities

  • AI-Guided Routing and Logging Posture: Beacon’s AI-driven Logging Posture capability identifies important telemetry gaps and recommends routing missing but relevant data to appropriate analytics destinations.
  • Late Arrival and Exactly-Once Processing: Supports lossless data delivery with stream correlation, replay handling, and tolerance for out-of-order events to ensure reliable and complete ingestion.
  • Live Recipe Validation and Decision-Supporting EDA: Enables real-time transformation-level preview, data analysis (EDA), optimization metrics, and testing of transformations. Users can verify schema accuracy, understand recipe logic, and customize transformations based on statistics and context.
  • Modular Cloning: Allows users to duplicate and reuse existing pipelines or Recipes across environments for faster scaling.
  • Regex-Based Data Shaping: Provides flexible parsing and field extraction for complex or unstructured log formats.
  • Governance and Masking: Applies field-level classification and masking for sensitive fields to maintain compliance.

Pipeline Building Experience

  • AI Chatbot Interface: AI assistant for guided pipeline creation.
  • Visual Workspace: Build and manage pipelines using a visual editor.
  • JSON/YAML Upload: Import configurations directly by uploading structured JSON or YAML files.
  • Point-and-Click Recipes: Use prebuilt Beacon Recipes with a simple interface to apply transformations without writing code.

AI Maturity

Beacon incorporates multiple layers of agentic and AI-driven intelligence across its platform. This includes schema-aware mapping and validation for target formats like ECS and OCSF, as well as Recipe recommendations that guide enrichment and pipeline optimization. Its logging posture and data discovery features help identify critical telemetry that may be missing from current collection, supporting improved detection coverage.

The platform’s agentic fabric intelligence is also oriented toward building entity and context graphs to support future AI-SOC workflows. BeaconBot, currently integrated with Slack, allows users to query platform state via API. The roadmap includes expanding this capability to support controlled write operations on data streams, also via MCP server, further embedding automation into pipeline operations.

Data Privacy with AI: Beacon’s BYOC and on-premises models keep customer data within their environment.

Integration Health Monitoring

Beacon Integration Health Monitoring

Additional Capabilities

This section notes capabilities that the vendor may provide beyond pure pipeline features.

Collectopedia

Beacon maintains Collectopedia, a structured knowledge base that maps security data sources, schemas, and field semantics to use cases, MITRE ATT&CK tactics, and investigative value. It supports Beacon’s normalization, enrichment, and discovery features by helping both AI agents and users understand the purpose and relevance of each field. End users interact with Collectopedia through capabilities like telemetry discovery, compliance mapping, and MITRE tactic-level visualizations.

Vision

Beacon’s goal is to become the security data and context layer, which utilizes pipelines and Beacon Fabric to unify collection, optimization, normalization, enrichment, and governance across environments. It is evolving toward enabling both analysts and AI systems to reason over high-quality, contextualized security data.

Analyst Take

Here’s where what we see as major strength and opportunity for improvement for Beacon –

Strengths:

Beacon is strongest at the intersection of security context and pipeline engineering. Its practitioner-defined Recipes, schema mapping to ECS and other standards, and support for exactly-once delivery and late-arrival handling offer reliability and flexibility. Logging Posture and Data Discovery help teams quickly identify what telemetry to collect and where to route it, reducing setup time and cost. Beacon avoids native storage and indexing by design. Data stays in the customer’s existing SIEM, object store, or data lake, while Beacon handles routing, normalization, and enrichment. This supports portability and aligns with federated analytics models, though it requires integration with downstream tools for search. For teams prioritizing AI-assisted posture, schema alignment, and operational control across open telemetry environments, Beacon offers a differentiated approach.

Areas to Watch:

Coverage signals are available through Logging Posture and Data Discovery, though a dedicated coverage dashboard is not yet available. It is planned in the near term roadmap and has been confirmed in preview. Programs with reporting or governance needs should assess the current level of depth. Detection content is currently destination and partner led, which keeps the focus on data fabric and context, but buyers expecting bundled detections should evaluate how well this aligns with their content strategy. Beacon’s approach to schema drift is to absorb changes by default through Recipe updates rather than surfacing them by default, to users. This reduces noise and day to day toil for most teams, but those who prefer more visibility will need to adjust the defaults to enable it. AI capabilities are a growing area of interest within the pipeline industry, and we look forward to seeing continued investment from Beacon in this area as they grow.

Brava

Brava is currently in stealth and takes an interesting approach to telemetry routing. While they don’t position themselves as an SDPP (security data pipeline platform), they address similar use cases extending beyond routing and cost reduction to also improve threat detection and overall telemetry efficacy. The platform can operate alongside an SDPP or as a standalone solution.

Brava positions itself as a telemetry efficacy layer that sits just in front of security data pipelines and SIEMs. Its platform uses AI-driven attack simulation, a continuously updated knowledge base of log signatures and even looks through undocumented APIs, to uncover how attacks expose themselves across telemetry. Brava enriches, filters, aggregates and tunes detection for an optimized threat detection process while reducing costs.

Architecture and Deployment Maturity

Brava’s management console is delivered as a multi-tenant SaaS platform hosted and operated by Brava, providing the user interface, API access, TTP coverage mapping, and configuration management. Users connect securely through a browser or API, with all configurations and metadata stored in Brava’s cloud environment.

The data pipeline engine can run as a managed SaaS service or be deployed within a customer’s environment, such as a cloud account, VPC, or other infrastructure. It handles log collection and forwards data to the Brava platform for normalization, reduction, and enrichment, where Brava maintains read access.

A fully on-premises version of the pipeline engine is on the roadmap.

  • Marketplace: AWS, also highly integrated with Cribl (Cribl Packs).
  • MSSPs: Mobia.
  • Compliance: SOC2, ISO27001

Pricing

Size of the environment.

Pricing model is based on the size of the environment (e.g. number of cloud resources, number of firewalls, etc.). The goal is to be predictable.

Pricing Assistance: Pricing calculator: Brava provides an estimate pricing calculator.

Data Collection and Integrations

Brava supports multiple data collection methods. Brava integrates with major cloud and identity sources such as AWS, Azure, GCP, Okta, Entra ID, and custom HTTP feeds for unsupported data like syslog. Data can be routed to destinations like Splunk, Microsoft Sentinel, Cribl, Snowflake, and major cloud storage services. Automation and ticketing integrations include Tines, ServiceNow, Slack, Teams, and Opsgenie.

  1. API-Based (Agentless) Integrations: Brava connects directly to major cloud and identity platforms such as AWS, Azure, GCP, Okta, Entra ID/Active Directory, and O365 through API-based ingestion. This method removes the need for local agents and provides secure, low-maintenance data collection.
  2. OpenTelemetry (OTEL) and Streaming: Supports ingestion via OpenTelemetry collectors and Kafka streams, allowing standardized telemetry capture from modern applications and infrastructure. This approach ensures compatibility with existing observability and security data pipelines.
  3. Custom HTTP Ingest: Offers flexible HTTP endpoints for proprietary, legacy, or unsupported data sources, including syslog, netflow, and custom application logs.
  4. Push Models: Data can be pushed or streamed to Brava.

Total number of integrations out of the box: 29 but can be expanded based on collection methods supported.

Core Pipeline Capabilities

Brava Pipeline Capabilities 1
Brava Pipeline Capabilities 2
Brava Pipeline Capabilities 3

Additional Pipeline Capabilities

  • Seamless Retrieval from Low-Cost Storage: Enables users to query logs stored in lower-cost tiers directly from their SIEM, maintaining the same interface and experience without additional tools or knowledge of data location.
  • Natural Language Querying: Allows users to run natural language searches within the SIEM, eliminating the need to specify indexes or write query syntax.

Pipeline Building Experience

  • Yaml Files: Brava represents and builds its pipelines as YAML files. They do mention that this is not its primary focus. The platform is typically deployed alongside other pipeline tools, adding an intelligence layer that identifies gaps and enhances pipeline efficiency.

AI Maturity

Brava uses an AI-driven efficacy engine that evaluates logs at the field level, combining insights from active attack simulations with a continuously updated knowledge base. These simulations run both broad and deep attack paths, mapping telemetry to the MITRE ATT&CK framework to assess whether each log contributes meaningfully to detection or investigation.

Brava also monitors SIEM analytics to identify whether required logs are being ingested, helping teams surface gaps that impact detection coverage. This continuous feedback loop supports intelligent routing decisions and allows Brava to deliver a pipeline’s data reduction value.

Data Privacy with AI: Tenant isolation is built into the SaaS architecture. The platform supports RBAC and integrates with enterprise identity providers. Access is provided through a secured UI and API.

Integration Health Monitoring

Brava Integration Health Monitoring

Additional Capabilities beyond Pipelines

Brava provides these additional capabilities beyond core pipeline features.

Attack Simulation for Coverage Testing

Brava runs large-scale attack simulations across both sandboxed environments and the customer’s own infrastructure. These simulations test whether the current telemetry and pipeline configuration can support detection and investigation at scale.

Seamless Search and Retrieval from Storage platforms

Brava enables natural language queries from within the SIEM, removing the need for users to specify index names or write complex query syntax. Users can query archived logs stored in low-cost tiers directly from their SIEM interface. The retrieval is transparent, no need to manage storage locations or switch tools.

Vision

Brava’s vision is to build an AI-driven security data fabric that improves the way teams manage telemetry across their pipeline and SIEM. Rather than replacing existing tools, the goal is to evaluate each log at the field level, map it to MITRE, and use continuous attack simulation to inform routing, enrichment, and storage decisions. The longer-term direction is toward agentic workflows, where the platform actively identifies gaps, reduces noise, and shapes what gets sent to the SIEM based on actual detection value.

Analyst Take

Here’s where what we see as major strengths and opportunities for improvement for Brava –

Strengths

Brava stands out for its focus on telemetry efficacy, using continuous attack simulation, and a living knowledge base from it to find gaps, add security context, and guide routing and reduction. This lets teams lower ingest costs without sacrificing detection coverage. Designed to run alongside existing SIEMs and pipelines, Brava reduces adoption friction and fits into environments using tools like Splunk, Cribl or Sentinel. It supports a wide range of integrations across cloud, identity, analytics, and automation platforms.

Features like SIEM-native log retrieval from low-cost storage and natural language querying help analyst workflows while optimizing data flow. Operationally, Brava offers tenant isolation, RBAC, SSO, and early marketplace and MSSP signals, making it a fit for teams prioritizing visibility, efficiency, and control.

Areas to Watch

Brava’s normalization is currently centered on JSON, with support for OCSF, ECS, and CIM still in progress. Integration coverage is growing but is an area to improve, and parser updates are still handled manually despite automatic drift detection. The YAML-based builder works but may limit accessibility as buyers look for natural language and validation tools. Overall, we believe that traditional SDPPs (such as those covered in this report) work much better as pure routers when a customer has many destinations.

CeTu

CeTu differentiates itself through context-aware SIEM analytics at scale, a low-latency data engine designed to operate at low cost, and a proprietary pattern-intelligence layer that improves with each deployment. The platform provides an AI-driven security data pipeline aimed at optimizing telemetry flow and reducing SIEM ingestion volume. The company’s broader vision is to develop a unified, AI-driven security data fabric that simplifies data management, enhances analytical context, and strengthens both security and observability workflows. This direction is intended to support organizations in maintaining a proactive security posture.

Voice of the Customer

Life Before CeTu

The customer described a growing operational burden associated with escalating log volumes and Microsoft-based SIEM ingestion costs. Prior to CeTu, the security team faced significant pressure to manage expanding telemetry from both IT and industrial environments without a scalable cost-control mechanism.

Most Used Capabilities Within CeTu

The customer reported that CeTu’s main value was its ability to improve detection efficiency through automated processing of high-volume log streams. In their case, the platform filtered out noise from overly verbose applications and produced a clearer and more actionable security signal. According to the customer, this contributed to a stronger detection posture.

The customer also observed a secondary outcome: a 60 to 70 percent reduction in log volume. This decrease lowered ingestion costs and made it possible to onboard additional systems without exceeding capacity, a practical need in data-heavy IT environments.

While the economic effects were notable, the customer emphasized that these results were a direct consequence of CeTu’s core function of increasing visibility, speeding detection, and reducing noise before it reached downstream tools.

What They Would Like to See More Of

While satisfied with cost reduction and noise control, the customer expressed interest in deeper detection-centric value, AI-driven rule optimization, and support for lower-cost analytical workflows on archived telemetry. The customer expressed interest in continued, deeper detection-focused capabilities, particularly around gap monitoring and AI-driven rule optimization, both of which, according to Cetu, his team has begun adopting. He also noted that he is looking forward to upcoming enhancements that will allow analytical AI workflows to run on lower-cost archival storage, confirming alignment with CeTu’s roadmap and long-term strategy.

Architecture and Deployment Maturity

CeTu offers flexible deployment options supporting SaaS, hybrid, and on-premise operation.

Their architecture is divided into Console (control plane) and Pipeline Engine (CeTU FLOW). CeTu Flow is the platform’s pipeline engine, designed to provide low latency and high throughput for large-scale data processing. It was developed by the company’s CTO, who applied experience from carrier-grade switching systems to create an architecture focused on efficiency. According to the company, the system delivers high performance while requiring comparatively limited infrastructure resources, resulting in lower operational costs. They also offer their own proprietary intelligence layer called CeTu Depth.

  • Marketplace: CeTu is listed in AWS Marketplace, Azure Marketplace, Crowdstrike
  • MSSPs: GuidePoint, Trace3.
  • Compliance: SOC 2 certified.

Pricing

Ingestion based model.

Pricing Assistance: CeTu provides pricing calculators, but these are not publicly exposed today (used via sales / SE engagement rather than self-serve).

Data Collection and Integrations

CeTu unifies telemetry ingestion across cloud platforms (AWS, Azure, GCP), SaaS, IoT, and on-premises environments, providing a single framework for collecting and routing security data. Supported inputs include OpenTelemetry (OTEL), Syslog, CloudTrail, APIs, and custom HTTP endpoints.

CeTu’s ingestion approach can be characterized across these modes:

  1. Pre-Built Connectors: More than 400 connectors for SaaS, cloud, and infrastructure sources deliver parsed, normalized data out-of-the-box.
  2. Custom Connector Builder (HTTP Sources): A low-code tool for creating connectors to proprietary or uncommon APIs, extending coverage to internal systems.
  3. AI-Assisted Parsing: CeTu’s AI engine automatically structures unformatted or custom logs without regex or manual parsing, accelerating integration.
  4. Proprietary Forwarder Architecture: A high-performance forwarder designed for high throughput, low latency, and reduced infrastructure overhead.
  5. Agent-Based and Agentless Collection: Supports both agent and agentless data collection through OTEL collectors, APIs, and forwarders to match customer deployment models.

Core Pipeline Capabilities

Diving deeper into core pipeline capabilities

CeTu Pipeline Capabilities 1
CeTu Pipeline Capabilities 2
CeTu Pipeline Capabilities 3

Additional Pipeline Capabilities

UI-Based Transformation Preview

The platform allows users to preview pipeline transformations within the interface, validating parsing, reduction, and normalization steps before deploying them into production.

Pipeline Building Experience

  • Template-Driven Manual Authoring: Users can build pipelines by starting from ready-made templates that reflect best-practice reduction, normalization, and enrichment patterns.
  • AI-Generated Pipelines: Zoe can automatically generate complete pipelines from a single user instruction, identifying gaps or inefficiencies and producing an optimized design.

AI Maturity

CeTu’s platform differentiator is the deep integration of its security-specific GenAI model across the pipeline engine, driving features like the Zoe AI assistant for routing and cost optimization, AI-assisted parsing to structure unstructured logs without reliance on complex regular expressions, and a Natural Language Builder for pipeline creation. Beyond user-facing features, CeTu VISION uses AI to perform large-scale, context-aware SIEM analysis, continuously assessing detection posture, identifying gaps, and guiding routing decisions with security context. In parallel, CeTu DEPTH applies proprietary AI-driven pattern intelligence to monitor environments over time, surface emerging issues, and recommend remediation automatically.

DEPTH differs from traditional static analytics tools by maintaining a continually expanding set of patterns based on real customer environments. These patterns include detection blind spots, event-quality gaps, and opportunities to reduce unnecessary data. This reference base allows the system to apply previously learned insights immediately, adjust optimization to each customer’s environment, and improve over time through shared learning across deployments. Because DEPTH does not need to rediscover patterns it has already identified, it lowers the cost of AI-driven analysis while improving accuracy. Each new deployment both benefits from and contributes to this shared knowledge base, creating a cumulative effect. These foundational AI systems operate continuously beneath the surface and are integral to how CeTu improves detection efficiency and visibility at scale, not just how users interact with the product.

While user feedback confirms the practical utility of these AI functions, suggesting mid-maturity, the company states their capabilities have received positive feedback from their customer base. CeTu additionally supports the operational availability of all AI features in strictly offline or on-premise deployment models, beyond SaaS deployments.

Integration Health Monitoring

CeTu Integration Health Monitoring

Additional Capabilities

This section notes capabilities that the vendor may provide beyond pure pipeline features

Cross-System Querying

The platform provides search across SIEMs and data lakes without requiring users to write query language. Instead, it relies on structured interface actions such as breakdown, filter, and pivot. These actions automatically generate the underlying logic needed to examine patterns, isolate events, and review activity across different data sources.

Search operates directly on the connected SIEM platforms, so users can examine data without moving or duplicating it. This allows analysts to review specific event classes, compare activity across systems, and validate whether data sources are producing the expected telemetry. The platform also highlights areas where volume changes, anomalies, or gaps may require follow up.

Vision

CeTu’s stated vision is to develop an AI-driven security data framework that makes security data immediately usable and contextually meaningful. The aim is to support faster and more accurate detection and response by unifying and preparing data across systems. The company describes its approach as focusing on three areas: incorporating business context into all data streams to support informed security decisions, reducing operational effort through automation and low-code or natural language workflows, and ensuring that data is prepared to a level where security tools and language models can operate on high-fidelity, decision-ready information.

Analyst Take

Here’s where what we see as major strength and opportunity for improvement for CeTu –

Strengths:

CeTu demonstrates materially strong cost-optimization and log reduction performance, with claims supported by customer-reported results. Its deployment flexibility spans SaaS, hybrid and fully on-premises models, complemented by SOC 2 alignment. The platform integrates AI natively through the Zoe assistant and Natural Language Builder, lowering the threshold for pipeline design and iterative refinement. CeTu’s commitment to destination independence enables organizations to avoid vendor lock-in and route data across multiple analytical and storage systems. In addition, its operational telemetry provides real-time visibility, including coverage-gap insights and schema-drift detection, supporting continuous data quality assurance.

Areas to Watch:

CeTu has introduced AI capabilities that are a part of their architecture, centered on two systems: CeTu VISION, which evaluates SIEM telemetry to identify security-relevant coverage gaps, and CeTu DEPTH, which applies pattern intelligence across customer environments to surface emerging issues and recommend remediation. These systems reflect a lifecycle-oriented approach extending from data ingestion to contextual analysis and pipeline optimization. However, several areas remain for user facing AI capabilities. Security teams need clearer visibility into how the platform can influence routing decisions, data prioritization, and pipeline outcomes. Practitioner expectations for SDPP platforms have also shifted toward predictive capabilities such as anticipating schema drift, automatically rerouting based on destination health; CeTu could broaden its AI-driven insights beyond pipeline efficiency to include recommendations on data source prioritization, automated quality assurance workflows, and adaptive lifecycle management that adjusts to organizational change.

Cribl

Cribl pioneered the Data Pipeline Platforms category back in 2018 scaling from $4 million dollars initial rounds to now over $319 million Series E. The evaluation is a testament to Cribl’s growth and success in the telemetry pipelines market. Today, Cribl continues to stand as a market leader in this category not only based on broad customer base but also for its in-depth maturity as can be seen below, in technical capabilities for both core pipeline category and diverse feature set beyond pipeline features. Among the practitioners we interviewed, Cribl was still the most known and used security data pipeline platform.

Voice of the Customer

We were able to meet customers of Cribl at CriblCon and interview them about their experience with Cribl. Here is what they said –

Life before Cribl

Before Cribl, teams sent almost everything straight into Splunk or bespoke collectors, with slow searches (up to 90 minutes), brittle Java style parsers, and little visibility into data as it flowed, driving up index and storage costs and making any change slow and painful. Long term, they see Cribl as a central telemetry fabric: everything flowing through Stream/Edge at scale, Lake as easy archival, Search for in place query, and SIEMs, EDR, data lakes, and BI tools treated as interchangeable consumers on top of that backbone rather than primary integration points.

Most used capabilities within Cribl

Customers lean hardest on Stream for filtering, suppression, and routing high value data to Splunk while offloading the rest to cheaper storage, plus using Packs (for things like Palo Alto) and starting to adopt Edge to replace Universal Forwarders and legacy collectors, with centrally managed, easier to change agents.

What they’d like to see more of

They want clearer operational guardrails and guidance (worker sizing, timeouts, tuning), faster resolution of “logs missing” and socket errors to reduce internal resistance, and more help with data tiering and classification so critical data goes to Splunk while lower value streams default to Lake or object storage instead of the SIEM.

Architecture and Deployment Maturity

Cribl offers flexible deployment options supporting SaaS, hybrid, and self-managed models. Each Cribl Stream instance runs in a defined mode: either as a Leader Node, governing the whole deployment, or as a Worker Node managed by the Leader.

  • Cribl.Cloud (SaaS): Fully managed by Cribl.
  • Hybrid: Combines Cribl-managed Leaders in the Cloud with customer-managed Workers across VPCs or data centers, maintaining centralized orchestration and RBAC.
  • Self-managed (on-prem/private cloud): Stream can run as a single instance or in distributed mode for HA and scalability.

Marketplace: Cribl is available in AWS, Azure, and GCP marketplaces

MSSPs: Over 20 MSSP partnerships (including Optiv, Deloitte, Deepwatch, and Reliaquest).

Compliance: ISO 27001, SOC 2 Type 2, SOC 3, and FedRAMP “In Process,” with Authorization to Operate (ATO) expected by the end of 2025.

Pricing

Consumption-based model

Cribl offers two main options:

Cloud: Credit-based pricing ($1 per credit) usable across all products. Cribl Search subscriptions are also available with flat-rate pricing.

Self-managed: Licensed by peak daily ingest for Stream and Edge, with an optional Universal Subscription that converts ingest capacity into cloud credits.

Pricing Assistance: Cribl provides:

  • FinOps Center for usage and cost forecasting
  • Transparent credit-based pricing by product
  • Self-service tools including a pricing page and ROI calculator

Data Collection and Integrations

Cribl supports multiple data collection methods. It can also work with existing protocols and tooling so customers can plug Stream (their core pipeline capability) into current architectures with minimal change. Collection methods fall into four main categories: push listeners, pull collectors (including flexible REST collectors), agent and forwarder integrations, and system or internal inputs, plus Cribl to Cribl flows for optimizing data egress.

  1. Local Forwarder & OTEL Collection: Cribl Edge and Stream collect telemetry from local agents and standard protocols including OTLP, Syslog, Splunk HEC, Splunk S2S, NetFlow/IPFIX, SNMP, and Windows Event Forwarding. This enables consistent, near-source data capture across hybrid environments.
  2. API-Based Agentless Collection: REST/API collectors pull data from SaaS and cloud platforms on a schedule, supporting authentication, discovery, filtering, and event parsing. Sources include S3, Azure Blob, GCS, databases, and Cribl Lake for batch and replay ingestion.
  3. Webhook / Push Streaming: Stream supports native HTTP/HTTPS endpoints such as Splunk HEC for real-time event delivery from agents and third-party services.
  4. Broker & Stream Integrations: Stream integrates with Kafka (source and destination), Amazon Kinesis, Google Cloud Pub/Sub, Azure Event Hubs, and Amazon SQS for high-throughput, decoupled ingestion with built-in state management and scheduling.
  5. Syslog, HTTP & Object Storage: Handles generic transports like Syslog, HTTP/HTTPS, and object stores (S3, Azure Blob, GCS, MinIO) for bulk ingestion or replay, routing to SIEM, observability, or data lake destinations (Snowflake, Databricks) with built-in normalization and enrichment.

Total number of integrations out of box: 114

Core Pipeline Capabilities

1. Data Reduction Capabilities

  • Dropping low-value events
  • Sampling and aggregation
  • De-duplication and burst caps to prevent system overload
  • Converting logs to metrics (switching verbose raw logs to metadata when needed)
  • Reducing metric cardinality and points
  • Event compression by translating to more compact formats, such as XML to JSON
  • Source-specific packs for targeted optimization

2. Out-of-the-Box Content Packs

Cribl provides ready-made content packs to support data reduction and transformation.

Source-Specific Packs: Use awareness of source formats to reduce and normalize data efficiently. For example, the Syslog Pack removes redundant headers and timezone fields, often achieving double-digit percentage savings per event.

3. Data Normalization

Supports OCSF, UDM, ECS, and OTEL formats. Users can access and analyze both raw and parsed data versions for flexibility.

4. Enrichment

External Context Enrichment:

  • Key Matches: Join lookup tables (.csv, .csv.gz) or IP databases (.mmdb) inline based on key matches.
  • GeoIP context and DNS lookups.
  • Redis integration for distributed, high-performance lookups.
  • File-based lookups

Derived and Computed Enrichment: Add calculated fields, normalize values, and define classifications for routing, masking, and analytics.

Threat Intel Enrichment: Not native, but supported through REST Collector or Search to gather IOCs, normalize to CSV, store in Lookups Library or Redis, and enrich via Lookup or Redis Functions in pipelines.

5. Schema Drift Detection

Currently in development. Will be generally available in the first half of 2026.

6. Threat Detection Content

Not developed in-house. Available through partners such as AlphaSOC and others.

7. Intelligent Data Routing

The in-product Copilot Editor turns plain-language prompts into draft filtering and routing logic, then lets users review and edit it before deployment.

Additional Pipeline Capabilities

  • Authoring and Testing: Build and validate pipelines using Data Preview and Metrics View for real-time feedback.
  • Branching and Reusing Data and Pipelines: Use Clone, and Chain to duplicate, split, or connect pipelines modularly. Stateful functions can rely on Redis for coordination.
  • Data Shaping and Normalization: Parse and structure data using Regex, Grok, JSON/XML Unroll, and other functions. Copilot helps map schemas and output metrics or serialized data.
  • Metrics and Governance: Aggregation and rollups manage data volume while retaining accuracy. Cribl Guard protects sensitive data, Event Breaker and Auto Timestamp ensure event integrity, and Cribl Expressions enable calculated fields and routing logic.

Pipeline Building Experience

  • Manual Creation: Pipelines can be defined manually as a sequence of functions in a JavaScript-style language, with Data Preview for validation.
  • Copilot Assistance: The AI-powered Copilot generates pipeline logic from natural language prompts, helping users create dynamic filtering and routing without manual coding.
  • Start from Prebuilt Packs: Users can import ready-made packs containing preconfigured pipelines for standard use cases.
  • QuickConnect (Drag and Drop): Provides a visual interface to connect sources to destinations, with options to include pipelines or packs for streamlined setup.

AI Maturity

Cribl embeds AI, with a focus on keeping humans in the loop, across its data pipeline to make engineering, enrichment, and observability faster and more intuitive. The Copilot Editor lets users build or edit pipelines in plain language, automatically mapping schemas to OCSF, UDM, ECS, or custom formats while validating outputs in real time. AI schema intelligence supports structure-at-ingest design and will introduce schema drift detection through Insights in 2025.

For data protection and quality, Cribl Guard applies AI-assisted masking to find previously unknown shapes of data, detect and block sensitive data in motion, while enrichment layers add context through GeoIP, DNS, and lookup integrations. On the analytics side, KQL Assistant converts natural language into structured queries, and Cribl Insights delivers AI-driven observability with alerts on drops, latency, and consumption anomalies. There’s no autonomous cross-source correlation today, though the platform is designed to feed AI-driven SOC tools via normalized data.

Cribl also includes an AI chatbot for product support, configuration help, regex guidance, and KQL generation without accessing customer data. AI-driven recommendations guide setup and troubleshooting across sources, routes, pipelines, datasets, and dashboards through natural language interactions.

On prem availability: Copilot, Copilot Editor, function assistance, and AI generated commit messages are available in self-managed deployments, as well as in Cribl.Cloud.

Data Privacy with AI: Governance remains central, with granular AI controls, human-in-the-loop workflows, explainable assistants, and strict data privacy ensuring no customer data is used for model training.

  • Granular Access controls: Global AI Settings for org wide enable or disable plus feature level controls in each product.
  • Data handling and providers: Models are not trained on customer data and only use purpose bound inputs such as user selected samples or Dataset metadata.
  • Human in the loop: Users review and approve AI generated changes before anything can affect production.

Integration Health Monitoring

Integration Health Monitoring: Cribl Stream provides live visibility into pipeline health, throughput, and system performance. Users can track data flow from source to destination, view per-integration metrics (bytes, events, drops, failures, RED metrics), and poll health endpoints for liveness checks.

Coverage Gap Analysis: Not currently available.

Schema Drift Detection: Planned for near-term release.

Additional – Unified Monitoring (Insights): Cribl Insights extends monitoring across Stream, Edge, Lake, and Search with proactive alerts for low/no data, backpressure, and queue utilization, integrated with Slack and similar tools.

Additional Capabilities beyond Pipelines

This section notes capabilities that the vendor may provide beyond pure pipeline features

Data Lake

  • Cribl Lake: Cribl Lake delivers managed, cloud-based storage designed for full-fidelity telemetry in open formats like gzip JSON and Parquet. It organizes data into searchable Datasets within Cribl.Cloud and integrates directly with Cribl Search. Analysts can replay data from the lake to any destination, creating a central foundation for long-term, open, and cost-efficient telemetry retention.
  • Cribl Lakehouse: Cribl Lakehouse extends Cribl Lake with an acceleration layer for recent data. By caching recent datasets, it enables low-latency querying without compromising durability. Available in Cribl.Cloud, it’s meant to complement the lake’s storage tier, bridging the gap between cold data economics and real-time analytical speed.
  • Tiering models: Hot, Warm and Cold are supported.

Search

Cribl Search offers search-in-place functionality that allows users to query data within its existing storage locations rather than re-ingesting or indexing it into a proprietary system. It supports direct querying across Cribl Edge, Cribl Lake, Amazon S3, and other external sources, which helps minimize data movement and associated costs.

Vision

Cribl’s product direction and vision is around what they’re calling agentic telemetry. Agentic Telemetry is an AI first architecture that fuses machine generated telemetry with human generated context into a unified, open, federated data layer so agents and humans can reason, act, and learn across observability and security.

Analyst Take

Here’s where what we see as major strength and opportunity for improvement for Cribl –

Strengths:

Cribl’s core strength is the depth and maturity of its data infrastructure, with Stream and Edge giving very fine grained control over ingest, routing, shaping, reduction, and enrichment across a huge range of sources, while Lake, Lakehouse, and Search extend that into cost efficient storage and federated analytics rather than “just pipelines.” On top of that, FinOps Center, detailed integration health metrics, and Cribl Insights show a mature operational story for monitoring, troubleshooting, and cost management at scale. The platform’s breadth is also notable: it spans observability and security use cases, supports SIEM replacement and security data fabric patterns, and layers in governance and sensitive data protection through Guard. Copilot’s AI assistants are not just bolted on but embedded across configuration, querying, visualization, and PII protection, which reinforces that Cribl is evolving from a pipeline engine into a broader telemetry control plane for IT and security teams.

Areas to Watch:

Cribl’s agentic telemetry vision is directionally aligned with its current AI capabilities, but it’s clearly a step ahead of where the product is today. Copilot, KQL Assistant, visualization helpers, guided configuration, schema conversion, SPL to KQL, code and pipeline generation, Guard, and Insights all deliver strong AI assisted workflows on top of structured ingest, federated search, and lakehouse style storage. That maps well to “structure at ingest,” schema agnosticism, and human in the loop control. What’s mostly aspirational for now is the fully agentic side of the story: AI analysts issuing thousands of correlated queries, a truly AI driven autonomous pipeline experience, AI SOC, and deep integration of intelligent telemetry routing with tickets, PRs, CI/CD, runbooks, Slack, and wikis. Today’s features accelerate humans working in Cribl; they do not yet constitute the autonomous, cross system agents described in the vision.

Databahn

Databahn’s strength lies in their AI driven pipeline capabilities. Their focus on intelligent integration health, semi-autonomous pipelines and coverage gap analysis make them stand out as an AI native pipeline platform.

Architecture and Deployment Maturity

Databahn offers flexible deployment options supporting SaaS, hybrid, and on-premise models.

Their architecture is split into Control Plane (“Console”) and Data Plane (“Pipeline Engine”). Dataplane can be deployed on SaaS or customer-managed environments across cloud or on-prem. Databahn uses a service-mesh design with a master–worker topology and automatic failover routing to ensure lossless data delivery and high availability across hybrid environments.

  • Marketplace: Databahn is available in AWS Marketplace and the Microsoft Commercial Marketplace (MACC eligible). Databahn mentioned that they will soon be featured in the Google Marketplace.
  • MSSPs: Global MSSPs including EY, Lumifi, SolCyber, Inspira, Wipro, PWC, ISA, ESI (Virtual Guardian), BeyonCyber.
  • Compliance: ISO 27001,SOC2, PCI compliant.

Pricing

Ingestion based.

Databahn’s pricing is calibrated by data volume and deployment model: whether fully SaaS, hybrid (Databahn -hosted control plane with customer-managed data planes), or fully on-premise.

Pricing Assistance: For enterprise customers, Databahn provides a cost estimation framework that models expected savings based on data reduction, enrichment, and tiering efficiency achieved through the platform.

Data Collection and Integrations

Databahn supports multiple data collection methods. Its collection layer is centered on “Smart Edge”, a distributed edge component that connects data sources such as servers, endpoints, applications, cloud services, and security tools to “Highway”, Databahn’s processing and enrichment pipeline engine. Smart Edge can run as an agent or a remote collector and supports multiple native connectors so that data is ingested using the protocol, data model, and authentication approach natural to the source system.

Databahn’s ingestion approach can be characterized across five modes:

  1. Local Agent Collection: Smart Edge can be deployed directly on servers or endpoints, collecting logs and metrics at the source before forwarding. This reduces loss risk and supports environments where local visibility is essential.
  2. API-Based Agentless Collection: For SaaS, cloud platforms, and managed services, Smart Edge performs scheduled, authenticated API polling. This enables telemetry acquisition without requiring installation on vendor-managed systems.
  3. Webhook / Push-Based Streaming: Where native push or webhook delivery exists, Smart Edge receives data in near real time. This is beneficial for event-driven architectures and systems that emit telemetry continuously.
  4. Broker or Stream Integration: Databahn integrates directly with high-throughput messaging and streaming platforms such as Kafka, Kinesis, and Pub/Sub.
  5. Agentless endpoint data collection: Databahn offers an agentless endpoint data collection capability. This enables direct data acquisition from endpoints without requiring additional software installations. Vendor claimed that several of their Fortune 500 customers leverage this architecture to achieve full endpoint visibility while avoiding agent complexity.

Number of integrations: stated as over 500 by the vendor

Core Pipeline Capabilities

Diving deeper into core pipeline capabilities

1. Data Reduction Capabilities

  • Filtering, suppression and sampling to remove low-value events
  • Aggregation by configurable intervals
  • Field-level suppression
  • De-duplication on event and field level
  • Forking and tiering to route non-critical data to lower-cost storage destinations
  • AI-driven recommendations to optimize data reduction policies based on observed usage and detection impact.

2. Out-of-the-Box Content Packs

Databahn provides a volume reduction library with pre-mapped rules for common log sources.

Source-Specific Packs: Use awareness of source formats to reduce and normalize data efficiently. For example, the Syslog Pack removes redundant headers and timezone fields, often achieving double-digit percentage savings per event.

This content is continuously updated by Databahn’s research team within their content studio.

3. Data Normalization

Databahn supports JSON, CSV, CEF, LEEF, OCSF, Sentinel Object, UDM, CPS, and CIM. Access to both raw and parsed data views

4. Enrichment

External Context Enrichment:• Static and dynamic enrichment through key-value lookups and external feed integrations.• Add business, asset, and geoip context.• Trigger automated tagging, correlation, and routing actions downstream without custom code or middleware.

Threat Intel Enrichment: Enrichment can draw from STIX/TAXII feeds, commercial threat intelligence databases, or customer-specific repositories. Events are enriched with indicators of compromise (IOCs), tactics, techniques, and procedures (TTPs), as well as adversary context before reaching downstream tools.

5. Schema Drift Detection

Databahn automatically detects and corrects schema drift across sources and pipelines. When deviations occur, it alerts users and rewrites field mappings to maintain stable, lossless downstream delivery. Its data health tracking continuously monitors consistency, quality, and volume trends to identify and address changes proactively.

6. Threat Detection Content

Databahn does not include native threat detection logic, by design.

Their focus is on enabling and enhancing detection accuracy by delivering clean, context-rich, and normalized telemetry. Additionally, Databahn provides a Quick Signal capability to surface high-severity patterns or anomalies directly at the pipeline layer, allowing customers to flag or route critical signals upstream without converting the pipeline into a detection engine.

7. Intelligent Data Routing

Cruz analyzes usage patterns and query frequency to optimize data routing, learning which event types impact detections and automatically recommending storage tier to improve performance and cost efficiency.

Additional Pipeline Capabilities

  • Lineage and Governance: End-to-end lineage tracks data from ingestion to delivery, providing visibility for traceability, compliance, and operational accountability.
  • Sensitive Data Handling: Built-in PII detection, masking, and quarantine controls safeguard sensitive data before export, aligning with PCI, HIPAA, and GDPR compliance standards.
  • Log Discovery and Micro-Indexing: Databahn automatically identifies new log sources and generates lightweight micro-indexes for rapid search and schema inference without full re-indexing.
  • Data Health and Anomaly Detection: Continuous telemetry monitoring detects silent or noisy sources and dynamically adjusts routing and enrichment to preserve data quality and performance.

Pipeline Building Experience

  • Manual Creation: Using user interface and an onboarding flow.
  • Code-Based Development: For engineering teams, pipelines can also be managed as code, supporting CI/CD workflows for versioning, testing, and deployment.
  • AI Assistance and Automation: The platform auto-generates parsers, recommends filters and mappings, and defines routing logic using natural language queries.
  • Template Library: A built-in library of preconfigured templates for common sources accelerates onboarding and standardizes pipeline setup.

AI Maturity

Databahn’s Cruz AI (agentic assistant), lies at the center of their AI capabilities. Cruz’s predictive optimization engine continuously monitors data flow, schema quality, and source health, detecting anomalies and dynamically adjusting routing to preserve performance and data integrity. It dynamically updates data reduction rules and assists with routing and mapping logic, while still operating under human approval and review.

A key capability of Cruz AI is its ability to transform telemetry into multiple ecosystem-specific data models, including CIM, OCSF, ASIM, UDM, and CPS. This allows enterprises to route the same normalized dataset to different SIEMs or data lakes without manual re-engineering. The goal is faster onboarding, simplified migration, and consistent analytics fidelity across diverse environments.

The platform automatically detects and corrects schema drift across data sources and formats, ensuring stable, lossless downstream delivery. AI-generated parsers handle new or unfamiliar log types with minimal user input, while schema adjustments are guided and fully auditable.

AI also powers contextual enrichment and intelligence correlation. Cruz maps telemetry attributes to frameworks like MITRE ATT&CK, enhancing detection context without performing detection itself. Its counterpart, Reef, correlates telemetry across domains to identify visibility gaps, redundancy, and coverage overlap.

For data hygiene and compliance, Databahn applies AI-driven PII detection, field-level masking, and data routing policies that automatically enforce privacy controls before routing to destinations such as SIEMs, data lakes, or AI systems.

Optimization remains predictive and guided, not autonomous. This maintains human oversight, auditability, and trust at enterprise scale. Governance features include lineage tracking, compliance tagging, and health scoring.

Data Privacy with AI:

  • Governed and Explainable AI Changes: Cruz operates under a governed, auditable AI framework. All AI-driven transformations such as parser generation, schema adjustments, or enrichment mappings are fully traceable through lineage tracking, compliance tagging, and metadata versioning.
  • AI Access and Hosting Controls: Databahn provides granular AI access controls and role-based permissions that define what AI agents can execute within each workspace. All AI operations and data contexts are isolated per customer tenant, ensuring strict separation of data, models, and telemetry across environments.

Integration Health Monitoring

Integration Health Monitoring

Databahn provides health scoring for all connected data sources by taking into account schema drift, data format, data quality, integrity, completeness, timeliness, and consistency. These metrics are continuously assessed using in-band telemetry to identify silent, low-signal, or noisy sources, and correlated across the pipeline to detect early degradation patterns for proactive remediation.

The platform also monitors destination health and delivery reliability across all connected targets, verifying each message through acknowledgments and replay-safe queuing to maintain guaranteed delivery. In case of destination failures, its adaptive pipeline architecture adjusts automatically by queuing, buffering, or rerouting data through alternate paths or backup targets, preserving continuity and preventing data loss.

Coverage Gap Analysis: Databahn maps normalized telemetry against frameworks such as MITRE ATT&CK to identify visibility and detection coverage gaps.

Schema Drift Detection: Databahn continuously monitors for schema drift and format deviations across all ingestion and transformation pipelines. When changes are detected, Cruz AI automatically proposes schema corrections and updates field mappings to maintain consistent downstream delivery. All AI-driven adjustments are explainable, version-controlled, and reflected within the Source Health scoring model, ensuring that schema changes are auditable and that drift never silently degrades analytic visibility.

Additional Capabilities

This section notes capabilities that the vendor may provide beyond pure pipeline features

Data Lake: Databahn does not promote or mandate the use of its in-house data lake. However, for customers who choose to use this option, the platform provides a three-tier storage model – hot, warm, and cold.

Vision

Databahn’s product direction reflects its mission to remove the manual, repetitive burdens of data engineering and give enterprises the ability to treat data as an asset rather than an operational constraint. This aligns with its broader vision of a Security Data Fabric, an AI-native, vendor-neutral layer that connects and empowers every component of the modern SOC. As organizations shift toward AI-driven security operations, Databahn aims to provide the trusted, enriched training data, consistent schemas, and unified pipelines required to support both traditional analytics and emerging AI copilots.

Analyst Take

Here’s where what we see as major strength and opportunity for improvement for Databahn –

Strengths

Databahn has strong data pipeline platform capabilities and has validated adoption across Fortune 100 and 500 enterprises. Their AI capabilities are also an area of strength, with Cruz automating core data engineering tasks such as parser creation, schema drift correction, and model transformations, and Reef providing federated search, coverage analysis, and continuous insight generation. Another advantage for a stand alone pipeline platform like Databahn’s platform, is its vendor-neutral, ecosystem-agnostic design. This neutrality eliminates platform lock-in and ensures consistent data portability across diverse security and observability ecosystems. Together with its end to end observability, and resilient delivery model, Databahn positions itself as an independent data intelligence layer within modern multi SIEM and SIEM plus data lake environments. Their vision is to intentionally focus on AI native data fabric that is an independent data intelligence layer, without targeting SIEM adjacent analytical / threat detection features. This enables them to focus on in-depth features within these capabilities. Beyond AI, Databahn’s self-adapting pipeline architecture ensures guaranteed delivery and destination health management, with dynamic queuing, rerouting, and failover for uninterrupted operations, a key requirement for mission-critical environments. Its source and destination health scoring, schema drift detection, and predictive routing collectively make it one of the most observable and resilient data pipelines on the market. The company’s commitment to vendor neutrality and open integration continues to resonate with large enterprises adopting non-platformized, multi-SIEM and SIEM + Data Lake strategies.

As enterprises accelerate toward AI-driven SOC architectures, Databahn’s progress in unifying governed data models, strengthening end-to-end data integrity, and demonstrating explainable automation across Cruz and Reef will be important markers of its evolution and long-term trajectory in the SDPP market.

Areas to watch

Databahn’s rapid pace of innovation continues to expand its AI-driven portfolio, making execution alignment an important area to watch as Cruz and Reef take on more autonomous and context-aware workflows. Enterprises will look for sustained focus without compromise on privacy, on AI explainability, fine-grained access controls, and model governance capabilities that Databahn mentioned, it is advancing through its Foundry governance layer.

Databahn’s pipeline-centric strategy and neutral positioning remain core differentiators in a market where many vendors tie pipelines to their own analytics ecosystems. The company’s ability to scale this vendor-agnostic approach as they invest in additional integrations while continuing to deliver AI-enabled enhancements such as predictive routing, enrichment intelligence, and SOC-aligned copilots will play a key role in how it matures as an AI-ready Security Data Pipeline Platform.

Panther (Datable)

Disclaimer:

Datable was acquired by Panther in October 2025. The content in this report reflects Datable’s current capabilities prior to full integration into Panther’s SIEM platform. Many aspects of the offering are expected to change as Datable becomes fully integrated into the broader Panther ecosystem. The goal of this section is to provide visibility into Datable’s differentiators within the security data pipeline platform landscape.

Introduction

Panther (Datable) is a centralized, single-tenant SaaS security data pipeline focused on transformation-as-code with a no-code layer for common tasks. Its differentiation centers on open standards normalization (OCSF and OpenTelemetry), fast time to value via guided onboarding, and an accessible UX that provides drag-and-drop and GitOps. While early in marketplace presence and MSSP programs, it supports a practical span of collection methods and destination types with transparent, code-driven control for routing and transformation.

Architecture and Deployment Maturity

Datable offered Single-tenant SaaS architecture as the exclusive deployment model, covering both the console and the pipeline engine. The environment is fully isolated per customer.

Compliance: SOC 2 and HIPAA.

Pricing

Custom flat pricing using GB/day buckets.

Pricing Assistance: Not specified. No calculators or estimate tooling referenced.

Data Collection and Integrations

Datable supports flexible ingestion and routing options designed to plug into existing architectures with minimal operational change. Its collection methods include, agents and forwarders, native protocol listeners, API-based pull collection, and SaaS/cloud integrations.

  1. Agent and Forwarder Collection: Datable ingests telemetry through established open-source agents and collectors, including Vector, OTEL Collector, Fluent Bit, FluentD, Logstash, and OTEL SDKs. This enables consistent data capture from endpoints, servers, and containerized environments using widely adopted tooling.
  2. Native Protocol Ingestion: The platform supports direct ingestion over OTLP and Syslog, allowing customers to stream high-volume infrastructure and application logs without custom adapters or proprietary formats.
  3. API-Based SaaS & Cloud Collection: Datable pulls logs and events from SaaS and cloud applications through scheduled API collection. This includes authentication handling, pagination, filtering, and parsing for common services such as GitHub, Okta, and major cloud providers.

Core Pipeline Capabilities

1. Data Reduction Capabilities

  • Deduping by timestamp (within 5s windows plus correlation points.)
  • Attribute-based filtering.
  • Auto-drop empty events.
  • Auto-purge empty attributes.
  • Tail-based sampling for traces (trace_id).
  • Tail-based sampling for logs (by received batch).
  • Transformation-as-code via JavaScript for advanced reduction logic.

2. Out-of-the-Box Content Packs

Not specified as packs. Reduction and transformation provided via no‑code options and JavaScript.

3. Data Normalization

Open standards oriented.

  • Logs and traces normalize to JSON OTEL.
  • Security sources (e.g., GitHub Audit Logs) normalize to OCSF. (Option to leave data unnormalized as raw JSON.)

4. Enrichment

Built-In GeoIP Enrichment: The platform provides native GeoIP enrichment, automatically deriving geographic and network attributes including country, region, city, ASN, and latitude and longitude whenever an event contains an IP address.

Custom Lookup Table Support: In addition to default GeoIP capabilities, the platform allows the use of customer-defined lookup tables to add business or contextual metadata. Lookup tables can be uploaded through the UI or programmatically via API.

Lookup Management and Performance: Uploaded lookups are validated, versioned, and indexed to ensure efficient runtime performance.

Threat Intel Enrichment: None

5. Schema Drift Detection

Not out of the box; can be implemented in JavaScript.

6. Threat Detection Content

None

7. Intelligent Data Routing

No automatic destination recommendations. Dynamic routing via JavaScript business logic evaluated per event.

Additional Pipeline Capabilities

  • Versioned Pipelines with Rollback: Pipelines are fully versioned, allowing teams to track changes and restore previous versions as needed. This supports safe experimentation and rapid recovery of production data flows.
  • GitOps Integration: Pipeline configurations can be managed through Git, enabling pull requests, reviews, and automated deployments. This ensures consistent and controlled releases aligned with standard engineering workflows.
  • Stage by Stage Exploration and Real Time Feedback: Allows you to move through each stage of the pipeline to see how data evolves at every step, helping you understand transformations in context.
  • Stream Search: Provides a quick way to search within the pipeline flow so you can inspect behavior, validate assumptions, and spot issues without digging into every detail manually.

Pipeline Building Experience

  • Drag and Drop: A drag-and-drop interface for quickly assembling pipelines without scripting.
  • GitOps Pipeline Management: Pipelines can be defined as code in a Git repo with full versioning and rollback support.
  • Copilot Assistance: Prompt-based guidance that helps create transformations and answers questions about data or pipeline logic.

AI Maturity

Datable’s copilot provides guided assistance for pipeline authors by helping build transformations and answering questions about the underlying data or code. It supports AI-assisted prompts that are aware of subsets of the data context, making it easier to craft field mappings, reductions, or normalization steps without writing everything from scratch. The feature is positioned as an assistive layer rather than an autonomous one; it does not generate full pipeline recommendations on its own, and JavaScript remains the primary control plane for any advanced or deterministic logic.

Data Privacy with AI: Not specified beyond general copilot behavior

Integration Health Monitoring

Throughput watermark alerts, error dashboards, error alerting.

Coverage Gap Analysis: Not provided.

Vision

Datable originated in the observability domain, where large enterprises already operated high-throughput telemetry pipelines but showed limited appetite beyond scale. They soon found that product market fit was instead with smaller security teams seeking self-service data onboarding without reliance on engineering. Datable’s current strategy centers on centralized ETL as the foundation for better telemetry decisions, followed by a blob-storage-backed query layer that supports multiple personas and enables enrichment at query time.

Analyst Take – Panther + Datable

I covered Panther’s capabilities in depth in our “[The Convergence of SIEMs and Data Lakes: Market Evolution, Key Players and What’s Next](https://softwareanalyst.substack.com/p/the-convergence-of-siems-and-data)” report. Now with Panther’s acquisition of Datable, we believe their telemetry pipeline capabilities will be strengthened from a standalone workflow to a core pillar of Panther’s security data fabric. This acquisition reinforces our belief that effective security starts with reliable, well-structured data – something that security data pipeline platforms excel in as their core foundation, above all. For any forward looking SIEM platforms, having such capabilities will soon become a “must-have”.

Key emphasis from Datable is their focus on open standards normalization such as OCSF and OTEL, fast time to value through guided onboarding and turnkey transformations, and a user experience that spans no-code workflows to code-driven stream processing aligned with detection-as-code practices.

By combining Panther’s large-scale detection and analytics engine with Datable’s strengths in data normalization, guided onboarding, and pipeline management, the joint platform aims to give teams a stronger and more efficient security data foundation. The strategy centers on three areas. First, using AI to automate parsing, transformations, and schema consistency. Second, building an open security data lake that promotes interoperability across tools. Third, improving usability so teams can onboard data sources and manage pipelines without heavy engineering support. Together, Panther and Datable are positioning themselves to deliver a more complete AI-driven SOC experience built on high-quality, well-managed telemetry.

However, the practitioner question remains on whether Datable will continue to maintain its neutrality of destinations (a core benefit of pure play data pipeline platforms) or be fully integrated into Panther’s security-focused SIEM ecosystem.

Datadog

Datadog’s Observability Pipelines (OP) solution offers the advantages of a vendor-agnostic data pipeline while benefiting from the strengths of a broader Observability and Security platform. Its deep integration with the Datadog CloudPrem (Security Data Lake) enables expanded capabilities beyond traditional SDP platforms. These strengths make OP an important offering to highlight in this report.

Datadog, founded in 2010 and publicly traded on NASDAQ as DDOG, offers Observability Pipelines platform that can be used to route high quality data within Datadog ecosystem or independently to other destinations. Its differentiation includes a vCPU-based processing model, a wide integration ecosystem, and expanding AI features such as in-stream pattern recognition and AI-assisted pipeline configuration. The company’s long-term goal is to unify routing, transformation, enrichment, and detection within an adaptive, AI-supported pipeline framework to reduce operational overhead and improve security and observability workflows.

Architecture and Deployment Maturity

Datadog Observability Pipelines (OP) offer flexible deployment options. The platform is divided into a SaaS-based control plane (Console) and a fully customer-hosted data plane (Pipeline Engine).

  • Marketplace: Datadog OP is available across major cloud marketplaces, as well as CrowdStrike Marketplace and Azure ISV listings.
  • MSSPs: Partnerships include SecurityHQ, alongside its full partner ecosystem.
  • Compliance: SOC 2, SOC 2 Type 1, CCPA, CSA STAR, DORA, EU-US DPF, FedRAMP Moderate, GDPR, HIPAA, ISO/IEC 27001, ISO/IEC 27017, ISO/IEC 27018, ISO/IEC 27701, PCI DSS, TISAX, and VPAT.

Pricing

The primary pricing model is based on the vCPU consumption of OP Workers rather than the volume of logs processed. The goal with this approach is to provide more predictable scaling during periods of organic log growth or short-term spikes. A per-gigabyte option is also available for organizations with low or stable data volumes.

Pricing Assistance: Datadog provides benchmark guidance on Worker throughput (for example, roughly 1 — 1.5 TB of processing per vCPU per day) to help estimate capacity.

Data Collection and Integrations

Datadog OP consolidates telemetry ingestion across on-premises, cloud, and hybrid environments through a worker-based architecture in which all parsing, transformation, enrichment, and routing execute locally on customer-managed infrastructure. Datadog’s ingestion model supports the following methods –

  1. Agent and Collector Based Ingestion: OP receives logs directly from host and application agents such as the Datadog Agent, OpenTelemetry collectors, and fluentd. These collectors forward data in real time, allowing OP to process and route events as they are generated.
  2. Direct Platform to Platform Ingestion: OP can ingest data from existing logging platforms, including Splunk and Sumo Logic, by connecting directly to their export or forwarding interfaces. This approach centralizes data flows without requiring changes to established logging deployments.
  3. Cloud Event and Service Triggered Ingestion: OP accepts logs through cloud delivery services such as Amazon Kinesis Firehose, AWS Lambda, and Amazon S3. These methods support both high-volume streaming ingestion and batch ingestion from cloud storage, making them suitable for dynamic or distributed workloads.
  4. Protocol Based Ingestion: OP also supports standard protocols including HTTP, Syslog, TCP, and Kafka. This enables ingestion from custom applications, legacy systems, and infrastructure that relies on generic network transport rather than dedicated agents or cloud services.

The platform has 40+ integrations, 150+ OOTB grok rules for popular logs types but this number can be higher due to methods of ingestion supported.

Core Pipeline Capabilities

1. Data Reduction Capabilities

  • Field and Event Filtering
  • Aggregation through metrics generation
  • Sampling (advanced sampling and throttling that can be customized)
  • Quota-based overflow routing
  • Field editing and transformation

2. Out-of-the-Box Content Packs

Datadog provides prebuilt “Packs” that supply ready-made pipeline rules focused on routing and filtering for common log and security data sources. These Packs deliver source-aware logic that prioritizes important events and routes them to the appropriate destinations.

Source-Specific Packs:

Packs are out-of-the-box pipeline rules primarily focused on routing important logs to appropriate security destinations; they work alongside 150+ grok parsers and other OP processors that provide parsing and transformation.

3. Data Normalization

OCSF, CEF, UDM, syslog, JSON, raw bytes, and OTEL formats, providing a consistent structure across heterogeneous sources.

Datadog OP includes more than 150 built-in grok parsers and packaged rules, allowing customers to transform, normalize, and route logs to any required destination with minimal configuration effort.

4. Enrichment

External Contextual Enrichment: Adds metadata such as GeoIP details, threat intelligence, and CMDB records (including ServiceNow) using the Enrichment Table processor.

Lookups: Supports both local file–based lookups and cloud-hosted reference tables.

Custom Enrichment: OP also has the ability to perform custom data transformations using in VRL using its Custom Processor. This feature allows SOC operations such as encryption, base64 decoding, CIDR based filtering and other similar features.

Threat Intel Enrichment: Using the Enrichment Table processor, customers can integrate with cloud based Reference Tables or use local csv files to enrich logs with Threat Detection information.

5. Schema Drift Detection

Datadog OP offers basic schema drift detection by showing Worker-level notifications when incoming logs fail to match parsing rules.

If logs fail to match the defined parsing rules, the OP Worker generates messages indicating the issue. Customers can set monitors on these messages to receive alerts.

6. Threat Detection Content

No native content but Datadog OP enables bringing your own threat intel feeds.

On the roadmap – developing integrations with automation tools such as Datadog Workflow and Tines to support early detection and remediation by SRE teams and SOC operations.

7. Intelligent Data Routing

While not fully dynamic, Datadog supports routing based on Packs, log attributes, quotas, and pipeline rules. Events can be directed to different indexes, tables, or destinations. Customers can additionally enforce global quotas and route any overflow directly to cold storage.

The roadmap includes introducing in-stream pattern recognition that will group similar logs and offer AI-driven recommendations on filtering or routing data, including sending less critical logs to cold storage.

Additional Pipeline Capabilities

  • Sensitive data redaction
  • Live Capture for Raw/Parsed Inspection: Pipelines support real-time inspection of both original and transformed events, enabling users to validate parsing and transformations directly within the platform.
  • Multi-Destination Data Lake Delivery: OP integrates with Datadog Cloudprem (petabyte-scale, on-prem security data lake) and supports routing to cold storage and data lakes such as Snowflake and Databricks via Kafka. Cloudprem offers Standard Object storage, while Datadog Log Management supports Hot (Standard indexing), Warm (Flex Logs), and Cold (Flex Frozen/Archives) tiers.
  • Archive-Ready Output for Search and Investigation: Pipeline output can be written in a format compatible with Datadog Archive Search, enabling efficient retrieval, investigation, and selective rehydration from large object-store archives.
  • Performance and Reliability:
    • High Throughtput per worker, resulting in a smaller infrastructure footprint.
    • Support for in-memory and on-disk buffers to reduce or eliminate data loss.
    • OOTB monitors and alerts to raise alarms and ability to autoscale

Pipeline Building Experience

  • Drag and Drop: Users can manually create pipelines in the Console via drag and drop experience.
  • Manual Pipeline Authoring and Declarative Configuration: Pipelines can be created and managed as versioned code via API, and Terraform
  • AI-Assisted Pipeline Configuration: On the roadmap – BitsAI enabled pipeline creation

AI Maturity

Datadog frames AI as an emerging accelerator for OP, with active investment in Bits AI and pattern-recognition capabilities. Datadog is currently building BitsAI, AI based pipeline configuration and AI-driven pattern recognition that will group similar logs on the stream and provide recommendations to simplify routing, filtering, tiering, and data hygiene decisions. The roadmap also includes automatic threat and anomaly detection on the stream and Agentic workflows to reduce the burden of manual decision-making. These AI capabilities are directional and focused on assisting configuration and operational insight, not delivering fully autonomous or self-optimizing pipelines today.

Data Privacy with AI: Tenant level isolation via customer hosted architecture.

Integration Health Monitoring

Integration Health Monitoring: The platform includes built-in dashboards, metrics, and monitors that provide visibility into worker health, data ingestion, and data processing on a per-source and per-destination basis. Customers can review diagnostic logs directly within the Datadog platform. Datadog also provides infrastructure monitoring for the hosts running Pipelines at no additional cost, which can reduce expenses for many users. The system further offers visibility into customer data usage, including quota status, sensitive data redaction activity, and the volume of logs sampled or filtered.

Coverage Gap Analysis: OP allows customers to generate metrics on logs of any type or sources. Customers can then add low or max threshold monitors on those metrics at no extra cost. Customers can also enable such monitors on each component/source in OP to detect gaps in coverage.

Additional Capabilities

This section notes capabilities that extend beyond core pipeline functionality –

Datadog Data Lake Native Integration

OP integrates directly with Datadog Cloudprem, the company’s on-premises, petabyte-scale security data lake. It also supports routing telemetry to cold storage and to external data lakes such as Snowflake and Databricks through a Kafka endpoint. In addition, OP can send logs to Datadog Log Management, enabling customers to correlate logs, metrics, and traces within a single platform.

Search

OP routes data to Archives in a format compatible with Datadog Archive Search. Through Archive Search, users can pull data from multiple sources into a Notebook for investigation and analysis. This allows customers to isolate specific log subsets from large S3 datasets using the Datadog Log Explorer interface, reducing rehydration time from hours or days to minutes.

Vision

Datadog’s vision for OP is to eliminate data silos and provide a reliable, scalable pipeline that helps organizations address cost and compliance requirements. The team is investing in Datadog’s Bits AI to deliver data-driven insights for SecOps and DevOps users. Planned capabilities include automated threat and anomaly detection on the stream and support for agent-driven workflows triggered by signals generated within OP. The objective is for customers to use these recommendations and insights to reduce mean time to remediation (MTTR).

Analyst Take

Here’s where what we see as major strength and opportunity for improvement for Datadog –

Strengths

Datadog Observability Pipelines (OP) offers several notable strengths for both SecOps and DevOps workflows. OP’s Sensitive Data Scanner is a core strength, offering a mature capability with approximately 200 out-of-the-box rules suited for high-risk workloads. Datadog’s investment in CloudPrem, its scalable self-hosted and hybrid security data lake, extends OP into a full end-to-end security telemetry solution for regulated and high-volume environments. Support for cloud-based and local CSV reference tables enables real-time enrichment and threat detection directly in the pipeline. Built for scale, OP can process petabytes of data with high throughput per worker, allowing organizations to achieve strong performance with a smaller infrastructure footprint. Operational simplicity is another advantage, supported by intuitive workflows, integrated health metrics, and built-in infrastructure monitoring for pipeline hosts. More than 150 built-in grok parsers, deep support for OCSF and OTEL, and source-specific routing Packs streamline onboarding and reduce manual configuration, providing efficient transformation and routing capabilities for common log sources. Finally, its vCPU-based pricing model adds flexibility by decoupling cost from log volume, helping teams manage unpredictable growth and seasonal telemetry spikes while maintaining predictable budgets.

Areas to Watch

Datadog is still continuing to invest in an AI roadmap, incorporating its BitsAI agent, natural-language configuration, and emerging pattern recognition capabilities. Schema drift detection is available, but has opportunities for further automation over time. Because Datadog’s strategy centers on a unified observability and security platform, organizations with deeply specialized security operations may want to confirm that this broad approach aligns with their requirements.

Falcon Onum (by Crowdstrike)

Disclaimer:

Onum was acquired by Crowdstrike in August 2025 for about $290 million. The content in this report reflects Onum’s current capabilities prior to full integration into Crowdstrike’s Falcon platform. Many aspects of the offering can be expected to change as Onum becomes fully integrated into the broader Crowdstrike ecosystem. The goal of this section is to provide visibility into Onum’s differentiators within the security data pipeline platform landscape.

Introduction

Falcon Onum is a real-time, security-focused data pipeline delivered through a hybrid control-plane, where the console runs in Falcon Cloud and data planes run on-prem or in customer clouds. It combines a no-code, drag-and-drop experience with assisted configuration for filtering, parsing, masking, enriching, sampling, and routing data in-stream across environments across environments. Its differentiation centers on AI-assisted pipeline creation, policy-driven tiering across storage classes, and bring-your-own Python for inline logic, supported by CrowdStrike’s extensive MSSP distribution. Crowdstrike has confirmed that while rooted in the broader Falcon ecosystem, Falcon Onum will remain fully agnostic in source and destination coverage, making it a practical upstream layer for cross-stack routing and governance.

Architecture and Deployment Maturity

Falcon Onum provides flexible deployment options with a hybrid control-plane. The console always lives in Falcon Cloud, while data planes (pipeline engine) can sit on-prem, in cloud VPCs, or in mixed deployments. Multi-dataplane management gives teams locality and scale without creating separate operational islands.

  • Marketplaces: AWS Marketplace and Google Cloud Marketplace, with private offers and CrowdStrike channel routes for enterprises that prefer procurement via existing vendor relationships.
  • MSSPs: Distributed through CrowdStrike’s mature MSSP network including Accenture, Deloitte, Optiv, Orange Cyberdefense, and NTT.
  • Compliance: SOC 2 Type I/II, ISO 27001, and ISO 27701. It benefits from CrowdStrike’s cloud governance model and inherits operational and privacy controls already vetted by enterprise buyers.

Pricing

Event-ingestion pricing based on events per second or data-volume tiers, with predictable scaling at sustained throughput.

Pricing Assistance: Today’s model relies on the self-service estimator. CrowdStrike is exploring additional pricing programs aimed at faster adoption in large estates.

Data Collection and Integrations

Falcon Onum’s ingestion model is built around agentless collection, using APIs, cloud-native streams, and open protocols to ingest from diverse environments without adding operational friction. The platform adapts to cloud-first and legacy estates alike. Some core collection methods include (not an exhaustive list):

  1. API-Based Collection: Pulls data directly from cloud and SaaS APIs for high-fidelity, agentless ingestion. Examples: Microsoft 365, cloud storage APIs (S3, GCS), message services (SQS, Pub/Sub).
  2. Cloud-Native Event Listeners: Subscribes to cloud provider streams without agents or intermediate brokers. Examples: AWS Kinesis, AWS SQS, Azure Event Hubs, Google Pub/Sub topics.
  3. Protocol Listeners: Captures data over standard network and telemetry protocols for mixed or legacy environments. Examples: HTTP/S, TCP, Syslog, NetFlow, SNMP, OpenTelemetry.
  4. Database and File Connectors: Reads structured and semi-structured data directly from relational databases or storage layers. Examples: JDBC/relational DBs, object-store file ingestion.

Core Pipeline Capabilities

1. Data Reduction Capabilities

  • Aggregation
  • Conditional filtering
  • Field-level suppression to remove nonessential attributes
  • Sampling
  • Deduplication
  • Compact output formats to reduce payload size
  • Inline logic checks (for example, IOC matches) to route high-value events to SIEM and send low-value data to cheaper storage

2. Out-of-the-Box Content Packs

Instead of “packs,” Onum provides routing, alerting, detection, enrichment templates and blueprints through the Marketplace for common pipelines and best practices.

3. Data Normalization

Native JSON model in-stream. Option to export normalized Parquet to S3.

OCSF mappings exist for select event types, with schema alignment and field-type consistency handled inside the pipeline.

4. Enrichment

  • Geo and Network Context: Real-time GeoIP tagging to add location and network context to events.
  • Asset and Configuration Context: CMDB-driven enrichment to attach asset details, ownership, and configuration metadata.
  • Identity and Access Context: Identity lookups to add user, group, and account attributes directly in-stream.

Threat Intelligence Enrichment: Integrates with external TI sources such as Zynap feeds and SOC Prime to enhance event context and detection quality.

Custom Lookups: The customer can upload their own lookups to enrich their data or make decisions based on the data enriched values.

5. Schema Drift Detection

Automatically surfaces unexpected schema changes and can divert logs to alternate parsing paths or quarantine pipelines.

6. Threat Detection Content

Supports prebuilt Sigma-based rules, with more than out-of-the-box detections in development. Customers can write or import detections as code using the Bring Your Own Code (BYOCode) framework for Python or customers can create their own detections with no code based on the conditionals & enrichment capabilities in the pipeline.

7. Intelligent Data Routing

Does not offer autonomous routing optimization today.

Instead, it leans on guided AI actions. Teams can also pull from routing blueprints and best-practice templates in the Onum Marketplace to accelerate setup. It provides assisted configuration, not autonomous decisioning.

Additional Pipeline Capabilities

Onum’s roadmap continues to deepen in-stream capabilities, positioning the pipeline as an active processing layer rather than a simple router.

Pipeline Building Experience

  • Drag and Drop: Onum provides a drag and drop model, where users can create pipelines by dragging Listeners (sources), Actions (pipeline capabilities), and Data Sinks (destinations) into end-to-end flows. The workspace supports full versioning, multi-cluster deployment, and real-time metrics such as EPS, bytes processed, and latency.
  • GitOps Integration: For teams running GitOps or automated deployments, all pipeline components can be defined and managed through APIs instead of the UI.
  • AI Pipeline Assistant: Operates at both the pipeline and action levels. At the pipeline level, users can generate complete pipelines or configure individual steps through natural-language prompts. A single instruction like “enrich failed logins with GeoIP and route to S3” produces the full set of components wired together correctly. At the action level, the assistant rovides chat-style guidance to fine-tune specific logic elements: Conditional, Group By, Math Expression, Message Builder, reducing trial-and-error and accelerating precision configuration.

AI Maturity

Falcon Onum’s AI current capabilities focus on accelerating pipeline design rather than making autonomous decisions. The AI Pipeline Assistant can generate full pipelines or subflows from natural-language prompts, turning instructions like “enrich login failures with GeoIP and send to S3” into fully wired listeners, actions, and sinks. The AI Action Assistant applies the same approach at the micro level, translating plain intent into validated logic blocks such as Conditional, Group By, Math Expression, and Message Builder. Together, they cut build time, reduce human error, and keep everything transparent and reviewable. This is guided AI, not autonomous optimization, and operators remain fully in control of routing logic, thresholds, and policy changes.

At run time, Onum extends AI and ML through Bring Your Own Code, allowing teams to run Python-based models inline for anomaly detection, scoring, enrichment, or classification on data in motion. These functions are said to execute in milliseconds (unverified by us). Schema drift detection surfaces unexpected changes and is capable of routing them to alternative parsers or quarantine paths.

Data Privacy with AI: No separate AI privacy model described; inherits CrowdStrike’s broader data protection framework.

Integration Health Monitoring

Integration Health Monitoring: Centralized visibility into EPS, latency, throughput, and error trends with alerting for pipeline health.

Coverage Gap Analysis: Not available today. Roadmap points toward automated mapping across sources, MITRE ATT&CK, and detection coverage.

Additional Capabilities beyond Pipelines

This section notes capabilities that the vendor may provide beyond pure pipeline features –

  • External Interaction and Enrichment: Onum can call out to external systems in real time for enrichment or filtering, including Redis, databases, and SIEM APIs, while staying upstream of search and analytics layers.
  • Data Lake and Storage Connectivity: Supports a broad range of storage targets through Data Sinks. Object stores such as S3, Azure Blob, and GCS handle raw or Parquet outputs. Analytics platforms such as LogScale, BigQuery, and Devo serve downstream querying. Databases and queues including Redis, MongoDB, and Kafka support intermediate storage and replay use cases.
  • Tiered Storage Routing: Provides policy-based Hot, Warm, and Cold routing aligned to access patterns. Hot targets such as LogScale, Redis, and BigQuery support real-time analysis. Warm tiers such as S3 Standard, Azure Blob Hot, and GCS Standard support replay and mid-term investigations. Cold tiers such as S3 Glacier, Azure Blob Archive, and GCS Coldline provide long-term, low-cost retention.

Vision

Falcon Onum strengthens CrowdStrike’s vision for the Agentic SOC by serving as the real-time data layer that prepares, enriches, and routes telemetry at the speed required for human and AI collaboration. Today it functions as a high-fidelity pipeline across security, IT, and observability domains, and over time it is set to become tightly integrated with Falcon’s unified data fabric, ensuring the right data lands in the right place and format for AI-driven operations. Early integration with Falcon Next-Gen SIEM is progressing quickly, laying the groundwork for real-time data intelligence that reduces latency, improves data quality, and fuels agentic decision-making across the platform. The standalone roadmap continues to emphasize multi-cloud data control, AI-ready transformation, and a unified collection-to-delivery layer that supports Falcon and broader enterprise ecosystems.

Analyst Take – Crowdstrike + Onum

This acquisition underscores what we predicted in our first Security Data Pipelines Report. Security Data Pipeline platforms will become a crucial part of modern SIEM and SecOos architectures. The platforms that can deliver a seamless integration without losing the neutrality that SDPs bring, will address practitioner’s concerns while delivering benefits of a combined model.

CrowdStrike’s acquisition of Onum exponentially strengthens its push toward an agentic SOC by closing a long-standing gap in real-time data readiness for Falcon Next-Gen SIEM. Onum gives CrowdStrike a high-performance telemetry pipeline that filters, enriches, detects, and routes data in motion, reducing ingestion overhead while improving the fidelity of what reaches Falcon’s AI-driven detection and response layers. The integration should accelerate SIEM migrations, streamline multi-source onboarding, and lower data-retention costs, while positioning Falcon as one of the few SIEM architectures with a native, real-time enrichment layer rather than batch or post-storage normalization. Their confirmation that they will maintain multi-destination support (as of this briefing) also aligns with a market that increasingly prioritizes SIEM migration flexibility. The integration is still early, but strategically it gives CrowdStrike tighter control over the data foundation required for agentic workflows, faster autonomous outcomes, and a clearer differentiation against legacy SIEMs and pipeline vendors that lack deep platform alignment.

Observo AI, a SentinelOne Company

Disclaimer:

SentinelOne announced its intent to acquire Observo AI in September 2025 for about $225 million and has since closed in Q3. The content in this report reflects the current capabilities of Observo AI, a SentinelOne Company, prior to full integration into the SentinelOne Singularity Platform. Many aspects of the offering can be expected to change as Observo AI becomes fully integrated into the broader SentinelOne ecosystem. The goal of this section is to provide visibility into Observo AI’s differentiators within the security data pipeline platform landscape.

Introduction

Observo AI is a security data-pipeline platform that focuses on AI native data pipelines that use machine learning, large language models, and agentic AI to automate data optimization across security and DevOps workflows. In addition to its core data pipeline product, Orion, acts as an AI powered data engineering assistant that allows teams to build and manage complex pipelines through natural language rather than specialized engineering skills.

Architecture and Deployment Maturity

Observo AI offers flexible deployment options such as on-premises, SaaS, and hybrid deployment models. The most common production pattern is hybrid: the Manager (control plane) runs as a SaaS service hosted by Observo AI, while customers choose whether to use the provided SaaS data plane or deploy their own Sites across cloud, data-center, or colo environments. Fully air-gapped, self-hosted deployments are supported, with both Manager and Sites deployed entirely inside the customer’s network.

Marketplace, MSSPs and Compliance: Unverified

Pricing

Not publicly available.

Data Collection and Integrations

Observo’s ingestion approach can be characterized across these modes:

  1. API Driven & Agentless Ingestion: Observo AI pull or receive data via endpoints provided by SaaS tools or custom apps, using REST APIs. Agentless pull ingestion allows Observo AI to fetch data directly from APIs, cloud storage buckets, or streaming platforms like Kafka and Event Hubs without requiring deployed agents.
  2. Cloud Storage Ingestion: Cloud storage ingestion works by retrieving exported log files from cloud object stores, allowing Observo AI to process batch style exports from AWS S3, Azure Blob Storage, and Google Cloud Storage.
  3. Streaming and Queue Ingestion: Streaming ingestion reads high throughput real time events from messaging systems like Kafka, Azure Event Hubs, and Google PubSub, enabling continuous processing of fast moving telemetry.
  4. Push Based / Protocol based Ingestion: Push based ingestion is used when systems directly forward logs to Observo AI through mechanisms such as Syslog, HTTPS based forwarding, or the Observo Edge gateway. Protocol based ingestion involves standardized transport formats such as Syslog, HTTP socket connections, platform level protocols, and Sentinel 1 connectors used by network and security systems.
  5. Agent and Collector Based Ingestion: Agent and collector based ingestion uses tools like Fluent Bit, Fluentd, Splunk Forwarder, and the OpenTelemetry collector to read logs locally on hosts or containers and send them to Observo AI.
  6. Advanced Collector Extensibility via Lua: Observo AI enables deep customization of ingestion behavior through Lua scripting. This allows teams to implement advanced logic for difficult or nonstandard sources, build custom polling mechanisms, and extend the collector framework beyond prebuilt patterns.

Core Pipeline Capabilities

1. Data Reduction Capabilities

  • Field and Event Filtering
  • Lookup-Driven Tagging and Classification
  • Built-in Optimizers (e.g., Windows Events, VPC Flow Logs)
  • Field Removal and Renaming
  • Cardinality-Aware Optimization via Insights Engine
  • Event Code–Based Reduction
  • Lua-Based Custom Reductions
  • ML-Driven Data Summarization
  • De-Duplication
  • Dynamic Pattern Detection

2. Out-of-the-Box Content Packs

Observo AI provides several out-of-the-box components, including built-in parsers for common log formats, predefined optimization routines for sources such as Windows events and cloud flow logs, PII-masking templates, enrichment lookups, and a robust library of transformation functions. The Data Insights Engine further contributes to OOTB value by automatically surfacing event distributions, high-cardinality fields, and optimization targets without requiring manual setup.

Source-Specific Transform Groups:

Observo AI includes source-aware parsing and optimization behavior for vendor formats such as Palo Alto CSV telemetry, where raw value-only CSV logs are transformed into structured field–value events and can be further optimized via built-in functions and Lua scripting.

3. Data Normalization

Observo AI normalizes to standard formats using AI-generated adjustments, and advanced scripting. The Data Insights Engine guides normalization efforts by surfacing patterns, field distributions, and optimization opportunities, enabling analysts to quickly shape data into consistent, usable formats.

Source-Aware Parsing and Transformation

Observo AI supports parsing of structured and semi-structured sources, including JSON and vendor-specific formats such as Palo Alto CSV telemetry. Exact normalization types supported are unknown.

Observo AI also uses AI-generated Grok patterns to automatically parse and normalize custom application logs and other unstructured or previously unknown formats. This allows the platform to infer schemas on the fly and convert irregular telemetry into consistent, structured records without manual field mapping.

4. Enrichment

Context and Lookup Enrichment: Observo AI supports enrichment via static lookup tables and dynamic lookups that run on a schedule (via CRON expressions) to keep CSV-based reference data up to date. These lookups can be used to tag events with additional attributes, such as threat-intel indicators or other customer-defined context, as data flows through the pipeline.

Threat Intel Enrichment:

Observo AI applies threat-intelligence context using lookup-based enrichment. Events can be matched against externally maintained IOC lists or dynamically updated threat-intel files, allowing pipelines to tag observables such as IPs, domains, or hashes.

The platform enriches telemetry but does not provide native threat detection or correlation logic, keeping enrichment focused on adding IOC context and sentiment analysis rather than specific threat alerts.

5. Schema Drift Detection

Observo AI does not currently support automated schema drift detection.

While underlying components such as data insights and parser analysis lay groundwork for future capabilities, schema drift awareness remains a roadmap item.

6. Threat Detection Content

Observo AI does not provide native streaming threat-detection rules or detection content. The platform offers enrichment-based threat intelligence through lookup tables and dynamic lookups but does not execute threat analytics, rule engines, or alerting within the pipeline.

Observo AI uses sentiment scoring to prioritize events based on patterns and text strings commonly associated with known attack behaviors. This helps downstream tools and analysts focus their investigation on higher-intent events.

7. Intelligent Data Routing

Routing decisions are transformation-driven and manually configurable, though the platform.

Additional Pipeline Capabilities

  • Sensitive Data Masking: Observo AI can automatically detect sensitive data based on extensive internal library and understanding of PII and mask it easily without manual configuration required. Custom configuration is available as needed.
  • Archive Rehydration from Object Storage: Rehydrates archived telemetry from S3-compatible storage into active pipelines for replay, transformation, or migration purposes.
  • Side-by-Side Output Validation for Migration Workflows: Sends the same data stream to multiple destinations, optimized and unoptimized to validate changes and support SIEM migration efforts.
  • Pipeline-Level CPU and Transformation Utilization Metrics: Provides operational metrics showing CPU load and transformation processing impact to monitor pipeline performance and stability.
  • Controlled Delivery & Backpressure Protection: Manages ingestion bursts and destination slowdowns by buffering, throttling, or rerouting data to maintain stability in high-volume environments.
  • Fine-Grained Role-Based Access Control: Provides granular control over who can view, modify, or publish pipelines and transformations, ensuring separation of duties and secure operations across data engineering, SecOps, and platform teams.

Pipeline Building Experience

  • Visual Pipeline Construction and Transformation Mapping: Observo AI enables users to easily build pipelines through a visual configuration model rather than code. Sources, destinations, and transformation functions are added directly within the UI, giving analysts a clear, interactive view of how each stage shapes the data.
  • Insight-Driven Pipeline Optimization: The Data Insights Engine provides detailed analysis of event patterns, cardinality, and field characteristics to guide normalization and reduction decisions. These insights help users quickly identify inefficiencies and optimize data before sending it to downstream systems.
  • AI-Assisted Pipeline Adjustments with Orion: Orion enhances pipeline creation by analyzing context from the Data Insights Engine and generating transformation logic such as PII masking, field adjustments, and optimization steps. Users can preview all AI-generated changes before deploying them, enabling safe, guided refinement without scripting.

AI Maturity

Observo AI’s data pipeline is AI-native, using machine learning to identify repetitive or low-value data patterns and summarize them for significant volume reduction. AI also supports automatic detection of schema structures, field anomalies, and optimization opportunities within the Data Insights Engine. Building on this foundation, Orion extends these capabilities as an agentic system that interprets pipeline context, analyzes field structures, and generates deployable transformations. Orion assists with tasks such as PII masking, data cleanup, optimization recommendations, and pipeline-level adjustments, enhancing the core AI-driven workflow.

The roadmap extends these capabilities toward automated detection of schema drift, parser pattern recognition, and conversational orchestration of complex pipeline operations. In this model, AI accelerates pipeline creation, refinement, and operational alignment, positioning it as a differentiating component for transformation logic and interactive workflow assistance rather than for autonomous detection or analytics.

Data Privacy with AI:

  • AI operates with outbound interaction: Orion relies on an external LLM service, requiring tokenized data to be sent outside the customer’s data plane when AI-assisted features are used. Customers must opt into this model based on their data-handling requirements.
  • Customer-controlled AI models (BYO-LLM): A bring-your-own-LLM option is planned, allowing organizations to supply their preferred model and maintain control over how AI processes pipeline-related information.
  • AI interaction limited to pipeline generation and orchestration: AI assistance is scoped to pipeline transformations, data inspection, and configuration workflows. It does not execute threat analytics, detection logic, or out-of-band inference on customer telemetry.

Integration Health Monitoring

Integration Health Monitoring: Observo AI delivers real-time visibility into pipeline health, monitoring data volumes, routing behavior, source degradation, destination errors, and resource utilization. The system detects dropped data, identifies unhealthy or slowing sources, and alerts when destinations stop receiving events. Integration health is treated as a continuous diagnostic signal, allowing operators to monitor ingestion stability and pipeline performance across distributed environments.

Coverage Gap Analysis: Unverified

Additional Capabilities

SIEM Data Extraction: Observo AI can retrieve data from existing analytics systems such as Splunk or security data lakes. This supports migration use cases, reprocessing of historical data, and pipeline consolidation across heterogeneous environments.

Vision

Observo AI is moving toward an AI-assisted pipeline platform where data flows can be dynamically refined through continuous understanding of pipeline behavior. The Data Insights Engine provides the contextual foundation, while Orion enables conversational interaction and deployable, AI-generated changes.

In the near term, integration work is focused on embedding pipeline capabilities directly into the broader SentinelOne platform to streamline data onboarding and strengthen product positioning. Key advantages include moving parsing earlier in the data flow for more effective optimization before ingestion, and using the platform’s data insights engine to provide out-of-band visibility during processing. Future development aims to expand autonomous pipeline behavior through conversational configuration, on-pipeline threat detection and alerting, and AI-based pattern recognition designed to improve data quality and reduce noise.

Analyst Take – SentinelOne + Observo AI

This acquisition strengthens the SentinelOne Singularity AI SIEM strategy by incorporating native pipeline capabilities directly into the platform. It also reinforces a key conclusion from our first Security Data Pipelines Report: pipeline technology has become a critical layer in modern SIEM and SecOps architectures. SIEM vendors that integrate this capability inline and address ingestion, onboarding, and data quality challenges at the source, pave the next generation of SIEM evolution. The company has confirmed that Observo AI will continue as a standalone offering. This decision reflects a recognition that security data pipelines operate as infrastructure layer components that must remain vendor-neutral. Security teams need the flexibility to ingest, transform, and route telemetry independent of their analytics platform choice. The integration path with SentinelOne AI SIEM is being developed in parallel, though the commercial model remains under discussion. This flexible approach supports a vendor-neutral posture for now, and continued investment will be important as new features are added to the roadmap.

Although both parties state that the product will remain independent, sustaining that independence can become challenging once deeper integration is prioritized. The detailed evaluation process and the focus on strengthening the SentinelOne AI SIEM will improve the SIEM platform but may also introduce concerns about maintaining the vendor-neutral positioning that is central to Security Data Pipeline offerings.

Realm Security

Realm is a promising newcomer to the security data pipeline platforms industry. They raised $15M in Series A just recently on Oct 8 2025. Realm positions itself as a tightly integrated, multi-tenant cloud control plane built around single-tenant data pipelines, persistent queues, and ML or LLM-driven relevance and normalization.

Voice of the Customer

We were able to interact with a customer of Realm to understand their use cases and experience with Realm. Here is what they said –

Life before Realm

“We needed a vendor agnostic platform to allow us to have control over the data that we send to our SIEM platform. We needed this to make sure that we could evaluate other SIEM platforms and quickly/easily transition to those platforms once a new one was selected. We needed to migrate/duplicate some of our data for long term retention purposes. We also have critical log sources that contain PHI/PII that we were unable to ingest into our SIEM as our SIEM is hosted in a third party cloud platform (defacto HIPAA violation) – Realm allows us to clean those logs of PHI/PII so that they can be ingested and we do get the alerting we need. We also have a small SIEM team, so the AI capabilities within Realm to intelligently trim log data to save on ingestion costs is hugely beneficial. Short term it will enable us to drive down SIEM costs, migrate SIEM platforms and keep PII/PHI out of our SIEM. Longer term the SIEM is likely going to be less of a focus for the industry as AI bots will be able to do most of that work, so Realm will still play a pivotal role in serving as the pipeline of the data that would once go explicitly to a SIEM to instead move to alternative data lakes, while making sure that those logs are continuing to be trimmed and maintained.”

Most used capabilities within Realm

“Data migration, data duplication, data scrubbing and intelligent log cleanup.”

What they’d like to see more of

“Going beyond pipeline management and a SIEM focus and potentially adding functionality where it can send AI bots into our various sources and extracting data without needing to explicitly send logs to a SIEM.”

Architecture and Deployment Maturity

Realm offers flexible deployment options. The platform separates its multi-tenant control plane (Console) from single-tenant data pipelines (Pipeline Engine). The control plane handles configuration, policy, and observability, while each customer’s dedicated pipeline processes and stores data within its own secure tenancy.

  • Marketplace: Hyperscaler partnerships; marketplace listings planned.
  • MSSPs: Private, undisclosed partnerships.
  • Compliance: SOC 2 certified

Pricing

Daily Ingestion. Consumption-based model

Pricing Assistance: Realm provides a pricing calculator for pricing assistance.

Data Collection and Integrations

  1. OTEL Collectors: Realm supports OpenTelemetry (OTEL) collectors for on-premises environments. These collectors standardize telemetry before forwarding to Realm pipelines.
  2. Agents and Forwarders: Lightweight agents and forwarders handle continuous data streaming from hosts, applications, and network sources.
  3. API-Based (Agentless) Integration: Realm enables agentless data ingestion through REST and streaming APIs. This method suits SaaS and cloud-native tools where direct API connectivity is preferred over deploying local agents.
  4. Cloud Syslog Streams: Cloud syslog integration allows ingestion from cloud-based security and infrastructure services that expose syslog endpoints.
  5. Push Integrations: Realm supports both push and pull data models for cloud services. Sources can push telemetry directly to Realm endpoints.

Number of integrations: 15 out of the box, with expanded possibilities because of collection methods.

Core Pipeline Capabilities

1. Data Reduction Capabilities

  • Data Reduction and Intelligent Filtering
  • Aggregation
  • Deduplication
  • Event Discards and
  • Field Filtering

2. Out-of-the-Box Content Packs

While not “content packs”, each customer receives rule recommendations within the data fabric for every integration configured.

These recommendations serve as the default mechanism for managing data transformations, providing out-of-the-box guidance without additional setup.

3. Data Normalization

Realm currently normalizes data to JSON and plans to add support for OCSF, ECS, and CIM in the first half of next year.

4. Enrichment

Security context and business context

Threat Intel Enrichment: Yes. Used for log reduction and filtering rules.

5. Schema Drift Detection

Realm detects schema drift and automates required parser updates for customers via support.

6. Threat Detection Content

Currently under development.

7. Intelligent Data Routing

Realm automates routing recommendations based on data type and destination use.

Additional Pipeline Capabilities

  • Intelligent Log Detection: Automatically identifies log types and routes them to the correct pipelines.
  • Secure Filtering and Testing: Implements a Dev-Test-Run framework with event capture to visualize transformation impact.
  • Persistent Queuing: Ensures no data loss during downstream outages and resumes delivery when systems recover.

Pipeline Building Experience

  • Drag and Drop interface: Provides drag-and-drop source and destination objects for faster pipeline configuration.

AI Maturity

Realm applies ML and LLM models across ingestion, filtering, normalization, and configuration to reduce reliance on expert scripting and support cost-aware routing. The platform’s intelligence layer incorporates inference mapping, combining statistical machine learning, security context, and incident response best practices to improve data quality and minimize alert fatigue before forwarding to SIEM or XDR systems.

Realm’s knowledge graph is used in production to generate personalized transformation rule recommendations, independent of its developing agentic workflows. The intelligence layer is designed to feed future agentic capabilities to support autonomous cross-source correlation.

Realm can autonomously parse datasets ingested into the platform, and the team is currently evaluating whether to expose this configuration for direct customer interaction

Data Privacy with AI: Tenant level isolation based on architecture.

Identity and Access Management: Includes SSO, SAML, custom OIDC, MFA, and granular RBAC controls.

Integration Health Monitoring

Integration Health Monitoring: Provides real-time integration active / error health status with alerts that identify likely causes and offer remediation recommendations to accelerate resolution.

Coverage Gap Analysis: Not available.

Schema Drift Detection: Automatic detection of schema drift. Parser updating to fix the schema would require manual intervention which is also supported by Realm’s support team to do so on behalf of the customers.

Additional Capabilities beyond Pipelines

Data Lake

Realm provides a data storage capability known as Data Haven, designed as a raw archival layer with retrieval and rehydration functionality. The system was built to support machine querying from the outset, enabling efficient access to historical data for analytics and AI-driven workflows. Realm positions Data Haven as both a cost-optimized extension of SIEM storage and a future-ready layer for agentic or AI SOC integrations, allowing customers to retain more data within budget while preparing for emerging automation and AI use cases.

Vision

Realm frames its direction around affordable ingest expansion, AI‑assisted pipeline configuration, and a haven‑plus‑replay (data lake) architecture that will eventually support AI SOC enablement. Leadership is direct about the roadmap being guided by customer needs and transparent about areas of the product still maturing.

Analyst Take

Here’s what we see as major strengths and opportunities for improvement for Realm –

Strengths:

In conversations with leadership, their goal is clear on where they want to go and where they don’t. Realm’s vision aligns closely with what buyers care about from Pipelines platforms, most: cost, speed, and reliability. Their focus on reducing ingest spend and simplifying deployment addresses a clear market pain point. The use of ML and LLMs for filtering and normalization addresses an area where many teams still struggle to automate and is in the direction where leading pipeline platforms are heading with AI capabilities. The combination of a multi-tenant control plane with single-tenant pipelines, supported by year-long pipeline reporting, gives early teams the maintenance flexibility and centralized visibility in distributed organizations they need as they scale.

Areas to Watch:

Realm remains in the early stage of its buildout, with a smaller integration footprint and will need time to reach broad source coverage. Marketplace listings and MSSP partnerships are still developing, which may create short-term friction for cloud-based procurement. Realm will need to accelerate supported normalization formats and invest in advance AI capabilities such as automatic coverage gap analysis and parser updates. Advancement will hinge on how quickly Realm expands integrations and delivers marketplace availability, while maturing AI and pipeline feature set. Practitioners should watch near-term progress on these fronts.

Tenzir

Tenzir, founded in 2017, raised about $3.3M in seed funding. Tenzir delivers an AI-enabled streaming data pipeline platform designed to process all telemetry across an organization’s environment. Its differentiation centers on TQL, a deterministic pipeline language, and an MCP server that converts natural-language instructions into fully generated pipelines and deployment packages. Their vision is to enable an AI-orchestrated Data Streaming Fabric that dynamically forms and adapts pipelines on demand, empowering organizations to move, transform, and validate security data in real time.

Voice of the Customer

Life Before Tenzir

The customer described a legacy log-management setup that had become increasingly complicated both in day-to-day operations and in overall cost. Their attempt to rebuild the system using open-source tooling ultimately proved unsustainable:

“We already had a solution which was quite legacy, had some problems, and wasn’t state-of-the-art. So we tried to replace it ourselves with open-source tools and came to the conclusion that it would not work, and that we should invest more in the problems of our customers and less in building the tooling to integrate system X and Y.”

The limitations of this environment pushed them toward commercial Streaming Data Pipeline solutions. Rising SIEM ingestion costs and the need to maintain consistent workflows across multiple SIEM platforms were central drivers in reassessing their architecture. As the customer put it:

“We started looking around at the solutions out there—Tenzir, Cribl, and others in that space—and in the end we came to the conclusion that Tenzir was the best match for our case.”

Most Used Capabilities Within Tenzir

The customer reported that Tenzir’s primary value lies in its ability to reduce log volume while unifying a wide range of integrations. This helped them regain control over SIEM-related costs and maintain a consistent pipeline across multiple SIEM platforms.

“It just did what we needed it to do, starting with the many integrations we required at the beginning. We are already transforming a lot of data into specific events, aggregating them, and then sending them to different SIEM solutions because we don’t have just one type. We don’t use only Elastic or Splunk or Sentinel or Google Cloud — we have different customers with different solutions.”

They emphasized that Tenzir’s normalization workflows and on-node storage materially improved investigation speed and consistency. Common enrichment and preparation steps no longer had to be recreated across different tools, and short-term storage on the node made it easier for analysts to review recent activity:

“We have storage on the node which is pretty helpful… if they [security engineers] see an incident and they want to research what happened or what this user has done in the last two days, you can — with OCSF — write some pretty nice queries which only show what has happened with that user or from that IP.”

Tenzir’s standardized enrichment flows, localized storage, and OCSF normalization streamline investigations by enabling fast, focused queries on recent user or IP activity. The customer also noted the resilience benefits of Tenzir’s architecture: keeping data and configuration on-node ensures that operations continue even if external connectivity is disrupted.

What They Would Like to See More Of

The most significant request centers on more granular role-based access control to safeguard sensitive fields within the data.

“One thing that would be really helpful for us is deeper role-based access control. Right now, there isn’t enough fine-grained control over who can access what data. We have sensitive information that shouldn’t be visible to all engineers, and some fields shouldn’t be viewable at all because they contain personal information. I’ve heard they’re already working on improvements in that area, and that would be very important from our side.”

They noted a handful of smaller operational enhancements, but access control remains the primary need moving forward:

“Beyond that, there are a few smaller operational things—customer requirements around how storage works, archiving data, optimizing compaction—that they’ve already started addressing. But those are minor compared to access control. The biggest thing we need from them is the access control.”

Architecture and Deployment Maturity

Tenzir provides a flexible deployment model that mirrors a clear separation between its cloud-based Console and on-premises Pipeline Engine. The Console functions as a SaaS control plane, while all streaming and transformation activity runs locally within the customer’s environment. For regulated or isolated settings, Tenzir can operate fully air-gapped, with both Console and Nodes deployed entirely inside the customer’s network. The architecture emphasizes deterministic, real-time processing through TQL, ensuring strict schema enforcement and data validation before telemetry is forwarded to downstream systems.

  • Marketplace: AWS Marketplace is the primary distribution channel.
  • MSSPs: Partnerships include DCSO, fernao magellan, and additional regional or technology.
  • Compliance: No formal certifications, though Tenzir maintains internal security policies aligned with industry-standard controls.

Pricing

Ingestion-based model with fixed annual pricing for predictable budgeting.

Pricing Assistance: Tenzir does not provide a pricing calculator or automated estimator.

Data Collection and Integrations

Tenzir unifies telemetry ingestion across on-premises, cloud, and hybrid environments through a connector-driven architecture in which all processing executes locally on customer-managed nodes. The platform consolidates data collection from heterogeneous enterprise logging ecosystems and accommodates a wide spectrum of security and operational data formats.

Tenzir’s ingestion approach can be characterized across these modes:

  1. Pre-Built Integrations and Packages: Tenzir provides a catalog of integrations and application-specific packages that deliver ready-made TQL pipelines for sources such as AWS CloudTrail, Microsoft 365, Okta, Palo Alto, CrowdStrike Falcon, GitHub audit logs, and others. These are shipped as modular, version-controlled units that can be composed to fit customer workflows.
  2. Universal Connector Framework: Nodes can ingest from virtually any source using receivers for syslog, Kafka, cloud object storage (S3, GCS, Azure Blob), HTTP endpoints, databases, and low-level transport protocols including TCP/UDP, AMQP, ZeroMQ, SQS, and Pub/Sub.
  3. Existing Agent and Forwarder Compatibility: Tenzir is designed to integrate with customers’ existing data collectors rather than require proprietary agents. It supports Splunk UF, Elastic Beats, Fluentd, Vector, OTEL collectors, and a variety of vendor-specific forwarders.
  4. HTTP and API-Based Collection: The platform ingests directly from REST APIs and custom HTTP endpoints, extending coverage to SaaS services and internal applications.

Marketplace: AWS Marketplace

MSSPs: DCSO (leading European MDR provider using Tenzir for security operations infrastructure) and fernao magellan (24/7 monitoring and incident response services)

Additional partners include ticura (AI-powered threat intelligence platform), HOOP Cyber (cyber data engineering consultancy), and trustunit (Turkish cybersecurity consultancy)

Compliance: No, equivalent policies in place

Core Pipeline Capabilities

Diving deeper into core pipeline capabilities

Tenzir Pipeline Capabilities 1
Tenzir Pipeline Capabilities 2
Tenzir Pipeline Capabilities 3

Additional Pipeline Capabilities

  • Symmetric Read/Write Format Handling: Pipelines can read and write formats such as JSON, CSV, Parquet, PCAP, Zeek, CEF, LEEF, and OCSF, and can re-encode optimized data back into its original format for compatibility with downstream tools.
  • Metrics and Diagnostics as Pipelines: Tenzir treats metrics and diagnostic signals as data streams, allowing users to filter, enrich, and route operational events using the same pipeline mechanics applied to production telemetry.
  • Streaming OCSF Validation: Pipelines perform real-time OCSF schema validation, serving as a quality gate to ensure downstream systems receive consistently structured data.

Pipeline Building Experience

  • Pipelines as Code and Operational Building Blocks: Tenzir treats pipelines as the core construct of the platform, with metrics, diagnostics, routing, enrichment, and detection all implemented as pipelines. They can be stored as code, versioned in Git, deployed through CI/CD, and forked to process raw and parsed data in parallel.
  • Manual Authoring with TQL and Reusable Packages: Pipelines are built using TQL, a composable, pipe-based language inspired by Splunk’s SPL and Microsofts’ KQL. Users can assemble pipelines by chaining operators and incorporating application-specific packages that ship as independently deployable, version-controlled TQL units.
  • AI-Generated Pipelines and Deployment Packages: Through the MCP server, users can describe objectives in natural language and have Tenzir generate full TQL pipelines. The AI can also produce complete deployment packages, including ingestion flows, normalization logic, enrichment contexts, threat-intelligence feeds, and detection rules.

AI Maturity

Tenzir positions AI as a foundational component of its streaming data fabric, with the MCP server enabling natural-language generation of complete TQL pipelines aligned to user intent. AI-driven schema mapping automatically translates telemetry into OCSF, Splunk CIM, Elastic ECS, and proprietary models, removing the need for manual normalization. The platform extends this automation to produce full deployment packages—spanning ingestion pipelines, enrichment contexts, threat-intelligence feeds, and detection logic.

These AI capabilities accelerate time-to-value by converting raw log samples into production-ready pipelines and enabling on-demand assembly of large, interdependent pipeline fabrics. This establishes AI as a strategic differentiator for pipeline creation, schema transformation, and dynamic fabric orchestration.

Data Privacy with AI:

  • AI operates without accessing customer telemetry: Tenzir states that “sensitive security data never leaves [the] customer environment,” and this applies equally to AI-assisted functions. All ingestion, transformation, enrichment, and detection run locally on Nodes, with only metadata, metrics, and control signals exchanged with the cloud Console.
  • Customer-controlled AI models (BYO-LLM): The MCP server enables AI to generate TQL pipelines through local pipeline execution, while customers supply their preferred AI model. Customers decide to which extent they share sample data with the AI provider in order to create TQL transformations.
  • AI interaction limited to pipeline generation and orchestration: MCP-driven AI assists with creating pipelines, deployment packages, and troubleshooting workflows, but does not perform inference on customer telemetry or move data into external environments.

Integration Health Monitoring

Tenzir Integration Health Monitoring

Additional Capabilities

This section notes capabilities that extend beyond core pipeline functionality.

Searching

Tenzir can query data stored in cloud object storage (S3, GCS, Azure Blob) with predicate pushdown, reducing data transfer and accelerating federated search workflows.

Vision

Tenzir aims to deliver an AI-orchestrated Data Streaming Fabric in which pipelines dynamically form, optimize, and retire based on real-time needs. The Security Data Fabric represents the first expression of this model, enabling analysts to use natural language to assemble streaming architectures that manage ingestion, normalization, enrichment, and detection across distributed environments.

Analyst Take

Here’s where what we see as major strength and opportunity for improvement for Tenzir –

Strengths

Tenzir’s strengths center on deterministic, standards-aligned pipelines and a design philosophy that treats pipeline reliability as a core engineering requirement. TQL enforces structure and validation in stream, which reduces downstream drift and noisy normalization issues. This is especially valuable for teams standardizing on OCSF or operating in regulated environments. The platform’s AI focus is on pipeline creation rather than inference. MCP turns natural language goals into complete TQL pipelines and deployment bundles, closing the long-standing gap between messy sample logs and production-ready pipelines. Tenzir also adopts a local-first execution model with an air-gapped option that keeps processing and control entirely inside the customer perimeter. The platform supports broad integrations without requiring proprietary agents, which lowers onboarding friction.

Areas to Watch

Key areas to watch include the depth of access control, where customers want more granular RBAC and field-level protections for sensitive data. Schema drift detection is another gap. Fast-moving SaaS sources require proactive alerts and assisted remediation to keep pipelines stable as formats evolve. Compliance maturity will matter for enterprise buyers because the absence of formal compliance certifications extends vendor review cycles and slows procurement. Expectations for AI pipeline capabilities are increasing as well. Buyers in regulated sectors will expect audit trails, prompt-to-code traceability, and structured change control for AI-generated pipelines. Package quality and lifecycle management will be evaluated closely, especially the speed at which new fields and vendor API changes are supported. Marketplace and MSSP motions are still developing, and large customers often require multi-cloud purchasing paths and clear shared responsibility models.

VirtualMetric

The core strength of VirtualMetric is their deep integration with Microsoft services. VirtualMetric positions itself as a security telemetry pipeline built for enterprises and MSSPs that need centralized, cloud-managed control with data processing kept entirely within the customer’s environment. The platform was built with MSSP support in mind to cater for centralized multi-tenancy support.

Voice of the Customer

We interviewed a customer of VirtualMetric about their experience with the platform. Here is what they said –

Life before VirtualMetric

Before VirtualMetric, the security team struggled with costly and rigid log ingestion into Microsoft Sentinel. Third-party logs, especially from high-volume sources like Palo Alto, drove up costs due to flat ingest pricing. You paid five dollars per gigabyte you ingest,” the security leader noted. Limited control over routing also made it hard to manage noisy data: “We decided to ignore network logs for a long while because it’s just too noisy, it’s too expensive. Deploying Linux-based forwarding also placed an operational burden on smaller customers, many of whom lacked the internal expertise to manage it.

Most used capabilities within VirtualMetric

Routing flexibility and centralized pipeline management were top priorities for this organization. “We really value the content packs… we can also reuse filtering. So if we decide to do Palo Alto filtering for one customer, all the other customers also immediately benefit.” While cost reduction is a clear value, the leader emphasized that “I would value that even higher than the actual cost reduction. Features like filtering out null-value fields helped lower ingest without sacrificing fidelity: “From a cost perspective, I think it’s almost a no-brainer.”

What they’d like to see more of

The organization adopted VirtualMetric not just for features, but for the ability to influence roadmap direction. “We liked the fact that they were still a new company… we already saw stuff we proposed come into the product.” Key asks include CI/CD integration (“We run SIEM-as-code quite seriously”) and stronger multi-tenancy controls for MSSPs. The leader also pointed to real-time threat intel matching as a valuable addition. Looking forward, they see VirtualMetric remaining focused on telemetry infrastructure rather than competing with the SIEM: “The SIEM is also a single place of truth.” Instead, he sees VirtualMetric filling critical gaps: “Microsoft has their hands full developing AI features… they are neglecting the logging pipeline a little bit. That’s why VirtualMetric came at the right time.

Architecture and Deployment Maturity

VirtualMetric separates the control plane (Console) and data plane (Pipeline Engine) to support flexible deployment models. The management console is delivered as SaaS for centralized fleet oversight, while the pipeline engine runs entirely in the customer’s environment, whether on premises or in their own cloud.

For air-gapped or isolated environments, a Self-Managed mode for Console is available where no data is sent to the vendor’s cloud. The underlying architecture incorporates a write-ahead log and vectorized processing to support high-throughput ingestion and ensure zero data loss.

One of their core differentiators is their architecture. The pipeline engine is built on a write-ahead log (WAL) and vectorized processing model, which means logs are durably written to disk before processing begins, minimizing the risk of data loss. Once written, the data is processed in parallel across all CPU cores, enabling high throughput and low latency even under heavy workloads.

  • Marketplace: Azure Marketplace and Microsoft Security Store, with AWS and Google Cloud support in progress.
  • MSSPs: The company is also exploring partnerships with OVHCloud and Scaleway to meet European sovereignty requirements
  • Compliance: SOC 2 Type II and ISO 27001.

Pricing

Ingestion based licensing model.

Pricing Assistance:

  • Public Licensing: Pricing details publicly available on its website.
  • Average: Usage is calculated based on a 7-day daily average to prevent short-term data spikes from triggering unexpected costs.
  • ROI Calculator: The vendor also offers an ROI calculator to help prospective customers estimate potential time and cost savings.Consumption-based model

Data Collection and Integrations

VirtualMetric offers flexible data collection methods.

  1. Smart Agent (Remote and Direct Deployment): VirtualMetric offers a proprietary Smart Agent that supports remote, no-install collection from Windows, Linux, Solaris, and AIX systems. Enterprises can also opt to deploy the agent locally where needed.
  2. Syslog and Network Protocols: Ingests data from standard protocols including Syslog, NetFlow/IPFIX, Cisco eStreamer, TCP, UDP, and HTTP, enabling integration with a wide range of network and security appliances.
  3. Forwarders and Event Streams: Supports ingestion from streaming platforms like Kafka, along with common log forwarders.
  4. OpenTelemetry Collectors: Compatible with OTEL collectors to support standardized, vendor-neutral telemetry collection across environments.
  5. API-Based and Managed Identity Integration: Includes support for agentless API-based integrations. For Microsoft Sentinel, VirtualMetric uses Managed Identity to avoid token rotation issues during connector operation.

Total number of integrations out of box: 45+ direct vendor integrations but a larger number when supported collection methods are taken into account.

Core Pipeline Capabilities

VirtualMetric Pipeline Capabilities 1
VirtualMetric Pipeline Capabilities 2
VirtualMetric Pipeline Capabilities 3
VirtualMetric Pipeline Capabilities 4

Additional Pipeline Capabilities

  • Advanced Data Deduplication: A new staged-routing and commit-processor pattern that simplifies multi-tier data pipelines by automatically selecting and committing only the highest-quality normalized log format, eliminating duplication, complexity, and conditional logic.
  • Extensive Drag and Drop Library: Vendor claims they offer over 150 purpose-built processors, enabling users to build complex pipelines through a drag-and-drop interface without writing code.
  • Transformation and Optimization: Processors like Compact remove empty or placeholder fields, while Enforce-Schema ensures incoming data aligns with the expected structure.
  • AI and Text Processing: AI-based and text processing modules classify and extract insights from unstructured logs, improving visibility and enabling downstream analytics.
  • Codeless Pipeline Design: The processor framework is meant to allow teams to design, test, and deploy pipelines quickly.
  • Observability Use Case: VirtualMetric DataStream is designed as a telemetry pipeline for both observability and security use cases, with a focus on data sovereignty and operational scale.

Pipeline Building Experience

  • Elastic-Compatible Pipeline Syntax and Prebuilt Pipelines: VirtualMetric DataStream supports Elastic Ingest Pipeline syntax, allowing teams to reuse existing Elastic pipelines without modification. It also provides access to over 300 prebuilt pipelines for common sources and use cases.
  • Drag-and-Drop Visual Pipeline Builder (Coming soon): A new visual debugger and builder is set to launch soon. This feature will allow users to design, test, and troubleshoot pipelines using an intuitive drag-and-drop interface, reducing reliance on manual configuration.
  • AI-Assisted Pipeline Creation (Coming Soon): VirtualMetric plans to release an AI Copilot feature that lets users describe pipeline goals in natural language. The system will translate those inputs into processors and flow logic, streamlining the creation process for teams that want to move faster without deep technical setup.

AI Maturity

VirtualMetric DataStream uses AI processors that integrate with OpenAI, Azure OpenAI, and Anthropic Claude. These processors allow users to classify logs, extract entities, and generate summaries by applying user-defined prompts directly within the data stream. These functions also help process unstructured log data and provide structure and context before routing.

The platform also uses AI to assist with data quality by identifying misconfigurations, classifying event types, and detecting anomalies. This supports more accurate filtering and helps reduce alert volume. VirtualMetric plans to introduce an AI Copilot feature that will let users build pipelines using natural language input. This capability is intended to simplify pipeline creation by translating user intent into configured processors and flows. AI is also applied to improve routing decisions and threat context enrichment, supporting earlier detection and more efficient data distribution across SIEM and data lake environments.

Data Privacy with AI:

  • No data collection: No customer data is sent to VirtualMetric’s cloud. All data processing happens entirely within the customer’s environment (on-prem or private cloud), ensuring full data ownership and control.
  • AI processors run within the customer environment: Ensuring that data stays local even when using third-party AI services for classification or summarization.

Integration Health Monitoring

VirtualMetric Integration Health Monitoring

Vision

VirtualMetric’s stated vision is to serve as the central telemetry and data management layer for both security and observability use cases. Rather than replacing SIEMs or data lakes, the platform aims to decouple data collection, normalization, and optimization from the analytics layer. They are also investing in real-time detection and edge intelligence to support context-aware routing, enabling teams to process all telemetry types and forward only relevant data. The long-term goal is to become the foundation for an open, vendor-agnostic Security and Observability Data Fabric.

Analyst Take

Here’s where what we see as major strength and opportunity for improvement for VirtualMetric

Strengths

VirtualMetric offers a deterministic, rule-based reduction model that helps reduce ingestion costs without risking detection quality. Its deep Microsoft integration, covering Sentinel, Sentinel data lake, ADX, Blob Storage, DCR autodiscovery, and ASIM schema alignment, removes manual routing overhead and suits MSSPs managing multiple tenants. Processing happens entirely in the customer’s environment, with support for SaaS, self-managed, and air-gapped setups. Backed by a write-ahead log and vectorized engine, the platform delivers high-throughput performance without relying on vendor cloud infrastructure. For teams prioritizing control, Microsoft native alignment, and operational clarity, VirtualMetric is a strong fit. The use of a write-ahead log and vectorized processing enables high-throughput pipelines, supporting both performance and resilience. Overall, VirtualMetric offers a strong operational foundation and a clear roadmap. Teams seeking precision, Microsoft-native alignment, and operational independence will find the platform compelling.

Areas to Watch

VirtualMetric currently lacks full integration health visibility across all data sources, creating potential blind spots for security teams until broader monitoring arrives. Certain advanced pipeline features like integration health monitoring beyond VirtualMetric’s own Smart Agent, and automated coverage gap detection are still in development. The platform does not provide formalized coverage gap analysis to show missing telemetry across data sources or ATT&CK techniques, which limits visibility into detection blind spots. Looking ahead, areas worth watching include their planned Sigma and YARA based edge detection capabilities in Q1 2026, the evolution of their multi cloud data lake strategy as major providers rapidly expand their native lakes, and their positioning in air gapped and sovereignty focused markets where self managed deployments and European cloud partnerships could be differentiators.

About Michelle Larson

Michelle Larson is a lingerie expert living in Brooklyn, NY, where she creates quippy written content, crafts dreamy illustrations, and runs the ethically-made loungewear line.

Related Posts

Modernizing the Identity Stack: From Visibility to Governance through Entitlement Intelligence

Agentic Remediation: The New Control Layer for AI-Generated Code

No Comments

home-new-prev data-ai-security deepak videos subscribe mika security-operations blog-3 conference identity-network-security rapheal other-topics webinar webinar about-us-prev cloud-app-security dspm report-tag/new
cybersecurity research icon

Subscribe to the
Software Analyst

Subscribe for a weekly digest on the best private technology companies.