An In-Depth Look at Snowflake’s Detection Development Lifecycle
Written on
This article is a collaborative effort with Tammy Truong and Michele Freschi.
TL;DR
This article outlines Snowflake's approach to the Detection Development Lifecycle, a structured methodology for creating and maintaining threat detections. The lifecycle encompasses six key phases: Requirements Gathering, Design, Development, Testing and Deployment, Monitoring, and Continuous Testing. A well-structured Detection Development Lifecycle not only enhances the quality of detections but also fosters comprehensive documentation, aids in team scalability, and lays the groundwork for performance metrics.
What is the Detection Development Lifecycle?
While software engineers adhere to the Software Development Lifecycle (SDLC) to create effective applications, detection engineers must also devise a systematic method for developing and managing detection logic. Without a solid framework, detections may yield low accuracy, leading to alert fatigue from false positives, and increased risk of missing attacks until they escalate.
To combat this, detection engineers at top security teams are embracing Detection-as-Code within the Detection Development Lifecycle. This essential element of any threat detection program ensures a structured approach to creating and maintaining detections. In a previous discussion about the Threat Detection Maturity Framework, we emphasized the significance of the Detection Development Lifecycle, and now we are sharing our methodology at Snowflake.
Given that threat actors are more sophisticated and well-funded than ever, detection teams cannot realistically anticipate every possible attack technique. Therefore, effective defenders must adopt a repeatable process for detection creation based on risk and intelligence prioritization, alongside continuous monitoring and testing to maintain high accuracy. Broadly, these processes can be categorized into Detection Creation and Detection Maintenance. The following diagram illustrates the Detection Development Lifecycle utilized at Snowflake:
The Detection Development Lifecycle comprises six distinct phases:
- Requirements Gathering
- Design
- Development
- Testing and Deployment
- Monitoring
- Continuous Testing
Let’s explore each phase in detail:
Requirements Gathering
This initial phase serves as a centralized channel for the Threat Detection Team to gather and prioritize all detection requests. It is crucial for any team responsible for safeguarding the organization to understand how to submit requests to the Detection team. At Snowflake, detection requests originate from several teams:
- Product and Corporate Security
- Threat Intelligence
- Incident Response
- Audit and Compliance
- Red Team
- Internal Threat Detection
During this phase, technical specifics are gathered from relevant parties to facilitate the construction of requested detections. Information collected during Snowflake's intake process includes the detection objective, target system and its function, risk factors, vulnerabilities, and preferred alerting methods (e.g., Slack, Jira). Once a detection request is submitted, it is prioritized using a risk and intelligence-based framework. This systematic approach aids in strategic planning, resource distribution, and effective task assignment. Additionally, metrics related to detection coverage are gathered to refine the prioritization process.
At Snowflake, we recognize the importance of involving Product and Corporate Security teams early in this phase to identify requirements for effectively monitoring new features and systems. Collaborative efforts focus on understanding what is being developed to support risk identification and mitigation, including logging needs, validating mitigations, assessing threat models and attack vectors, and identifying opportunities for detection.
Design
Each detection must have a clearly articulated objective. Once work commences, this objective transforms into a detection strategy. We utilize Palantir’s Alerting and Detection Strategy (ADS) Framework, which offers a comprehensive and standardized documentation system for all detections. Given that Threat Detection and Incident Response (IR) operate as distinct teams at Snowflake, it is crucial to solicit reviews from the IR team during this phase, as they are the recipients of the alerts. This collaboration fosters IR ownership and awareness of the built detections, enabling them to influence the detection process. Furthermore, involving IR during the ADS design allows for thorough documentation review and playbook development for triage.
Development
Once the design for a new detection is finalized, it is translated into code. Standardization is essential for maintaining quality and effectiveness, which is why we have developed a template that ensures every detection includes a set of common fields and links to the ADS, clearly defining the detection's goal within the code. At Snowflake, we employ Panther as our detection platform, which operates on our security data lake and allows for both scheduled queries and stream-based detections. A critical aspect of developing detections involves tagging them with metadata, such as the MITRE tactic, technique, and sub-technique, system/platform, data source, and audit findings. This tagging not only assists during audits and other inquiries but is also vital for measuring detection coverage.
Testing and Deployment
After coding a detection, it undergoes testing for accuracy, precision, and alert volume. At Snowflake, we conduct both historical and real-time testing. Historical testing involves assessing the detection against past data, while real-time testing entails enabling the detection in a test queue to verify it meets established acceptance criteria. These criteria are determined collaboratively by the TD and IR teams and define acceptable alert volume and false positive rates. If necessary logs are unavailable for validation, tools like Red Canary’s Atomic Red Team or the Red Team can assist in testing. Upon completion of testing, detections are subject to peer review and managed within a version control system, ensuring adherence to Detection as Code principles.
Post-deployment, detections require ongoing maintenance. As with any production code, updates are often necessary to address bugs or issues that arise. The Threat Detection Team must work closely with all relevant stakeholders to establish a robust feedback loop.
Monitoring
The goal of this phase is to consistently evaluate the performance of deployed detections, analyze assumptions and gaps, and decommission detections when necessary.
Monitoring is supported by several core processes:
- Detection Improvement Requests (DIR): All code can harbor bugs, necessitating a process for identification, tracking, and resolution. At Snowflake, we refer to these as Detection Improvement Requests, which are submitted to resolve bugs, refine false positives, enhance detections, update the ADS, or even reconstruct detection logic. Furthermore, collecting metrics on DIRs provides insights into the quality and performance of our detections. For instance, recurring DIRs may suggest low fidelity detections warranting review.
- Detection Decommission Requests: Disabling a detection can feel counterintuitive for Threat Detection Engineers, as they may worry about missing potential attacks. However, alert fatigue poses its own risks, making it sometimes necessary to disable noisy detections. To address this, we have established a Temporary and Permanent Detection Decommission process that requires documentation of each request with justification and supporting information. A well-defined process here empowers the team to make informed decisions about disabling detections.
- Detection Reviews: Deployed detections are regularly evaluated to ensure their relevance to the organization. This is a vital process in the Monitoring Phase, as it prevents the accumulation of irrelevant or noisy detections as the Threat Detection Team evolves and the number of detections increases. This process also creates learning opportunities, allowing engineers to review and learn from their colleagues' code. We conduct annual reviews to assess various components of a detection, including its goal, scope, relevance, assumptions, and detection logic. The output of these reviews may result in no action, a Detection Improvement Request (DIR), or a Detection Decommission Request.
Continuous Testing
Continuous testing is essential for mature threat detection teams to ensure that each detection achieves its intended objective. While numerous tools can assist with this, we partner with our Red Team to conduct Purple Teaming exercises. These exercises not only strengthen the relationship between Red and Blue Teams but also provide valuable learning experiences. The outcomes of Continuous Testing may indicate no action if the detection performs correctly, a Detection Improvement Request, or a call for a new detection.
Benefits and Adoption
Establishing a Detection Development Lifecycle offers numerous advantages for the Threat Detection Team, including:
- Quality: Adhering to a robust Detection Development Lifecycle results in the creation of high-quality detections.
- Metrics: Collecting metrics such as detection coverage and quality indicators, as well as defining key performance metrics, helps identify areas for improvement and informs program planning.
- Documentation: Relying solely on "code as documentation" is inadequate, as non-technical stakeholders also engage with internal processes. Rapidly assembling documentation about how a detection was built or why one was disabled is often cumbersome. Strong documentation enables the SOC or IR Team to understand and triage detections more effectively, reduces the time a Detection Engineer spends reviewing and updating detections, and supports compliance and audit requirements.
- Scale: This document serves as an essential resource for new Detection Engineers, providing a comprehensive understanding of the Threat Detection process from start to finish. It also ensures accountability and repeatability within the team.
The Detection Development Lifecycle detailed in this article reflects the approach taken by the Threat Detection Team at Snowflake. Each organization is unique, and implementations will differ accordingly. Within our organization, this lifecycle is a dynamic document that we continually update and refine in response to changes and improvements. Developing a robust lifecycle presents challenges, requiring strong partnerships with stakeholders, clear role definitions, and the necessary resources to support these processes. We welcome connections with peers to exchange ideas and learn how others address similar challenges; please feel free to reach out to Haider Dost, Tammy Truong, and Michele Freschi.