Overview
The Workflow Package allows to define all data stack pipeline processing using configuration files. You can export/import the pacakges by organization to configure different use-cases.
Workflow Package Structure
workflow_package.tar.gz
├── kafka
│ ├── topics_group_1.json
│ ├── topics_group_2.json
│ └── topics_group_3.json
├── druid
│ ├── datasource_name_1.json
│ ├── datasource_name_2.json
│ ├── datasource_name_3.json
│ └── datasource_name_4.json
└── streams
├── enricher.json
├── zz-cep.json
└── normalizer.json
Kafka
On this folder, you must create a file per each group of Kafka topics. The content of each file is something like this:
[
{
"name": "topic_name_1",
"partitions": 1,
"replicas": 1,
"config": {}
},
{
"name": "topic_name_2",
"partitions": 1,
"replicas": 1,
"config": {}
}
]
You can check Kafka Topic Spec
Druid
On this folder, you must create a file per each streaming indexer, using the data source name as the filename.
Format: ${datasource_name}.json
The content of the file is a Streaming Indexer spec. It is defining what and how the data is indexed.
Streams
This folder has the three main stream-plans (normalizer, enricher, zz-cep). Inside these files you define the stream-plan configuration to each service, using their definitions:
Working with Workflow Packages
There are two main methods to work with Workflow Packages: