Base Tutorial

On this page, we can try a stream example using a real Kafka cluster and the normalizer jar artifact. We are going to suppose that you have built the normalizer distribution how we explain on the Building section.

Explication

First of all, we need define a stream for launch a normalizer application.

Stream Config Json (my-stream-tutorial.json)

{
    "inputs":{
      "mytopic":["mystream"]
    },
    "streams":{
      "mystream":{
          "funcs":[
                {
                  "name":"firstMapper",
                  "className":"io.wizzie.normalizer.funcs.impl.SimpleMapper",
                  "properties": {
                    "maps": [
                      {"dimPath":["body", "messages"]},
                      {"dimPath":["header", "mac"], "as":"id"}
                    ]
                  }
                }
          ],
          "sinks":[
              {"topic":"partitionedStream", "type":"stream", "partitionBy":"id"}
          ]
      },
      "partitionedStream":{
          "funcs":[
                {
                  "name":"flattenMessages",
                  "className":"io.wizzie.normalizer.funcs.impl.ArrayFlattenMapper",
                  "properties": {
                    "flat_dimension": "messages"
                  }
                }
          ],
          "sinks":[
              {"topic":"output", "type":"kafka"}
          ]
      }
    }
}

At this point we have defined our stream where we inject messages

Phase 0: Input messages

{
  "header": {
    "mac": "00:00:00:00:00",
    "version": 2
  },
  "body": {
    "messages": [
      {
        "type": "rssi",
        "value": -78
      },
      {
        "type": "cpu",
        "value": 80
      }
    ],
    "someData": "otherData"
  }
}

On this example we read the Kafka topic mytopic and mapped it to the stream mystream. On the stream mystream we use one function that is called firstMapper, using this function we select messages specific fields and rename them. We select the field messages that is inside the field body and we also select the field mac that is inside header and rename it to id. The processed stream is partitioned by the field id and it is sent to the partitionedStream that is created at runtime.

Phase 1: Partitioned stream messages

{
  "id": "00:00:00:00:00",
  "messages": [
    {
      "type": "rssi",
      "value": -78
    },
    {
      "type": "cpu",
      "value": 80
    }
  ]
}

Later, we process the partitionedStream using another function that is called flattenMessages on this case we do a flatten on the messages that is an JSON ARRAY.

Phase 2: Output messages

{"id": "00:00:00:00:00", "type": "rssi", "value": -78}
{"id": "00:00:00:00:00", "type": "cpu",  "value":  80}

Finally the result will be sent to Kafka again into a topic that is called output.

Execution

On the first time we need to have a running Kafka cluster and the decompressed normalizer distribution.

Config file

We need to modify the config file that is inside the folder config/sample_config.json, we can change it or destroy it and create a new one with this content.

{
  "application.id": "my-first-normalizer-app",
  "bootstrap.servers": "localhost:9092",
  "num.stream.threads": 1,
  "bootstrapper.classname": "io.wizzie.bootstrapper.bootstrappers.impl.FileBootstrapper",
  "file.bootstrapper.path": "/etc/normalizer/my-stream-tutorial.json",
  "metric.enable": true,
  "metric.listeners": ["io.wizzie.metrics.ConsoleMetricListener"],
  "metric.interval": 60000
}

On this config file we indicate the application.id that will identify our instances group and some Kafka Broker. On the example we are going to use the FileBootstrapper so we read the config using a local file. We also need to set the property file.bootstrapper.path to the path where we have the stream config file.

Now we can start the normalizer service to do that we can uses the init script that is inside the folder bin:

normalizer/bin/normalizer-start.sh normalizer/config/sample_config.json

When the normalizer is running you can check it on the log file that is on directory /var/log/ks-normalizer/normalizer.log by default. If it started correctly you can see something like this:

2016-10-26 13:18:27      StreamTask [INFO] task [0_0] Initializing state stores
2016-10-26 13:18:27      StreamTask [INFO] task [0_0] Initializing processor nodes of the topology
2016-10-26 13:18:27    StreamThread [INFO] stream-thread [StreamThread-1] Creating active task 1_0 with assigned partitions [[__my-first-normalizer-app_normalizer_mystream_to_partitionedStream-0]]
2016-10-26 13:18:27      StreamTask [INFO] task [1_0] Initializing state stores
2016-10-26 13:18:27      StreamTask [INFO] task [1_0] Initializing processor nodes of the topology

Now you can produce some input message into input Kafka topic, but first you could open a Kafka consumer to check the output messages.

Consumer

kafka_dist/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --property print.key=true --topic output --new-consumer

Producer

kafka_dist/bin/kafka-console-producer.sh --broker-list localhost:9092 --property parse.key=true --property key.separator=, --topic mytopic

You can write some message into console-producer:

key1,{"header":{"mac":"00:00:00:00:00","version":2},"body":{"messages":[{"type":"rssi","value":-78},{"type":"cpu","value":80}],"someData":"otherData"}}

and you must see the output message on the console-consumer:

00:00:00:00:00	{"type":"rssi","value":-78,"id":"00:00:00:00:00"}
00:00:00:00:00	{"type":"cpu","value":80,"id":"00:00:00:00:00"}

This is the end of the tutorial!! Congratulations! ;)