Understanding the descriptor files

As a descriptive tool the KTB uses a file (or set of files) to record how the cluster should look like, this files are call descriptors.

Because users might have complex deployments, the structure is flexible enough to accommodate them, we will cover them in this section.

The File format

Currently the tool supports reading YAML and JSON files. Note that all topologies will need to be using the same file format, a mixture is not supported.

In this guide we will use yaml as the core format, for JSON examples please refer to the examples directory.

The file type is configured using the topology.file.type configuration variable.

` topology.file.type=JSON `

Getting started

The KTB files use a common structure for describing the cluster entities, this attributes are setup around:

  • context : It is commonly used to describe where a collection of entities are coming from. This value could be for example a team name, a line of business or simply the origin of the topics (the data center).

This attribute is used as a primary key to group all the entities.

  • project: In each context there could be one or more projects, this attribute is the next level of personalisation as it is usually used to group all final entities such as permissions (acls/rbac), topics and schemas. Each Topology can have multiple of them.

In between the context and the project attribute the user can define a free list of attributes in form of a key and value. This list of attributes is going to be listed, in order, by default during the topic name composition in between the context and the project attribute.

A descriptor header like:

---
  context: "context"
  source: "source"
  projects:
    - name: "foo"
      topics:
        - name: "foo"
          config:
            replication.factor: "1"
            num.partitions: "1"
        - dataType: "avro"
          name: "bar"
          config:
            replication.factor: "1"
            num.partitions: "1"

will by default create topic names with the prefix context.source.foo, in detail this topology will create two topics.

  • context.source.foo.foo with one partition and a replication factor of one.
  • context.source.foo.bar.avro with a single partition and a single replica.

The separator and order of attribute can be personalised using the KTB configuration file. The relevant properties are:

  • Property: topology.topic.prefix.format, to set the full topic naming format.
  • Property: topology.project.prefix.format, to set the project level name format, it should be a subset of the previous one.
  • Property: topology.topic.prefix.separator, to select a custom separator between attributes.

more details can be found in the Important configuration values section where the most important configuration details are explained.

Manage only topics, the optional files

Not all the attributes are mandatory in the descriptor file, it is currently possible to:

  • Have a file with only topics, so no acls are defined using the abstractions provided by the consumers, producers, streams, etc attributes.
  • Build a topology with partial acls, if you are not using any stream application, there is no need to define it, same for other access control properties.
  • When defining a topic it is possible to use: * dataType when as a user it is aimed to specify the data type of the topic. * schemas if the reader is interested to register schemas for the topic.