System overview and server services

To provide a better introduction to the system, it is helpful to first understand which services are available to the system. This also gives a better overview of which application level options are offered at all.

Service Use in synQup
RabbitMQ Message Queue Server
distributes module and core tasks to work processes (workers).
MongoDB schema-less NoSQL database
holds the so-called transfer data (e.g.: products, orders...) in a separate database also user data of the system (logs, time series data).
MySQL RDBMS
holds system configuration (users, flows, executions...)
Redis RAM-supported Key-Value-Storage
provides modules and core with easy caching, possibility for thread synchronization and much more

synQup itself consists of modules and core (the separation is described in more detail in the following section) and is a PHP application based on Symfony.

Module & Core

Data processing within synQup always consists of an interaction of the main application (the so-called core) and at least one module. The tasks of core and module are shown in the following as an exemplary delimitation:

Tasks Core:

  • Process Modeling (Which modules interact and how?)
  • Sequence Control (Starting the modules)
  • Monitoring (Monitoring the progress of the module)
  • Alerting
  • File System Abstraction
  • Authentication & Authorization (Which user is allowed to do what and how is their identity to be confirmed?)
  • Centralized Logging
  • Provision of the Surface

In contrast to this, the modules take over specific integration tasks. This can be, for example, the connection of a certain ERP system or the import of a certain data format. Modules are only truly powerful when they are designed to be reusable and expandable.

Flows & Executions

A flow is an abstract data flow description of all substeps that data must pass through. The instance of a flow, i.e. a concrete execution, is called Execution or FlowExecution. A simple example of a flow could look like this:

Flow A:

  • Download the file ~/DATEN.zip from the FTP server ftp.example.com with the user data example / password to the local file system demoFs. (I)
  • Extract the zip file DATEN.zip in the root folder of the local file system demoFs. (I)
  • Read the CSV file data.csv in the demoFs file system with the column description (...). (I)
  • … (T)
  • Export all products that have changed since the last update to the Shopware 6 webshop shop.example.com with the API credentials (…) (O)

Let's assume that this flow is controlled with a cronjob and is started every day at 10:00. A FlowExecution is therefore, for example, the concrete execution of Flow A on 16.09.2021 at 10:00.

This flow example also shows the separation of configuration and module code: The FTP module is configured here for our example customer for the FTP server ftp.example.com. For the next customer, this host could be example.org. Configurable, reusable modules are thus brought into interaction in a flow.

It is also clear here that synQup also handles the file system level abstraction. The storage backend used is therefore a black box in the direction of the modules. This allows even complex requirements to be implemented and the integration into existing system landscapes is simplified: behind demoFs in the above example could be a local could be a local file system, an (S)FTP server or an AWS S3 bucket.

All steps in a flow can be sorted into one of three namespaces: Input, Processing and Output. In the core, input, transformations and output are used. These steps run sequentially and self-contained: the steps in the output namespace can only run after all transformations steps have finished running.

To simplify the organization, it is best practice to always decide which namespace a (sub)module should be placed in transfer database to decide in which namespace a (sub)module is to be sorted: a module, that exports data from the transfer database to a target system would be an output module. A module that adds data into the transfer database would be an input module, a module that operates on data in the transfer database would be a processing module.

Flow control

So a flow always consists of different steps: certain modules (in the groups mentioned above) that run one after the other. This "one after the other" is to be considered flexible in synQup: IIn addition to the above-mentioned process steps of input, processing, and output, it is possible to the flow within a single process step (e.g. the interaction of all input modules for a specific flow) can be configured: for this, so-called DispatchConditions are used (i.e. conditions that must be fulfilled before a module can be started).

In our example above, we find already implied dispatch conditions: the modules in the input step (blue marks) would, without DispatchConditions would initially all be started simultaneously with the start of the input step (concurrency within the process steps, sequential flow between the process steps). What would such a DispatchCondition have to look like? What would it refer to in the first place?

Progress handling

All startable (sub)modules in synQup must provide a so-called ProgressDefinition: this ProgressDefinition is a blueprint of the intended processing of a particular task. A simple (in practice these ProgressDefinitions may be complex tree structures) example of a CSV import:

  • Module start: check the existence of the file to be imported, format check.
  • Split the file into batches
  • Processing of these batches into transfer products
  • Clean up: delete/move the now imported file

From this blueprint, an entry is created in a table every time a flow starts. The respective module is responsible for reporting the total number of process steps (total count) and the processed partial steps (processed count) to the core when processing starts. This allows the core to recognize when a module has crashed (= no more changes to the progress) as well as to log execution times, throughput, etc. The question of the last section is thus also answered: Modules can be started in the same substep depending on the progress of their sibling modules. (so that, as in the example above, the CSV file is processed only after it has been unzipped from the zip file).