Introduction

This page explains the concept of "batches" and how they are used in the output-module.

You can configure several batch sizes in the configuration:

{
  "batchSizes": {
    "cache": ...,
    "validate": ...,
    "upsert": ...,
    "delete": ...,
    "deltaCount": ...
  }
}

This page explains in detail what each of these values does, as the values configured can greatly affect the performance of the module.

Batches

In general batches are used to separate the transfer data into chunks. This makes it possible to import chunks of data in parallel which increases the performance for huge data sets.

The output-module uses batches for different purposes like caching, validating and upserting your data. This is why can find several combinations of "start" and "run" messages throughout the module. A start handler slices your data into chunks. For each chunk there is one RunMessage handled by a RunMessageHandler. The size of each batch/chunk is determined by the "batch-size" that you can set via configuration.

So basically the batch sizes determine how many entities are processed in a single message and how many "Run-Messages" are dispatched for your data.

Batch Sizes

This section explains how the different batch sizes affect the module in detail.

Cache

The module searches and caches corresponding shopware entities for every mapped document. To reduce the number of requests the entities are searched in batches. The cache batch size does affect the maximum number of entities received from Shopware in a single request.

Example: If you have 125 documents and configure a cache batch size of 50 the module will generate 3 search requests. The search requests are performed in parallel by a run handlers.

Validate

Each document is validated to ensure it can be transformed into a valid Shopware entity. The validate batch size determines the number of documents that are validated per message in the subsection validate. There are no batch requests performed during this process, so the batch size does mainly affect the memory consumption during the document iteration.

Example: If you have 125 documents and configure a validate batch size of 50 the module will generate 3 run messages. You will have two run messages that validate 50 documents each and one validating 25 documents. The validation messages are performed in parallel.

You can find more information about the validation process in the validation documentation.

Upsert

The upsert batch size determines how many documents are transformed and sent to shopware in a single Sync API request. The sync API is used to reduce the number of requests that are necessary to create or update entities in the target shop.

Example: If you have 125 documents and configure an upsert batch-size of 25 the module will generate 5 Sync API requests that contain 25 transformed documents each. Each request is generated by a run handler. So the requests are executed in parallel.

You can find more information about the upsert-process in the upserts documentation.

Delete

Analogous to the upsert-process the delete batch size determines how many entities are deleted in a single Sync API request.

Delta Count

This batch-size is a part of the anomaly detection process . Basically the module checks for every deleted document if the correspondent Shopware entity is still present in Shopware. This is done to calculate how many documents are actually going to be removed from Shopware. This is a very fast process because the documents are only checked against the cache. This means that you can use big batch sizes.

Example: If you set a deltaCount batch size of 200 every so called RunDeltaCountHandler will check for 200 documents if the corresponding Shopware entity is still present in Shopware. This process is performed in parallel.

Extract

As mentioned in the flow overview the module extracts certain EmbeddedDocuments from their BaseDocuments. The batch size extract determines the number of documents from which embedded entities are extracted in a single message.

Example: If you set an extract batch size of 250 a run handler will analyze 250 BaseDocuments and extracts its embedded documents.

This is a very fast process that is performed with non-hydrated documents. This increases the performance and reduces the memory usage. The performance of this subsection of course depends on the amount of data that is extracted, but in general huge batch sizes up to 1000 can increase the performance in big data sets.

Configuration

You can specify global default batch sizes and/or set it manually per subsection. If an individual value has been set in a subsection, it will be preferred to the global one.

Setting Global Values

If you remove the global values from the configuration a fallback batch size of 25 is used (if no individual subsection value is present).

{
  "batchSizes": {
    "cache": 50,
    "validate": 50,
    "upsert": 50,
    "delete": 50,
    "deltaCount": 50
  },
  "subsections": {
    "extractEmbedded": {
      "batchSize": 100
    }
  }
}  

Setting Values by Subsection

It is possible to overwrite global values per subsection. The following example shows how to set custom values for products:

{
  "...": "...",
  "batchSizes": {
    "cache": 50,
    "validate": 50,
    "upsert": 50,
    "delete": 50,
    "deltaCount": 250
  },
  "subsections": {
    "...": "...",
    "product": {
      "...": "...",
      "batchSizes": {
        "cache": 25,
        "validate": 75,
        "upsert": 15,
        "delete": 10,
        "deltaCount": 30
      }
    },
    "...": "..."
  }
}  

Batch Size Theory & Recommendations

If you choose very low batch sizes the general overhead caused by ...

  • ... handling messages
  • ... sending API requests
  • ... handling progress
  • ... mongo-db-communication
  • ... subsection management by subscribers
  • ...

... can significantly slow the module down.

However, if you choose very huge batch sizes, this can lead to other problems:

  • The memory consumption while handling lots of complex documents can become a problem
  • Sync API requests can contain a lot of entities and take a very long time or even run into timeouts
  • Huge Sync API requests sent to Shopware in parallel can cause performance issues in the target shop
  • The whole dataset may not be cut into chunks and is therefore not handled by multiple workers at the same time
  • ...

This is just a general overview of the things to consider while setting your batch sizes. These factors can even differ from one subsection to another. Whereas products and orders are very complex entities and should therefore be handled with smaller batch sizes, properties can be handled with much bigger batch sizes without running into the problems mentioned above.

So the batch sizes to be used are different for every type of document. Even with the same type of document, it is possible that they need to be treated differently in different projects. If project A uses a batch size of 100 for its customers without any problems this can cause issues for project B - e.g. due to lots of customer-extensions added to your documents and/or a module-extension that extends the customer-handling.

Another thing to consider is the target shop itself. Depending on the hardware, installed plugins or even the number of sales channels the performance on how fast a Sync API requests is performed can be very different. Due to different complexities on indexing entities in Shopware the same batch sizes can behave very differently.

So as you can see it is very hard to recommend "the best" batch size to use. In reality there are way too many factors in place that influence the performance of the whole mapping-process. Nevertheless, the next section tries to give you a good starting point.

Recommended Values

Because the batch size has a big impact on the performance of the module, we often receive the question about recommended values. In fact, this question is very difficult to answer, because it is influenced by various factors. Nevertheless, this section is intended to provide some insight on how to determine good batch sizes and gives you some recommendations.

In short: In general, the more complex a document and/or Shopware entity, the smaller the batch sizes should be. Use the default batch sizes from below as a starting point. In many cases these values will probably work for you. However, if you have the feeling the module is too slow in huge data sets, feel free to experiment with larger batch sizes. But before doing so please read the section "Batch Size Theory" that you can find at the end of this page.

{
  "batchSizes": {
    "cache": 50,
    "delete": 50,
    "upsert": 50,
    "validate": 50,
    "deltaCount": 250
  },
  "subsections": {
    "extractEmbedded": {
      "batchSize": 500
    },
    "product": {
      "batchSizes": {
        "upsert": 25,
        "cache": 25
      }
    },
    "propertyGroupOption": {
      "batchSizes": {
        "cache": 100,
        "upsert": 100,
        "validate": 200,
        "delete": 100
      }
    },
    "order": {
      "batchSizes": {
        "cache": 25,
        "upsert": 10,
        "validate": 25
      }
    }
  }
}

As seen in this configuration example, there are many subsections that do not need custom batch sizes. So the following subsections are using the global default values:

  • customer (you could consider to reduce this to 25 as well)
  • customer-group
  • shipping-method
  • payment-method
  • tax
  • unit
  • manufacturer

The following subsections have been adjusted and do not use the global default values:

  • property-group-option: You can increase batch sizes due to very minimalistic entities. Often there are thousands of properties, so bigger batch sizes can greatly improve the performance.
  • product: You should reduce the batch sizes due to complex entities. This can help to reduce internal server errors and high server load during sync api requests in Shopware. However, you can also try using the global batch size of 50, as this issue has been seen less frequently in the newer Shopware versions.
  • order: A request to upsert an order contains up to 6 embedded entities. So this should be reduced due to very complex entities.

In general, the more complex a document and/or shopware entity, the smaller the batch sizes should be. But in fact it is very hard to recommend which batch size you should use. The reasons for that are explained in the following section.