Processing large collections in actions

Find out how to work with large collections of data in the action builder with the help of batch processing.

After reading this article you’ll know:

  • How to deal with large collections of records in action builder

  • The purpose of batch processing for handling large collections

  • What action steps are applied when working with large collections

Large amounts of data can lead to challenges while processing it in applications. Therefore, we always need to think ahead to find the right solutions. Fortunately, there are some best practices to choose from within the Betty Blocks platform.

  • Manually set up a batch action flow, allowing customisation during your steps (max 200 records per set)

  • Using the auto batched collection, auto-batching by your application (max 5000 records per set)

More about working with the Loop step and the auto batched collection feature, can be found here.

In this article, we’ll explore how to create your own action flow to handle large collections of data in the action builder.

Getting started

Employing the strategy of splitting data into manageable batches of up to 200 records at a time, and using the dynamic combination of different action steps (data API request, loop times, expression steps, etc.), we can effectively work with data collections to achieve our goals.

Use case

First, let’s briefly define what kind of use case and data collection we’ll be using. It will be an employee management application based on the Employee model that stores data of all employees assigned to some specific project. Besides the Name and Email, each employee will have the status applied to their record - Active checkbox property, which would identify if certain employees are actively working on a project or not.

Our employee management page is created with the back office page template and is based on the Employee model. Some mock employee data has been added to this model, and as a result, we have about 800 unique records.

Looking at how the table looks now, we’ll see that some of the employees are marked as ‘Active’. But, let’s say, we are closing the project and want to unassign every employee from it. 800 records isn’t that bad but imagine if you have a hundred times bigger number. That’s where batches would come into play.

Batch size explained

Before we dive into building the action flow, we need to define the batch size for our action. In simple terms, batch size refers to the number of data points (records in our case) that are processed at a time. It’s like dividing a large group of data into smaller, manageable chunks to work with. At Betty Blocks, you can use batch sizes of up to 200 records each.

Using a larger batch size can speed up the processing of large collections of data, but it may require more memory. On the other hand, a smaller batch size may take longer to process the data, but it can be more memory efficient. Adjusting the batch size can impact the efficiency and speed of the processing, and how it’s an important consideration when working with big datasets.

Blocks used

Only some of the action steps we are going to use come out of the box, so you have to pre-install a few of them from the block store. Go there, and install three of these blocks: loop times, data API requests, expression, and update many records. We will use them in our further setup.

After everything is prepared, we can proceed to the next step - building our action flow.

Building the action flow

Go to the action builder and create a new action. Call it something like Update collections.

1. Open the Start step and create a new action variable of the number kind called batch_size. Type in the value for it that will define the number of records for processing in one go. Let’s set it to 100 and save.

Note: You have to find your own ‘sweet spot’ while setting the number of records in a batch - that is, the exact number that will lead to quick processing of data in your application.

2. Drop the data API requests action step to the canvas and set it up like this:

  • Select the Employee model

  • Define the Collection type as our query starting point

  • Output type: Total count that should always be used when querying collections

  • Type in total_count as the result variable to be used further in the flow. Save the step.

3. Place the expression step after the data API requests. Using the expression, we will calculate the number of batches that we need.

  • Paste or type in the following expression:

Math.ceil( / )

  • Then add the variables we’ve already created in the previous steps: total_count and batch_size

  • Finish by typing in the result variable, something like batches_number. Leave the result type as Text. Don’t forget to save your configuration.

Note: Normally, the result type should be set as Number but unfortunately current version of the loop times step supports only Text format.

4. Let’s now check if we are getting the total number of records and batches by dropping the log message after the expression step:

  • Select the info severity

  • Add a few variables: total_count & batches_number

Save your configuration and do a test run for your action (explained in the Log message step article) open the logs to see if the record total score is correct and if the batches are split right.

As you can see, it was done as we expected:

  • Batches number: 8

  • Total count: 800

 

5. Once you’re sure that the first steps are working, take the loop times action step and drop it after the expression and log message.

  • Choose the number_of_batches variable to set as loop times value - that will mean if the number of batches changes, you won’t have to set it manually again

  • Come up with the iterator name under which each object of iteration will be available

  • In the index, set the name of a variable under which the current iterator will be available and save this step.

6. Next, we are going to set another expression step. Connected to the update many records step, it will enable us to skip a certain number of batches depending on the batch size we’ve defined. In our case, it’s 100 records each of 8-loop time, until it gets to 800.

  • Use the expression:

*

  • Set the name for the result of this expression as ‘skip’ and choose the type Number. Save the action step.

7. Drop the update many records step to the loop and configure it:

  • Add a new collection variable, and type in the name for it. Then select your model and the ‘skip’ variable. As we agreed earlier, we’ll take the batch_size variable as a measure of one increment (100 records) to be updated at a time.

Note: You can optionally add the indexation of your collection, create the filter rule, and select the order to process the records in your collections.

  • In the options:

    • Pick up the employee variable as the collection, and map the property we’ll be updating by choosing the Active property.

    • As we want to unassing every employee from the project, we’ll leave the value as False.

    • Type in the name for the updated collection and save the step.

Final check

Finally, we can another log message step to the loop and see the process in front of us. As you can see, the records stored in our data model have been updated in batches of 100 records at a time.

If we update our front-end page, all the employee records will be marked as inactive.

Overall, the bottom line of this example should be read like this: if you deal with large amounts of data, try splitting them into batches. Do it wisely, finding the balance between efficient processing and resource strain.