Skip to content

Datapool

Using Datapools, you will be able to manage batch processing of items efficiently.

Datapool can be considered a queue manager, allowing us to have control and granularity over the items that need to be processed.

In the following sections, you will find details about how a Datapool works and how to use this functionality in your automation processes.

Datapools list

Creating a Datapool

To create a new Datapool, click on + New Datapool and fill in the following fields:

Datapool new

  • Label: The unique identifier that will be used to access the Datapool.
  • Active: If ACTIVE, the Datapool will be available to be accessed and consumed.
  • Consumption policy: You can choose between two consumption policies:
    • FIFO: The first item to be added to the Datapool will also be the first item to be processed.
    • LIFO: The last item to be added to the Datapool will be the first item to be processed.
  • Auto retry: If enabled, an item can be automatically reprocessed in the event of an error.
    • Max auto retry: The maximum number of attempts for an item to be processed successfully.
  • Abort on error: If enabled, the Datapool becomes inactive and is no longer consumed in the event of consecutive errors.
    • Max errors before inactive: Maximum number of consecutively processed items with an error that will be tolerated until the Datapool becomes INACTIVE.
  • Item max processing time (Minutes): Expected time for a Datapool item to be processed under normal conditions.
  • Trigger: You can define whether the created Datapool will also be responsible for triggering new tasks:
    • ALWAYS: Whenever a new item is added to the Datapool, a new task for a given automation process will be created.
    • NEVER: Datapool will never be responsible for triggering tasks from an automation process.
    • NO TASK ACTIVE: Whenever a new item is added, Datapool will trigger a new task from an automation process only if there are no tasks from that process being executed or pending.
  • Default automation: The automation process that Datapool will use to trigger new tasks if any trigger is being used.
  • Schema: The fields that will make up the structure of a Datapool item. You can add new fields to the schema by clicking +Add and defining a label and the expected data type.

Adding new items to the Datapool

We can add new items to the Datapool in two different ways.

Datapool view

Adding each item manually

By clicking on + Add entry, we can add a new item to the Datapool. We can fill in what the priority of this specific item will be and also the values that this item receives.

In addition to filling in the values defined when creating the Schema, we can also add new fields containing additional data that are part of this item.

By clicking +Add within the item filling window, you can include as many additional fields as necessary for that specific item.

Datapool add item

Adding items using a CSV file

In addition to adding items manually, we can add multiple items simultaneously through a .csv file.

By selecting the Import CSV option, we can download an example file and fill it with information about the items that will be added to the Datapool.

Once this is done, upload the file and click Upload to automatically upload the items.

Datapool import csv

Managing Datapool items

For each item added to the Datapool, we can view the following information:

Datapool items list

  • Id: The item's unique identifier.
  • Priority: Priority set for the item.
  • State: The current state of the item in the Datapool.
  • Created at: The date the item was added to the Datapool.
  • Processing time: Time spent processing the item.
  • Lifecycle: The time elapsed from the creation of the item in the Datapool to the completion of processing.

When expanding an item's details by clicking +, we can view the following additional information:

Datapool item details

  • Task Id: The identifier of the task responsible for accessing and consuming that Datapool item.
  • User: The user responsible for adding the item to the Datapool.
  • Date creation: The date the item was added to the Datapool.
  • Date start processing: The date the item was consumed to be processed.
  • Date finished: The date that processing of the item was completed.
  • Parent: The parent item that this child item originated from. This information will be displayed in cases where an item is reprocessed (retry) or when processing an item is restarted (restart).
  • Child: The child item that this item originated from. This information will be displayed in cases where an item is reprocessed (retry) or when processing an item is restarted (restart).
  • Priority: The priority set for the item.
  • Values: The key/value sets that make up the Datapool item. You will be able to view the default fields that were defined through Datapool's Schema and also add new fields by clicking +Add.

In addition to viewing the information for each item, we can perform some operations by accessing the item's menu. You can Restart an item that has already been processed and also Delete an item that is still pending.

Important

When you restart or perform automatic reprocessing, a new item will be created in the Datapool.

These operations will never be done on the same item; instead, a "copy" of that item will be created in the Datapool that will reference the original item (Parent property mentioned previously).

Viewing item processing states

When adding a new item to the Datapool, it will initially be in the PENDING state. We can understand the states that an item can assume during its life cycle as follows:

PENDING: The item is waiting to be processed; at this point, it will be available to be accessed and consumed.

PROCESSING: The item has been accessed for execution and is in the processing phase.

DONE: Item processing has been completed successfully.

ERROR: Item processing was completed with an error.

TIMEOUT: Item processing is in a timeout phase (this can occur when the item's finish state is not reported via code).

Reporting the state of an item

Warning

For the states to be updated in the Datapool, the processing state of each item (DONE or ERROR) must be reported via code.

If the processing state of the item is not reported via the robot code, this will be automatically considered by Datapool as a TIMEOUT state for that item.

In the following sections, we will better understand how the state of an item can be reported via code.

Understanding the TIMEOUT state

The TIMEOUT state is based on the time that was defined in the Item max processing time (Minutes) property when creating the Datapool.

If the processing of an item exceeds the defined maximum time, either due to a lack of report indicating the state of the item or some problem in the execution of the process that prevents the report from being made, Datapool will automatically indicate that the item has entered a state of TIMEOUT.

This does not necessarily mean an error, as an item can still go from a TIMEOUT state to a DONE or ERROR state.

However, if the process does not recover (in case of possible crashes) and the item state is not reported, Datapool will automatically consider the state of that item as ERROR after a period of 24 hours.

How to use Datapools with the Maestro SDK

You can easily consume and report the state of items from a Datapool using the Maestro SDK in your automation code.

Installation

If you don't have the dependency installed yet, just follow these instructions:

pip install botcity-maestro-sdk

Important

In addition to installing, remember to include the dependency in the bot's requirements.txt file.

Importing the SDK

After installation, import the dependency and instantiate the Maestro SDK:

# Import for integration with BotCity Maestro SDK
from botcity.maestro import *

# Disable errors if we are not connected to Maestro
BotMaestroSDK.RAISE_NOT_CONNECTED = False

# Instantiating the Maestro SDK
maestro = BotMaestroSDK.from_sys_args()
# Fetching the details of the current task being executed
execution = maestro.get_execution()

Processing Datapool items

# Consuming the next available item and reporting the finishing state at the end
datapool = maestro.get_datapool(label="Items-To-Process")

while datapool.has_next():
    # Fetch the next Datapool item
    item = datapool.next(task_id=execution.task_id)
    if item is None:
        # Item could be None if another process consumed it before
        break

    # Processing item...

    item.report_done()

Tip

To obtain the value of a specific field that was defined in the Schema of the item, you can use the get_value() method or pass the field label between [], using the item reference.

item = datapool.next(task_id=execution.task_id)

data = item.get_value("data-label")
# or
data = item["data-label"]

Complete code

from botcity.core import DesktopBot
from botcity.maestro import *

# Disable errors if we are not connected to Maestro
BotMaestroSDK.RAISE_NOT_CONNECTED = False

def main():
    maestro = BotMaestroSDK.from_sys_args()
    execution = maestro.get_execution()

    bot = DesktopBot()
    # Implement here your logic...

    # Getting the Datapool reference
    datapool = maestro.get_datapool(label="Items-To-Process")

    while datapool.has_next():
        # Fetch the next Datapool item
        item = datapool.next(task_id=execution.task_id)

        if item is None:
            # Item could be None if another process consumed it before
            break

        # Getting the value of some specific field of the item
        item_data = item["data-label"]

        try:
            # Processing item...

            # Finishing as 'DONE' after processing
            item.report_done()

        except Exception:
            # Finishing item processing as 'ERROR'
            item.report_error()

def not_found(label):
    print(f"Element not found: {label}")

if __name__ == '__main__':
    main()

Tip

Look at the other operations we can do with Datapools using the BotCity Maestro SDK and BotCity Maestro API.