Custom Models

Build a custom model class in IntegrateFL

If a user wants to create a federated model that is not a standard integrate.ai model (Feed Forward Neural Nets or a Generalized Linear Models), there is the option to create a custom model package.

A user can test and upload a custom model class into integrate.ai using integrate.ai custom model sdk, accessible in our Docker image via the hfl upload command. Once a custom model is uploaded, it can be used by any other users in that workspace.

integrate.ai supports all custom models under pytorch, for example CNNs, LSTMs, Transformers and GLMs. Customization is restricted in the following ways:

  • Models must use one of the following loss functions;

    • Classification, regression, logistic, normal, poisson, gamma, and inverse gaussian

  • We only supports a single output train function

  • We do not support data augmentation

Step 1: Download template and examples

Start by downloading the sample packages below. This package contains everything you need to learn about and create your own custom model package. It's suggested that the user review these examples and the API documentation before getting started.

Contents of sample_packages:

Step 2: Create model

The next step is to draft the custom model package. Use the template files provided in Step 1 as a guide, and refer to the API Documentation for more information.

Every custom model package folder is required to have two files:

  • model.py - file for the custom model definition. Requires user to import the lAIBaseModule.

  • <model-class-name>.json - default model inputs for this model. It must have the same name as the model class name, which is defined in the model.py file.

Optionally, the user can create a custom data loader to extend the IAI base dataset, which requires an additional 2 files:

  • dataset.py - the custom data loading function. Requires user to import the lAIBaseDataset, torch and Tuple.

  • <custom-class-name>.json- default inputs for the data loader for this model. It must have the same name as the data loader class name, which is defined in the dataset.py file.

If there is no custom dataset file, PowerFlow will default to a TabularDataset loader, which will load .parquet and .csv files, and requires predictors: ["x1", "x2"], target: y as input for in the data configuration. This is what is used for the standard models, and you can see an example of that here.

Follow the following naming convention for these file names: no spaces, no special characters, no hyphens, all lowercase characters.

Step 3: Test and upload with hfl upload function

The last step is to test and upload! The hfl upload command has two modes (1) test the model using the default inputs provided and (2) test and upload the model into PowerFlow. If the model fails the test, it will not be uploaded.

Open a terminal window, and use this command:

hfl upload [-h] [--test-only] --token TOKEN --package-path PACKAGE_PATH --dataset-path DATASET_PATH --task {classification, regression, logistic, normal, poisson, gamma, inverse gaussian} --batch-size BATCH_SIZE [--package-name PACKAGE_NAME] [--model-config-path MODEL_CONFIG_PATH] [--data-config-path DATA_CONFIG_PATH] [--description DESCRIPTION]

Use hfl upload --help to access the definitions of each element of the command:

hfl upload --help

optional arguments:
  -h, --help            show this help message and exit
  --test-only           include this to test the model before uploading
  --token TOKEN         Authentication token, from the PowerFlow UI
  --package-path PACKAGE_PATH
                        Package folder path.
  --dataset-path DATASET_PATH
                        Dataset path.
  --task {classification, regression, logistic, normal, poisson, gamma, inverseGaussian}
                        specify the machine learning task
  --batch-size BATCH_SIZE
                        Batch size to load the data with.
  --package-name PACKAGE_NAME
                        Optional package name, default to the package directory name
  --model-config-path MODEL_CONFIG_PATH
                        Optional path to the model config file. 
                        default to <package_path>/<model_class_name>.json
  --data-config-path DATA_CONFIG_PATH
                        Optional path to the data config file. 
                        default to <package_path>/<dataset_class_name>.json, 
                        when no dataset class is defined in the package, the
                        TabularDataset class will be used,and the data config path 
                        of a tabular dataset must be provided.
  --description DESCRIPTION
                        Package description, maximum 1024 characters.

The following message will appear if you have successfully uploaded the model:

2022-02-14 17:06:52,314 FLOUR MainThread INFO | orchestration.py:315 | Successfully uploaded model definition: <model name>

Once uploaded, the custom model will appear in the Model Library and any user in the workspace can login to create a session using that model.

Last updated