Building a Custom Model

If the standard integrate.ai models (Feedforward Neural Network or Generalized Linear Model) do not suit your needs, you can create a custom model. If you are working with non-tabular data, you can also create a custom dataloader to use with your custom model.

integrate.ai supports all custom models under pytorch (for example CNNs, LSTMs, Transformers and GLMs).

Customization is restricted in the following ways:

Models must use one of the following loss functions;
- Classification
- Regression
- Logistic
- Normal
- Poisson
- Gamma
- Inverse gaussian
Only a single output train function is supported
Data augmentation is not supported

Download template and examples

Start by downloading the sample package below. This package contains everything you need to learn about and create your own custom model package. Review these examples and the API documentation before getting started.

10KB

sample_packages.zip

Contents of sample_packages.zip:

template_package folder that contains template_model.py and template_dataset.py.
- Use these files as starting points when creating a custom model and data loader.
Two example custom model packages, each including a readmefile that explains how to test and upload the custom model.
- cifar10_vgg16 folder - contains a VGG net model that is ideal for large scale image recognition.
- lstmTagger folder - contains an LSTM (long short term memory) RNN model, which is ideal for processing sequences of data.

Create a custom model package

Using the template files provided, create a custom model package.

Follow the naming convention for files in the custom package: no spaces, no special characters, no hyphens, all lowercase characters.

Create a folder to contain your custom model package. For this tutorial, this folder is named myCustomModel, and is located in the same parent folder as the template folder. Example path: C:\<workspace>\integrate_ai_sdk\sample_packages\myCustomModel
Create two files in the custom model package folder:
1. model.py - the custom model definition. You can rename the template_model.py as a starting point for this file.
2. <model-class-name>.json - default model inputs for this model. It must have the same name as the model class name that is defined in the model.py file. If you are using the template files, the default name is templatemodel.json.
Optional: To use a custom dataloader, you must also create a dataset.py and a dataset configuration JSON file in the same folder. For more information, see Create a Custom Dataloader. If there is no custom dataset file, the default TabularDataset loader is used. It loads .parquet and .csv files, and requires predictors: ["x1", "x2"], target: y as input for the data configuration. This is what is used for the standard models.

Example data configuration for the TabularDataset loader

{
    "predictors": ["sbp", "tobacco", "ldl", "adiposity", "typea", "obesity", "alcohol", "age"],
    "target": "chd"
}

Custom model definition

The API class IaiBaseModule must be implemented for all custom models. This class is the super class of all models.

class IaiBaseModule(abc.ABC, torch.nn.modules.module.Module)

For detailed information, see the API Documentation.

The example below provides the boilerplate for your custom model definition. Fill in the code required to define your model. Refer to the model.py files provided for the lstmTagger and cifar10_vgg16 examples if needed.

Example: template_model.py

from integrate_ai_sdk.base_class import IaiBaseModule

class TemplateModel(IaiBaseModule):
    def __init__(self):
        """
        Here you should instantiate your model layers based on the configs.
        """
        super(TemplateModel, self).__init__()

    def forward(self):
        """
        The forward path of a model. Can take an input tensor and return a prediction tensor
        """
        pass


if __name__ == "__main__":
    template_model = TemplateModel()

Custom model configuration inputs

Create a JSON file that defines the model inputs for your model.
It must have the same name as the model class name that is defined in the model.py file (e.g. templatemodel.json).

The content of this file is dictated by your model. The following parameters are required:

Parameter Description

Parameter	Description
`experiment_name`	(string) The name of your experiment.
`experiment_description`	(string) A description of your experiment.
`strategy`	(object) The federated learning strategy to use and any required parameters. Supported strategies are: `FedAvg, FedAvgM, FedAdam, FedAdagrad, FedOpt, FedYogi`. See Strategy Library for details.
`model`	(object) The model type and any required parameters.
`ml_task`	The machine learning task type and any required parameters. Supported types are: `regression, classification, logistic, normal, Poisson, gamma`, and `inverseGaussian`.
`optimizer`	(object) The optimizer and any required parameters. See the Torch.Optim package description for details.
`differential_privacy_params`	(number) The privacy budget. Larger values correspond to less privacy protection and potentially better model performance.

experiment_name

(string) The name of your experiment.

experiment_description

(string) A description of your experiment.

strategy

(object) The federated learning strategy to use and any required parameters.

Supported strategies are: FedAvg, FedAvgM, FedAdam, FedAdagrad, FedOpt, FedYogi.

See Strategy Library for details.

model

(object) The model type and any required parameters.

ml_task

The machine learning task type and any required parameters.

Supported types are: regression, classification, logistic, normal, Poisson, gamma, and inverseGaussian.

optimizer

(object) The optimizer and any required parameters.

See the Torch.Optim package description for details.

differential_privacy_params

(number) The privacy budget. Larger values correspond to less privacy protection and potentially better model performance.

Example: FFNet model with FedAvg strategy

{
	"experiment_name": "Test session",
	"experiment_description": "This is a test session",
	"job_type": "training",
	"strategy": {
		"name": "FedAvg",
		"params": {}
		},
	"model": {
		"type": "FFNet",
		"params": {
			"input_size": 200,
			"hidden_layer_sizes": [80,40,8],
			"output_size": 2,
			"hidden_activation": "relu"
			}
	},
	"ml_task": "classification",
	"optimizer": {
		"name": "SGD",
		"params": {
			"learning_rate": 0.03,
			"momentum": 0
		}
	},
	"differential_privacy_params": {
		"epsilon": 1,
		"max_grad_norm": 7,
		"delta": 0.000001
	},
	"eval_metrics": [
		"accuracy",
		"loss",
		"roc_auc"
	]
}

Below is the outline for the full schema used to validate the model configuration inputs for GLM and FFNet models. This schema is provided for reference.

Model validation schema

{
  "$schema": "https://json-schema.org/draft-07/schema#",
  "title": "FL Model Config",
  "description": "The model config for an FL model",
  "type": "object",
  "properties": {
    "experiment_name": {
      "type": "string",
      "description": "Experiment Name"
    },
    "experiment_description": {
      "type": "string",
      "description": "Experiment Description"
    },
    "strategy": {
      "type": "object",
      "properties": {
        "name": {
          "enum": [
            "FedAvg"
          ],
          "description": "Name of the FL strategy"
        },
        "params": {
          "type": "object",
          "properties": {
            "fraction_fit": {
              "type": "number",
              "minimum": 0,
              "maximum": 1,
              "description": "Proportion of clients to use for training. If fraction * total_num_users is smaller than min_num_clients set in the session config, then min_num_clients will be used."
            },
            "fraction_eval": {
              "type": "number",
              "minimum": 0,
              "maximum": 1,
              "description": "Proportion of clients to use for evaluation. If fraction * total_num_users is smaller than min_num_clients set in the session config, then min_num_clients will be used."
            },
            "accept_failures": {
              "type": "boolean",
              "description": "Whether to accept failures during training and evaluation. If False, the training process will be stopped when a client fails, otherwise, the failed client will be ignored."
            }
          },
          "additionalProperties": false
        }
      },
      "required": [
        "name",
        "params"
      ]
    },
    "model": {
      "type": "object",
      "description": "Model type and parameters",
      "properties": {
        "params": {
          "type": "object",
          "description": "Model parameters"
        }
      },
      "required": [
        "params"
      ]
    },
    "ml_task": {
      "type": "object",
      "description": "Type of ML task",
      "properties": {
        "type": {
          "enum": [
            "regression",
            "classification",
            "logistic",
            "normal",
            "poisson",
            "gamma",
            "inverseGaussian"
          ]
        },
        "params": {
          "type": "object"
        }
      },
      "required": ["type", "params"],
      "allOf": [
        {
          "if": {
            "properties": { "type": { "enum": ["regression", "classification"] } }
          },
          "then": {
            "properties": { "params": { "type": "object" } }
          }
        },
        {
          "if": {
            "properties": { "type": { "enum": [
              "logistic",
              "normal",
              "poisson",
              "gamma",
              "inverseGaussian"
            ] } }
          },
          "then": {
            "properties": { "params": {
              "type": "object",
              "properties": {
                "alpha": {
                  "type": "number",
                  "minimum": 0
                },
                "l1_ratio": {
                  "type": "number",
                  "minimum": 0,
                  "maximum": 1
                }
              },
              "required": ["alpha", "l1_ratio"]
            } }
          }
        }
      ]
    },
    "optimizer": {
      "type": "object",
      "properties": {
        "name": {
          "enum": [
            "SGD"
          ]
        },
        "params": {
          "type": "object",
          "properties": {
            "learning_rate": {
              "type": "number",
              "minimum": 0,
              "description": "See https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD for details"
            },
            "momentum": {
              "type": "number",
              "minimum": 0,
              "description": "See https://pytorch.org/docs/stable/generated/torch.optim.SGD.html#torch.optim.SGD for details"
            }
          },
          "required": [
            "learning_rate"
          ],
          "additionalProperties": false
        }
      },
      "required": [
        "name",
        "params"
      ],
      "additionalProperties": false
    },
    "differential_privacy_params": {
      "type": "object",
      "properties": {
        "epsilon": {
          "type": "number",
          "minimum": 0,
          "description": "Privacy budget. Larger values correspond to less privacy protection, and potentially better model performance."
        },
        "max_grad_norm": {
          "type": "number",
          "minimum": 0,
          "description": "The upper bound for clipping gradients. A hyper-parameter to tune."
        }
      },
      "required": [
        "epsilon",
        "max_grad_norm"
      ]
    },
    "eval_metrics": {
      "description": "A list of metrics to use for evaluation",
      "type": "array",
      "minItems": 1,
      "items": {}
    }
  },
  "required": [
    "experiment_name",
    "experiment_description",
    "strategy",
    "model",
    "ml_task",
    "optimizer",
    "differential_privacy_params"
  ]
}

Create a Custom Dataloader/Dataset

The default dataloader is a Tabular dataset loader that is useful for standard FFN and GLM models. If your data is not in a tabular format (for example, if it contains images or sound files, or is organized in a folder hierarchy) you can create a custom dataloader.

A custom dataloader requires two additional files:

dataset.py - the custom data loading function. You must import AIBaseDataset, torch, and Tuple in this file.
<custom-class-name>.json- specifies the default inputs for the dataloader for this model. It must have the same name as the dataloader class name, which is defined in the dataset.py file.

The API class IaiBaseDataset must be implemented for all custom datasets. This class is the super class for all datasets.

class IaiBaseDataset(typing.Generic[+T_co]):

For detailed information, see the API Documentation.

The example below provides the boilerplate for your custom dataloader definition. Fill in the code needed to define your function. You can refer to the dataset.py files provided for the lstmTagger and cifar10_vgg16 examples if needed.

Example: template_dataset.py

from typing import Tuple
import torch
from integrate_ai_sdk.base_class import IaiBaseDataset

class TemplateDataset(IaiBaseDataset):
    def __init__(self, path: str) -> None:
        """
        In this class you can load and pre-process the data.
        You can add parameters to do pre-processing or transformations on your dataset
        @param path: path of your dataset - REQUIRED
        """
        super(TemplateDataset, self).__init__(path)

    def __getitem__(self, item: int) -> Tuple[torch.Tensor]:
        """
        This method is responsible for producing each data point tensor.
        :param item:
        :return:
        """
        pass

    def __len__(self) -> int:
        """
        Returns the size of the dataset
        :return: dataset_size
        """
        pass


if __name__ == "__main__":

    dataset = TemplateDataset("path_to_your_sample_data")

Custom dataset configuration

Create a JSON file that defines the inputs for your dataloader. It must have the same name as the dataloader class name that is defined in the dataset.py file.

The content of this file is dictated by your model.

Example: dataset configuration

{
    "predictors": ["sbp", "tobacco", "ldl", "adiposity", "typea", "obesity", "alcohol", "age"],
    "target": "chd"
}

Test and upload the custom model

Before you start training your custom model, you should test it and upload it to your workspace. The method for uploading also tests the model by training a single epoch locally. After the model has been successfully uploaded, you or any user with access to the model can train it in a session.

To test and upload a custom model, use the upload_model method:

def upload_model(
	self,
	package_path: str,
	dataset_path: str,
	package_name: str,
	sample_model_config_path: str,
	sample_data_config_path: str,
	batch_size: int,
	task: str,
	test_only: bool,
	description: str
):

where:

Argument (type) Description

Argument (type)	Description
`package_path` (str)	Path to your custom model folder
`dataset_path` (str)	Path to the dataset(s)
`package_name` (str)	Name of the custom model package. It must be unique from other previously uploaded package names.
`sample_model_config_path` (str)	Path to the model configuration JSON file
`sample_data_config_path` (str)	Path to the dataset configuration JSON file
`batch_size` (int)	Number of samples to propagate through the network at a time
`task` (str)	Either 'classification' or 'regression'. Set it to 'regression' for numeric and 'classification' for categorical target.
`test_only` (bool)	If set to `True`, perform one epoch training to test the model without uploading it. If set to `False`, tests and uploads the model if the test passes.
`description` (str)	Can be set with maximum 1024 characters to describe the model. This description also appears in the integrate.ai web portal.

package_path (str)

Path to your custom model folder

dataset_path (str)

Path to the dataset(s)

package_name (str)

Name of the custom model package. It must be unique from other previously uploaded package names.

sample_model_config_path (str)

Path to the model configuration JSON file

sample_data_config_path (str)

Path to the dataset configuration JSON file

batch_size (int)

Number of samples to propagate through the network at a time

task (str)

Either 'classification' or 'regression'. Set it to 'regression' for numeric and 'classification' for categorical target.

test_only (bool)

If set to True, perform one epoch training to test the model without uploading it.

If set to False, tests and uploads the model if the test passes.

description (str)

Can be set with maximum 1024 characters to describe the model. This description also appears in the integrate.ai web portal.

This method tests a custom model by creating the model based on the custom model configuration (JSON file) and then training it with one epoch locally. If the model fails the test, it cannot be uploaded.

When starting a session with your custom model, make sure you specify the correct package_name, model_config, and data_config file names. For details, see create_fl_session in the API documentation.

PreviousMaking Predictions NextExploratory Data Analysis - Individual Mode

Last updated 1 year ago