Building a Custom Model
If the standard integrate.ai models (Feedforward Neural Network or Generalized Linear Model) do not suit your needs, you can create a custom model. If you are working with non-tabular data, you can also create a custom dataloader to use with your custom model.
integrate.ai supports all custom models under pytorch (for example CNNs, LSTMs, Transformers and GLMs).
Customization is restricted in the following ways:
Models must use one of the following loss functions;
Classification
Regression
Logistic
Normal
Poisson
Gamma
Inverse gaussian
Only a single output train function is supported
Data augmentation is not supported
Download template and examples
Start by downloading the sample package below. This package contains everything you need to learn about and create your own custom model package. Review these examples and the API documentation before getting started.
Contents of sample_packages.zip:
template_package folder that contains
template_model.py
andtemplate_dataset.py
.Use these files as starting points when creating a custom model and data loader.
Two example custom model packages, each including a
readme
file that explains how to test and upload the custom model.cifar10_vgg16 folder - contains a VGG net model that is ideal for large scale image recognition.
lstmTagger folder - contains an LSTM (long short term memory) RNN model, which is ideal for processing sequences of data.
Create a custom model package
Using the template files provided, create a custom model package.
Follow the naming convention for files in the custom package: no spaces, no special characters, no hyphens, all lowercase characters.
Create a folder to contain your custom model package. For this tutorial, this folder is named
myCustomModel
, and is located in the same parent folder as the template folder. Example path:C:\<workspace>\integrate_ai_sdk\sample_packages\myCustomModel
Create two files in the custom model package folder:
model.py
- the custom model definition. You can rename thetemplate_model.py
as a starting point for this file.<model-class-name>.json
- default model inputs for this model. It must have the same name as the model class name that is defined in the model.py file. If you are using the template files, the default name is templatemodel.json.
Optional: To use a custom dataloader, you must also create a
dataset.py
and a dataset configuration JSON file in the same folder. For more information, see Create a Custom Dataloader. If there is no custom dataset file, the defaultTabularDataset
loader is used. It loads.parquet
and.csv
files, and requirespredictors: ["x1", "x2"], target: y
as input for the data configuration. This is what is used for the standard models.
Custom model definition
The API class IaiBaseModule
must be implemented for all custom models. This class is the super class of all models.
class IaiBaseModule(abc.ABC, torch.nn.modules.module.Module)
For detailed information, see the API Documentation.
The example below provides the boilerplate for your custom model definition. Fill in the code required to define your model. Refer to the model.py
files provided for the lstmTagger
and cifar10_vgg16
examples if needed.
Custom model configuration inputs
Create a JSON file that defines the model inputs for your model.
It must have the same name as the model class name that is defined in the model.py file (e.g. templatemodel.json).
The content of this file is dictated by your model. The following parameters are required:
Parameter | Description |
---|---|
| (string) The name of your experiment. |
| (string) A description of your experiment. |
| (object) The federated learning strategy to use and any required parameters. Supported strategies are: See Strategy Library for details. |
| (object) The model type and any required parameters. |
| The machine learning task type and any required parameters. Supported types are: |
| (object) The optimizer and any required parameters. See the Torch.Optim package description for details. |
| (number) The privacy budget. Larger values correspond to less privacy protection and potentially better model performance. |
Below is the outline for the full schema used to validate the model configuration inputs for GLM and FFNet models. This schema is provided for reference.
Create a Custom Dataloader/Dataset
The default dataloader is a Tabular dataset loader that is useful for standard FFN and GLM models. If your data is not in a tabular format (for example, if it contains images or sound files, or is organized in a folder hierarchy) you can create a custom dataloader.
A custom dataloader requires two additional files:
dataset.py
- the custom data loading function. You must importAIBaseDataset
,torch
, andTuple
in this file.<custom-class-name>.json
- specifies the default inputs for the dataloader for this model. It must have the same name as the dataloader class name, which is defined in thedataset.py
file.
The API class IaiBaseDataset
must be implemented for all custom datasets. This class is the super class for all datasets.
class IaiBaseDataset(typing.Generic[+T_co]):
For detailed information, see the API Documentation.
The example below provides the boilerplate for your custom dataloader definition. Fill in the code needed to define your function. You can refer to the dataset.py
files provided for the lstmTagger
and cifar10_vgg16
examples if needed.
Custom dataset configuration
Create a JSON file that defines the inputs for your dataloader. It must have the same name as the dataloader class name that is defined in the dataset.py
file.
The content of this file is dictated by your model.
Test and upload the custom model
Before you start training your custom model, you should test it and upload it to your workspace. The method for uploading also tests the model by training a single epoch locally. After the model has been successfully uploaded, you or any user with access to the model can train it in a session.
To test and upload a custom model, use the upload_model
method:
where:
Argument (type) | Description |
---|---|
| Path to your custom model folder |
| Path to the dataset(s) |
| Name of the custom model package. It must be unique from other previously uploaded package names. |
| Path to the model configuration JSON file |
| Path to the dataset configuration JSON file |
| Number of samples to propagate through the network at a time |
| Either 'classification' or 'regression'. Set it to 'regression' for numeric and 'classification' for categorical target. |
| If set to If set to |
| Can be set with maximum 1024 characters to describe the model. This description also appears in the integrate.ai web portal. |
This method tests a custom model by creating the model based on the custom model configuration (JSON file) and then training it with one epoch locally. If the model fails the test, it cannot be uploaded.
When starting a session with your custom model, make sure you specify the correct package_name
, model_config
, and data_config
file names. For details, see create_fl_session
in the API documentation.
Last updated