Glossary

This page lists terms and definitions commonly used in this documentation.

Active Party - In VFL, the party that owns the labels. Might also be contributing data.

Aggregator - the integrate.ai cloud server that collects and aggregates models.

Central server - the integrate.ai cloud server. Also known as the aggregator. The central server does not collect or host datasets.

Client - the integrate.ai client software package.

Data custodian - the user in charge of the dataset or data silo. May or may not also be a machine learning scientist.

Dataset or data silo - a single unique collection of data.

Differential privacy - a technique that adds noise to the model during local training to reduce the possibility that the model can be used to re-identify individual data points.

Epoch - one cycle or iteration of training a complete dataset. In integrate.ai terms, one epoch is one round.

Federated machine learning - a machine learning technique where model training is performed on datasets on local clients with the model parameters being aggregated on a central server.

Feedforward Neural Network (FFN) - a type of neural network in which information flows through the nodes in a single direction (forward).

Examples of use cases include:

  • Classification tasks like image recognition or churn conversion prediction.

  • Regression tasks like forecasting revenues and expenses, or determining the relationship between drug dosage and blood pressure

Gradient Boosted Models (GBM) - a model class that builds predictive models by using three elements: a loss function, weak learners, and an additive model where trees are added one at a time, and existing trees in the model are not changed.

Generalized Linear Models (GLM) - a model class that supports a variety of regression models. Examples include linear regression, logistic regression, Poisson regression, gamma regression, and inverse Gaussian regression models. We also support regularizing the model coefficients with the elastic net penalty.

Examples of use cases include:

  • Agriculture / weather modeling: number of rain events per year, amount of rainfall per event, total rainfall per year

  • Risk modeling / insurance policy pricing: number of claim events / policyholder per year, cost per event, total cost per policyholder per year

  • Predictive maintenance: number of production interruption events per year, duration of interruption, total interruption time per year

HFL - Horizontal federated learning. Also known as sample-based federated learning.

Machine Learning (Data) Scientist - the user most often responsible for training the data model. May or may not be a custodian for one or more datasets.

Model - a file that has been trained to recognize certain types of patterns.

Node - integrate.ai term for a single dataset associated with a single data collector or machine. The combination of dataset and machine together form a node.

Passive Party - In VFL, the party that is contributing data only.

PET - Privacy Enhancing Technology

Private Record Linkage (PRL) -

Round - integrate.ai term for one epoch of training with a complete dataset.

Session - integrate.ai term for the time period in which rounds of model training are being performed.

Training - the process of generating a model (file) that can recognize patterns in datasets.

VFL - Vertical federated learning. A federated learning setting where multiple parties, each having different features for the same user data set, jointly train machine learning models without sharing their data or model parameters.

Last updated