Error Handling

Understand and address common errors that occur while using integrate.ai

When joining a Session

  1. Authentication denied response in Client App and cannot join ("401 Client Error: UNAUTHORIZED for url:...")

    Potential Cause: Token is invalid or expired (expiry time set to 30 days)

    Resolution: Download new token from Management Console, see Step 1: Generate token

  2. Dataset validation error response in Client App and cannot join ("Columns in data set: <dataset name> for session_id: <session id>, does not match the data schema [<predictor/target column(s)>, ... ]

    Potential Cause: dataset does not contain target or predictor(s) columns

    Resolution: Make sure that predictor and target column(s) are contained in your dataset(s) according to the session’s dataset configuration

  3. Error message "CLI0001: Cannot join session '850e98de77'. Current session status: 'created'. Expected status: 'started'."

    Cause: Session has been created but not started, cannot join the session as a result

    Resolution: Go back to session details page and click the "Start" button, and repeat steps to join the session

When Session training is in progress

  1. Authentication denied response and excluded from running training session, potentially (if number of clients now < min. number of clients) stopping the session

    Potential Cause: Token expired (expiry time set to 30 days)

    Resolution: Generate new token from UI and rejoin session; session should continue training if stopped because of number of clients < min. number of clients (with new client id)

  2. Client App exits training with error "FLTrainer ERROR: Current run is terminating due to exception: default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object"

    Potential Cause: Predictor or target column(s) contain non-numerical values

    Resolution: Make sure that column(s) contain no non-numerical values

  3. Client App is terminating due to exception "Engine run is terminating due to exception: Input contains NaN, infinity or a value too large for dtype('float32')."

    Potential Cause: Predictor or target column(s) contain NULL or unsupported values

    Resolution: Make sure that column(s) contain no NULL or unsupported values

  4. Client App crashes with a stack dump; IntegrateFL API continues waiting for client to (re)join

    Potential Cause: Memory of docker environment is insufficient to handle data

    Resolution: Provision docker environment with sufficient memory

  5. Client Apps continue receiving ‘Waiting for message from server’ from IntegrateFL API despite minimum number of clients has been met

    Potential Causes: IntegrateFL API timed out (currently set at 8h) or failed due to an unexpected error

    Resolution: If you are sure minimum number of clients have successfully joined and server run time has been less than 8 hours: contact IAI for support

When making predictions

  1. Dataset validation error response in Client App and cannot predict ("Columns in data set: <dataset name> for session_id: <session id>, does not match the data schema [<predictor/target column(s)>, ... ]

    Potential Cause: dataset does not contain required predictor(s) columns as specified in the dataset configuration

    Resolution: Make sure that predictor and target column(s) are contained in your dataset(s) according to the session’s dataset configuration

  2. Client App terminates prediction with message "default_collate: batch must contain tensors, numpy arrays, numbers, dicts or lists; found object"

    Potential Cause: Predictor column(s) contain non-numerical values

    Resolution: Make sure that column(s) contain no non-numerical values

When navigating the Management Console

  1. All or parts of Management Console do not load or show error messages

    Potential Causes: Temporary connection error; IntegrateFL API errors

    Resolution: Reload page; contact IAI if error persists

Running a Custom Model

  1. Docker kills a training session after running an hfl --train command.

    Potential cause: Docker memory settings are too low for the model to run, most likely to occur if running a model with image based data.

    Resolution: Go to Docker settings and increase your memory limit from 2GB to 4GBs.

Last updated