Solution Development
Breadcrumbs

Configuration > Model Parameters

Intended audience: Data scientists developers administrators

AO Platform: 4.3

Overview

image-20230421-091625.png

Properties

Label

Description

Training

This percentage of the Training Data will be used for training the Models. The result will feature metrics based on this percentage.

Testing

This percentage of the Training Data will be used for testing the Models.

Validation

This percentage of the Training Data will be used for validating the Models.

Target

Use this dropdown to select the target field from the data source for the model.

Feature Importance Type

A technique for assessing data towards the Target. Includes the following options in dropdown:

  • Information Value

  • Permutations Importance

  • Tree

  • Weight of Evidence

Stratified Shuffle

Provides train/test indices to split data in train/test sets. This cross-validation object is a merge of StratifiedKFold and ShuffleSplit, which returns stratified randomized folds. The folds are made by preserving the percentage of samples for each class.

Set Random Seed

The seed value is a base value used by a pseudo-random generator to produce random numbers. The random number or data generated by Python’s random module is not truly random; it is pseudo-random (it is PRNG), i.e., deterministic.
The random module uses the seed value as a base to generate a random number.

nFolds / CV

K-fold cross-validation is used to validate a model internally, i.e., estimate the model performance without having to sacrifice a validation split. Also, you avoid statistical issues with your validation split (it might be a “lucky” split, especially for imbalanced data). Good values for nfolds are generally from 5 to 10, but keep in mind that higher values result in higher computational cost.

Data Sampling

The Imbalanced classification problem happens when there is a skew in the class distribution of our training data. An approach to combat this challenge is Random Sampling. There are two main ways to perform random resampling, both of which have pros and cons:

  • Over Sampling — Duplicating samples from the minority class

  • Under Sampling — Deleting samples from the majority class.

In other words, Both over sampling and under sampling involve introducing a bias to select more samples from one class than from another, to compensate for an imbalance that is either already present in the data, or likely to develop if a purely random sample were taken.

Data Sampling Field

Select the field for which Data Sampling will be executed.






Contact App Orchid | Disclaimer