Curate and Manage Data

It is important to carefully curate the datasets to avoid noise being introduced into the model due to the presence of outliers. This can be done after a dataset has been created.

For more information, see Best Practices for PhysicsAI.

From the PhysicsAI ribbon, select the Manage Dataset tool.
Figure 1.


The Datasets dialog opens.

Identify Outliers

Outliers in the datasets are automatically detected based on one or more responses or model properties.

A Z-score distribution is used to highlight the samples which fall in 3-sigma tails.

  1. Select a dataset and properties.
    Figure 2.


  2. Switch between the Table and Plot tabs to identify outliers in the tabular and graphical views.
  3. Compare data samples on multiple criteria by using side-by-side plots.

Edit Datasets

  1. In the tabular view, click to add datasets.
    The Add files to Dataset dialog opens wherein you can browse and select files to be added the dataset.
  2. Select files and click to remove data samples.
  3. Select files and click Move to to move them to other datasets.
    Note: A potential source of noise is rotational/translational non-alignment of the data samples. While translational non-alignment can be corrected (see Train Models), rotational alignment cannot be corrected. To detect rotational outliers, the dimensions of the bounding box and coordinates of the center of gravity can be used.
    Figure 3.


    Figure 4.


Clone Datasets

Entire datasets can be copied to preserve the original datasets without the need to repeat the entire dataset creation process.

  1. Click .
    The Copy Dataset dialog opens.
  2. Enter a name for the new dataset and click OK.
The new dataset is added in the Datasets dialog.

Import Datasets

  1. Click Import Dataset.
    The Import Dataset dialog opens.
  2. Enter a name for the dataset.
  3. Click and browse and select the .psdata file.
    Note: A .log file is also required.
  4. Click OK.
The dataset is added in the Datasets dialog.