+1 vote

Best answer

Imputation of noisy features or missing feature values is a research question. However, there are some existing methods that can be used to impute the missing values. The sklearn library has univariate and multivariate imputation modules.

Here is an example using the univariate feature imputation method. Missing values can be imputed with a provided constant value, or using the statistics (mean, median, or most frequent) of each column in which the missing values are located.

>>> import numpy as np

>>> from sklearn.impute import SimpleImputer>>> imp = SimpleImputer(missing_values=np.nan, strategy='mean')

>>> X_train = np.array([[4, 2, 3], [6, 1, 1], [7, 6, 5], [4, 9, 10]])

>>> X_train

array([[ 4, 2, 3],

[ 6, 1, 1],

[ 7, 6, 5],

[ 4, 9, 10]])

>>> X_test = np.array([[np.nan, 2, 3], [6, np.nan, 1], [7, 6, 5], [4, 9, np.nan]])

>>> X_test

array([[nan, 2., 3.],

[ 6., nan, 1.],

[ 7., 6., 5.],

[ 4., 9., nan]])

>>> imp.fit(X_train)

SimpleImputer()

>>> imp.transform(X_test)

array([[5.25, 2. , 3. ],

[6. , 4.5 , 1. ],

[7. , 6. , 5. ],

[4. , 9. , 4.75]])