You can use make_classification() method of sklearn to generate the sample data. You can specify number of samples, number of features, number of classes, etc. in this function. It also provides options to generate imbalanced data and noisy data.
Here is an example to show how to use this function. You can check this function on sklearn's website to know more about it.
from sklearn.datasets import make_classificationdef generate_sample_data(sc, fc, nf): """ Generate sample data using sklearn """ print("Generate sample ML data") X, y = make_classification(n_samples=sc, n_features=fc, n_informative=2, n_redundant=0, n_classes=2, flip_y=nf, class_sep=0.5, n_clusters_per_class=1, random_state=4) return X, yif __name__ == '__main__': noise_fraction_in_data = 0 sample_count = 100000 feature_count = 1000 X_all, y_all = generate_sample_data(sample_count, feature_count, noise_fraction_in_data)