The short answer to this question is "it depends on the data". Not one value will be suitable for all types of data.
According to XGBoost's documentation, in a binary classification problem,
scale_pos_weight = number of majority class records/number of the minority class records.
In your case, scale_pos_weight = number of class 0 records/number of class 1 records.
However, if your data is highly imbalanced, the above formula might not give you the best results. Sometimes, square_root (number of class 0 records/number of class 1 records) might provide better results.
In my opinion, one should run GridSearch to find the optimal value of scale_pos_weight. Without scale_pos_weight, when the number of class 0 records is very high compared to the number of class 1 records, you get poor results for recall [tp / (tp + fn) where tp is the number of true positives and fn the number of false negatives]. So, in GridSearch, use recall as a scoring parameter. Thus, the GridSearch will find the optimal value of scale_pos_weight that returns the best recall.
Here is a template for the GridSearch code:
import xgboost as xgb
from sklearn.model_selection import GridSearchCV
max_spw = count(0)/count(1)
model = xgb.XGBClassifier()
xgb_grid_params = {
'scale_pos_weight': [i for i in range(1, max_spw, 5)]
}
gs = GridSearchCV(model, param_grid=xgb_grid_params, scoring="recall", cv=5, verbose=7)
gs.fit(data, label)
print(gs.best_params_)