xgboost fails when running on large dataframe
$30-250 USD
Pagato alla consegna
I have a brand-new Dell XPS 8940 with a GPU and I installed Anaconda on it. I was able to run xgboost on it (using tree_method='gpu_hist' ) on a large dataframe. I then installed bayesian-optimization to help me optimize the xgboost hyperparameters. I was able to optimize the hyperparameters on a small version of my dataframe but I received the following error when I tried to optimize on my full dataframe.
oserror: [winerror -529697949] windows error 0xe06d7363
I need somebody that:
1) uses Anaconda,
2) uses XGBoost to run regression trees (i.e., regression, not classification),
3) uses bayesian-optimization to optimize hyperparameters for xgboost,
4) runs this in Windows 10, and
5) uses tree_method='gpu_hist'
to show me how to do it.
I can uninstall Anaconda, reinstall anaconda, install all necessary libraries, and prove that my (python) code to optimize hyperparameters works on a small dataframe but fails on a large dataframe in less than one hour. I assume that somebody that has done this exact thing before could show me how to do it in less than 3 hours. My code is below.
I've tested my RAM and my GPU and the hardware seems perfect.
I am posting this project at about 12:30 PM Central Daylight Time (i.e., Chicago time) on Monday and will check for replies sometime before 6:00 PM CDT. With any luck at all we can finish this project tonight or tomorrow. I hope that this will be a quick project for somebody that has done this before.
import pandas as pd
import numpy as np
number_of_rows = 100000000
print('reading data')
train_file = "D:\\reading_data/local_formatted/[login to view URL]"
df = [login to view URL] _csv(train_file, sep=',', nrows=number_of_rows)
df = [login to view URL]([[login to view URL], [login to view URL]], [login to view URL]).dropna(axis=0)
# print(df)
print('done reading data')
import xgboost as xgb
from bayes_opt import BayesianOptimization
from [login to view URL] import mean_squared_error
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split([login to view URL](['Symbol', 'new_date', 'old_dep_var'], axis=1),
df['old_dep_var'], test_size=0.25)
del(df)
dtrain = [login to view URL](X_train, label=y_train)
del(X_train)
dtest = [login to view URL](X_test)
del(X_test)
def xgb_evaluate(max_depth, gamma, colsample_bytree, subsample, eta, min_child_weight):
params = {'objective': 'reg:linear',
'eval_metric': 'rmse',
'max_depth': int(max_depth),
'subsample': 0.8,
'eta': 0.1,
'gamma': gamma,
'colsample_bytree': colsample_bytree,
'tree_method':'gpu_hist',
'min_child_weight': 100
}
# Used around 1000 boosting rounds in the full model
cv_result = [login to view URL] (params, dtrain, num_boost_round=100, nfold=3)
# Bayesian optimization only knows how to maximize, not minimize, so return the negative RMSE
return -1.0 * cv_result['test-rmse-mean'].iloc[-1]
xgb_bo = BayesianOptimization(xgb_evaluate, {'max_depth': (2, 10),
'gamma': (0, 10),
'colsample_bytree': (0.1, 0.9),
'subsample': (0.2, 0.95),
'eta': (0.05, 0.5),
'min_child_weight': (10, 10000)
})
[login to view URL](init_points=5, n_iter=10, acq='ei')
Rif. progetto: #27274631
Info sul progetto
2 freelance hanno fatto un'offerta media di $170 per questo lavoro
Hi, I have more than 15 years of experience in this field, so I think I will be the best fit for your project. Please let us chat sometime to narrow down the understanding and finally proceed with the work. Best Regar Altro