xgboost fails when running on large dataframe

Chiuso Pubblicato 3 anni fa Pagato alla consegna
Chiuso Pagato alla consegna

I have a brand-new Dell XPS 8940 with a GPU and I installed Anaconda on it. I was able to run xgboost on it (using tree_method='gpu_hist' ) on a large dataframe. I then installed bayesian-optimization to help me optimize the xgboost hyperparameters. I was able to optimize the hyperparameters on a small version of my dataframe but I received the following error when I tried to optimize on my full dataframe.

oserror: [winerror -529697949] windows error 0xe06d7363

I need somebody that:

1) uses Anaconda,

2) uses XGBoost to run regression trees (i.e., regression, not classification),

3) uses bayesian-optimization to optimize hyperparameters for xgboost,

4) runs this in Windows 10, and

5) uses tree_method='gpu_hist'

to show me how to do it.

I can uninstall Anaconda, reinstall anaconda, install all necessary libraries, and prove that my (python) code to optimize hyperparameters works on a small dataframe but fails on a large dataframe in less than one hour. I assume that somebody that has done this exact thing before could show me how to do it in less than 3 hours. My code is below.

I've tested my RAM and my GPU and the hardware seems perfect.

I am posting this project at about 12:30 PM Central Daylight Time (i.e., Chicago time) on Monday and will check for replies sometime before 6:00 PM CDT. With any luck at all we can finish this project tonight or tomorrow. I hope that this will be a quick project for somebody that has done this before.

import pandas as pd

import numpy as np

number_of_rows = 100000000

print('reading data')

train_file = "D:\\reading_data/local_formatted/[login to view URL]"

df = [login to view URL] _csv(train_file, sep=',', nrows=number_of_rows)

df = [login to view URL]([[login to view URL], [login to view URL]], [login to view URL]).dropna(axis=0)

# print(df)

print('done reading data')

import xgboost as xgb

from bayes_opt import BayesianOptimization

from [login to view URL] import mean_squared_error

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split([login to view URL](['Symbol', 'new_date', 'old_dep_var'], axis=1),

df['old_dep_var'], test_size=0.25)

del(df)

dtrain = [login to view URL](X_train, label=y_train)

del(X_train)

dtest = [login to view URL](X_test)

del(X_test)

def xgb_evaluate(max_depth, gamma, colsample_bytree, subsample, eta, min_child_weight):

params = {'objective': 'reg:linear',

'eval_metric': 'rmse',

'max_depth': int(max_depth),

'subsample': 0.8,

'eta': 0.1,

'gamma': gamma,

'colsample_bytree': colsample_bytree,

'tree_method':'gpu_hist',

'min_child_weight': 100

}

# Used around 1000 boosting rounds in the full model

cv_result = [login to view URL] (params, dtrain, num_boost_round=100, nfold=3)

# Bayesian optimization only knows how to maximize, not minimize, so return the negative RMSE

return -1.0 * cv_result['test-rmse-mean'].iloc[-1]

xgb_bo = BayesianOptimization(xgb_evaluate, {'max_depth': (2, 10),

'gamma': (0, 10),

'colsample_bytree': (0.1, 0.9),

'subsample': (0.2, 0.95),

'eta': (0.05, 0.5),

'min_child_weight': (10, 10000)

})

[login to view URL](init_points=5, n_iter=10, acq='ei')

Python Linux Ubuntu Architettura Software Programmazione C++

Rif. progetto: #27274631

Info sul progetto

2 proposte Progetto a distanza Attivo 3 anni fa

2 freelance hanno fatto un'offerta media di $170 per questo lavoro

parthabindu

Hi, I have more than 15 years of experience in this field, so I think I will be the best fit for your project. Please let us chat sometime to narrow down the understanding and finally proceed with the work. Best Regar Altro

$240 USD in 7 giorni
(0 valutazioni)
0.0