Skip to content
This repository has been archived by the owner on Nov 8, 2018. It is now read-only.

Keras and dist-keras results differ #81

Open
pooja9410 opened this issue Aug 29, 2018 · 0 comments
Open

Keras and dist-keras results differ #81

pooja9410 opened this issue Aug 29, 2018 · 0 comments

Comments

@pooja9410
Copy link

I am trying to build LSTM model on time-series data. I am using MinMaxScaler to change range of features and target variable, then I reshaped data into 3d [samples, timestep, dimensions]
Then I created a neural network model with 3 lstm layers. And after training, I am calculating r2 score on test data.

Same things I have done using dist-keras. But I am getting different results.

mxscaler_f = MinMaxScaler(inputCol='features', outputCol="features_normalized")
mxscaler_model_f = mxscaler_f.fit(dataset)
dataset = mxscaler_model_f.transform(dataset)

mxscaler = MinMaxScaler(inputCol='target', outputCol="adjclose_min")
mxscaler_model = mxscaler.fit(dataset)
dataset = mxscaler_model.transform(dataset)

dataset = dataset.select("features_normalized", "label_index", "adjclose_min")
dataset.cache()

raw_dataset = dataset
nb_features = len(raw_dataset.select("features_normalized").take(1)[0]["features_normalized"])

timesteps = 1
dimension = nb_features
reshape_transformer = ReshapeTransformer("features_normalized", "matrix", (timesteps, dimension))
raw_dataset = reshape_transformer.transform(raw_dataset)

train_len = int(0.7 * raw_dataset.count())
training_set = sqlContext.createDataFrame(raw_dataset.head(train_len), raw_dataset.schema)
test_set = raw_dataset.subtract(training_set)

optimizer = 'adagrad'
loss = 'mse'
model = Sequential()
model.add(LSTM(80, input_shape=(1,nb_features), return_sequences=True))
model.add(LSTM(70, return_sequences=True))
model.add(LSTM(50 , return_sequences=False))
model.add(Dense(1, kernel_initializer='uniform', activation='relu'))
trainer = SingleTrainer(keras_model=model, loss=loss, worker_optimizer=optimizer, 
                    features_col="features_normalized",label_col="adjclose_min", num_epoch=20, batch_size=512)
trained_model = trainer.train(training_set)
test_set = test_set.select("matrix", "adjclose_min", "label_index")
predictor = ModelPredictor(keras_model=trained_model, features_col="matrix")
test_set = predictor.predict(test_set)
newone = test_set.rdd.map(extract).toDF(["adjclose_min","label_index","pred"])
evaluator = RegressionEvaluator(metricName='r2', predictionCol="pred", labelCol="adjclose_min")
score = evaluator.evaluate(newone)

What wrong I am doing?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant