-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE REQUEST]: Issues with Customizing Data Classes #2759
Comments
Hi @FriendLey, thanks for the feedback. I believe this may be simpler than you think. The base class MyData: Type[Data] = custom_data_class(
column_data_types={"my_column": str}
)
class MyMetric(Metric):
data_constructor: Type[Data] = MyData
# you'll have to write one of these anyway
def fetch_trial_data(
self, trial: core.base_trial.BaseTrial, **kwargs: Any
) -> MetricFetchResult:
# construct a df `my_df` with "my_column"
return Ok(
value=MyData(df=my_df)
) I'm a little less sure about saving and loading an experiment with a custom data type. Is that a concern of yours? If so I can investigate. There's also the issue that our models would not be using your custom column. |
It seems that using the approach as you suggested doesn't work, In the from ax import (
ChoiceParameter,
ComparisonOp,
Experiment,
FixedParameter,
Metric,
Objective,
OptimizationConfig,
OrderConstraint,
OutcomeConstraint,
ParameterType,
RangeParameter,
SearchSpace,
SumConstraint,
)
from ax.modelbridge.registry import Models
from ax.utils.notebook.plotting import init_notebook_plotting, render
init_notebook_plotting()
import pandas as pd
import numpy as np
from typing import Type
from ax import Data
from ax.core.data import BaseData, custom_data_class
from ax.utils.common.result import Err, Ok
MyData: Type[Data] = custom_data_class(
column_data_types={
**BaseData.COLUMN_DATA_TYPES,
"p_value": float,
"power": float,
}
)
class BoothMetric(Metric):
def fetch_trial_data(self, trial):
records = []
for arm_name, arm in trial.arms_by_name.items():
params = arm.parameters
records.append(
{
"arm_name": arm_name,
"metric_name": self.name,
"trial_index": trial.index,
# in practice, the mean and sem will be looked up based on trial metadata
# but for this tutorial we will calculate them
"mean": (params["x1"] + 2 * params["x2"] - 7) ** 2
+ (2 * params["x1"] + params["x2"] - 5) ** 2,
"sem": 0.0,
"p_value": 0.01,
"power": 0.8,
}
)
return Ok(value=MyData(df=pd.DataFrame.from_records(records)))
def is_available_while_running(self) -> bool:
return True
search_space = SearchSpace(
parameters=[
RangeParameter(
name=f"x{i}", parameter_type=ParameterType.FLOAT, lower=0.0, upper=1.0
)
for i in range(1, 3)
]
)
param_names = [f"x{i}" for i in range(1, 3)]
optimization_config = OptimizationConfig(
objective=Objective(
metric=BoothMetric(name="BoothMetric", lower_is_better=True),
minimize=True,
),
)
from ax import Runner
class MyRunner(Runner):
def run(self, trial):
trial_metadata = {"name": str(trial.index)}
return trial_metadata
exp = Experiment(
name="test_hartmann",
search_space=search_space,
optimization_config=optimization_config,
runner=MyRunner(),
)
from ax.modelbridge.registry import Models
NUM_SOBOL_TRIALS = 5
NUM_BOTORCH_TRIALS = 2
print(f"Running Sobol initialization trials...")
sobol = Models.SOBOL(search_space=exp.search_space)
for i in range(NUM_SOBOL_TRIALS):
# Produce a GeneratorRun from the model, which contains proposed arm(s) and other metadata
generator_run = sobol.gen(n=1)
# Add generator run to a trial to make it part of the experiment and evaluate arm(s) in it
trial = exp.new_trial(generator_run=generator_run)
# Start trial run to evaluate arm(s) in the trial
trial.run()
# Mark trial as completed to record when a trial run is completed
# and enable fetching of data for metrics on the experiment
# (by default, trials must be completed before metrics can fetch their data,
# unless a metric is explicitly configured otherwise)
trial.mark_completed()
for i in range(NUM_BOTORCH_TRIALS):
print(
f"Running BO trial {i + NUM_SOBOL_TRIALS + 1}/{NUM_SOBOL_TRIALS + NUM_BOTORCH_TRIALS}..."
)
# Reinitialize GP+EI model at each step with updated data.
gpei = Models.BOTORCH_MODULAR(experiment=exp, data=exp.fetch_data())
generator_run = gpei.gen(n=1)
trial = exp.new_trial(generator_run=generator_run)
trial.run()
trial.mark_completed()
print("Done!") error details: [ERROR 09-13 11:35:35] ax.core.experiment: Encountered ValueError Columns ['p_value', 'power'] are not supported. while attaching results. Proceeding and returning Results fetched without attaching.
Running Sobol initialization trials...
Running BO trial 6/7...
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[44], line 27
23 print(
24 f"Running BO trial {i + NUM_SOBOL_TRIALS + 1}[/](http://9.135.100.122:8080/){NUM_SOBOL_TRIALS + NUM_BOTORCH_TRIALS}..."
25 )
26 # Reinitialize GP+EI model at each step with updated data.
---> 27 gpei = Models.BOTORCH_MODULAR(experiment=exp, data=exp.fetch_data())
28 generator_run = gpei.gen(n=1)
29 trial = exp.new_trial(generator_run=generator_run)
File [/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/experiment.py:572](http://9.135.100.122:8080/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/experiment.py#line=571), in Experiment.fetch_data(self, metrics, combine_with_last_data, overwrite_existing_data, **kwargs)
560 results = self._lookup_or_fetch_trials_results(
561 trials=list(self.trials.values()),
562 metrics=metrics,
(...)
565 **kwargs,
566 )
568 base_metric_cls = (
569 MapMetric if self.default_data_constructor == MapData else Metric
570 )
--> 572 return base_metric_cls._unwrap_experiment_data_multi(
573 results=results,
574 )
File [/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/metric.py:586](http://9.135.100.122:8080/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/metric.py#line=585), in Metric._unwrap_experiment_data_multi(cls, results)
580 raise UnwrapError(errs) from (
581 exceptions[0] if len(exceptions) == 1 else Exception(exceptions)
582 )
584 data = [ok.ok for ok in oks]
585 return (
--> 586 cls.data_constructor.from_multiple_data(data=data)
587 if len(data) > 0
588 else cls.data_constructor()
589 )
File [/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/data.py:529](http://9.135.100.122:8080/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/data.py#line=528), in Data.from_multiple_data(data, subset_metrics)
516 @staticmethod
517 def from_multiple_data(
518 data: Iterable[Data], subset_metrics: Optional[Iterable[str]] = None
519 ) -> Data:
520 """Combines multiple objects into one (with the concatenated
521 underlying dataframe).
522
(...)
527 in the underlying dataframe.
528 """
--> 529 data_out = Data.from_multiple(data=data)
530 if len(data_out.df.index) == 0:
531 return data_out
File [/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/data.py:284](http://9.135.100.122:8080/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/data.py#line=283), in BaseData.from_multiple(cls, data)
281 if len(dfs) == 0:
282 return cls()
--> 284 return cls(df=pd.concat(dfs, axis=0, sort=True))
File [/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/data.py:92](http://9.135.100.122:8080/home/pengleizhao/miniconda3/envs/adaptive-py39/lib/python3.9/site-packages/ax/core/data.py#line=91), in BaseData.__init__(self, df, description)
90 extra_columns = columns - self.supported_columns()
91 if extra_columns:
---> 92 raise ValueError(f"Columns {list(extra_columns)} are not supported.")
93 df = df.dropna(axis=0, how="all").reset_index(drop=True)
94 df = self._safecast_df(df=df)
ValueError: Columns ['p_value', 'power'] are not supported. |
@FriendLey I see what you're saying. We encode the data type as an int on experiment (https://github.com/facebook/Ax/blob/main/ax/core/experiment.py#L124) so it's loadable. We try not to encode classes directly in the db. Then we use that enum to look up what data type to use https://github.com/facebook/Ax/blob/main/ax/core/experiment.py#L589. Also That would take a bit of a refactor. Alternatively, we might not need to raise if there are extra columns in data (https://github.com/facebook/Ax/blob/main/ax/core/data.py#L96).
If you wanted to implement this, I would recommend the path of just being more permissive with extra fields in Data and making sure they don't disappear when saved and reloaded. |
Motivation
In our current business scenario, the data type df contains not only the columns specified by COLUMN_DATA_TYPES in the BaseData class, but also many columns relevant to specific business scenarios. In the current implementation of ax, although a custom_data_class function is provided for users to define their own data classes, the Experiment class in ax.core.experiment and the Metric class in ax.core.metric do not support fetching data for custom data types.
As a result, in the following use case: if the user customizes the data type, they need to rewrite the fetch_data function in Experiment and various related functions, solely to support returning some custom data columns.
Describe the solution you'd like to see implemented in Ax.
Is there currently a plan for refactoring this? If not, do you think it’s necessary? ’m interested in working on this implementation.
Describe any alternatives you've considered to the above solution.
No response
Is this related to an existing issue in Ax or another repository? If so please include links to those Issues here.
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: