Merge pull request #65 from sgbaird/tutorial-updates

Tutorial updates
sgbaird · Sep 5, 2024 · a00e987 · a00e987
2 parents cc86829 + 122e50a
commit a00e987
Show file tree

Hide file tree

Showing 810 changed files with 4,577 additions and 6,405 deletions.
diff --git a/.../thumbnails/BatchBO_concept_thumbnail.png → docs/_static/thumbnails/concepts/batch.png b/.../thumbnails/BatchBO_concept_thumbnail.png → docs/_static/thumbnails/concepts/batch.png
diff --git a/...nails/FullyBayesian_concept_thumbnail.png → ...ic/thumbnails/concepts/fully-bayesian.png b/...nails/FullyBayesian_concept_thumbnail.png → ...ic/thumbnails/concepts/fully-bayesian.png
diff --git a/docs/_static/thumbnails/concepts/multitask.png b/docs/_static/thumbnails/concepts/multitask.png
diff --git a/...thumbnails/SOBOMOBO_concept_thumbnail.png → ..._static/thumbnails/concepts/sobo-mobo.png b/...thumbnails/SOBOMOBO_concept_thumbnail.png → ..._static/thumbnails/concepts/sobo-mobo.png
diff --git a/...c/thumbnails/batch_tutorial_thumbnail.png → docs/_static/thumbnails/tutorials/batch.png b/...c/thumbnails/batch_tutorial_thumbnail.png → docs/_static/thumbnails/tutorials/batch.png
diff --git a/docs/_static/thumbnails/tutorials/benchmarking.png b/docs/_static/thumbnails/tutorials/benchmarking.png
diff --git a/docs/_static/thumbnails/tutorials/featurization.png b/docs/_static/thumbnails/tutorials/featurization.png
diff --git a/...ic/thumbnails/mobo-tutorial-thumbnail.jpg → docs/_static/thumbnails/tutorials/mobo.jpg b/...ic/thumbnails/mobo-tutorial-thumbnail.jpg → docs/_static/thumbnails/tutorials/mobo.jpg
diff --git a/docs/_static/thumbnails/tutorials/multitask.png b/docs/_static/thumbnails/tutorials/multitask.png
diff --git a/...ic/thumbnails/sobo-tutorial-thumbnail.jpg → docs/_static/thumbnails/tutorials/sobo.jpg b/...ic/thumbnails/sobo-tutorial-thumbnail.jpg → docs/_static/thumbnails/tutorials/sobo.jpg
diff --git a/docs/concepts.md b/docs/concepts.md
@@ -6,4 +6,5 @@
 curriculum/concepts/sobo-vs-mobo/sobo-vs-mobo.md
 curriculum/concepts/freq-vs-bayes/freq-vs-bayes.md
 curriculum/concepts/batch/single-vs-batch.md
+curriculum/concepts/multitask/multitask.md
 ```
diff --git a/docs/conf.py b/docs/conf.py
@@ -218,12 +218,16 @@
 html_static_path = ["_static"]
 
 nbsphinx_thumbnails = {
-    "curriculum/tutorials/sobo/sobo-tutorial": "_static/thumbnails/sobo-tutorial-thumbnail.jpg",
-    "curriculum/tutorials/mobo/mobo-tutorial": "_static/thumbnails/mobo-tutorial-thumbnail.jpg",
-    "curriculum/tutorials/batch/batch-bo-tutorial": "_static/thumbnails/batch_tutorial_thumbnail.png",
-    "curriculum/concepts/sobo-vs-mobo/sobo-vs-mobo": "_static/thumbnails/SOBOMOBO_concept_thumbnail.png",
-    "curriculum/concepts/freq-vs-bayes/freq-vs-bayes": "_static/thumbnails/FullyBayesian_concept_thumbnail.png",
-    "curriculum/concepts/batch/single-vs-batch": "_static/thumbnails/BatchBO_concept_thumbnail.png",
+    "curriculum/tutorials/sobo/sobo": "_static/thumbnails/tutorials/sobo.jpg",
+    "curriculum/tutorials/mobo/mobo": "_static/thumbnails/tutorials/mobo.jpg",
+    "curriculum/tutorials/batch/batch-fullybayesian": "_static/thumbnails/tutorials/batch.png",
+    "curriculum/tutorials/featurization/featurization": "_static/thumbnails/tutorials/featurization.png",
+    "curriculum/tutorials/multitask/multitask": "_static/thumbnails/tutorials/multitask.png",
+    "curriculum/tutorials/benchmarking/benchmarking": "_static/thumbnails/tutorials/benchmarking.png",
+    "curriculum/concepts/sobo-vs-mobo/sobo-vs-mobo": "_static/thumbnails/concepts/sobo-mobo.png",
+    "curriculum/concepts/freq-vs-bayes/freq-vs-bayes": "_static/thumbnails/concepts/fully-bayesian.png",
+    "curriculum/concepts/batch/single-vs-batch": "_static/thumbnails/concepts/batch.png",
+    "curriculum/concepts/multitask/multitask": "_static/thumbnails/concepts/multitask.png",
 }
 
 # If not '', a 'Last updated on:' timestamp is inserted at every page bottom,

diff --git a/docs/curriculum/concepts/batch/ExpectedImprovement.png b/docs/curriculum/concepts/batch/ExpectedImprovement.png
diff --git a/docs/curriculum/concepts/batch/single-vs-batch.md b/docs/curriculum/concepts/batch/single-vs-batch.md
@@ -4,9 +4,15 @@ Many optimization tasks permit experiments to be run in parallel such that obser
 
 ![](batch-choices.png)
 
-Ideally, a set of *q* points is selected such that their joint expected improvement is maximized. This is denoted mathematically in the equation below:
+Ideally, a set of *q* points is selected such that their joint expected improvement is maximized. Recall that the expected value of a distribution (for GPs a Gaussian) is its average value, and that the expected improvement of a given point is the  expected value of the portion of the distribution outside of the best observed value relative to the optimization objective. This is shown graphically in the figure below.
 
-$$qEI(X) = E\left[\textrm{max}(\textrm{max}(f(x_1), f(x_2),..., f(x_q)) - f(x^*), 0)\right]$$
+![](ExpectedImprovement.png)
+
+The batch formulation is then an extension of this to several points, which form a multivariate distribution from which the expected improvement is maximized. This is denoted mathematically in the equation below:
+
+$qEI(X) = E\left[\textrm{max}(\textrm{max}(f(x_1), f(x_2),..., f(x_q)) - f(x^*), 0)\right]$
+
+Here $E$ denotes the expectation, $f(x_n)$ denotes the gaussian process function value at each point $x$ in the batch, and $x^*$ denotes the best value found to date.
 
 Finding the optimal joint expected improvement is computationally difficult and typically requires the use of Monte Carlo estimation methods. This estimation has become easier through the development of several notable algorithms, and trivial to utilize thanks to the inclusion of efficient versions of these algorithms in state of the art libraries like `Ax` and `Botorch`. That said, a variety alternative approaches have emerged within the literature that are less computationally demanding. These typically rely on "*fantasy models*," which utilize simulated outcomes derived from the current surrogate model predictions to preemptively update and refine the model at each selection of a batch point. Put another way, for each requested point in the batch, the model assumes an observation value at the optimal acquisition function value and refits the model before selecting the next point. Common assumption strategies include the 'Kriging believer,' which takes an optimistic view by assuming the function's mean at the point of interest, and the 'constant liar,' which assumes values pessimistically to safeguard against overestimation in pursuit of the optimization goal. Other approaches propose seeking iteratively lower modes of the acquisition function, penalize the acquisition function near already observed points, or maximize exploration beyond the optimal point. While more computationally efficient, these approaches show weaker empirical performance relative to joint expected improvement estimation.
 
@@ -21,3 +27,12 @@ This is both sensical and likely what many practitioners would apply under naive
 ## Which approach is right for your problem?
 
 For tasks that allow experiments to be performed in parallel, batch optimization is generally preferred as it is more time and resource efficient compared with sequential optimization. That said, in the absence of per-observation model updates, it is likely some batch points will show relatively poor objective function performance. Poor performing observations may, however, improve model representation, resulting in better subsequent predictions. Sequential optimization allows the model to be updated with each observation, which potentially allows greater per trial improvement in model predictions. These advantages and disadvantages are often situation dependent, and the parallelizability of a given task is often a better selection criteria for single vs. batch optimization.
+
+> **Want to see it in action?**\
+Check out our batch optimization tutorial where we apply it to optimizing corrosion resistant coatings.
+
+## Additional Resources
+
+Hunt N. Batch Bayesian optimization (Doctoral dissertation, Massachusetts Institute of Technology). [🔗](https://dspace.mit.edu/bitstream/handle/1721.1/128591/1220836868-MIT.pdf?sequence=1#page=32.20)
+
+Botorch Batch Optimization Tutorial [🔗](https://botorch.org/docs/batching)
diff --git a/docs/curriculum/concepts/freq-vs-bayes/freq-vs-bayes.md b/docs/curriculum/concepts/freq-vs-bayes/freq-vs-bayes.md
@@ -41,10 +41,11 @@ Looking at the above results, it might be tempting to choose the fully Bayesian
 
 In the above examples it should be clear that the key advantage of going fully Bayesian is that it can provide more robust models when knowledge about the domain is extremely limited, data is scarce relative to the number of input dimensions, and observation noise is difficult to measure. For many problems, a standard "frequentist" GP will give equivalent optimization performance and should be the default unless a fully Bayesian approach can be justified.
 
+> **Want to see it in action?**\
+Check out our batch optimization tutorial where we apply a fully Bayesian GP model to optimizing corrosion resistant coatings.
+
 ## Additional Resources
 
-ML Tutorial: Gaussian Processes (Richard Turner)
-- https://www.youtube.com/watch?v=92-98SYOdlY
+ML Tutorial: Gaussian Processes (Richard Turner) [🔗](https://www.youtube.com/watch?v=92-98SYOdlY)
 
-High-Dimensional Bayesian Optimization with SAASBO
-- https://ax.dev/tutorials/saasbo.html
+High-Dimensional Bayesian Optimization with SAASBO [🔗](https://ax.dev/tutorials/saasbo.html)
diff --git a/docs/curriculum/concepts/multitask/divergence.png b/docs/curriculum/concepts/multitask/divergence.png
diff --git a/docs/curriculum/concepts/multitask/inverse_comparison.png b/docs/curriculum/concepts/multitask/inverse_comparison.png
diff --git a/docs/curriculum/concepts/multitask/mean_collapse.png b/docs/curriculum/concepts/multitask/mean_collapse.png
diff --git a/docs/curriculum/concepts/multitask/multitask.md b/docs/curriculum/concepts/multitask/multitask.md
@@ -0,0 +1,46 @@
+# Multitask Bayesian Optimization
+
+Optimization tasks occasionally overlap in their design spaces and outputs. Consider a scenario where a company has constructed twin chemical reactors two facilities in different parts of the country. While the reactors are theoretically identical, they are likely to differ slightly in output due to inherent variations in construction and regional climate. Despite these differences, these devices are more alike than different, and learning the optimal set of parameters for one tells us something about the other. Multitask Bayesian optimization (MTBO) allows us to exploit these similarities and optimize both reactors in tandem, which proves much more efficient than optimizing each reactor independently.
+
+The sharing of information between tasks is achieved through a unique covariance kernel design that models covariance between design points and between tasks. The mathematical structure of the kernel provided below:
+
+$$K((x,t)(x',t')) = K_t(t,t') \circ K_x(x,x')$$
+
+> Note: The literature typically uses the Kronecker product in defining this kernel, but as the Ax implementation relies on the Hadamard product, it has been expressed here.
+
+Here $K$ is the covariance function that models the relations between points, $x$, in the design space and relations between tasks, $t$. The symbol $\circ$ denotes the Hadamard product, which indicates element-wise multiplication of covariance matrices. Practically, this kernel design allows the observations and model of one task to inform another via a multiplicative relationship.
+
+An example of this shared information is shown in the figure below. Here, two functions, A and B, are modeled with slight offsets such that their peaks are close to each other but not exactly aligned. Ten noisy measurements have been made of Task A, whereas only three noisy observations have been made of Task B. The multitask kernel allows the information observed in Task A to inform the Task B model. This allows the Task B to more accurately extrapolate outside of its observed region.
+
+![](simple_comparison.png)
+
+As the kernel governing the relationship between tasks is modeling the covariance between tasks, it can also account for situations where one task is the inverse of another. In the figure below, the Task B function is exact inverse of Task A such that peaks of Task A are Task B's troughs. Regardless of this switch, the multitask kernel is able to model the relationship accurately.
+
+![](inverse_comparison.png)
+
+The above scenarios have been structured such that a task with many observations informs one with few; however, this isn’t the only approach. However, tasks can also be explored simultaneously, with the insights from each task informing the other.
+
+## Where Multitask Models go Wrong
+
+While multi-task kernels are powerful, care should be taken when applying them to tasks that differ in the design and/or output space. As the kernel is solely multiplicative, there is no mathematical mechanism to accounting for differences in the mean function value of each task. As such, the predicted values of task models will converge to a common mean value under extrapolation conditions. In the figure below two tasks are modeled with Task B having a higher mean output value. Within the observation region, the Task B model captures this shift, but quickly collapses to the Task A observation mean further away from its observations. Such behavior can derail multitask optimization campaigns and reduce their efficiency.
+
+![](mean_collapse.png)
+
+It is worth noting that this issue can be fixed by modifying the kernel structure to model differences in task intercepts. However, this modification is beyond the scope of this concept document and the aims of Honegumi.
+
+Multitask models also suffer when tasks differ significantly in their design spaces. Applying multitask models to design spaces with different topologies can point researchers in the wrong directions, especially when experimental observations are limited. The figure below shows the predictions of a multitask model for two completely different tasks. Notice that the information from Task A contributes little to the estimates for Task B, and creates an inaccurate representation.
+
+![](divergence.png)
+
+## Is a Multitask GP Right For Your Problem?
+
+There is increasing interest in leveraging historical data and shared information between tasks. However, the commonality between tasks isn't always clear. The optimal parameters of similar fabrication tools are likely similar, but the polymer formulation from the literature may differ significantly from what is optimal for your task of interest. Consider the extent to which two problems are correlated and whether that correlation is simple enough to be modeled with a multiplicative covariance function.
+
+> **Want to see it in action?**\
+Check out our multitask tutorial where we apply multitask optimization to the joint optimization of two ceramic slip systems.
+
+## Additional Resources
+
+Swersky K, Snoek J, Adams RP. Multi-task bayesian optimization. Advances in neural information processing systems. 2013. [🔗](https://proceedings.neurips.cc/paper/2013/hash/f33ba15effa5c10e873bf3842afb46a6-Abstract.html)
+
+Bonilla EV, Chai K, Williams C. Multi-task Gaussian process prediction. Advances in neural information processing systems. 2007. [🔗](https://proceedings.neurips.cc/paper_files/paper/2007/hash/66368270ffd51418ec58bd793f2d9b1b-Abstract.html)
diff --git a/docs/curriculum/concepts/multitask/simple_comparison.png b/docs/curriculum/concepts/multitask/simple_comparison.png
diff --git a/docs/curriculum/concepts/sobo-vs-mobo/sobo-vs-mobo.md b/docs/curriculum/concepts/sobo-vs-mobo/sobo-vs-mobo.md
@@ -28,15 +28,13 @@ These Pareto optimal solutions are identified by optimizing a metric referred to
 
 In deciding between single or multi-objective optimization, it's important to consider the complexity of your goals and the trade-off's you're willing to navigate. Single-objective optimization is straightforward, which makes the optimization process simpler, more interpretable, and often faster. However, in omitting competing objectives you may be oversimplify complex problems where multiple, often conflicting objectives must be balanced. Multi-objective optimization allows for the simultaneous consideration of several goals and the ability to learn the bounds on the tradeoffs between them. This, however, comes at a higher computational cost and  requires more sophisticated decision making processes for selecting the optimal solution. When other objectives, such as cost, can be directly computed, consider representing them in the form of a constraint rather than a separate objective.
 
+> **Want to see them in action?**\
+Check out our single and multi-objective optimization tutorials.
 
 ## Additional Resources
 
-P. Frazier, A Tutorial on Bayesian Optimization
-- https://arxiv.org/abs/1807.02811
+A. Agnihotri, Exploring Bayesian Optimization [🔗](https://distill.pub/2020/bayesian-optimization/)
 
-Ax Multi-Objective Optimization Tutorial
-- https://ax.dev/tutorials/multiobjective_optimization.html
+P. Frazier, A Tutorial on Bayesian Optimization [🔗](https://arxiv.org/abs/1807.02811)
 
-
-Emmerich et al. A tutorial on multiobjective optimization: fundamentals and evolutionary methods
-- https://link.springer.com/article/10.1007/s11047-018-9685-y
+Ax Multi-Objective Optimization Tutorial [🔗](https://ax.dev/tutorials/multiobjective_optimization.html)