Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue #2841] Make the robots.txt dynamic so that it can mutate per environment #3379

Merged
merged 5 commits into from
Jan 6, 2025

Conversation

mdragon
Copy link
Collaborator

@mdragon mdragon commented Jan 2, 2025

Summary

Fixes #2841

Time to review: 5 mins

Changes proposed

We need to make the robots.txt dynamic per environment so that we can ban crawling of the lower environments but still support nuanced rules for upper environments.

Context for reviewers

As a follow up change we will also make the Sitemap dynamic as well since we don't want to point to Sitemaps outside of Production. Our Sitemap isn't currently ready for being utilized so not including it for now.

Additional information

Tested this locally and confirmed that we do have a dynamic robots.txt that reacts to changes to the environment variable at runtime, not build time. Not sure if this will all hook together as expected when it tries to run in AWS, so having this test for "dev" temporarily until we prove it is all likely to work in Prod.

@mdragon mdragon force-pushed the mdragon/2841-programatic-robots-txt branch from add7af3 to 917576d Compare January 2, 2025 17:14
acouch
acouch previously approved these changes Jan 2, 2025
Copy link
Collaborator

@acouch acouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed this builds dynamic locally and has the right cache control headers so that it should be stored in the CDN (isn't dynamic for each bot).

export default function robots(): MetadataRoute.Robots {
return {
rules: [
process.env.AWS_ENVIRONMENT === "dev" // switching this temporarily to ensure that the variable is being set at AWS runtime as expected, will make it "prod" after confirming in Dev
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big deal, but all the other env vars being used by the Next app are sourced out of https://github.com/HHS/simpler-grants-gov/blob/main/frontend/src/constants/environments.ts. In some way this is to avoid the potentially expensive access to process.env any more than necessary, but Next seems to not have much of a problem with that, so this is more of an organization thing than anything else

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@acouch acouch Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: made this comment not seeing the update

process.env is defined on the C side an exposed as a getter to the runtime. Every access of process.env needs to drop down to C to get the value out - that is the performance penalty.

Yeah, not sure what the penalty is here in real life but should use the same pattern of the cached versions in environments.ts.

@mdragon mdragon requested review from doug-s-nava and acouch January 3, 2025 15:26
@@ -7,6 +7,8 @@ locals {
NEW_RELIC_ENABLED = "true"
# see https://github.com/newrelic/node-newrelic?tab=readme-ov-file#setup
NODE_OPTIONS = "-r newrelic"
# expose the current AWS Env to the FE Next Node Server at Runtime
AWS_ENVIRONMENT = var.environment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to call this AWS_ENVIRONMENT vs ENVIRONMENT? We would want to be able to test this locally, so the AWS_ prefix feels odd.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought, though not strongly held, was that it really only matters in AWS, you don't have to specify it locally cause no robots will ever hit us there, as opposed to other Env Vars where stuff wouldn't work without specifying it even when running locally. I was also trying to sus-out a name that wouldn't get confused with other reasons we might specify Environment, like NODE_ENV, other concepts of Production like our "Production" build we run in every AWS environment but not laptop environments, etc. But I'm happy to drop the AWS_

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -31,4 +32,5 @@ export const environment: { [key: string]: string } = {
FEATURE_OPPORTUNITY_OFF,
FEATURE_SEARCH_OFF,
NEXT_BUILD,
AWS_ENVIRONMENT,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above we are using NODE_ENV to determine the legacy host, could use the actual env instead of NODE_ENV for consistency since we are introducing that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mdragon mdragon requested a review from acouch January 3, 2025 18:42
@mdragon mdragon force-pushed the mdragon/2841-programatic-robots-txt branch from 1f2d3c3 to 3a279df Compare January 3, 2025 20:33
Copy link
Collaborator

@acouch acouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@mdragon mdragon merged commit 6875896 into main Jan 6, 2025
14 checks passed
@mdragon mdragon deleted the mdragon/2841-programatic-robots-txt branch January 6, 2025 19:28
@mdragon
Copy link
Collaborator Author

mdragon commented Jan 6, 2025

Testing in Stage:
image

Testing in Dev (which is what Prod will work like):
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a robots.txt file
3 participants