Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Sourcegraph Private Connect doc page #755

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 75 additions & 49 deletions docs/cloud/private_connectivity_sourcegraph_connect.mdx
Original file line number Diff line number Diff line change
@@ -1,115 +1,141 @@
# Private Resources on on-prem data center via Sourcegraph Connect agent
# Private Resources in On-Prem Data Centers via Sourcegraph Connect Agent

<Callout type="note">This feature is in the Experimental stage. Please contact Sourcegraph directly via [preferred contact method](https://about.sourcegraph.com/contact) for more information.</Callout>
<Callout type="note">This feature is in the Experimental stage. [Contact us](https://about.sourcegraph.com/contact) for more information.</Callout>

As part of the [Enterprise tier](https://sourcegraph.com/pricing), Sourcegraph Cloud supports connecting private resources on any on-prem private network by running Sourcegraph Connect tunnel agent in customer infrastructure.
As part of the [Enterprise tier](https://sourcegraph.com/pricing), Sourcegraph Cloud supports connecting to private code hosts and artifact registries in the customer's network by deploying the Sourcegraph Connect tunnel agent in the customer's network.

## How it works

Sourcegraph will set up a tunnel server in a customer dedicated GCP project. Customer will start the tunnel agent provided by Sourcegraph with the provided credential. After start, the agent will authenticate and establish a secure connection with Sourcegraph tunnel server.
Sourcegraph Connect consists of three components:

Sourcegraph Connect consists of three major components:
### Tunnel Clients

Tunnel agent: deployed inside the customer network, which uses its own identity and encrypts traffic between the customer code host and client. Agent can only communicate with permitted customer code hosts inside the customer network. Only agents are allowed to establish secure connections with tunnel server, the server can only accept connections if agent identity is approved.
Forward proxy clients for the Sourcegraph Cloud instance's containers to reach the customer's private code hosts and artifact registries, through the tunnel server.

Tunnel server: a centralized broker between client and agent managed by Sourcegraph. Its purpose is to set up mTLS, proxy encrypted traffic between clients and agents and enforce ACL.
Managed by Sourcegraph, and deployed in the customer's Sourcegraph Cloud instance's VPC.

Tunnel client: forward proxy clients managed by sourcegraph. Every client has its own identity and it cannot establish a direct connection with the customer agent, and has to go through tunnel server.
### Tunnel Server

[link](https://link.excalidraw.com/readonly/453uvY8infI8wskSecGJ)
The broker between agents and clients, it authenticates agents and clients, enforces ACLs, sets up mTLS, and proxies encrypted traffic between agents and clients.

Managed by Sourcegraph, and deployed in the customer's Sourcegraph Cloud instance's VPC.

### Tunnel Agents

Deployed by the customer inside their network, agents proxy and encrypt traffic between the customer's private resources and the Sourcegraph Cloud tunnel clients.

The agent has its own identity, and using credentials provided to the customer during deployment, the agent authenticates and establishes a secure connection with the tunnel server. Only agents are allowed to establish secure connections with the tunnel server, and the server only accepts a connection if the agent's identity is approved.

Agents can only communicate with permitted code hosts and artifact registries.

<iframe
src="https://link.excalidraw.com/readonly/453uvY8infI8wskSecGJ"
width="100%"
height="100%"
style={{ border: "none" }}
/>
[Diagram link](https://link.excalidraw.com/readonly/453uvY8infI8wskSecGJ)

## Steps

### Initiate the process

Customer should reach out to their account manager to initiate the process. The account manager will work with the customer to collect the required information and initiate the process, including but not limited to:
The customer reaches out to their account manager to request this feature be enabled on their Sourcegraph Cloud instance.

- The DNS name of the private code host, e.g. `gitlab.internal.company.net` or private artifact registry, e.g. `artifactory.internal.company.net`.
- The port of the private code host, e.g., `443`, `80`, `22`.
- The type of TLS certificate used by the private resource: either self-signed by an internal private CA or issued by a public CA.
The account manager collects the required information from the customer, including but not limited to:

Finally, Sourcegraph will provide the following:
- The DNS names of the needed private resources (e.g. `gitlab.internal.company.net`, `artifactory.internal.company.net`)
- The ports of the private resources (e.g. `443`, `80`, `22`)
- The type of TLS certificates used by the private resources (e.g. self-signed, internal PKI, or issued by a public CA)

- Instruction to run the agent along with credentials, and endpoint to allowlist egress traffic if needed.
Sourcegraph provides:
- The instructions, config file, and credentials to run the agent
- The tunnel server's static public IPs and ports

### Create the connection

Customer can follow the provided instructions and install the tunnel agent in the private network. At a high level:
The customer installs the agent in their private network, following the instructions provided. At a high level:

- Permit egress to the internet to a set of static IP addresses and corresponding ports to be provided by Sourcegraph.
- Permit egress to the private resources at the given port.
- Run the tunnel agent binary or docker images with the provided config files and credentials.
- Configure internet egress to the provided tunnel server's static public IPs and ports
- Configure intranet egress to the needed private resources
- Deploy the tunnel agent via Docker container or binary, with the provided config file and credentials

### Create the code host connection
### Configure the code host connection

Once the connection to private code host is established, the customer can create the [code host connection](/admin/code_hosts/) on their Sourcegraph Cloud instance.
Once the tunnel is established between the agent and server, the customer can configure the [code host connection](/admin/code_hosts/) on their Sourcegraph Cloud instance.

## FAQ

### Why TCP over gRPC?

The tunnel between the client and agent is built using TCP over gRPC. gRPC is a high-performant and battle-tested framework, with built-in support for mTLS for a trusted secure connection. TCP and HTTP/2 are widely supported in the majority of customer environments. Compared to traditional VPN solutions, such as OpenVPN, IPSec, and SSH over bastion hosts, gRPC allows us to design our own protocol, and the programmable interface allows us to implement advanced features, such as fine-grained access control at a per-connection level, audit logging with effective metadata, etc.

### How are connections encrypted? Can anyone else inspect the traffic?

Connections between the tunnel agent inside customer network and a tunnel server inside customer dedicated Sourcegraph GCP VPC use mTLS. Both agents, server and Sourcegraph clients have their own certificates and encrypt/decrypt traffic over TCP. mTLS enforce that both the client and the agent has to have a private key and present valid signed certificate from a trusted CA, which is not shared and this protects from [on-paths and spoofing attacks](https://www.cloudflare.com/en-gb/learning/access-management/what-is-mutual-tls/).
Tunnel connections use mTLS between the agent in the customer's network and the clients in the customer's Sourcegraph Cloud instance's VPC. Agents, clients, and the server all have their own certificates and encrypt / decrypt traffic over TCP. mTLS requires agents and clients to have a private key, and present a valid signed certificate from a trusted CA, which is not shared. This protects customers and Sourcegraph from [on-path and spoofing attacks](https://www.cloudflare.com/en-gb/learning/access-management/what-is-mutual-tls/).

### How do you authenticate requests?

Both tunnel clients and agents are assigned an identity corresponding to a GCP Service Account, and they are provided credentials to prove such identity. For tunnel agents, a Service Account key is distributed to the customer. For tunnel clients, it will utilize Workload Identity to prove its identity. They use them to authenticate against tunnel server by sending signed JWT tokens and public key. JWT token contains information about GCP service account credential public key required to validate signature and confirm identity of requestor. The server will then sign the requestor public key and respond with a signed certificate containing GCP Service Account email as a Subject Alternative Name (SAN).
Both tunnel agents and clients are assigned an identity corresponding to a GCP Service Account, and they are provided credentials to prove this identity. Agents use the Service Account key provided to the customer during deployment, and clients use Workload Identity to prove their identity. They use these credentials to authenticate to the tunnel server, by sending signed JWT tokens and public keys. JWT tokens contain details to specify the GCP Service Account credential public key required to validate their signature to confirm the identity of the requestor. The server then signs the requestor's public key and responds with a signed certificate containing the GCP Service Account email as a Subject Alternative Name (SAN).

Finally, if the customer NAT Gateway/Exit Gateway has stable CIDRs, we can provision firewall rules to restrict access to the tunnel server from the provided IP ranges only for an added layer of security.
For an added layer of security, if the customer network's NAT / internet gateway uses public IPs in a stable CIDR range, Sourcegraph can provision firewall rules to restrict access to the tunnel server from the provided IP ranges.

### How do you enforce authorization to restrict what requests can reach the private code host?
### How do you enforce authorization to restrict which requests can reach private resources?

The tunnel server is configured with ACLs. With mTLS every entity in the network has its own identity. The client's identity is used as a source for accessing customer private code hosts, while the agent's identity is used for destination. Tunnel server ensures that only clients with proven identity can communicate with customer tunnel agents.
With mTLS, every entity in the network has its own identity. The tunnel server is configured with ACLs, using the client's identity as the source, and the agent's identity as the destination. This ensures only clients with a proven identity can communicate with agents.

### Do you rotate the encryption keys?
### How do you manage keys and certificates?

Encryption keys are short-lived and both tunnel agents and clients have to refresh certificates every 24h. The customer may also manually rotate it by restarting the tunnel agent.
We utilize GCP Certificate Authority Service (CAS), a managed Public Key Infrastructure (PKI) service. It is responsible for the storage of root and intermediate CA signing keys, and the signing of client certificates. Access to GCP CAS is governed by GCP IAM, and only necessary individuals and services can access CAS, with audit trails in GCP Logging.

### How do you manage keys or certificates?
The TLS private keys in the agents and clients only exist in memory, and are never transmitted or shared. Only the public key is sent to the tunnel server, to issue a signed certificate, to establish the mTLS connection.

We utilize GCP Certificate Authority Service (CAS), a managed Public Key Infrastructure (PKI) service. It is responsible for the storage of all signing keys (e.g., root CAs, immediate CAs), and the signing of client certificates. Access to GCP CAS is governed by GCP IAM service and only necessary services or individuals will have access to the service with audit trails in GCP Logging.
### How often do you rotate the encryption keys?

The TLS private key on the tunnel agent or tunnel clients only exist in memory, and are never shared with other parties. Only the public key is sent to the tunnel server to issue a signed certificate to establish mTLS connection.
Encryption keys are short-lived, and both tunnel agents and clients refresh their certificates every 24h. The customer may also manually rotate the agent's certificate by restarting the agent.

### How do you audit access?

Tunnel server will log all critical operations with sufficient metadata to identify the requester to GCP Logging with a default 30-day retention policy. We will also be monitoring unauthorized access events to watch out for potential attackers.
Tunnel server logs operations to GCP Logging, with sufficient metadata to identify the requester, and a 30-day retention policy. We also monitor unauthorized access events to watch for potential attacks.

### Why TCP over gRPC?
### What if an attacker gains access to the Sourcegraph Cloud instance?

If an attacker gains access to the Sourcegraph Cloud instance's containers, this would be a security breach, and trigger our Incident Response process. However, we have many controls in place to prevent this from happening, where Cloud infrastructure access always requires approval, and the Security team is on-call for unexpected usage patterns. Learn more in our [security portal](https://security.sourcegraph.com/).

Please reach out to us if you have any specific questions regarding our Cloud security posture, we are happy to provide more detail to address your concerns.

### How do I need to configure my network for the agent to work?

The tunnel agent needs to connect to the tunnel server, and your private resources. Sourcegraph provides dedicated static IPs and ports to connect to the tunnel server. The customer must configure network egress to allow TCP (HTTP/2) traffic access to these IPs and ports.

The tunnel is built using TCP over gRPC. gRPC is a high-performant and battle-tested framework, e.g., built-in support for mTLS for a trusted secure connection. We believe TCP and HTTP/2 are widely supported in majority of environments. Moreover, the simplicity of having a single endpoint for connection between customer environment and their Cloud instance greatly simplifies the work required for customer IT admin. Compared to traditional VPN solutions, such as OpenVPN, IPSec, and SSH over bastion hosts, gRPC allows us to design our own protocol, and the programmable interface allows us to implement advanced features, e.g., fine-grained access control at a per connection level, audit logging with rich metadata.
### How can I restrict access to my private resources?

### How many agents can a customer start?
The customer has full control of their network where they deploy the tunnel agent, and can configure, monitor, and terminate connections at will.

To obtain high availability, customers can start multiple tunnel agents. Each of the agents will use the same GCP Service Account credentials, authenticate with the tunnel server and establish connection to it. Tunnel client will randomly select an available agent to forward the traffic.
We recommend implementing an allowlist to restrict the egress traffic of the agent to the IP addresses provided by Sourcegraph and to the specific private resources your Sourcegraph Cloud instance needs to access, and configuring your firewall to alert you if this ACL is hit.

### How does the customer configure the network to make the agent work?
If your code hosts or registries use DNS names, the agent needs access to the DNS server configured on its host.

The customer tunnel agent has to authenticate and establish connection with the tunnel server. Sourcegraph will provide a single dedicated static IP from customer dedicated GCP VPC which is used to connect with the tunnel server. Customer has to configure network egress to allow TCP (HTTP/2) traffic access to this static IP.
### How can I harden the tunnel agent deployment?

### How can I restrict access to my private code host connection?
The tunnel agent is designed and built with a minimal footprint and attack surface, and is scanned for vulnerabilities.

The customer has full control over the tunnel agent configuration and they can terminate the connection at any time.
What if the attacker gains access to the frontend?
You can:

In the event of an attacker gaining access to the Sourcegraph containers, we consider this to be a security breach and we have Incident response processes in place that we will follow. However, we have many controls in place to prevent this from happening where Cloud infrastructure access always requires approval and the Security team is on-call for unexpected usage patterns. You may learn more from our [Security Portal](https://security.sourcegraph.com/).
- Deploy the agent on a hardened container platform
- Store the agent credential and config content in a secrets management system and mount these secrets to the container
- Forward the agent's logs to your log management system

Please reach out to us if you have any specific questions regarding our Cloud security posture, and we are happy to provide more detail to address your concerns.
### How can I inspect the agent's traffic, and audit the data the agent is accessing in my environment?

### How to harden the tunnel agent deployment?
If a customer needs to inspect and audit traffic, such as performing TLS inspection on the connection between the private resources and Sourcegraph Cloud, we recommend inspecting traffic on the connections between the tunnel agent and internal resources, as this traffic uses the protocols and encryption of the internal resources.

We recommend using an allowlist to limit the egress traffic of the agent to IP addresses provided by Sourcegraph and specific private resources you would like to permit access. This will prevent the agent to talk to arbitrary services, and reduce the blast radius in the event of a security event.
The tunnel from the agent to the server is encrypted and authenticated by mTLS over gRPC, and uses a custom protocol, so the decrypted payload isn't usable for traffic inspection.

### How can I audit the data Sourcegraph has access to in my environment?
### Can I use Internal PKI or self-signed TLS certificates for my private resources?

The tunnel is secured and authenticated by mTLS over gRPC, and everything is encrypted over transit. If a customer is looking to perform an audit, such as TLS inspection, on the connection between the private resources and Sourcegraph Cloud, we recommend only intercepting and inspecting traffic between the tunnel agent and private resources. The connection between the tunnel agent and Sourcegraph Cloud is using a custom protocol, and the decrypted payload has very little value.
Yes. Please work with your account team to add the public certificate chain of your internal CAs, and / or your private resources' self-signed certs, under `experimentalFeatures.tls.external.certificates` in your instance's [site configuration](/admin/config/site_config#experimentalFeatures).

### Can I use self-signed TLS certificate for my private resources?
### Is this connection highly available?

Yes. Please work with your account team to add the certificate chain of your internal CA to [site configuration](/admin/config/site_config#experimentalFeatures) at `experimentalFeatures.tls.external.certificates`
To achieve high availability, customers can run multiple tunnel agents across their network. Each agent uses the same GCP Service Account credentials, authenticates with the tunnel server, and establishes their own connection to it. Tunnel clients randomly select an available agent to forward traffic through.
Loading