Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set up SIG-etcd #7372

Merged
merged 11 commits into from
Sep 12, 2023
5 changes: 5 additions & 0 deletions OWNERS_ALIASES
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,11 @@ aliases:
- reylejano
- sftim
- tengqm
sig-etcd-leads:
- ahrtr
- jmhbnz
- serathius
- wenjiaswe
sig-instrumentation-leads:
- dashpole
- dgrisonnet
Expand Down
1 change: 1 addition & 0 deletions liaisons.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ members will assume one of the departing members groups.
| [SIG Cluster Lifecycle](sig-cluster-lifecycle/README.md) | Nabarun Pal (**[@palnabarun](https://github.com/palnabarun)**) |
| [SIG Contributor Experience](sig-contributor-experience/README.md) | Bob Killen (**[@mrbobbytables](https://github.com/mrbobbytables)**) |
| [SIG Docs](sig-docs/README.md) | Carlos Tadeu Panato Jr. (**[@cpanato](https://github.com/cpanato)**) |
| [SIG etcd](sig-etcd/README.md) | TBD (**[@TBD](https://github.com/TBD)**) |
| [SIG Instrumentation](sig-instrumentation/README.md) | Christoph Blecker (**[@cblecker](https://github.com/cblecker)**) |
| [SIG K8s Infra](sig-k8s-infra/README.md) | Stephen Augustus (**[@justaugustus](https://github.com/justaugustus)**) |
| [SIG Multicluster](sig-multicluster/README.md) | Bob Killen (**[@mrbobbytables](https://github.com/mrbobbytables)**) |
Expand Down
8 changes: 8 additions & 0 deletions sig-etcd/OWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# See the OWNERS docs at https://go.k8s.io/owners

reviewers:
- sig-etcd-leads
approvers:
- sig-etcd-leads
labels:
- sig/etcd
Copy link
Member

@pacoxu pacoxu Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this label ready?

@pacoxu: The label(s) sig/etcd cannot be applied, because the repository doesn't have them.

kubernetes/kubernetes#118077 (comment)

Currently, we have area/etcd label.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminding, this is done now: kubernetes/test-infra#30948

118 changes: 118 additions & 0 deletions sig-etcd/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
<!---
This is an autogenerated file!

Please do not edit this file directly, but instead make changes to the
sigs.yaml file in the project root.

To understand how this file is generated, see https://git.k8s.io/community/generator/README.md
--->
# etcd Special Interest Group

etcd is a production-ready store for building cloud-native distributed systems and managing cloud-native infrastructure via orchestrators like Kubernetes.
Etcd should provide distributed system primitives** (such as distributed locking and leader election) that allow users to **create scalable, highly available and fault-tolerant systems.
Etcd is the place to store the infrastructure configuration, not only as part of Kubernetes, but also as a standalone solution.

The [charter](charter.md) defines the scope and governance of the etcd Special Interest Group.

## Meetings
*Joining the [mailing list](https://groups.google.com/g/etcd-dev) for the group will typically add invites for the following meetings to your calendar.*
* Regular SIG Meeting: [Thursdays at 11:00 PT (Pacific Time)](https://zoom.us/my/cncfetcdproject) (biweekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=11:00&tz=PT%20%28Pacific%20Time%29).
* [Meeting notes and Agenda](https://docs.google.com/document/d/16XEGyPBisZvmmoIHSZzv__LoyOeluC5a4x353CX0SIM/edit?usp=sharing).
* [Meeting recordings](https://www.youtube.com/playlist?list=PLRGL688DpO9rtufHbiunuCHddYY6MGkwW).

## Leadership

### Chairs
logicalhan marked this conversation as resolved.
Show resolved Hide resolved
The Chairs of the SIG run operations and processes governing the SIG.

* James Blair (**[@jmhbnz](https://github.com/jmhbnz)**), Red Hat
* Wenjia Zhang (**[@wenjiaswe](https://github.com/wenjiaswe)**), Google

### Technical Leads
The Technical Leads of the SIG establish new subprojects, decommission existing
subprojects, and resolve cross-subproject technical issues and decisions.

* Benjamin Wang (**[@ahrtr](https://github.com/ahrtr)**), VMWare
* Marek Siarkowicz (**[@serathius](https://github.com/serathius)**), Google

## Contact
- Slack: [#etcd](https://kubernetes.slack.com/messages/etcd)
- [Mailing list](https://groups.google.com/g/etcd-dev)
- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/sig%2Fetcd)
- GitHub Teams:
- [@kubernetes/sig-etcd-leads](https://github.com/orgs/kubernetes/teams/sig-etcd-leads) - SIG Chairs and Tech Leads
- Steering Committee Liaison: TBD (**[@TBD](https://github.com/TBD)**)

## Subprojects
mrbobbytables marked this conversation as resolved.
Show resolved Hide resolved

The following [subprojects][subproject-definition] are owned by sig-etcd:
### bbolt
An embedded key/value database for Go.
- **Owners:**
- [etcd-io/bbolt/MAINTAINERS](https://github.com/etcd-io/bbolt/blob/master/MAINTAINERS)
### cetcd
Serve Consul with etcd
- **Owners:**
- [etcd-io/cetcd/MAINTAINERS](https://github.com/etcd-io/cetcd/blob/master/MAINTAINERS)
### dbtester
Distributed database benchmark tester
- **Owners:**
- [etcd-io/dbtester/MAINTAINERS](https://github.com/etcd-io/dbtester/blob/master/MAINTAINERS)
### discovery.etcd.io
Kubernetes manifests powering discovery.etcd.io
- **Owners:**
- [etcd-io/discovery.etcd.io/MAINTAINERS](https://github.com/etcd-io/discovery.etcd.io/blob/master/MAINTAINERS)
### discoveryserver
Public etcd Discovery Service
- **Owners:**
- [etcd-io/discoveryserver/MAINTAINERS](https://github.com/etcd-io/discoveryserver/blob/master/MAINTAINERS)
### etcd
Distributed reliable key-value store for the most critical data of a distributed system
- **Owners:**
- [etcd-io/etcd/MAINTAINERS](https://github.com/etcd-io/etcd/blob/master/MAINTAINERS)
### etcd-play
etcd playground
- **Owners:**
- [etcd-io/etcd-play/MAINTAINERS](https://github.com/etcd-io/etcd-play/blob/master/MAINTAINERS)
### etcdlabs
etcd playground
- **Owners:**
- [etcd-io/etcdlabs/MAINTAINERS](https://github.com/etcd-io/etcdlabs/blob/master/MAINTAINERS)
### gofail
failpoints for go
- **Owners:**
- [etcd-io/gofail/MAINTAINERS](https://github.com/etcd-io/gofail/blob/master/MAINTAINERS)
### govanityurls
Use a custom domain in your Go import path
- **Owners:**
- [etcd-io/govanityurls/MAINTAINERS](https://github.com/etcd-io/govanityurls/blob/master/MAINTAINERS)
### jetcd
etcd java client
- **Owners:**
- [etcd-io/jetcd/MAINTAINERS](https://github.com/etcd-io/jetcd/blob/master/MAINTAINERS)
### maintainers
issue tracking for project wide non-code concerns
- **Owners:**
- [etcd-io/maintainers/MAINTAINERS](https://github.com/etcd-io/maintainers/blob/master/MAINTAINERS)
### protodoc
protodoc generates Protocol Buffer documentation.
- **Owners:**
- [etcd-io/protodoc/MAINTAINERS](https://github.com/etcd-io/protodoc/blob/master/MAINTAINERS)
### raft
Raft library for maintaining a replicated state machine
- **Owners:**
- [etcd-io/raft/MAINTAINERS](https://github.com/etcd-io/raft/blob/master/MAINTAINERS)
### website
etcd-io
- **Owners:**
- [etcd-io/website/MAINTAINERS](https://github.com/etcd-io/website/blob/master/MAINTAINERS)
### zetcd
Serve the Apache Zookeeper API but back it with an etcd cluster
- **Owners:**
- [etcd-io/zetcd/MAINTAINERS](https://github.com/etcd-io/zetcd/blob/master/MAINTAINERS)

[subproject-definition]: https://github.com/kubernetes/community/blob/master/governance.md#subprojects
[working-group-definition]: https://github.com/kubernetes/community/blob/master/governance.md#working-groups
<!-- BEGIN CUSTOM CONTENT -->

<!-- END CUSTOM CONTENT -->
63 changes: 63 additions & 0 deletions sig-etcd/charter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# SIG etcd Charter

This charter adheres to the conventions described in the [Kubernetes Charter README] and uses
the Roles and Organization Management outlined in [sig-governance].

[Kubernetes Charter README]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/README.md
[sig-governance]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance.md

## Scope

Owns the etcd project and how it is used by Kubernetes.
Copy link
Member

@neolit123 neolit123 Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that you gain ownership of the current, very cumbersome and undeterministic process of updating etcd server/client in k8s.

this is currently a best effort from community members and issues and PRs just run stale:
kubernetes/kubernetes#117648
https://github.com/kubernetes/kubernetes/issues?q=is%3Aopen+label%3Aarea%2Fkubeadm+etcd

various product tooling allows some form of etcd version customization, custom images, fips builds, etc. thus the public updates are not a p0-1.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically it were etcd maintainers who bumped the etcd image in Kubernetes. Reason was that Kubernetes scalability tests are the major signal in etcd qualification, so minor bumps etcd version were done immediately.

What's more cumbersome is security patching, which is a problem because there are the etcd k8s image is also used by kubeadm. With the SIG in place, I think we can discuss changing/improving the release process.

Copy link
Member

@neolit123 neolit123 Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically it were etcd maintainers who bumped the etcd image in Kubernetes. Reason was that Kubernetes scalability tests are the major signal in etcd qualification, so minor bumps etcd version were done immediately.

i think i vaguelly recall this period.

What's more cumbersome is security patching, which is a problem because there are the etcd k8s image is also used by kubeadm. With the SIG in place,

the biggest current pain points for etcd update in k8s are:

  • undocumented, manual image promotion process.
  • no dedicated top level approver to own etcd k/k updates as it touches a lot of things.
  • too many steps, different repos, client vs server updates

kubeadm updates are simple - changing a small version map / constant. by updating kubeadm k8s+etcd gains upgrade signal from etcd version at k8s N-1 to etcd version at k8s N. the kube-up upgrade suite is still not working, IIRC.

I think we can discuss changing/improving the release process.

happy to participate.


### In scope

#### Code, Binaries and Services

- Development of [etcd] and other repositories under [etcd-io organization]
- Maintenance of [etcd image] packaged with Kubernetes

[etcd]: https://github.com/etcd-io/etcd
[etcd-io organization]: https://github.com/etcd-io
[etcd image]: https://github.com/kubernetes/kubernetes/tree/master/cluster/images/etcd

#### Cross-cutting and Externally Facing Processes

- Specifying, testing and improving the implicit Kubernetes-ETCD Contract, which includes storage requirements, write and delete requirements, read requirements and watch requirements.
- Release process of etcd and other binaries belonging to [etcd-io organization]

### Out of scope

- Structure of data stored in etcd by Kubernetes components is owned by SIG API Machinery

## Roles and Organization Management

This SIG follows the Roles and Organization Management outlined in [sig-governance]
and opts-in to updates and modifications to [sig-governance].

### Additional responsibilities of Tech Leads

- Release of etcd and other binaries belonging to [etcd-io organization]

### Deviations from [sig-governance]

- SIG etcd's participation in the Kubernetes release cycle is limited by etcd having a different schedule for its releases.
- SIG etcd communication utilizes pre-existing forums for communication:
cblecker marked this conversation as resolved.
Show resolved Hide resolved
- Email: [etcd-dev](https://groups.google.com/forum/?hl=en#!forum/etcd-dev).
- Slack: [#etcd](https://kubernetes.slack.com/messages/C3HD8ARJ5/details/) channel on Kubernetes.
- SIG etcd contributing instructions ([CONTRIBUTING.md]) be defined in etcd project.

[CONTRIBUTING.md]: https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md

### Deviations from [kubernetes-repositories]

- SIG etcd repositories live in github.com/etcd-io
- SIG etcd repositories should (but not must) adopt merge bot, Kubernetes PR commands/bot.
- SIG etcd repositories will follow [rules for donated repositories].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

posting here for convenience for other reviews:

Rules for new repositories

  • For now all repos will live in github.com/kubernetes-sigs/\<project-name\>.
  • Must contain the topic for the sponsoring SIG - e.g.
    k8s-sig-api-machinery. (Added through the Manage topics link on the
    repo page.)
  • Must adopt the Kubernetes Code of Conduct
  • All code projects use the Apache License version 2.0. Documentation
    repositories must use the Creative Commons License version 4.0.
  • Must adopt the CNCF CLA bot, merge bot and Kubernetes PR commands/bots.
  • All OWNERS of the project must also be active SIG members.
  • Must be approved by the process spelled out in the SIG's charter and a
    publicly linkable written decision should be available for the same.
  • SIG must already have identified all of their existing subprojects and
    code, with valid OWNERS files, in
    sigs.yaml

Rules for donated repositories

The kubernetes-sigs organization is primarily intended to house net-new
projects originally created in that organization. However, projects that a SIG
adopts may also be donated.

In addition to the requirements for new repositories, donated repositories must
demonstrate that:

  • All contributors must have signed the CNCF Individual
    CLA
    or CNCF
    Corporate CLA
  • If (a) contributor(s) have not signed the CLA and could not be reached, a
    NOTICE file should be added referencing section 7 of the CLA with a list of
    the developers who could not be reached
  • Licenses of dependencies are acceptable; project owners can ping
    caniszczyk for review of third party deps
  • Boilerplate text across all files should attribute copyright as follows:
    "Copyright <Project Authors>" if no CLA was in place prior to donation
  • Additions of the standard Kubernetes header
    to code created by the contributors can occur post-transfer, but should
    ideally occur shortly thereafter.
  • Should contain template files as per the
    kubernetes-template-project.

Note that copyright notices should only be modified or removed by the people or
organizations named in the notice. See the FAQ below for more information
regarding copyrights and copyright notices.


[kubernetes-repositories]: https://github.com/kubernetes/community/blob/master/github-management/kubernetes-repositories.md#sig-repositories
[rules for donated repositories]: https://github.com/kubernetes/community/blob/master/github-management/kubernetes-repositories.md#rules-for-donated-repositories

### Subproject Creation

By SIG Technical Leads
100 changes: 100 additions & 0 deletions sig-etcd/vision.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# SIG etcd Vision

The long-term success of the etcd project depends on the following:
- Etcd is a reliable key-value storage
- Etcd is simple to operate
- Etcd is a standalone solution for managing infrastructure
- Etcd scales beyond Kubernetes dimensions

The goals and milestones listed here are for future releases.
The scope of release v3.6 has already been defined and is unlikely to change.

## Etcd is a reliable key-value storage service

Reliability remains the most important property of etcd.
The project cannot allow for another [data inconsistency incident].
If we could only pick one thing from the list of goals above, this would be it.
No matter what features we add in the future,
they must not diminish etcd's reliability.
We must establish processes and safeguards to prevent future incidents.

How?
- Etcd API guarantees are well understood, documented and tested.
- Etcd adopts a production readiness review process for new features, similar to Kubernetes one.
- Robustness tests should cover most of the API and most common failures.
- New features must have accompanying e2e tests and be covered by robustness tests.
- Etcd must be able to immediately detect corruption.
- Etcd must be able to automatically recover from data corruption.

[data inconsistency incident]: https://github.com/etcd-io/etcd/blob/main/Documentation/postmortems/v3.5-data-inconsistency.md

## Etcd is simple to operate

Etcd should be easy to operate.
Currently, there are many steps involved in operating etcd,
and some of these steps require external tools.
For example, Kubernetes provides tools to [downgrade/upgrade etcd].
These tools are not part of the etcd,
but they are available as part of the Kubernetes distribution of etcd.

How?
- Etcd should not require users to run periodic defrag
- Etcd officially supports live upgrades and downgrades
- Disaster recovery for Etcd & Kubernetes
- Reliable cluster membership changes via learners with automated promotion
- Two node etcd clusters

## Etcd is a standalone solution for managing infrastructure configuration

Kubernetes is not the only way to manage infrastructure.
It was the first to introduce many concepts that have now become the standard,
but they are not unique to Kubernetes.
The most important design principle of Kubernetes,
the reconciliation protocol, is not something unique to it.

Reconciliation can be implemented solely on etcd,
as has been shown by projects like Cillium,
Calico Typha that support etcd-based control planes.
The reason why this idea has not propagated further is
the amount of work that was put into making
the reconciliation protocol scale in Kubernetes.
The watch cache is a key part of this scaling,
and it is not part of the etcd project.

If etcd provided a Kubernetes-like storage interface
and primitives for the reconciliation protocol,
it would be a more viable solution for managing infrastructure.
This would allow users to build etcd-based control planes that
could scale to meet the needs of large and complex deployments.

How?
- Introduce Kubernetes like storage interface into etcd-client
- Provide etcd primitives for reconciliation protocol
- Strip out the Kubernetes watch cache and make it part of the etcd client.
- Use the watch cache in the client to build an eventually consistent etcd proxy.

[downgrade/upgrade etcd]: https://github.com/kubernetes/kubernetes/tree/master/cluster/images/etcd

## Etcd scales beyond Kubernetes dimensions

Etcd has proven its scalability by enabling Kubernetes clusters of up to 5,000 nodes.
However, as the cloud native ecosystem has evolved, new projects have been built on top of Kubernetes.
These projects, such as [KCP] (a multi-cluster control plane) and [Kueue] (a batch job queuing system),
have different scalability requirements than pure Kubernetes.
For example, they need support for larger storage sizes and higher throughput.

Etcd's strong points are its reliable raft and efficient watch implementation.
However, its storage capabilities are not as strong.
To address this, we should look into growing out storage capabilities and making them more flexible depending on the use case.

How?
- Well-defined and tested scalability dimensions
- Increase raft throughput (async and batch proposal handling)
- Increasing bbolt supported storage size
- Pluggable storage layer
- Hybrid clusters with write and read optimized members


[KCP]: https://cloud.redhat.com/blog/an-introduction-to-kcp
[Kueue]: https://github.com/kubernetes-sigs/kueue

Loading