-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
set up SIG-etcd #7372
set up SIG-etcd #7372
Changes from all commits
645397e
e09c95e
efa39b0
cee917c
9646aba
b206fcb
67bec4d
fec2740
4c7185e
b742046
a53b3f8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# See the OWNERS docs at https://go.k8s.io/owners | ||
|
||
reviewers: | ||
- sig-etcd-leads | ||
approvers: | ||
- sig-etcd-leads | ||
labels: | ||
- sig/etcd | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
<!--- | ||
This is an autogenerated file! | ||
|
||
Please do not edit this file directly, but instead make changes to the | ||
sigs.yaml file in the project root. | ||
|
||
To understand how this file is generated, see https://git.k8s.io/community/generator/README.md | ||
---> | ||
# etcd Special Interest Group | ||
|
||
etcd is a production-ready store for building cloud-native distributed systems and managing cloud-native infrastructure via orchestrators like Kubernetes. | ||
Etcd should provide distributed system primitives** (such as distributed locking and leader election) that allow users to **create scalable, highly available and fault-tolerant systems. | ||
Etcd is the place to store the infrastructure configuration, not only as part of Kubernetes, but also as a standalone solution. | ||
|
||
The [charter](charter.md) defines the scope and governance of the etcd Special Interest Group. | ||
|
||
## Meetings | ||
*Joining the [mailing list](https://groups.google.com/g/etcd-dev) for the group will typically add invites for the following meetings to your calendar.* | ||
* Regular SIG Meeting: [Thursdays at 11:00 PT (Pacific Time)](https://zoom.us/my/cncfetcdproject) (biweekly). [Convert to your timezone](http://www.thetimezoneconverter.com/?t=11:00&tz=PT%20%28Pacific%20Time%29). | ||
* [Meeting notes and Agenda](https://docs.google.com/document/d/16XEGyPBisZvmmoIHSZzv__LoyOeluC5a4x353CX0SIM/edit?usp=sharing). | ||
* [Meeting recordings](https://www.youtube.com/playlist?list=PLRGL688DpO9rtufHbiunuCHddYY6MGkwW). | ||
|
||
## Leadership | ||
|
||
### Chairs | ||
logicalhan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
The Chairs of the SIG run operations and processes governing the SIG. | ||
|
||
* James Blair (**[@jmhbnz](https://github.com/jmhbnz)**), Red Hat | ||
* Wenjia Zhang (**[@wenjiaswe](https://github.com/wenjiaswe)**), Google | ||
|
||
### Technical Leads | ||
The Technical Leads of the SIG establish new subprojects, decommission existing | ||
subprojects, and resolve cross-subproject technical issues and decisions. | ||
|
||
* Benjamin Wang (**[@ahrtr](https://github.com/ahrtr)**), VMWare | ||
* Marek Siarkowicz (**[@serathius](https://github.com/serathius)**), Google | ||
|
||
## Contact | ||
- Slack: [#etcd](https://kubernetes.slack.com/messages/etcd) | ||
- [Mailing list](https://groups.google.com/g/etcd-dev) | ||
- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/sig%2Fetcd) | ||
- GitHub Teams: | ||
- [@kubernetes/sig-etcd-leads](https://github.com/orgs/kubernetes/teams/sig-etcd-leads) - SIG Chairs and Tech Leads | ||
- Steering Committee Liaison: TBD (**[@TBD](https://github.com/TBD)**) | ||
|
||
## Subprojects | ||
mrbobbytables marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
The following [subprojects][subproject-definition] are owned by sig-etcd: | ||
### bbolt | ||
An embedded key/value database for Go. | ||
- **Owners:** | ||
- [etcd-io/bbolt/MAINTAINERS](https://github.com/etcd-io/bbolt/blob/master/MAINTAINERS) | ||
### cetcd | ||
Serve Consul with etcd | ||
- **Owners:** | ||
- [etcd-io/cetcd/MAINTAINERS](https://github.com/etcd-io/cetcd/blob/master/MAINTAINERS) | ||
### dbtester | ||
Distributed database benchmark tester | ||
- **Owners:** | ||
- [etcd-io/dbtester/MAINTAINERS](https://github.com/etcd-io/dbtester/blob/master/MAINTAINERS) | ||
### discovery.etcd.io | ||
Kubernetes manifests powering discovery.etcd.io | ||
- **Owners:** | ||
- [etcd-io/discovery.etcd.io/MAINTAINERS](https://github.com/etcd-io/discovery.etcd.io/blob/master/MAINTAINERS) | ||
### discoveryserver | ||
Public etcd Discovery Service | ||
- **Owners:** | ||
- [etcd-io/discoveryserver/MAINTAINERS](https://github.com/etcd-io/discoveryserver/blob/master/MAINTAINERS) | ||
### etcd | ||
Distributed reliable key-value store for the most critical data of a distributed system | ||
- **Owners:** | ||
- [etcd-io/etcd/MAINTAINERS](https://github.com/etcd-io/etcd/blob/master/MAINTAINERS) | ||
### etcd-play | ||
etcd playground | ||
- **Owners:** | ||
- [etcd-io/etcd-play/MAINTAINERS](https://github.com/etcd-io/etcd-play/blob/master/MAINTAINERS) | ||
### etcdlabs | ||
etcd playground | ||
- **Owners:** | ||
- [etcd-io/etcdlabs/MAINTAINERS](https://github.com/etcd-io/etcdlabs/blob/master/MAINTAINERS) | ||
### gofail | ||
failpoints for go | ||
- **Owners:** | ||
- [etcd-io/gofail/MAINTAINERS](https://github.com/etcd-io/gofail/blob/master/MAINTAINERS) | ||
### govanityurls | ||
Use a custom domain in your Go import path | ||
- **Owners:** | ||
- [etcd-io/govanityurls/MAINTAINERS](https://github.com/etcd-io/govanityurls/blob/master/MAINTAINERS) | ||
### jetcd | ||
etcd java client | ||
- **Owners:** | ||
- [etcd-io/jetcd/MAINTAINERS](https://github.com/etcd-io/jetcd/blob/master/MAINTAINERS) | ||
### maintainers | ||
issue tracking for project wide non-code concerns | ||
- **Owners:** | ||
- [etcd-io/maintainers/MAINTAINERS](https://github.com/etcd-io/maintainers/blob/master/MAINTAINERS) | ||
### protodoc | ||
protodoc generates Protocol Buffer documentation. | ||
- **Owners:** | ||
- [etcd-io/protodoc/MAINTAINERS](https://github.com/etcd-io/protodoc/blob/master/MAINTAINERS) | ||
### raft | ||
Raft library for maintaining a replicated state machine | ||
- **Owners:** | ||
- [etcd-io/raft/MAINTAINERS](https://github.com/etcd-io/raft/blob/master/MAINTAINERS) | ||
### website | ||
etcd-io | ||
- **Owners:** | ||
- [etcd-io/website/MAINTAINERS](https://github.com/etcd-io/website/blob/master/MAINTAINERS) | ||
### zetcd | ||
Serve the Apache Zookeeper API but back it with an etcd cluster | ||
- **Owners:** | ||
- [etcd-io/zetcd/MAINTAINERS](https://github.com/etcd-io/zetcd/blob/master/MAINTAINERS) | ||
|
||
[subproject-definition]: https://github.com/kubernetes/community/blob/master/governance.md#subprojects | ||
[working-group-definition]: https://github.com/kubernetes/community/blob/master/governance.md#working-groups | ||
<!-- BEGIN CUSTOM CONTENT --> | ||
|
||
<!-- END CUSTOM CONTENT --> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# SIG etcd Charter | ||
|
||
This charter adheres to the conventions described in the [Kubernetes Charter README] and uses | ||
the Roles and Organization Management outlined in [sig-governance]. | ||
|
||
[Kubernetes Charter README]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/README.md | ||
[sig-governance]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance.md | ||
|
||
## Scope | ||
|
||
Owns the etcd project and how it is used by Kubernetes. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note that you gain ownership of the current, very cumbersome and undeterministic process of updating etcd server/client in k8s. this is currently a best effort from community members and issues and PRs just run stale: various product tooling allows some form of etcd version customization, custom images, fips builds, etc. thus the public updates are not a p0-1. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Historically it were etcd maintainers who bumped the etcd image in Kubernetes. Reason was that Kubernetes scalability tests are the major signal in etcd qualification, so minor bumps etcd version were done immediately. What's more cumbersome is security patching, which is a problem because there are the etcd k8s image is also used by kubeadm. With the SIG in place, I think we can discuss changing/improving the release process. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
i think i vaguelly recall this period.
the biggest current pain points for etcd update in k8s are:
kubeadm updates are simple - changing a small version map / constant. by updating kubeadm k8s+etcd gains upgrade signal from etcd version at k8s N-1 to etcd version at k8s N. the kube-up upgrade suite is still not working, IIRC.
happy to participate. |
||
|
||
### In scope | ||
|
||
#### Code, Binaries and Services | ||
|
||
- Development of [etcd] and other repositories under [etcd-io organization] | ||
- Maintenance of [etcd image] packaged with Kubernetes | ||
|
||
[etcd]: https://github.com/etcd-io/etcd | ||
[etcd-io organization]: https://github.com/etcd-io | ||
[etcd image]: https://github.com/kubernetes/kubernetes/tree/master/cluster/images/etcd | ||
|
||
#### Cross-cutting and Externally Facing Processes | ||
|
||
- Specifying, testing and improving the implicit Kubernetes-ETCD Contract, which includes storage requirements, write and delete requirements, read requirements and watch requirements. | ||
- Release process of etcd and other binaries belonging to [etcd-io organization] | ||
|
||
### Out of scope | ||
|
||
- Structure of data stored in etcd by Kubernetes components is owned by SIG API Machinery | ||
|
||
## Roles and Organization Management | ||
|
||
This SIG follows the Roles and Organization Management outlined in [sig-governance] | ||
and opts-in to updates and modifications to [sig-governance]. | ||
|
||
### Additional responsibilities of Tech Leads | ||
|
||
- Release of etcd and other binaries belonging to [etcd-io organization] | ||
|
||
### Deviations from [sig-governance] | ||
|
||
- SIG etcd's participation in the Kubernetes release cycle is limited by etcd having a different schedule for its releases. | ||
- SIG etcd communication utilizes pre-existing forums for communication: | ||
cblecker marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Email: [etcd-dev](https://groups.google.com/forum/?hl=en#!forum/etcd-dev). | ||
- Slack: [#etcd](https://kubernetes.slack.com/messages/C3HD8ARJ5/details/) channel on Kubernetes. | ||
- SIG etcd contributing instructions ([CONTRIBUTING.md]) be defined in etcd project. | ||
|
||
[CONTRIBUTING.md]: https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md | ||
|
||
### Deviations from [kubernetes-repositories] | ||
|
||
- SIG etcd repositories live in github.com/etcd-io | ||
- SIG etcd repositories should (but not must) adopt merge bot, Kubernetes PR commands/bot. | ||
- SIG etcd repositories will follow [rules for donated repositories]. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. posting here for convenience for other reviews: Rules for new repositories
Rules for donated repositoriesThe In addition to the requirements for new repositories, donated repositories must
Note that copyright notices should only be modified or removed by the people or |
||
|
||
[kubernetes-repositories]: https://github.com/kubernetes/community/blob/master/github-management/kubernetes-repositories.md#sig-repositories | ||
[rules for donated repositories]: https://github.com/kubernetes/community/blob/master/github-management/kubernetes-repositories.md#rules-for-donated-repositories | ||
|
||
### Subproject Creation | ||
|
||
By SIG Technical Leads |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
# SIG etcd Vision | ||
|
||
The long-term success of the etcd project depends on the following: | ||
- Etcd is a reliable key-value storage | ||
- Etcd is simple to operate | ||
- Etcd is a standalone solution for managing infrastructure | ||
- Etcd scales beyond Kubernetes dimensions | ||
|
||
The goals and milestones listed here are for future releases. | ||
The scope of release v3.6 has already been defined and is unlikely to change. | ||
|
||
## Etcd is a reliable key-value storage service | ||
|
||
Reliability remains the most important property of etcd. | ||
The project cannot allow for another [data inconsistency incident]. | ||
If we could only pick one thing from the list of goals above, this would be it. | ||
No matter what features we add in the future, | ||
they must not diminish etcd's reliability. | ||
We must establish processes and safeguards to prevent future incidents. | ||
|
||
How? | ||
- Etcd API guarantees are well understood, documented and tested. | ||
- Etcd adopts a production readiness review process for new features, similar to Kubernetes one. | ||
- Robustness tests should cover most of the API and most common failures. | ||
- New features must have accompanying e2e tests and be covered by robustness tests. | ||
- Etcd must be able to immediately detect corruption. | ||
- Etcd must be able to automatically recover from data corruption. | ||
|
||
[data inconsistency incident]: https://github.com/etcd-io/etcd/blob/main/Documentation/postmortems/v3.5-data-inconsistency.md | ||
|
||
## Etcd is simple to operate | ||
|
||
Etcd should be easy to operate. | ||
Currently, there are many steps involved in operating etcd, | ||
and some of these steps require external tools. | ||
For example, Kubernetes provides tools to [downgrade/upgrade etcd]. | ||
These tools are not part of the etcd, | ||
but they are available as part of the Kubernetes distribution of etcd. | ||
|
||
How? | ||
- Etcd should not require users to run periodic defrag | ||
- Etcd officially supports live upgrades and downgrades | ||
- Disaster recovery for Etcd & Kubernetes | ||
- Reliable cluster membership changes via learners with automated promotion | ||
- Two node etcd clusters | ||
|
||
## Etcd is a standalone solution for managing infrastructure configuration | ||
|
||
Kubernetes is not the only way to manage infrastructure. | ||
It was the first to introduce many concepts that have now become the standard, | ||
but they are not unique to Kubernetes. | ||
The most important design principle of Kubernetes, | ||
the reconciliation protocol, is not something unique to it. | ||
|
||
Reconciliation can be implemented solely on etcd, | ||
as has been shown by projects like Cillium, | ||
Calico Typha that support etcd-based control planes. | ||
The reason why this idea has not propagated further is | ||
the amount of work that was put into making | ||
the reconciliation protocol scale in Kubernetes. | ||
The watch cache is a key part of this scaling, | ||
and it is not part of the etcd project. | ||
|
||
If etcd provided a Kubernetes-like storage interface | ||
and primitives for the reconciliation protocol, | ||
it would be a more viable solution for managing infrastructure. | ||
This would allow users to build etcd-based control planes that | ||
could scale to meet the needs of large and complex deployments. | ||
|
||
How? | ||
- Introduce Kubernetes like storage interface into etcd-client | ||
- Provide etcd primitives for reconciliation protocol | ||
- Strip out the Kubernetes watch cache and make it part of the etcd client. | ||
- Use the watch cache in the client to build an eventually consistent etcd proxy. | ||
|
||
[downgrade/upgrade etcd]: https://github.com/kubernetes/kubernetes/tree/master/cluster/images/etcd | ||
|
||
## Etcd scales beyond Kubernetes dimensions | ||
|
||
Etcd has proven its scalability by enabling Kubernetes clusters of up to 5,000 nodes. | ||
However, as the cloud native ecosystem has evolved, new projects have been built on top of Kubernetes. | ||
These projects, such as [KCP] (a multi-cluster control plane) and [Kueue] (a batch job queuing system), | ||
have different scalability requirements than pure Kubernetes. | ||
For example, they need support for larger storage sizes and higher throughput. | ||
|
||
Etcd's strong points are its reliable raft and efficient watch implementation. | ||
However, its storage capabilities are not as strong. | ||
To address this, we should look into growing out storage capabilities and making them more flexible depending on the use case. | ||
|
||
How? | ||
- Well-defined and tested scalability dimensions | ||
- Increase raft throughput (async and batch proposal handling) | ||
- Increasing bbolt supported storage size | ||
- Pluggable storage layer | ||
- Hybrid clusters with write and read optimized members | ||
|
||
|
||
[KCP]: https://cloud.redhat.com/blog/an-introduction-to-kcp | ||
[Kueue]: https://github.com/kubernetes-sigs/kueue | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this label ready?
kubernetes/kubernetes#118077 (comment)
Currently, we have
area/etcd
label.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding, this is done now: kubernetes/test-infra#30948