\ No newline at end of file
+404: This page could not be found
404
This page could not be found.
\ No newline at end of file
diff --git a/_next/data/0-nV1MzsWdmIJ2GMWdViW/index.json b/_next/data/0-nV1MzsWdmIJ2GMWdViW/index.json
new file mode 100644
index 0000000..cfb6815
--- /dev/null
+++ b/_next/data/0-nV1MzsWdmIJ2GMWdViW/index.json
@@ -0,0 +1 @@
+{"pageProps":{"posts":[{"title":"Consistent Hashing","slug":"consistent-hashing","date":"2023-10-26","preview":"Recently I've been reading about the Chord distributed hash table network for a class. The goal of Chord is to map any key (e.g., a file nam..."},{"title":"Rendevous Hashing","slug":"rendevous-hashing","date":"2023-10-26","preview":"This is a follow up to my post on Consistent Hashing -- see that first\nConsistent hashing is a technique used for distributed hash tables th..."},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11","tags":"async,asyncio,python,coroutine,generator","preview":"Python supports generators which allow you to .send() and recieve (via next(...)) values. They are kind of like channels since they don't bl..."},{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-06-01","preview":"When I started as a product manager at Roblox, I didn't really even know what a product manager does. After doing it, I can confidently say ..."},{"title":"Einsum","slug":"einsum","date":"2022-03-14","tags":"math","preview":"The basic idea for einstein notation is to drop the ∑\\sum∑ from summations in some cases (reducing notational complexity). For instance, you..."},{"title":"Entropy","slug":"entropy","date":"2021-12-28","tags":"Information Theory","preview":"Inspired by this video on compression, I wanted to understand what carrying information actually means, from a few interesting examples rela..."},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11","tags":"sudo, su, bash, sh, shell","preview":"Sometimes you want to run without sudo if you can... and run with sudo if you can't.\n_sudo.sh\n$@ || sudo $@\n..."},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11","tags":"Competition, Focus","preview":"Roku feels like a business being cannabalized by competitors with deeper pockets and tall vertical integration. For example, Amazon or Apple..."}]},"__N_SSG":true}
\ No newline at end of file
diff --git a/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/PMing.json b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/PMing.json
new file mode 100644
index 0000000..2261428
--- /dev/null
+++ b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/PMing.json
@@ -0,0 +1 @@
+{"pageProps":{"post":{"title":"Lessons from Product Managing","date":"2023-06-01","slug":"PMing","content":"
When I started as a product manager at Roblox, I didn't really even know what a product manager does. After doing it, I can confidently say that I still don't know. Every PM seems to do it differently.
\n
There are some general trends, though. All the good PMs seem to be constantly thinking about the following things:
\n\n
The Customer - There's a trap of using the metrics you measure to replace the customer. Adoption and retention don't tell you how the customer gets value from the product. It happens so gradually, though, that you don't realize that you've stopped thinking about the customer. And this is the most obvious part of your job as a PM! So it's important not to forget about the customer.
\n
Getting Stuff Done - A good PM is effective not when they issue product directives from on high, but when they work really closely with engineers, designers, and data scientists to fill in the gaps left by the experts. That might mean making the first version of the design to reduce the load on your designer, or filling out paperwork for your engineers. The basic function of the PM is to reduce the mental load to allow builders to build.
\n
Simplify - There are two parts: strong opinions held loosely and clear communication. Strong opinions held loosely, with justification, allow your team to push back on you and converge sooner. Clear, simple communication uses bullet points to make it obvious who needs to work on what and where the open questions lie. Many PMs don't dive deep into technical details because it impedes this function.
\n
Push, but not too hard - The PMs I've seen asks lots of why questions. But they never push too hard on any one thing or undermine the more technical members of their team -- instead, they try to find a way to deliver value to customers around technical constraints.
\n
Constantly Communicate - This is the advice I think is most applicable to a start-up. Having founders who constantly communicate, even around seemingly tiny milestones keeps the entire team grounded in the product and generates forward progress.
\n
Seek Contradiction - this is hard to stomach when you near a deadline, but an incredible way to discover your underlying assumptions. Talk to other PMs, engineers, marketing people, privacy and safety people even (especially!) if you don't like what they will have to say.
\n
Global Tradeoffs - Another trap in PMing is optimizing for your own product's success at the expense of everything else. Great PMs \"take the long view\" and optimize for a great user experience. Sometimes this means forgoing the low-hanging fruit for a better customer experience.
\n"},"morePosts":[{"title":"Consistent Hashing","slug":"consistent-hashing","date":"2023-10-26"},{"title":"Rendevous Hashing","slug":"rendevous-hashing","date":"2023-10-26"},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11"},{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-06-01"},{"title":"Einsum","slug":"einsum","date":"2022-03-14"},{"title":"Entropy","slug":"entropy","date":"2021-12-28"},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11"},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11"}]},"__N_SSG":true}
\ No newline at end of file
diff --git a/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/_sudo.json b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/_sudo.json
new file mode 100644
index 0000000..0578d6b
--- /dev/null
+++ b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/_sudo.json
@@ -0,0 +1 @@
+{"pageProps":{"post":{"title":"Try again with sudo: _sudo","date":"2021-11-11","slug":"_sudo","content":"
Sometimes you want to run without sudo if you can... and run with sudo if you can't.
\n
_sudo.sh
\n
$@ || sudo $@\n
"},"morePosts":[{"title":"Consistent Hashing","slug":"consistent-hashing","date":"2023-10-26"},{"title":"Rendevous Hashing","slug":"rendevous-hashing","date":"2023-10-26"},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11"},{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-06-01"},{"title":"Einsum","slug":"einsum","date":"2022-03-14"},{"title":"Entropy","slug":"entropy","date":"2021-12-28"},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11"},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11"}]},"__N_SSG":true}
\ No newline at end of file
diff --git a/_next/data/nOPRxLI7Ag7g6KN_vyVyX/p/competition-focus.json b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/competition-focus.json
similarity index 68%
rename from _next/data/nOPRxLI7Ag7g6KN_vyVyX/p/competition-focus.json
rename to _next/data/0-nV1MzsWdmIJ2GMWdViW/p/competition-focus.json
index 00f4af1..f07d5b6 100644
--- a/_next/data/nOPRxLI7Ag7g6KN_vyVyX/p/competition-focus.json
+++ b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/competition-focus.json
@@ -1 +1 @@
-{"pageProps":{"post":{"title":"Roku & Competition","date":"2021-11-11","slug":"competition-focus","content":"
Roku feels like a business being cannabalized by competitors with deeper pockets and tall vertical integration. For example, Amazon or Apple produce hardware and the software running on it and the media running on the software. While Roku does produce it's originals, it is not as keenly invested in pushing it to gain mindshare except to allow more space for their ads.
\n
But Roku also leverages the fact that it is not as vertically integrated to create values where its competitors can't in a couple of ways:
\n\n
allowing conflicting ads. Where Amazon would be reluctant to advertise the new Netflix show since that would cannabalize Prime Video, Roku has less qualms about this.
\n
self-incentivizing building a better TV experience across the board. Google or Apple may want to push their brand of TV, which is less the case with Roku.
"},"morePosts":[{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-07-23"},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11"},{"title":"Einsum","slug":"einsum","date":"2022-03-14"},{"title":"Entropy","slug":"entropy","date":"2021-12-28"},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11"},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11"}]},"__N_SSG":true}
\ No newline at end of file
+{"pageProps":{"post":{"title":"Roku & Competition","date":"2021-11-11","slug":"competition-focus","content":"
Roku feels like a business being cannabalized by competitors with deeper pockets and tall vertical integration. For example, Amazon or Apple produce hardware and the software running on it and the media running on the software. While Roku does produce it's originals, it is not as keenly invested in pushing it to gain mindshare except to allow more space for their ads.
\n
But Roku also leverages the fact that it is not as vertically integrated to create values where its competitors can't in a couple of ways:
\n\n
allowing conflicting ads. Where Amazon would be reluctant to advertise the new Netflix show since that would cannabalize Prime Video, Roku has less qualms about this.
\n
self-incentivizing building a better TV experience across the board. Google or Apple may want to push their brand of TV, which is less the case with Roku.
"},"morePosts":[{"title":"Consistent Hashing","slug":"consistent-hashing","date":"2023-10-26"},{"title":"Rendevous Hashing","slug":"rendevous-hashing","date":"2023-10-26"},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11"},{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-06-01"},{"title":"Einsum","slug":"einsum","date":"2022-03-14"},{"title":"Entropy","slug":"entropy","date":"2021-12-28"},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11"},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11"}]},"__N_SSG":true}
\ No newline at end of file
diff --git a/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/consistent-hashing.json b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/consistent-hashing.json
new file mode 100644
index 0000000..6fa8116
--- /dev/null
+++ b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/consistent-hashing.json
@@ -0,0 +1 @@
+{"pageProps":{"post":{"title":"Consistent Hashing","date":"2023-10-26","slug":"consistent-hashing","content":"
Recently I've been reading about the Chord distributed hash table network for a class. The goal of Chord is to map any key (e.g., a file name) to the server that is responsible. This mapping has to be maintained without any centralization: you can ask any server in the network for which other server is responsible for a key, and get an answer, even as servers enter and leave the network as they please. At the core of Chord and most other DHTs is consistent hashing, a beautifully simple idea.
\n
In order for the system to be distributed in an appreciable way, the mapping from keys to servers needs to be somewhat even; in other words any arbitrary set of keys are likely to be evenly distributed among the servers. At the same time, this mapping needs to be consistent: all servers need to agree on which server is responsible for key, even as servers come and go and key responsibility is reassigned.
\n
These two requirements are the problem that consistent hashing solves. It provides a way of taking hash values, which are uniformly distributed, and mappping them to servers in a way that is consistent no matter which server is doing the mapping. This is in contrast to the way we do hashing in a hashtable, which requires us to mod the hash by the size of the table and thereby introduces a dependence on the number of servers which are in the network at any time. We have no guarantees about the size or composition of the network in the distributed situation, so consistent hashing solves the problem in a way that traditional hashtables can't.
\n
Conceptually, there is no big jump here. We merely compute the hash for each key and then find some consistent way of mapping that to a node. In the case of Chord, we create this assignment by hashing the server identifiers, and then mapping each key hash to the closest succeeding key hash. In this way, each node that knows the other nodes in the network can map a key to a node. No matter if a node leaves or enters, the node will continue to be able to map the key. And no matter how nodes enter and leave, the mapping will be close to uniform. These are powerful properties.
"},"morePosts":[{"title":"Consistent Hashing","slug":"consistent-hashing","date":"2023-10-26"},{"title":"Rendevous Hashing","slug":"rendevous-hashing","date":"2023-10-26"},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11"},{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-06-01"},{"title":"Einsum","slug":"einsum","date":"2022-03-14"},{"title":"Entropy","slug":"entropy","date":"2021-12-28"},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11"},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11"}]},"__N_SSG":true}
\ No newline at end of file
diff --git a/_next/data/nOPRxLI7Ag7g6KN_vyVyX/p/einsum.json b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/einsum.json
similarity index 99%
rename from _next/data/nOPRxLI7Ag7g6KN_vyVyX/p/einsum.json
rename to _next/data/0-nV1MzsWdmIJ2GMWdViW/p/einsum.json
index baf6584..d7748f5 100644
--- a/_next/data/nOPRxLI7Ag7g6KN_vyVyX/p/einsum.json
+++ b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/einsum.json
@@ -1 +1 @@
-{"pageProps":{"post":{"title":"Einsum","date":"2022-03-14","slug":"einsum","content":"
The basic idea for einstein notation is to drop the ∑ from summations in some cases (reducing notational complexity). For instance, you might want to compute the matrix multiplication between a row vector and a column vector: ∑ixi∗yi is xiyi in einsum.
The index is just i or j or the like. Repeated indices mean they appear more than once in a single term.
\n
Examples:
\n
\n
xiyi=∑ixiyi ☑️
\n
\n
\n\n
Each index can appear at most twice in any term.
\n\n
\n
This means that even the indices you intend to sum over should be repeated at most twice (obviously indices you don't want to sum over should be repeated at most once)
\n
Examples:
\n\n
xiyizi ✖️
\n
xiyjzi ☑️
\n\n
\n\n
Each term must contain identical non-repeated indices.
\n\n
\n
This is a rule that only applies when you have more than one term. It specifies that if you use a term that only appears once in the first term (a non-repeating term), then you should make an effort to use the same index where there are non-repeating indices. To my mind, this does not always make things intuitive (see the examples below), and it is not emphasized among many Einstein notation users.
\n
Examples:
\n\n
xiyizj+xicp ✖️
\n
xiyizj+xicj ☑️
\n\n
Does this imply that the index j in zj equals the index j in cj? It's a little unclear to my eye.
\n
Applications of Einstein Notation
\n
Kronecker Delta
\n
The Kronecker delta is a pretty simple idea that says the following (written as pseudocode)
\n
δij:
\n
if i == j:\n 1\nelse: \n 0\n
\n
It allows a neat rewriting of the dot product as δijxiyj, or the trace as δijAij. It also allows a nice expression for matrix multiplication δjkAijBki.
\n
Levi-Civita Permutation Tensor
\n
The ϵ tensor (called so because it can be described as N-D array) helps compute cyclic things, which arise in cross-products and determinants.
\n
in 3d we can describe the tensor as follows:
\n
ϵijk
\n
if i == j or j == k or i == k:\n 0\nelse if you can shift ijk s.t. they are in order decreasing order:\n 1\nelse:\n -1\n
\n
By shift we mean going from a sequence abc -> bca -> cab. Case 2 is called the cyclic or even case, and case 3 is the acyclic/odd case.
\n
This allows us to rewrite the cross product where Ci is hte ith component of the cross product i,j,k vary from 1 to 3):\nu×v=(uy∗vz−uz∗vy)x^+(uy∗vz−uz∗vy)y^→Ci=ϵijkujvk
\n
The Levi-Civita tensor allows for a few nice things:
\n\n
It encodes that i=j and i=k since terms go to 0 when this is violated.
\n
The signs of the 2 products in each component Ci are flipped since we have ijk and ikj, one of which must necessarily be odd and one even.
\n
It encodes the flip in sign that occurs for the y^ term (ie when i=2). This is since 213 (ie where i<j so that the expression is u1∗v3) is odd; and 231 (ie where i>j) is even.
\n"},"morePosts":[{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-07-23"},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11"},{"title":"Einsum","slug":"einsum","date":"2022-03-14"},{"title":"Entropy","slug":"entropy","date":"2021-12-28"},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11"},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11"}]},"__N_SSG":true}
\ No newline at end of file
+{"pageProps":{"post":{"title":"Einsum","date":"2022-03-14","slug":"einsum","content":"
The basic idea for einstein notation is to drop the ∑ from summations in some cases (reducing notational complexity). For instance, you might want to compute the matrix multiplication between a row vector and a column vector: ∑ixi∗yi is xiyi in einsum.
The index is just i or j or the like. Repeated indices mean they appear more than once in a single term.
\n
Examples:
\n
\n
xiyi=∑ixiyi ☑️
\n
\n
\n\n
Each index can appear at most twice in any term.
\n\n
\n
This means that even the indices you intend to sum over should be repeated at most twice (obviously indices you don't want to sum over should be repeated at most once)
\n
Examples:
\n\n
xiyizi ✖️
\n
xiyjzi ☑️
\n\n
\n\n
Each term must contain identical non-repeated indices.
\n\n
\n
This is a rule that only applies when you have more than one term. It specifies that if you use a term that only appears once in the first term (a non-repeating term), then you should make an effort to use the same index where there are non-repeating indices. To my mind, this does not always make things intuitive (see the examples below), and it is not emphasized among many Einstein notation users.
\n
Examples:
\n\n
xiyizj+xicp ✖️
\n
xiyizj+xicj ☑️
\n\n
Does this imply that the index j in zj equals the index j in cj? It's a little unclear to my eye.
\n
Applications of Einstein Notation
\n
Kronecker Delta
\n
The Kronecker delta is a pretty simple idea that says the following (written as pseudocode)
\n
δij:
\n
if i == j:\n 1\nelse: \n 0\n
\n
It allows a neat rewriting of the dot product as δijxiyj, or the trace as δijAij. It also allows a nice expression for matrix multiplication δjkAijBki.
\n
Levi-Civita Permutation Tensor
\n
The ϵ tensor (called so because it can be described as N-D array) helps compute cyclic things, which arise in cross-products and determinants.
\n
in 3d we can describe the tensor as follows:
\n
ϵijk
\n
if i == j or j == k or i == k:\n 0\nelse if you can shift ijk s.t. they are in order decreasing order:\n 1\nelse:\n -1\n
\n
By shift we mean going from a sequence abc -> bca -> cab. Case 2 is called the cyclic or even case, and case 3 is the acyclic/odd case.
\n
This allows us to rewrite the cross product where Ci is hte ith component of the cross product i,j,k vary from 1 to 3):\nu×v=(uy∗vz−uz∗vy)x^+(uy∗vz−uz∗vy)y^→Ci=ϵijkujvk
\n
The Levi-Civita tensor allows for a few nice things:
\n\n
It encodes that i=j and i=k since terms go to 0 when this is violated.
\n
The signs of the 2 products in each component Ci are flipped since we have ijk and ikj, one of which must necessarily be odd and one even.
\n
It encodes the flip in sign that occurs for the y^ term (ie when i=2). This is since 213 (ie where i<j so that the expression is u1∗v3) is odd; and 231 (ie where i>j) is even.
\n"},"morePosts":[{"title":"Consistent Hashing","slug":"consistent-hashing","date":"2023-10-26"},{"title":"Rendevous Hashing","slug":"rendevous-hashing","date":"2023-10-26"},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11"},{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-06-01"},{"title":"Einsum","slug":"einsum","date":"2022-03-14"},{"title":"Entropy","slug":"entropy","date":"2021-12-28"},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11"},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11"}]},"__N_SSG":true}
\ No newline at end of file
diff --git a/_next/data/nOPRxLI7Ag7g6KN_vyVyX/p/entropy.json b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/entropy.json
similarity index 96%
rename from _next/data/nOPRxLI7Ag7g6KN_vyVyX/p/entropy.json
rename to _next/data/0-nV1MzsWdmIJ2GMWdViW/p/entropy.json
index 0c5c384..94f1014 100644
--- a/_next/data/nOPRxLI7Ag7g6KN_vyVyX/p/entropy.json
+++ b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/entropy.json
@@ -1 +1 @@
-{"pageProps":{"post":{"title":"Entropy","date":"2021-12-28","slug":"entropy","content":"
Inspired by this video on compression, I wanted to understand what carrying information actually means, from a few interesting examples relating to repeated random events (like how much information is required to encode flipping a coin 100, 1000, or more times).
\n
Intuition for Entropy
\n
Shannon entropy is the expectation of the number of bits required to encode a particular symbol.
\n
Imagine if you took a character out of a string like \"aaaaaaaaaaaa...\". Since you know that every character is an a, there is actually very little information encoded (no information, if the length of the string has already been given). Similarly if you had a 99.9% chance of an a, there is still very little information encoded.
\n
Therefore we can see that the entropy is inverse of the probability -- kind of. If the probability of an a gets really small, then you similarly dont need to encode it very often so its entropy becomes low. Really, we want a parabolic-looking function for the probability:
\n
∑in−pilogpi s.t. ∑inpi=1
\n
Here's a look at what the function inside the sigma looks like (using base 2). As p→0 the linear p term dominates, while in the middle, the logarithmic term plays a big role:\n
\n
Here's maybe a more useful view. Red is function we are examining, green is a linear function and blue is a negative log:\n
\n
One thing that might be confusing is how exactly the log of base 2 related to the number of bits of information. For a second imagine that we use base 10 instead of base 2. Now imagine the probability was .1, then .01, .001 etc. If we wanted to encode that these were the probabilities, then we would need to use −log10(p) digits. It's a similar thing for base 2, if we now consider numbers encoded in binary. Note that this isn't so much about the precision of the probability as it is about the magnitude; if we literally wanted to encode the exact probability and it were .100000000000000001 then we might need to use many more digits than the entropy predicts, even though it makes little difference.
\n
Note that non-power of 2 probabilities, this explanation implies fractional binary digits, but we can accept this as the nature of such a metric.
\n
Applying entropy
\n
It's useful to think of where we can find entropy both in the natural sciences and in statistics.
\n
The link to entropy in science
\n
Entropy in physics, for example, is often stated as the number of microstates a system can achieve, ie all the ways the particles can be arranged etc. This is true, but there is also a probability component -- as probability diminishes to 0, the entropy a microstate contributes is small (it's so unlikely to occur). This mirrors the intuition for information theoretical entropy.
\n
Entropy of the number of successes with the number of trials? (Binomial)
\n
Let X be the binomial random variable that denotes the number of successes of n bernouli trials. The probability mass function is given by Pr[X=i]=pi(1−p)n−i. We aim to find ∑i=1npi(1−p)n−ilog(pi(1−p)n−i).
We see some interesting behavior. For a small number of trials, the entropy is small, but increases then rapidly drops to 0. The fact that the entropy rises for the first couple trials illustrates that entropy rises as more cases are possible (you can have a greater range of values in a binomial random variable of 2 trials than on 1 trial). The fact that it eventually limits to 0 shows us the value of repeated trials in increasing our certainty and thereby reducing entropy.
"},"morePosts":[{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-07-23"},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11"},{"title":"Einsum","slug":"einsum","date":"2022-03-14"},{"title":"Entropy","slug":"entropy","date":"2021-12-28"},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11"},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11"}]},"__N_SSG":true}
\ No newline at end of file
+{"pageProps":{"post":{"title":"Entropy","date":"2021-12-28","slug":"entropy","content":"
Inspired by this video on compression, I wanted to understand what carrying information actually means, from a few interesting examples relating to repeated random events (like how much information is required to encode flipping a coin 100, 1000, or more times).
\n
Intuition for Entropy
\n
Shannon entropy is the expectation of the number of bits required to encode a particular symbol.
\n
Imagine if you took a character out of a string like \"aaaaaaaaaaaa...\". Since you know that every character is an a, there is actually very little information encoded (no information, if the length of the string has already been given). Similarly if you had a 99.9% chance of an a, there is still very little information encoded.
\n
Therefore we can see that the entropy is inverse of the probability -- kind of. If the probability of an a gets really small, then you similarly dont need to encode it very often so its entropy becomes low. Really, we want a parabolic-looking function for the probability:
\n
∑in−pilogpi s.t. ∑inpi=1
\n
Here's a look at what the function inside the sigma looks like (using base 2). As p→0 the linear p term dominates, while in the middle, the logarithmic term plays a big role:\n
\n
Here's maybe a more useful view. Red is function we are examining, green is a linear function and blue is a negative log:\n
\n
One thing that might be confusing is how exactly the log of base 2 related to the number of bits of information. For a second imagine that we use base 10 instead of base 2. Now imagine the probability was .1, then .01, .001 etc. If we wanted to encode that these were the probabilities, then we would need to use −log10(p) digits. It's a similar thing for base 2, if we now consider numbers encoded in binary. Note that this isn't so much about the precision of the probability as it is about the magnitude; if we literally wanted to encode the exact probability and it were .100000000000000001 then we might need to use many more digits than the entropy predicts, even though it makes little difference.
\n
Note that non-power of 2 probabilities, this explanation implies fractional binary digits, but we can accept this as the nature of such a metric.
\n
Applying entropy
\n
It's useful to think of where we can find entropy both in the natural sciences and in statistics.
\n
The link to entropy in science
\n
Entropy in physics, for example, is often stated as the number of microstates a system can achieve, ie all the ways the particles can be arranged etc. This is true, but there is also a probability component -- as probability diminishes to 0, the entropy a microstate contributes is small (it's so unlikely to occur). This mirrors the intuition for information theoretical entropy.
\n
Entropy of the number of successes with the number of trials? (Binomial)
\n
Let X be the binomial random variable that denotes the number of successes of n bernouli trials. The probability mass function is given by Pr[X=i]=pi(1−p)n−i. We aim to find ∑i=1npi(1−p)n−ilog(pi(1−p)n−i).
We see some interesting behavior. For a small number of trials, the entropy is small, but increases then rapidly drops to 0. The fact that the entropy rises for the first couple trials illustrates that entropy rises as more cases are possible (you can have a greater range of values in a binomial random variable of 2 trials than on 1 trial). The fact that it eventually limits to 0 shows us the value of repeated trials in increasing our certainty and thereby reducing entropy.
"},"morePosts":[{"title":"Consistent Hashing","slug":"consistent-hashing","date":"2023-10-26"},{"title":"Rendevous Hashing","slug":"rendevous-hashing","date":"2023-10-26"},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11"},{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-06-01"},{"title":"Einsum","slug":"einsum","date":"2022-03-14"},{"title":"Entropy","slug":"entropy","date":"2021-12-28"},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11"},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11"}]},"__N_SSG":true}
\ No newline at end of file
diff --git a/_next/data/nOPRxLI7Ag7g6KN_vyVyX/p/py-generator-couroutines.json b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/py-generator-couroutines.json
similarity index 72%
rename from _next/data/nOPRxLI7Ag7g6KN_vyVyX/p/py-generator-couroutines.json
rename to _next/data/0-nV1MzsWdmIJ2GMWdViW/p/py-generator-couroutines.json
index 76118aa..f9a5317 100644
--- a/_next/data/nOPRxLI7Ag7g6KN_vyVyX/p/py-generator-couroutines.json
+++ b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/py-generator-couroutines.json
@@ -1 +1 @@
-{"pageProps":{"post":{"title":"Python Coroutines!?!?","date":"2023-06-11","slug":"py-generator-couroutines","content":"
Python supports generators which allow you to .send() and recieve (via next(...)) values. They are kind of like channels since they don't block until you send or recieve.
\n
In the code below, we use callbacks (called aperiodically in a separate thread) to send values to our channel. Simultaneously, we try to consume those values, which should be allowed because generators and our coroutine are non-blocking.
\n
import time, threading\n\ndef channel(x=\"Hello\"):\n while True:\n x = yield x\n\ndef make_coroutine(callback):\n def coroutine():\n callback()\n threading.Timer(1, coroutine).start()\n return coroutine\n\ndef make_callback():\n chan = channel()\n # prime the channel\n next(chan)\n\n def callback():\n print(\"Calling!\")\n chan.send(\"Hello world!\")\n \n return chan, callback\n\ndef main():\n chan, callback = make_callback()\n coroutine = make_coroutine(callback)\n coroutine()\n for i in chan:\n if i is not None:\n print(i)\n\nif __name__ == \"__main__\":\n main()\n
\n
And yet, this approach doesn't work as expected! One of three things happens:
\n\n
ValueError: generator already executing
\n
only None values are output from the channel
\n\n
Why doesn't this work? email me if you have an answer.
"},"morePosts":[{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-07-23"},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11"},{"title":"Einsum","slug":"einsum","date":"2022-03-14"},{"title":"Entropy","slug":"entropy","date":"2021-12-28"},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11"},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11"}]},"__N_SSG":true}
\ No newline at end of file
+{"pageProps":{"post":{"title":"Python Coroutines!?!?","date":"2023-06-11","slug":"py-generator-couroutines","content":"
Python supports generators which allow you to .send() and recieve (via next(...)) values. They are kind of like channels since they don't block until you send or recieve.
\n
In the code below, we use callbacks (called aperiodically in a separate thread) to send values to our channel. Simultaneously, we try to consume those values, which should be allowed because generators and our coroutine are non-blocking.
\n
import time, threading\n\ndef channel(x=\"Hello\"):\n while True:\n x = yield x\n\ndef make_coroutine(callback):\n def coroutine():\n callback()\n threading.Timer(1, coroutine).start()\n return coroutine\n\ndef make_callback():\n chan = channel()\n # prime the channel\n next(chan)\n\n def callback():\n print(\"Calling!\")\n chan.send(\"Hello world!\")\n \n return chan, callback\n\ndef main():\n chan, callback = make_callback()\n coroutine = make_coroutine(callback)\n coroutine()\n for i in chan:\n if i is not None:\n print(i)\n\nif __name__ == \"__main__\":\n main()\n
\n
And yet, this approach doesn't work as expected! One of three things happens:
\n\n
ValueError: generator already executing
\n
only None values are output from the channel
\n\n
Why doesn't this work? email me if you have an answer.
"},"morePosts":[{"title":"Consistent Hashing","slug":"consistent-hashing","date":"2023-10-26"},{"title":"Rendevous Hashing","slug":"rendevous-hashing","date":"2023-10-26"},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11"},{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-06-01"},{"title":"Einsum","slug":"einsum","date":"2022-03-14"},{"title":"Entropy","slug":"entropy","date":"2021-12-28"},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11"},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11"}]},"__N_SSG":true}
\ No newline at end of file
diff --git a/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/rendevous-hashing.json b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/rendevous-hashing.json
new file mode 100644
index 0000000..ed2f33d
--- /dev/null
+++ b/_next/data/0-nV1MzsWdmIJ2GMWdViW/p/rendevous-hashing.json
@@ -0,0 +1 @@
+{"pageProps":{"post":{"title":"Rendevous Hashing","date":"2023-10-26","slug":"rendevous-hashing","content":"
Consistent hashing is a technique used for distributed hash tables that assigns each key to a server in the network. It does this by hashing everything and assigning each key to the closest succeeding server hash value. Rendevous hashing is the generalization of this concept to assign a key to k nodes. You can think about wanting to replicate a file (where it's name is the key) across k servers -- you need something more than consistent hashing. Another situation is where you expect that only a small number of nodes will not be part of the network at any given time (e.g., because failure probability is low but non-zero, and nodes come back over time); in this case consistent hashing would require you to rapidly reconstruct each server's mapping from key ranges to servers so that you can still query for the key from any node. Rendevous hashing solves this inefficiency by storing more keys, but preserving the nice properties of load-balancing and consistency that consistent hashing provides..
\n
The idea behindrendevous hashing is that each server assigns scores to all servers (including itself) for a given key. Then they route the key to the top scorer (or to the top k, if you are rendevous hashing for replication). In particular this scoring set up needs to be consistent in that servers need to be able to agree on their score. It should also be even for load-balancing purposes. The scoring mechanism that fulfills both these things is simply the hash of the server's id concatenated to the key in question.
\n
Read more on Randorithms.com, who has a much better write up.
"},"morePosts":[{"title":"Consistent Hashing","slug":"consistent-hashing","date":"2023-10-26"},{"title":"Rendevous Hashing","slug":"rendevous-hashing","date":"2023-10-26"},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11"},{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-06-01"},{"title":"Einsum","slug":"einsum","date":"2022-03-14"},{"title":"Entropy","slug":"entropy","date":"2021-12-28"},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11"},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11"}]},"__N_SSG":true}
\ No newline at end of file
diff --git a/_next/data/nOPRxLI7Ag7g6KN_vyVyX/index.json b/_next/data/nOPRxLI7Ag7g6KN_vyVyX/index.json
deleted file mode 100644
index f1687e5..0000000
--- a/_next/data/nOPRxLI7Ag7g6KN_vyVyX/index.json
+++ /dev/null
@@ -1 +0,0 @@
-{"pageProps":{"posts":[{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-07-23","preview":"When I started interning as a product manager at Roblox, I didn't really even know what a product manager does. 10 weeks in, I can confident..."},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11","tags":"async,asyncio,python,coroutine,generator","preview":"Python supports generators which allow you to .send() and recieve (via next(...)) values. They are kind of like channels since they don't bl..."},{"title":"Einsum","slug":"einsum","date":"2022-03-14","tags":"math","preview":"The basic idea for einstein notation is to drop the ∑\\sum∑ from summations in some cases (reducing notational complexity). For instance, you..."},{"title":"Entropy","slug":"entropy","date":"2021-12-28","tags":"Information Theory","preview":"Inspired by this video on compression, I wanted to understand what carrying information actually means, from a few interesting examples rela..."},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11","tags":"sudo, su, bash, sh, shell","preview":"Sometimes you want to run without sudo if you can... and run with sudo if you can't.\n_sudo.sh\n$@ || sudo $@\n..."},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11","tags":"Competition, Focus","preview":"Roku feels like a business being cannabalized by competitors with deeper pockets and tall vertical integration. For example, Amazon or Apple..."}]},"__N_SSG":true}
\ No newline at end of file
diff --git a/_next/data/nOPRxLI7Ag7g6KN_vyVyX/p/PMing.json b/_next/data/nOPRxLI7Ag7g6KN_vyVyX/p/PMing.json
deleted file mode 100644
index 4371b41..0000000
--- a/_next/data/nOPRxLI7Ag7g6KN_vyVyX/p/PMing.json
+++ /dev/null
@@ -1 +0,0 @@
-{"pageProps":{"post":{"title":"Lessons from Product Managing","date":"2023-07-23","slug":"PMing","content":"
When I started interning as a product manager at Roblox, I didn't really even know what a product manager does. 10 weeks in, I can confidently say that I still don't know. Every PM seems to do it differently.
\n
There are some general trends, though. All the good PMs seem to be constantly thinking about the following things:
\n\n
The Customer - There's a trap of using the metrics you measure to replace the customer. Adoption and retention don't tell you how the customer gets value from the product. It happens so gradually, though, that you don't realize that you've stopped thinking about the customer. And this is the most obvious part of your job as a PM! So it's important not to forget about the customer.
\n
Getting Stuff Done - A good PM is effective not when they issue product directives from on high, but when they work really closely with engineers, designers, and data scientists to fill in the gaps left by the experts. That might mean making the first version of the design to reduce the load on your designer, or filling out paperwork for your engineers. The basic function of the PM is to reduce the mental load to allow builders to build.
\n
Simplify - There are two parts: strong opinions held loosely and clear communication. Strong opinions held loosely, with justification, allow your team to push back on you and converge sooner. Clear, simple communication uses bullet points to make it obvious who needs to work on what and where the open questions lie. Many PMs don't dive deep into technical details because it impedes this function.
\n
Push, but not too hard - The PMs I've seen asks lots of why questions. But they never push too hard on any one thing or undermine the more technical members of their team -- instead, they try to find a way to deliver value to customers around technical constraints.
\n
Constantly Communicate - This is the advice I think is most applicable to a start-up. Having founders who constantly communicate, even around seemingly tiny milestones keeps the entire team grounded in the product and generates forward progress.
\n
Seek Contradiction - this is hard to stomach when you near a deadline, but an incredible way to discover your underlying assumptions. Talk to other PMs, engineers, marketing people, privacy and safety people even (especially!) if you don't like what they will have to say.
\n
Global Tradeoffs - Another trap in PMing is optimizing for your own product's success at the expense of everything else. Great PMs \"take the long view\" and optimize for a great user experience. Sometimes this means forgoing the low-hanging fruit for a better customer experience.
\n"},"morePosts":[{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-07-23"},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11"},{"title":"Einsum","slug":"einsum","date":"2022-03-14"},{"title":"Entropy","slug":"entropy","date":"2021-12-28"},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11"},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11"}]},"__N_SSG":true}
\ No newline at end of file
diff --git a/_next/data/nOPRxLI7Ag7g6KN_vyVyX/p/_sudo.json b/_next/data/nOPRxLI7Ag7g6KN_vyVyX/p/_sudo.json
deleted file mode 100644
index b05278e..0000000
--- a/_next/data/nOPRxLI7Ag7g6KN_vyVyX/p/_sudo.json
+++ /dev/null
@@ -1 +0,0 @@
-{"pageProps":{"post":{"title":"Try again with sudo: _sudo","date":"2021-11-11","slug":"_sudo","content":"
Sometimes you want to run without sudo if you can... and run with sudo if you can't.
\n
_sudo.sh
\n
$@ || sudo $@\n
"},"morePosts":[{"title":"Lessons from Product Managing","slug":"PMing","date":"2023-07-23"},{"title":"Python Coroutines!?!?","slug":"py-generator-couroutines","date":"2023-06-11"},{"title":"Einsum","slug":"einsum","date":"2022-03-14"},{"title":"Entropy","slug":"entropy","date":"2021-12-28"},{"title":"Try again with sudo: _sudo","slug":"_sudo","date":"2021-11-11"},{"title":"Roku & Competition","slug":"competition-focus","date":"2021-11-11"}]},"__N_SSG":true}
\ No newline at end of file
diff --git a/_next/static/nOPRxLI7Ag7g6KN_vyVyX/_buildManifest.js b/_next/static/0-nV1MzsWdmIJ2GMWdViW/_buildManifest.js
similarity index 100%
rename from _next/static/nOPRxLI7Ag7g6KN_vyVyX/_buildManifest.js
rename to _next/static/0-nV1MzsWdmIJ2GMWdViW/_buildManifest.js
diff --git a/_next/static/nOPRxLI7Ag7g6KN_vyVyX/_ssgManifest.js b/_next/static/0-nV1MzsWdmIJ2GMWdViW/_ssgManifest.js
similarity index 100%
rename from _next/static/nOPRxLI7Ag7g6KN_vyVyX/_ssgManifest.js
rename to _next/static/0-nV1MzsWdmIJ2GMWdViW/_ssgManifest.js
diff --git a/error.html b/error.html
index 3a50534..06c0d04 100644
--- a/error.html
+++ b/error.html
@@ -1 +1 @@
-404 | aadalal
\ No newline at end of file
diff --git a/index.html b/index.html
index 559f46a..49353c9 100644
--- a/index.html
+++ b/index.html
@@ -1,4 +1,5 @@
-aagam's blog
Roku feels like a business being cannabalized by competitors with deeper pockets and tall vertical integration. For example, Amazon or Apple...
\ No newline at end of file
diff --git a/p/PMing.html b/p/PMing.html
index ecaeb12..676dd87 100644
--- a/p/PMing.html
+++ b/p/PMing.html
@@ -1,4 +1,4 @@
-Lessons from Product Managing | aadalal
When I started interning as a product manager at Roblox, I didn't really even know what a product manager does. 10 weeks in, I can confidently say that I still don't know. Every PM seems to do it differently.
When I started as a product manager at Roblox, I didn't really even know what a product manager does. After doing it, I can confidently say that I still don't know. Every PM seems to do it differently.
There are some general trends, though. All the good PMs seem to be constantly thinking about the following things:
The Customer - There's a trap of using the metrics you measure to replace the customer. Adoption and retention don't tell you how the customer gets value from the product. It happens so gradually, though, that you don't realize that you've stopped thinking about the customer. And this is the most obvious part of your job as a PM! So it's important not to forget about the customer.
@@ -8,4 +8,4 @@
Constantly Communicate - This is the advice I think is most applicable to a start-up. Having founders who constantly communicate, even around seemingly tiny milestones keeps the entire team grounded in the product and generates forward progress.
Seek Contradiction - this is hard to stomach when you near a deadline, but an incredible way to discover your underlying assumptions. Talk to other PMs, engineers, marketing people, privacy and safety people even (especially!) if you don't like what they will have to say.
Global Tradeoffs - Another trap in PMing is optimizing for your own product's success at the expense of everything else. Great PMs "take the long view" and optimize for a great user experience. Sometimes this means forgoing the low-hanging fruit for a better customer experience.
-
\ No newline at end of file
+
\ No newline at end of file
diff --git a/p/_sudo.html b/p/_sudo.html
index d235c04..0234987 100644
--- a/p/_sudo.html
+++ b/p/_sudo.html
@@ -1,4 +1,4 @@
-Try again with sudo: _sudo | aadalal
Roku feels like a business being cannabalized by competitors with deeper pockets and tall vertical integration. For example, Amazon or Apple produce hardware and the software running on it and the media running on the software. While Roku does produce it's originals, it is not as keenly invested in pushing it to gain mindshare except to allow more space for their ads.
Roku feels like a business being cannabalized by competitors with deeper pockets and tall vertical integration. For example, Amazon or Apple produce hardware and the software running on it and the media running on the software. While Roku does produce it's originals, it is not as keenly invested in pushing it to gain mindshare except to allow more space for their ads.
But Roku also leverages the fact that it is not as vertically integrated to create values where its competitors can't in a couple of ways:
allowing conflicting ads. Where Amazon would be reluctant to advertise the new Netflix show since that would cannabalize Prime Video, Roku has less qualms about this.
self-incentivizing building a better TV experience across the board. Google or Apple may want to push their brand of TV, which is less the case with Roku.
\ No newline at end of file
diff --git a/p/consistent-hashing.html b/p/consistent-hashing.html
new file mode 100644
index 0000000..cebb22a
--- /dev/null
+++ b/p/consistent-hashing.html
@@ -0,0 +1,4 @@
+Consistent Hashing | aadalal
Recently I've been reading about the Chord distributed hash table network for a class. The goal of Chord is to map any key (e.g., a file name) to the server that is responsible. This mapping has to be maintained without any centralization: you can ask any server in the network for which other server is responsible for a key, and get an answer, even as servers enter and leave the network as they please. At the core of Chord and most other DHTs is consistent hashing, a beautifully simple idea.
+
In order for the system to be distributed in an appreciable way, the mapping from keys to servers needs to be somewhat even; in other words any arbitrary set of keys are likely to be evenly distributed among the servers. At the same time, this mapping needs to be consistent: all servers need to agree on which server is responsible for key, even as servers come and go and key responsibility is reassigned.
+
These two requirements are the problem that consistent hashing solves. It provides a way of taking hash values, which are uniformly distributed, and mappping them to servers in a way that is consistent no matter which server is doing the mapping. This is in contrast to the way we do hashing in a hashtable, which requires us to mod the hash by the size of the table and thereby introduces a dependence on the number of servers which are in the network at any time. We have no guarantees about the size or composition of the network in the distributed situation, so consistent hashing solves the problem in a way that traditional hashtables can't.
+
Conceptually, there is no big jump here. We merely compute the hash for each key and then find some consistent way of mapping that to a node. In the case of Chord, we create this assignment by hashing the server identifiers, and then mapping each key hash to the closest succeeding key hash. In this way, each node that knows the other nodes in the network can map a key to a node. No matter if a node leaves or enters, the node will continue to be able to map the key. And no matter how nodes enter and leave, the mapping will be close to uniform. These are powerful properties.
\ No newline at end of file
diff --git a/p/einsum.html b/p/einsum.html
index bef3ecb..066b85b 100644
--- a/p/einsum.html
+++ b/p/einsum.html
@@ -1,4 +1,4 @@
-Einsum | aadalal
The basic idea for einstein notation is to drop the ∑ from summations in some cases (reducing notational complexity). For instance, you might want to compute the matrix multiplication between a row vector and a column vector: ∑ixi∗yi is xiyi in einsum.
The basic idea for einstein notation is to drop the ∑ from summations in some cases (reducing notational complexity). For instance, you might want to compute the matrix multiplication between a row vector and a column vector: ∑ixi∗yi is xiyi in einsum.
It encodes that i=j and i=k since terms go to 0 when this is violated.
The signs of the 2 products in each component Ci are flipped since we have ijk and ikj, one of which must necessarily be odd and one even.
It encodes the flip in sign that occurs for the y^ term (ie when i=2). This is since 213 (ie where i<j so that the expression is u1∗v3) is odd; and 231 (ie where i>j) is even.
-
\ No newline at end of file
+
\ No newline at end of file
diff --git a/p/entropy.html b/p/entropy.html
index 3883e9d..7b29aea 100644
--- a/p/entropy.html
+++ b/p/entropy.html
@@ -1,4 +1,4 @@
-Entropy | aadalal
Inspired by this video on compression, I wanted to understand what carrying information actually means, from a few interesting examples relating to repeated random events (like how much information is required to encode flipping a coin 100, 1000, or more times).
Inspired by this video on compression, I wanted to understand what carrying information actually means, from a few interesting examples relating to repeated random events (like how much information is required to encode flipping a coin 100, 1000, or more times).
Intuition for Entropy
Shannon entropy is the expectation of the number of bits required to encode a particular symbol.
Imagine if you took a character out of a string like "aaaaaaaaaaaa...". Since you know that every character is an a, there is actually very little information encoded (no information, if the length of the string has already been given). Similarly if you had a 99.9% chance of an a, there is still very little information encoded.
@@ -17,4 +17,4 @@
The link to entropy in science
Entropy of the number of successes with the number of trials? (Binomial)
Let X be the binomial random variable that denotes the number of successes of n bernouli trials. The probability mass function is given by Pr[X=i]=pi(1−p)n−i. We aim to find ∑i=1npi(1−p)n−ilog(pi(1−p)n−i).
We see some interesting behavior. For a small number of trials, the entropy is small, but increases then rapidly drops to 0. The fact that the entropy rises for the first couple trials illustrates that entropy rises as more cases are possible (you can have a greater range of values in a binomial random variable of 2 trials than on 1 trial). The fact that it eventually limits to 0 shows us the value of repeated trials in increasing our certainty and thereby reducing entropy.
\ No newline at end of file
+
We see some interesting behavior. For a small number of trials, the entropy is small, but increases then rapidly drops to 0. The fact that the entropy rises for the first couple trials illustrates that entropy rises as more cases are possible (you can have a greater range of values in a binomial random variable of 2 trials than on 1 trial). The fact that it eventually limits to 0 shows us the value of repeated trials in increasing our certainty and thereby reducing entropy.
\ No newline at end of file
diff --git a/p/py-generator-couroutines.html b/p/py-generator-couroutines.html
index cc45660..a309957 100644
--- a/p/py-generator-couroutines.html
+++ b/p/py-generator-couroutines.html
@@ -1,4 +1,4 @@
-Python Coroutines!?!? | aadalal
Python supports generators which allow you to .send() and recieve (via next(...)) values. They are kind of like channels since they don't block until you send or recieve.
Python supports generators which allow you to .send() and recieve (via next(...)) values. They are kind of like channels since they don't block until you send or recieve.
In the code below, we use callbacks (called aperiodically in a separate thread) to send values to our channel. Simultaneously, we try to consume those values, which should be allowed because generators and our coroutine are non-blocking.
import time, threading
@@ -39,4 +39,4 @@
ValueError: generator already executing
only None values are output from the channel
-
Why doesn't this work? email me if you have an answer.
\ No newline at end of file
+
Why doesn't this work? email me if you have an answer.
\ No newline at end of file
diff --git a/p/rendevous-hashing.html b/p/rendevous-hashing.html
new file mode 100644
index 0000000..f55ee3a
--- /dev/null
+++ b/p/rendevous-hashing.html
@@ -0,0 +1,4 @@
+Rendevous Hashing | aadalal
Consistent hashing is a technique used for distributed hash tables that assigns each key to a server in the network. It does this by hashing everything and assigning each key to the closest succeeding server hash value. Rendevous hashing is the generalization of this concept to assign a key to k nodes. You can think about wanting to replicate a file (where it's name is the key) across k servers -- you need something more than consistent hashing. Another situation is where you expect that only a small number of nodes will not be part of the network at any given time (e.g., because failure probability is low but non-zero, and nodes come back over time); in this case consistent hashing would require you to rapidly reconstruct each server's mapping from key ranges to servers so that you can still query for the key from any node. Rendevous hashing solves this inefficiency by storing more keys, but preserving the nice properties of load-balancing and consistency that consistent hashing provides..
+
The idea behindrendevous hashing is that each server assigns scores to all servers (including itself) for a given key. Then they route the key to the top scorer (or to the top k, if you are rendevous hashing for replication). In particular this scoring set up needs to be consistent in that servers need to be able to agree on their score. It should also be even for load-balancing purposes. The scoring mechanism that fulfills both these things is simply the hash of the server's id concatenated to the key in question.
+
Read more on Randorithms.com, who has a much better write up.