-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add postgres dev #71
base: develop
Are you sure you want to change the base?
Add postgres dev #71
Conversation
sql/snapshot.sql
Outdated
subheight int8 NOT NULL, | ||
last_snapshot_hash text NULL, | ||
created_at timestamp NULL, | ||
updated_at timestamp NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need updated_at
field? afaik all data here are immutable and once stored, never updated 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not necessarily true. OpenSearch indexes are edited sometimes in unusual cases, for example. The only overhead for updated_at
is the trigger when it's actually updated so it doesn't really impact performance significantly. It makes the database a lot more human readable in case of any issues though.
I prefer to have them on all tables but it's up for debate if there's a good reason to remove those columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenSearch indexes are edited sometimes in unusual cases, for example.
@AlexBrandes could you elaborate on that? what is an example case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would know better than I would but haven't you guys had to edit or reindex in certain cases? There have been the sporadic issues with balances (I know some of those were issues between 2 indexes). There was also an older case I remember that had a broken chain that had to be fixed.
I can think of other cases when we might update a row instead of deleting it and re-inserting also. For example, we might decide that we really need all historical state proofs from all snapshots to be returned through the API. Updating the existing rows would be a better strategy than deleting/re-inserting everything.
I guess the important questions are:
- Will this field impact performance enough that we should avoid it? imo no
- Will this field be useful enough for working with the DB that it's worth including? imo yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK all cases were that the data were not indexed. Since we ingest blockchain data that are immutable, we should never update data indexed in the database. (For example, an exchange confirmed a transaction and then you update its balance or parent hash, or whatever). AFAIK block explorer, since it's centralized, works as a source of truth for the whole network. Cluster nodes fetch data from block explorer when performing a rollback. Imo those data should never be modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it would be a rare occurrence that a record would be updated - in general I agree.
What's the argument against including the column though? Performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both arguments make sense. Although adding the updated_at
column later is not free, if we're careful, we can minimize the impact, so I don't think we need to commit right now :)
(since updates are rare, maybe we can consider a separate table also)
sql/snapshot.sql
Outdated
amount int8 NOT NULL, | ||
fee int8 NOT NULL, | ||
block_hash text NOT NULL, | ||
transaction_original_id int4 NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this field? 🤔
object Configuration { | ||
import pureconfig.ConfigSource | ||
import pureconfig.generic.auto._ | ||
|
||
case class DbConfig(host: String, port: Int, user: String, password: Option[String], database: String, maxSessions: Int) | ||
|
||
lazy val dbConfig: DbConfig = ConfigSource | ||
.default | ||
.at("snapshotStreaming.db") | ||
.loadOrThrow[DbConfig] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use the pattern that we already have configured in the project? it would be great not to make two different ways of reading config, so either use the previous one or refactor the previous one to use pureconfig instead (but as a separate PR) 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a WIP and the main goal is to validate the approach and library. Once it reaches a mature state, I'll unify it with the rest of the project. I favor PureConfig because is less boilerplate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Me too, but EOD I'd opt for consistency so either this or that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall Notes:
- Table names should be plural (global_snapshots, dag_balance_changes, etc.)
- Think about what indexes will be needed based on the existing endpoint definitions.
- Tables that still need to be added:
-
- metagraph balances
-
- blocks
-
- DAG rewards + metagraph rewards
-
- FeeTransactions
-
- Metagraphs
|
||
-- DROP TABLE global_snapshot; | ||
|
||
CREATE TABLE global_snapshot ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes:
hash
can be VARCHARsubheight
can be int4last_snapshot_hash
can be VARCHAR- We need to include fields from the current snapshot schema here as nullable fields:
epochProgress
metagraphSnapshotCount
- We should have a unique index/constraint on
hash
last_snapshot_hash
should FK tohash
on this tablecreated_at
/updated_at
- need DEFAULT NOW() to set initial value and then I think updated_at needs a trigger to update whenever it’s changed. We should consider timestampz instead of timestamp here but maybe it doesn’t matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to add epochProgress
- also add version
and proofs
as a separate relation.
|
||
-- DROP TABLE dag_transaction; | ||
|
||
CREATE TABLE dag_transaction ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes:
hash
can be VARCHARglobal_ordinal
- nice, let’s use this format throughout. Need an index on this column to fetch transactions per snapshot.block
- is this supposed to beblock_hash
? Can be VARCHAR.source
- this is a wallet address. Should be VARCHAR.destination
- also a wallet addressblock_hash
- what’s the difference between this andblock
?transaction_original_id
- does this FK to a separate table? I’m not really sure why we return this from the API but we’ll need to continue returning it for backwards compatibility. We could store it as JSON here as it’s never queried separately from this record and it’s a nested record so it would mean multiple tables to store it separately.created_at
/updated_at
- same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few other missing columns here if you check against the existing BE API output: salt, parent, etc.
https://be-mainnet.constellationnetwork.io/transactions?limit=10
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still missing the columns in the above comment.
sql/snapshot.sql
Outdated
|
||
-- DROP TABLE metagraph_snapshot; | ||
|
||
CREATE TABLE metagraph_snapshot ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes:
metagraph_id
should be VARCHARordinal
- should we make thismetagraph_ordinal
? Or justordinal
for metagraph snapshots andglobal_ordinal
for global snapshots? We should be consistent on all tables.global_ordinal
- I think this should be NOT NULL. Is there a case this could be NULL?hash
- VARCHARsubheight
- probably int4 since this doesn’t get biglast_snapshot_hash
VARCHARcreated_at
/updated_at
- same as above- global_snapshot_fk constraint references the metagraph ordinal but we need 2 FKs here anyways: global snapshot and metagraph snapshot.
- Need
epochProgress
here also and probably other CurrencySnapshot fields (fee
definitely). Take a look at the output in an actual snapshot.
regarding varchar vs text, I don't have a preference, the underlying datatype is the same. Since size is fixed, should we consider char(SIZE) ? |
Ah, I didn't know this. That's not the case in MySQL, for example. I think TEXT is fine then. I feel like we will run into edge cases we can't support if we force an exact length with char(SIZE). |
val SplitSnapshot(snapshot, blocks, | ||
transactions, | ||
balances, | ||
currSnapshot, | ||
currIncrementalSnapshots, | ||
currBlocks, | ||
currTransactions, | ||
currFeeTransactions, | ||
currBalances) = splitSnapshot | ||
val parallelRequests = updateParallelRequests( | ||
blocks, | ||
transactions, | ||
balances, | ||
currSnapshot, | ||
currIncrementalSnapshots, | ||
currBlocks, | ||
currTransactions, | ||
currFeeTransactions, | ||
currBalances | ||
).grouped(config.bulkSize).toList |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can put whole split snapshot into the updateParallelRequests
function?
(currSnapshot, currIncrementalSnapshots, currBlocks, currTransactions, currFeeTransactions, currBalances) = | ||
mappedCurrencyData | ||
} yield { | ||
val (snapshot, blocks, transactions, balances) = mappedGlobalData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why currency data are in the for comp and global data are in yield? imo only the SplitSnapshot should be in the yield. or even, create a fn in the SplitSnapshot companion object that will receive mapped global data and mapped currency data, and produce the split snapshot. then, that whole for comprehension may look like:
(globalMapper.mapGlobalSnapshot(...), currencyMapper.mapCurrencySnapshots(...))
.mapN(SplitSnapshot(_, _))
|
||
case class UpdateRequests( | ||
sequentialRequests: Seq[Seq[UpdateRequest]], | ||
parallelRequests : List[List[UpdateRequest]] | ||
) | ||
|
||
case class SplitSnapshot(snapshot: Snapshot, blocks: Seq[Block], txs: List[Transaction], balances: Seq[AddressBalance], cdSnapshots: Seq[CurrencyData[Snapshot]], cdCcySnapshots: Seq[CurrencyData[CurrencySnapshot]], cdBlocks: Seq[CurrencyData[Block]], cdTxs: Seq[CurrencyData[Transaction]], cdFeeTxs: Seq[CurrencyData[FeeTransaction]], cdBalances: Seq[CurrencyData[AddressBalance]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the use case for the SplitSnapshot
? 🤔 I see you create the class and then desctructure it immediately after 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think we need an addresses table that all the different transaction tables FK to (source/destination).
|
||
-- DROP TABLE global_snapshot; | ||
|
||
CREATE TABLE global_snapshot ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to add epochProgress
- also add version
and proofs
as a separate relation.
snapshot_hash varchar NOT NULL, | ||
snapshot_ordinal int8 NOT NULL, | ||
created_at timestamp DEFAULT now() NOT NULL, | ||
updated_at timestamp DEFAULT now() NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just comment here but it applies to all the tables. You'll need an auto-update trigger to support updated_at
.
CONSTRAINT global_snapshot_pkey PRIMARY KEY (ordinal) | ||
created_at timestamp DEFAULT now() NOT NULL, | ||
updated_at timestamp DEFAULT now() NOT NULL, | ||
CONSTRAINT global_snapshot_pk PRIMARY KEY (hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ordinal
needs to be a unique constraint
staking_address varchar NULL, | ||
created_at timestamp DEFAULT now() NOT NULL, | ||
updated_at timestamp DEFAULT now() NOT NULL, | ||
CONSTRAINT metagraph_snapshot_pk PRIMARY KEY (metagraph_id, hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(metagraph_id, ordinal)
should be a unique constraint here also.
@@ -42,13 +84,13 @@ CREATE INDEX global_snapshot_hash_idx ON public.global_snapshot USING btree (has | |||
-- DROP TABLE dag_balance_change; | |||
|
|||
CREATE TABLE dag_balance_change ( | |||
hash varchar NOT NULL, | |||
ordinal int8 NOT NULL, | |||
snapshot_hash varchar NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if snapshot_hash
is the FK to snapshots here, we need global_ordinal
because we need to be able to query this by address and ordinal like this
SELECT balance
FROM dag_balance_changes
WHERE global_ordinal < :snapshot_ordinal and address = :address
LIMIT 1
CONSTRAINT dag_balance_change_global_snapshot_fk FOREIGN KEY (ordinal) REFERENCES global_snapshot(ordinal) | ||
created_at timestamp DEFAULT now() NOT NULL, | ||
updated_at timestamp DEFAULT now() NOT NULL, | ||
CONSTRAINT dag_balance_change_pk PRIMARY KEY (snapshot_hash, address), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CONSTRAINT dag_balance_change_pk PRIMARY KEY (snapshot_hash, address), | |
CONSTRAINT dag_balance_change_pk PRIMARY KEY (address, global_ordinal), |
snapshot_hash varchar NOT NULL, | ||
created_at timestamp DEFAULT now() NOT NULL, | ||
updated_at timestamp DEFAULT now() NOT NULL, | ||
CONSTRAINT dag_block_pk PRIMARY KEY (hash), | ||
CONSTRAINT dag_block_block_fk FOREIGN KEY (hash) REFERENCES block(hash), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an FK to the parent abstract_table? I think we can probably remove.
hash varchar NOT NULL, | ||
block varchar NULL, | ||
source_addr varchar NOT NULL, | ||
CREATE TABLE dag_reward_transaction ( | ||
destination_addr varchar NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this one destination
to match the transaction schema
global_snapshot_hash varchar NOT NULL, | ||
created_at timestamp DEFAULT now() NOT NULL, | ||
updated_at timestamp DEFAULT now() NOT NULL, | ||
CONSTRAINT reward_transaction_global_snapshot_fk FOREIGN KEY (global_snapshot_hash) REFERENCES global_snapshot(hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table has no PK.
|
||
-- DROP TABLE dag_transaction; | ||
|
||
CREATE TABLE dag_transaction ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still missing the columns in the above comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry - hit submit on the last review too quick. Here are a few more comments.
created_at timestamp DEFAULT now() NOT NULL, | ||
updated_at timestamp DEFAULT now() NOT NULL, | ||
CONSTRAINT dag_transaction_pk PRIMARY KEY (hash), | ||
CONSTRAINT dag_transaction_abstract_transaction_fk FOREIGN KEY (hash) REFERENCES abstract_transaction(hash), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it need an FK to the abstract table? I'm not familiar with how that works.
metagraph_id varchar NULL, | ||
created_at timestamp DEFAULT now() NOT NULL, | ||
updated_at timestamp DEFAULT now() NOT NULL, | ||
CONSTRAINT fee_transaction_pkey PRIMARY KEY (hash), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be at least (metagraph_id, hash)
. We should double check that the hash is unique to the metagraph also. It's generated from the data but I don't remember how it's made unique.
metagraph_snapshot_hash varchar NULL, | ||
metagraph_id varchar NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metagraph_snapshot_hash varchar NULL, | |
metagraph_id varchar NULL, | |
metagraph_snapshot_hash varchar NOT NULL, | |
metagraph_id varchar NOT NULL, |
|
||
CREATE TABLE metagraph_balance_change ( | ||
metagraph_id varchar NOT NULL, | ||
metagraph_hash varchar NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as the dag_balance_change
table. We need ordinal here to be queried, unless the intent is to join to the metagraph_snapshots table. That seems like it might be too inefficient though to search against.
balance int8 NULL, | ||
created_at timestamp DEFAULT now() NOT NULL, | ||
updated_at timestamp DEFAULT now() NOT NULL, | ||
CONSTRAINT metagraph_balance_change_metagraph_snapshot_fk FOREIGN KEY (metagraph_id,metagraph_hash) REFERENCES metagraph_snapshot(metagraph_id,hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table has no PK
|
||
-- DROP TABLE metagraph_block; | ||
|
||
CREATE TABLE metagraph_block ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table has no PK.
|
||
-- DROP TABLE metagraph_reward_transaction; | ||
|
||
CREATE TABLE metagraph_reward_transaction ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table has no PK.
|
||
-- DROP TABLE metagraph_snapshots_to_global_snapshot; | ||
|
||
CREATE TABLE metagraph_snapshots_to_global_snapshot ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this works - metagraph_id and metagraph_snapshot_hash have to be NOT NULL and the FK to metagraph_snapshot would need to be removed because these rows would be inserted before the snapshot records exist. This table does feel a little bit messy though if it doesn't have FK constraints.
It might be easier to just to the metagraph_snapshot_count
or a JSONB field with an array of metagraph snapshot records on the global snapshot. They would only be used while upload of the metagraph snapshots is pending.
created_at timestamp DEFAULT now() NOT NULL, | ||
updated_at timestamp DEFAULT now() NOT NULL, | ||
CONSTRAINT metagraph_snapshots_to_global_snapshot_global_snapshot_fk FOREIGN KEY (global_snapshot_hash) REFERENCES global_snapshot(hash), | ||
CONSTRAINT metagraph_snapshots_to_global_snapshot_metagraph_snapshot_fk FOREIGN KEY (metagraph_id,metagraph_snapshot_hash) REFERENCES metagraph_snapshot(metagraph_id,hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table has no PK.
No description provided.