Add postgres dev #71

gclaramunt · 2024-12-17T23:25:19Z

No description provided.

marcinwadon · 2024-12-17T23:27:19Z

sql/snapshot.sql

+	subheight int8 NOT NULL,
+	last_snapshot_hash text NULL,
+	created_at timestamp NULL,
+	updated_at timestamp NULL,


why we need updated_at field? afaik all data here are immutable and once stored, never updated 🤔

That's not necessarily true. OpenSearch indexes are edited sometimes in unusual cases, for example. The only overhead for updated_at is the trigger when it's actually updated so it doesn't really impact performance significantly. It makes the database a lot more human readable in case of any issues though.

I prefer to have them on all tables but it's up for debate if there's a good reason to remove those columns.

OpenSearch indexes are edited sometimes in unusual cases, for example.

@AlexBrandes could you elaborate on that? what is an example case?

You would know better than I would but haven't you guys had to edit or reindex in certain cases? There have been the sporadic issues with balances (I know some of those were issues between 2 indexes). There was also an older case I remember that had a broken chain that had to be fixed.

I can think of other cases when we might update a row instead of deleting it and re-inserting also. For example, we might decide that we really need all historical state proofs from all snapshots to be returned through the API. Updating the existing rows would be a better strategy than deleting/re-inserting everything.

I guess the important questions are:

Will this field impact performance enough that we should avoid it? imo no

Will this field be useful enough for working with the DB that it's worth including? imo yes

AFAIK all cases were that the data were not indexed. Since we ingest blockchain data that are immutable, we should never update data indexed in the database. (For example, an exchange confirmed a transaction and then you update its balance or parent hash, or whatever). AFAIK block explorer, since it's centralized, works as a source of truth for the whole network. Cluster nodes fetch data from block explorer when performing a rollback. Imo those data should never be modified.

So it would be a rare occurrence that a record would be updated - in general I agree.

What's the argument against including the column though? Performance?

Both arguments make sense. Although adding the updated_at column later is not free, if we're careful, we can minimize the impact, so I don't think we need to commit right now :)
(since updates are rare, maybe we can consider a separate table also)

marcinwadon · 2024-12-17T23:28:02Z

sql/snapshot.sql

+	amount int8 NOT NULL,
+	fee int8 NOT NULL,
+	block_hash text NOT NULL,
+	transaction_original_id int4 NULL,


what is this field? 🤔

marcinwadon · 2024-12-17T23:29:53Z

src/main/scala/org/constellation/snapshotstreaming/Configuration.scala

+object Configuration {
+  import pureconfig.ConfigSource
+  import pureconfig.generic.auto._
+
+  case class DbConfig(host: String, port: Int, user: String, password: Option[String], database: String, maxSessions: Int)
+
+  lazy val dbConfig: DbConfig = ConfigSource
+    .default
+    .at("snapshotStreaming.db")
+    .loadOrThrow[DbConfig]
+


can we use the pattern that we already have configured in the project? it would be great not to make two different ways of reading config, so either use the previous one or refactor the previous one to use pureconfig instead (but as a separate PR) 🤔

This is a WIP and the main goal is to validate the approach and library. Once it reaches a mature state, I'll unify it with the rest of the project. I favor PureConfig because is less boilerplate.

Ok. Me too, but EOD I'd opt for consistency so either this or that

build.sbt

AlexBrandes

Overall Notes:

Table names should be plural (global_snapshots, dag_balance_changes, etc.)
Think about what indexes will be needed based on the existing endpoint definitions.
Tables that still need to be added:
- metagraph balances
- blocks
- DAG rewards + metagraph rewards
- FeeTransactions
- Metagraphs

AlexBrandes · 2024-12-17T23:28:39Z

sql/snapshot.sql

+
+-- DROP TABLE global_snapshot;
+
+CREATE TABLE global_snapshot (


Notes:

hash can be VARCHAR

subheight can be int4

last_snapshot_hash can be VARCHAR

We need to include fields from the current snapshot schema here as nullable fields:

epochProgress

metagraphSnapshotCount

We should have a unique index/constraint on hash

last_snapshot_hash should FK to hash on this table

created_at / updated_at - need DEFAULT NOW() to set initial value and then I think updated_at needs a trigger to update whenever it’s changed. We should consider timestampz instead of timestamp here but maybe it doesn’t matter.

Still need to add epochProgress - also add version and proofs as a separate relation.

sql/snapshot.sql

AlexBrandes · 2024-12-17T23:29:29Z

sql/snapshot.sql

+
+-- DROP TABLE dag_transaction;
+
+CREATE TABLE dag_transaction (


Notes:

hash can be VARCHAR

global_ordinal - nice, let’s use this format throughout. Need an index on this column to fetch transactions per snapshot.

block - is this supposed to be block_hash? Can be VARCHAR.

source - this is a wallet address. Should be VARCHAR.

destination - also a wallet address

block_hash - what’s the difference between this and block?

transaction_original_id - does this FK to a separate table? I’m not really sure why we return this from the API but we’ll need to continue returning it for backwards compatibility. We could store it as JSON here as it’s never queried separately from this record and it’s a nested record so it would mean multiple tables to store it separately.

created_at / updated_at - same as above

There are a few other missing columns here if you check against the existing BE API output: salt, parent, etc.

https://be-mainnet.constellationnetwork.io/transactions?limit=10

Still missing the columns in the above comment.

AlexBrandes · 2024-12-17T23:29:45Z

sql/snapshot.sql

+
+-- DROP TABLE metagraph_snapshot;
+
+CREATE TABLE metagraph_snapshot (


Notes:

metagraph_id should be VARCHAR

ordinal - should we make this metagraph_ordinal ? Or just ordinal for metagraph snapshots and global_ordinal for global snapshots? We should be consistent on all tables.

global_ordinal - I think this should be NOT NULL. Is there a case this could be NULL?

hash - VARCHAR

subheight - probably int4 since this doesn’t get big

last_snapshot_hash VARCHAR

created_at / updated_at - same as above

global_snapshot_fk constraint references the metagraph ordinal but we need 2 FKs here anyways: global snapshot and metagraph snapshot.

Need epochProgress here also and probably other CurrencySnapshot fields (fee definitely). Take a look at the output in an actual snapshot.

sql/snapshot.sql

gclaramunt · 2024-12-18T19:22:25Z

regarding varchar vs text, I don't have a preference, the underlying datatype is the same. Since size is fixed, should we consider char(SIZE) ?

AlexBrandes · 2024-12-18T23:50:17Z

regarding varchar vs text, I don't have a preference, the underlying datatype is the same. Since size is fixed, should we consider char(SIZE) ?

Ah, I didn't know this. That's not the case in MySQL, for example.

I think TEXT is fine then. I feel like we will run into edge cases we can't support if we force an exact length with char(SIZE).

marcinwadon · 2024-12-20T23:17:26Z

src/main/scala/org/constellation/snapshotstreaming/opensearch/UpdateRequestBuilder.scala

+        val SplitSnapshot(snapshot, blocks,
+        transactions,
+        balances,
+        currSnapshot,
+        currIncrementalSnapshots,
+        currBlocks,
+        currTransactions,
+        currFeeTransactions,
+        currBalances) = splitSnapshot
+        val parallelRequests = updateParallelRequests(
+          blocks,
+          transactions,
+          balances,
+          currSnapshot,
+          currIncrementalSnapshots,
+          currBlocks,
+          currTransactions,
+          currFeeTransactions,
+          currBalances
+        ).grouped(config.bulkSize).toList


maybe you can put whole split snapshot into the updateParallelRequests function?

marcinwadon · 2024-12-20T23:22:05Z

src/main/scala/org/constellation/snapshotstreaming/opensearch/UpdateRequestBuilder.scala

+      (currSnapshot, currIncrementalSnapshots, currBlocks, currTransactions, currFeeTransactions, currBalances) =
+        mappedCurrencyData
+    } yield {
+      val (snapshot, blocks, transactions, balances) = mappedGlobalData


why currency data are in the for comp and global data are in yield? imo only the SplitSnapshot should be in the yield. or even, create a fn in the SplitSnapshot companion object that will receive mapped global data and mapped currency data, and produce the split snapshot. then, that whole for comprehension may look like:

(globalMapper.mapGlobalSnapshot(...), currencyMapper.mapCurrencySnapshots(...)) .mapN(SplitSnapshot(_, _))

marcinwadon · 2024-12-20T23:23:30Z

src/main/scala/org/constellation/snapshotstreaming/opensearch/UpdateRequestBuilder.scala


 case class UpdateRequests(
  sequentialRequests: Seq[Seq[UpdateRequest]],
  parallelRequests  : List[List[UpdateRequest]]
 )

+case class SplitSnapshot(snapshot: Snapshot, blocks: Seq[Block], txs: List[Transaction], balances: Seq[AddressBalance], cdSnapshots: Seq[CurrencyData[Snapshot]], cdCcySnapshots: Seq[CurrencyData[CurrencySnapshot]], cdBlocks: Seq[CurrencyData[Block]], cdTxs: Seq[CurrencyData[Transaction]], cdFeeTxs: Seq[CurrencyData[FeeTransaction]], cdBalances: Seq[CurrencyData[AddressBalance]])


what is the use case for the SplitSnapshot? 🤔 I see you create the class and then desctructure it immediately after 🤔

AlexBrandes

I think we need an addresses table that all the different transaction tables FK to (source/destination).

AlexBrandes · 2025-01-16T18:57:38Z

sql/snapshot.sql

+
+-- DROP TABLE global_snapshot;
+
+CREATE TABLE global_snapshot (


Still need to add epochProgress - also add version and proofs as a separate relation.

AlexBrandes · 2025-01-16T19:17:25Z

sql/snapshot.sql

-	snapshot_hash varchar NOT NULL,
-	snapshot_ordinal int8 NOT NULL,
+	created_at timestamp DEFAULT now() NOT NULL,
+	updated_at timestamp DEFAULT now() NOT NULL,


I'll just comment here but it applies to all the tables. You'll need an auto-update trigger to support updated_at.

AlexBrandes · 2025-01-16T19:19:02Z

sql/snapshot.sql

-	CONSTRAINT global_snapshot_pkey PRIMARY KEY (ordinal)
+	created_at timestamp DEFAULT now() NOT NULL,
+	updated_at timestamp DEFAULT now() NOT NULL,
+	CONSTRAINT global_snapshot_pk PRIMARY KEY (hash)


ordinal needs to be a unique constraint

AlexBrandes · 2025-01-16T19:19:31Z

sql/snapshot.sql

+	staking_address varchar NULL,
+	created_at timestamp DEFAULT now() NOT NULL,
+	updated_at timestamp DEFAULT now() NOT NULL,
+	CONSTRAINT metagraph_snapshot_pk PRIMARY KEY (metagraph_id, hash)


(metagraph_id, ordinal) should be a unique constraint here also.

AlexBrandes · 2025-01-16T19:22:35Z

sql/snapshot.sql

@@ -42,13 +84,13 @@ CREATE INDEX global_snapshot_hash_idx ON public.global_snapshot USING btree (has
 -- DROP TABLE dag_balance_change;

 CREATE TABLE dag_balance_change (
-    hash varchar NOT NULL,
-	ordinal int8 NOT NULL,
+	snapshot_hash varchar NOT NULL,


Even if snapshot_hash is the FK to snapshots here, we need global_ordinal because we need to be able to query this by address and ordinal like this

SELECT balance FROM dag_balance_changes WHERE global_ordinal < :snapshot_ordinal and address = :address LIMIT 1

AlexBrandes · 2025-01-16T19:24:07Z

sql/snapshot.sql

-	CONSTRAINT dag_balance_change_global_snapshot_fk FOREIGN KEY (ordinal) REFERENCES global_snapshot(ordinal)
+	created_at timestamp DEFAULT now() NOT NULL,
+	updated_at timestamp DEFAULT now() NOT NULL,
+	CONSTRAINT dag_balance_change_pk PRIMARY KEY (snapshot_hash, address),


Suggested change

CONSTRAINT dag_balance_change_pk PRIMARY KEY (snapshot_hash, address),

CONSTRAINT dag_balance_change_pk PRIMARY KEY (address, global_ordinal),

AlexBrandes · 2025-01-16T19:26:26Z

sql/snapshot.sql

+	snapshot_hash varchar NOT NULL,
+	created_at timestamp DEFAULT now() NOT NULL,
+	updated_at timestamp DEFAULT now() NOT NULL,
+	CONSTRAINT dag_block_pk PRIMARY KEY (hash),
 	CONSTRAINT dag_block_block_fk FOREIGN KEY (hash) REFERENCES block(hash),


Is this an FK to the parent abstract_table? I think we can probably remove.

AlexBrandes · 2025-01-16T19:27:33Z

sql/snapshot.sql

-	hash varchar NOT NULL,
-	block varchar NULL,
-	source_addr varchar NOT NULL,
+CREATE TABLE dag_reward_transaction (
 	destination_addr varchar NOT NULL,


I would call this one destination to match the transaction schema

AlexBrandes · 2025-01-16T19:28:38Z

sql/snapshot.sql

+	global_snapshot_hash varchar NOT NULL,
+	created_at timestamp DEFAULT now() NOT NULL,
+	updated_at timestamp DEFAULT now() NOT NULL,
+	CONSTRAINT reward_transaction_global_snapshot_fk FOREIGN KEY (global_snapshot_hash) REFERENCES global_snapshot(hash)


This table has no PK.

AlexBrandes · 2025-01-16T19:29:47Z

sql/snapshot.sql

+
+-- DROP TABLE dag_transaction;
+
+CREATE TABLE dag_transaction (


Still missing the columns in the above comment.

AlexBrandes

Sorry - hit submit on the last review too quick. Here are a few more comments.

AlexBrandes · 2025-01-16T19:31:50Z

sql/snapshot.sql

+	created_at timestamp DEFAULT now() NOT NULL,
+	updated_at timestamp DEFAULT now() NOT NULL,
+	CONSTRAINT dag_transaction_pk PRIMARY KEY (hash),
+	CONSTRAINT dag_transaction_abstract_transaction_fk FOREIGN KEY (hash) REFERENCES abstract_transaction(hash),


Does it need an FK to the abstract table? I'm not familiar with how that works.

AlexBrandes · 2025-01-16T19:34:30Z

sql/snapshot.sql

+	metagraph_id varchar NULL,
+	created_at timestamp DEFAULT now() NOT NULL,
+	updated_at timestamp DEFAULT now() NOT NULL,
+	CONSTRAINT fee_transaction_pkey PRIMARY KEY (hash),


I think this needs to be at least (metagraph_id, hash). We should double check that the hash is unique to the metagraph also. It's generated from the data but I don't remember how it's made unique.

AlexBrandes · 2025-01-16T19:35:39Z

sql/snapshot.sql

+	metagraph_snapshot_hash varchar NULL,
+	metagraph_id varchar NULL,


Suggested change

metagraph_snapshot_hash varchar NULL,

metagraph_id varchar NULL,

metagraph_snapshot_hash varchar NOT NULL,

metagraph_id varchar NOT NULL,

AlexBrandes · 2025-01-16T19:37:12Z

sql/snapshot.sql

+
+CREATE TABLE metagraph_balance_change (
+	metagraph_id varchar NOT NULL,
+	metagraph_hash varchar NOT NULL,


Same issue as the dag_balance_change table. We need ordinal here to be queried, unless the intent is to join to the metagraph_snapshots table. That seems like it might be too inefficient though to search against.

AlexBrandes · 2025-01-16T19:37:35Z

sql/snapshot.sql

+	balance int8 NULL,
+	created_at timestamp DEFAULT now() NOT NULL,
+	updated_at timestamp DEFAULT now() NOT NULL,
+	CONSTRAINT metagraph_balance_change_metagraph_snapshot_fk FOREIGN KEY (metagraph_id,metagraph_hash) REFERENCES metagraph_snapshot(metagraph_id,hash)


This table has no PK

AlexBrandes · 2025-01-16T19:37:54Z

sql/snapshot.sql

+
+-- DROP TABLE metagraph_block;
+
+CREATE TABLE metagraph_block (


This table has no PK.

AlexBrandes · 2025-01-16T19:39:08Z

sql/snapshot.sql

+
+-- DROP TABLE metagraph_reward_transaction;
+
+CREATE TABLE metagraph_reward_transaction (


This table has no PK.

AlexBrandes · 2025-01-16T19:41:03Z

sql/snapshot.sql

+
+-- DROP TABLE metagraph_snapshots_to_global_snapshot;
+
+CREATE TABLE metagraph_snapshots_to_global_snapshot (


I think this works - metagraph_id and metagraph_snapshot_hash have to be NOT NULL and the FK to metagraph_snapshot would need to be removed because these rows would be inserted before the snapshot records exist. This table does feel a little bit messy though if it doesn't have FK constraints.

It might be easier to just to the metagraph_snapshot_count or a JSONB field with an array of metagraph snapshot records on the global snapshot. They would only be used while upload of the metagraph snapshots is pending.

AlexBrandes · 2025-01-16T19:43:08Z

sql/snapshot.sql

+	created_at timestamp DEFAULT now() NOT NULL,
+	updated_at timestamp DEFAULT now() NOT NULL,
+	CONSTRAINT metagraph_snapshots_to_global_snapshot_global_snapshot_fk FOREIGN KEY (global_snapshot_hash) REFERENCES global_snapshot(hash),
+	CONSTRAINT metagraph_snapshots_to_global_snapshot_metagraph_snapshot_fk FOREIGN KEY (metagraph_id,metagraph_snapshot_hash) REFERENCES metagraph_snapshot(metagraph_id,hash)


This table has no PK.

gclaramunt added 3 commits December 13, 2024 20:40

add DDL

c1a9549

add skunk dep

6e09262

add config

4ef8578

gclaramunt marked this pull request as draft December 17, 2024 23:25

marcinwadon reviewed Dec 17, 2024

View reviewed changes

build.sbt Show resolved Hide resolved

AlexBrandes requested changes Dec 17, 2024

View reviewed changes

gclaramunt added 5 commits December 19, 2024 11:14

update ddl

578d2f0

update ddl

ca30d62

add abstract tables

5ce69ba

remove abstract tx

f437941

separate hashing from writing into opensearch

8c2ef60

marcinwadon reviewed Dec 20, 2024

View reviewed changes

gclaramunt added 10 commits January 6, 2025 18:37

first pass at SnapshotDAO

6ed91c6

wip add db layer

567e51c

Merge branch 'develop' into add-postgres-dev

961e442

old version

35176b0

working mvp

2fc0cb5

re-export sql

b9483d8

update db relations

1548327

wip

06238ed

wip

50bf41c

remove opensearch

1db3550

AlexBrandes reviewed Jan 16, 2025

View reviewed changes


		-- DROP TABLE global_snapshot;

		CREATE TABLE global_snapshot (


		-- DROP TABLE dag_transaction;

		CREATE TABLE dag_transaction (


		-- DROP TABLE metagraph_snapshot;

		CREATE TABLE metagraph_snapshot (

	CONSTRAINT dag_balance_change_pk PRIMARY KEY (snapshot_hash, address),
	CONSTRAINT dag_balance_change_pk PRIMARY KEY (address, global_ordinal),

		metagraph_snapshot_hash varchar NULL,
		metagraph_id varchar NULL,


		-- DROP TABLE metagraph_block;

		CREATE TABLE metagraph_block (


		-- DROP TABLE metagraph_reward_transaction;

		CREATE TABLE metagraph_reward_transaction (


		-- DROP TABLE metagraph_snapshots_to_global_snapshot;

		CREATE TABLE metagraph_snapshots_to_global_snapshot (

Add postgres dev #71

Are you sure you want to change the base?

Add postgres dev #71

Conversation

gclaramunt commented Dec 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcinwadon Dec 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexBrandes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gclaramunt commented Dec 18, 2024

AlexBrandes commented Dec 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexBrandes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AlexBrandes left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcinwadon Dec 18, 2024 •

edited

Loading