Skip to content

Commit

Permalink
docs: Flesh out BigQuery Data Transfer asset guide (#1737)
Browse files Browse the repository at this point in the history
* Also fix some broken links
  • Loading branch information
ryscheng authored Jul 1, 2024
1 parent dd51851 commit 3ad9ecd
Show file tree
Hide file tree
Showing 2 changed files with 66 additions and 12 deletions.
68 changes: 61 additions & 7 deletions apps/docs/docs/contribute/connect-data/bigquery/replication.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: 🏗️ Using a BigQuery Data Transfer Service
title: Using BigQuery Data Transfer Service
sidebar_position: 2
---

Expand All @@ -15,12 +15,66 @@ If you already maintain a public dataset in
the US multi-region, you should simply make a dbt source
as shown in [this guide](./index.md).

## OSO Dataset Replication
## Define the Dagster asset

:::warning
Coming soon... This section is a work in progress.
To track progress, see this
[GitHub issue](https://github.com/opensource-observer/oso/issues/1311).
:::
Create a new asset file in
`warehouse/oso_dagster/assets/`.
This file should invoke the BigQuery Data Transfer asset factory.
For example, you can see this in action for
[Lens data](https://github.com/opensource-observer/oso/blob/main/warehouse/oso_dagster/assets/lens.py).
We make a copy of this data because the source dataset is not
in the US multi-region, which is required by our dbt pipeline.

```python
# warehouse/oso_dagster/assets/lens.py
from ..factories import (
create_bq_dts_asset,
BigQuerySourceConfig,
BqDtsAssetConfig,
SourceMode,
TimeInterval,
)

lens_data = create_bq_dts_asset(
BqDtsAssetConfig(
name="lens",
destination_project_id="opensource-observer",
destination_dataset_name="lens_v2_polygon",
source_config=BigQuerySourceConfig(
source_project_id="lens-public-data",
source_dataset_name="v2_polygon",
service_account=None
),
copy_interval=TimeInterval.Weekly,
copy_mode=SourceMode.Overwrite,
),
)
```

For the latest documentation on configuration parameters,
check out the comments in the
[BigQuery Data Transfer factory](https://github.com/opensource-observer/oso/blob/main/warehouse/oso_dagster/factories/bq_dts.py).

In order for our Dagster deployment to recognize this asset,
you need to import it in
`warehouse/oso_dagster/assets/__init__.py`.

```python
...
from .lens import *
...
```

For more details on defining Dagster assets,
see the [Dagster tutorial](https://docs.dagster.io/tutorial).

### BigQuery Data Transfer examples in OSO

In the
[OSO monorepo](https://github.com/opensource-observer/oso),
you will find a few examples of using the BigQuery Data Transfer asset factory:

- [Farcaster data](https://github.com/opensource-observer/oso/blob/main/warehouse/oso_dagster/assets/farcaster.py)
- [Lens data](https://github.com/opensource-observer/oso/blob/main/warehouse/oso_dagster/assets/lens.py)

<NextSteps components={props.components}/>
10 changes: 5 additions & 5 deletions apps/docs/docs/contribute/connect-data/gcs.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ For example, you can see this in action for
from ..factories import (
interval_gcs_import_asset,
SourceMode,
Interval,
TimeInterval,
IntervalGCSAsset,
)

Expand All @@ -57,7 +57,7 @@ gitcoin_passport_scores = interval_gcs_import_asset(
destination_table="passport_scores",
raw_dataset_name="oso_raw_sources",
clean_dataset_name="gitcoin",
interval=Interval.Daily,
interval=TimeInterval.Daily,
mode=SourceMode.Overwrite,
retention_days=10,
format="PARQUET",
Expand Down Expand Up @@ -88,8 +88,8 @@ In the
[OSO monorepo](https://github.com/opensource-observer/oso),
you will find a few examples of using the GCS asset factory:

- [Superchain data](https://github.com/opensource-observer/oso/blob/main/warehouse/oso_dagster/assets.py)
- [Gitcoin Passport scores](https://github.com/opensource-observer/oso/blob/main/warehouse/oso_dagster/assets.py)
- [OpenRank reputations on Farcaster](https://github.com/opensource-observer/oso/blob/main/warehouse/oso_dagster/assets.py)
- [Superchain data](https://github.com/opensource-observer/oso/blob/main/warehouse/oso_dagster/assets/__init__.py)
- [Gitcoin Passport scores](https://github.com/opensource-observer/oso/blob/main/warehouse/oso_dagster/assets/gitcoin.py)
- [OpenRank reputations on Farcaster](https://github.com/opensource-observer/oso/blob/main/warehouse/oso_dagster/assets/karma3.py)

<NextSteps components={props.components}/>

0 comments on commit 3ad9ecd

Please sign in to comment.