Redshift configurations
Incremental materialization strategies
In dbt-redshift, the following incremental materialization strategies are supported:
append
(default whenunique_key
is not defined)merge
delete+insert
(default whenunique_key
is defined)microbatch
All of these strategies are inherited from dbt-postgres.
Performance optimizations
Using sortkey and distkey
Tables in Amazon Redshift have two powerful optimizations to improve query performance: distkeys and sortkeys. Supplying these values as model-level configurations apply the corresponding settings in the generated CREATE TABLE
DDL. Note that these settings will have no effect on models set to view
or ephemeral
models.
dist
can have a setting ofall
,even
,auto
, or the name of a key.sort
accepts a list of sort keys, for example:['reporting_day', 'category']
. dbt will build the sort key in the same order the fields are supplied.sort_type
can have a setting ofinterleaved
orcompound
. if no setting is specified, sort_type defaults tocompound
.
When working with sort keys, it's highly recommended you follow Redshift's best practices on sort key effectiveness and cardinality.
Sort and dist keys should be added to the {{ config(...) }}
block in model .sql
files, eg:
-- Example with one sort key
{{ config(materialized='table', sort='reporting_day', dist='unique_id') }}
select ...
-- Example with multiple sort keys
{{ config(materialized='table', sort=['category', 'region', 'reporting_day'], dist='received_at') }}
select ...
-- Example with interleaved sort keys
{{ config(materialized='table',
sort_type='interleaved'
sort=['category', 'region', 'reporting_day'],
dist='unique_id')
}}
select ...
For more information on distkeys and sortkeys, view Amazon's docs:
- AWS Documentation » Amazon Redshift » Database Developer Guide » Designing Tables » Choosing a Data Distribution Style
- AWS Documentation » Amazon Redshift » Database Developer Guide » Designing Tables » Choosing Sort Keys
Late binding views
Redshift supports views unbound from their dependencies, or late binding views. This DDL option "unbinds" a view from the data it selects from. In practice, this means that if upstream views or tables are dropped with a cascade qualifier, the late-binding view does not get dropped as well.
Using late-binding views in a production deployment of dbt can vastly improve the availability of data in the warehouse, especially for models that are materialized as late-binding views and are queried by end-users, since they won’t be dropped when upstream models are updated. Additionally, late binding views can be used with external tables via Redshift Spectrum.
To materialize a dbt model as a late binding view, use the bind: false
configuration option:
{{ config(materialized='view', bind=False) }}
select *
from source.data
To make all views late-binding, configure your dbt_project.yml
file like this:
models:
+bind: false # Materialize all views as late-binding
project_name:
....
Materialized views
The Redshift adapter supports materialized views with the following configuration parameters:
Parameter | Type | Required | Default | Change Monitoring Support |
---|---|---|---|---|
on_configuration_change | <string> | no | apply | n/a |
dist | <string> | no | even | drop/create |
sort | [<string>] | no | none | drop/create |
sort_type | <string> | no | auto if no sort compound if sort | drop/create |
auto_refresh | <boolean> | no | false | alter |
backup | <string> | no | true | n/a |
- Project file
- Property file
- Config block
models:
<resource-path>:
+materialized: materialized_view
+on_configuration_change: apply | continue | fail
+dist: all | auto | even | <field-name>
+sort: <field-name> | [<field-name>]
+sort_type: auto | compound | interleaved
+auto_refresh: true | false
+backup: true | false
version: 2
models:
- name: [<model-name>]
config:
materialized: materialized_view
on_configuration_change: apply | continue | fail
dist: all | auto | even | <field-name>
sort: <field-name> | [<field-name>]
sort_type: auto | compound | interleaved
auto_refresh: true | false
backup: true | false
{{ config(
materialized="materialized_view",
on_configuration_change="apply" | "continue" | "fail",
dist="all" | "auto" | "even" | "<field-name>",
sort=["<field-name>"],
sort_type="auto" | "compound" | "interleaved",
auto_refresh=true | false,
backup=true | false,
) }}
Many of these parameters correspond to their table counterparts and have been linked above. The parameters unique to materialized views are the auto-refresh and backup functionality, which are covered below.
Learn more about these parameters in Redshift's docs.
Auto-refresh
Parameter | Type | Required | Default | Change Monitoring Support |
---|---|---|---|---|
auto_refresh | <boolean> | no | false | alter |
Redshift supports automatic refresh configuration for materialized views.
By default, a materialized view does not automatically refresh.
dbt monitors this parameter for changes and applies them using an ALTER
statement.
Learn more information about the parameters in the Redshift docs.
Backup
Parameter | Type | Required | Default | Change Monitoring Support |
---|---|---|---|---|
backup | <boolean> | no | true | n/a |
Redshift supports backup configuration of clusters at the object level.
This parameter identifies if the materialized view should be backed up as part of the cluster snapshot.
By default, a materialized view will be backed up during a cluster snapshot.
dbt cannot monitor this parameter as it is not queryable within Redshift.
If the value changes, the materialized view will need to go through a --full-refresh
to set it.
Learn more about these parameters in Redshift's docs.
Limitations
As with most data platforms, there are limitations associated with materialized views. Some worth noting include:
- Materialized views cannot reference views, temporary tables, user-defined functions, or late-binding tables.
- Auto-refresh cannot be used if the materialized view references mutable functions, external schemas, or another materialized view.
Find more information about materialized view limitations in Redshift's docs.