fusetools.db_etl_tools.Redshift

class fusetools.db_etl_tools.Redshift[source]

Bases: object

Generic functions for Redshift SQL queries and ETL.

../_images/redshift11.png

Methods

insert_df_rs

Executes an INSERT INTO statement for a given Pandas DataFrame into a Redshift table..

insert_val_rs

Creates SQL to run an INSERT operation of a given Redshift table.

make_df_tbl_rs

Creates SQL to run a CREATE TABLE statement based on a Pandas DataFrame.

make_tbl_complete_rs

Executes a series of SQL statements to CREATE and INSERT into a table from a Pandas DataFrame.

run_query_rs

Executes a SQL statement with a Redshift database connection.

sequential_load_rs

param override

sequential_load_rs_wk

param rptg_dates

upsert_tbl_rs

Creates SQL to run an UPSERT (INSERT new records or UPDATE existing records) operation of a given Redshift table.

classmethod insert_df_rs(cursor, conn, df, tbl_name)[source]

Executes an INSERT INTO statement for a given Pandas DataFrame into a Redshift table..

Parameters
  • cursor – Redshift database cursor object.

  • conn – Redshift database connection object.

  • df – Pandas DataFrame to insert into a Redshift table.

  • tbl_name – Redshift table name.

Returns

Elapsed time to execute query.

classmethod insert_val_rs(col_list, val_list, tbl_name)[source]

Creates SQL to run an INSERT operation of a given Redshift table.

Parameters
  • col_list – List of columns to INSERT or UPDATE.

  • val_list – List of values to INSERT or UPDATE.

  • tbl_name – Name of Postgres table.

Returns

SQL to run an INSERT statement.

classmethod make_df_tbl_rs(tbl_name, df)[source]

Creates SQL to run a CREATE TABLE statement based on a Pandas DataFrame.

Parameters
  • tbl_name – Redshift table name.

  • df – Pandas DataFrame.

Returns

CREATE TABLE SQL statement.

classmethod make_tbl_complete_rs(df, tbl_name, conn, cursor, batch_size=False)[source]

Executes a series of SQL statements to CREATE and INSERT into a table from a Pandas DataFrame.

Parameters
  • df – Pandas DataFrame to create a table from.

  • tbl_name – Name of table to be created.

  • conn – Redshift database connection object.

  • cursor – Redshift database cursor object.

  • batch_size – Records to load per batch.

Returns

Elapsed time to execute query.

classmethod run_query_rs(conn, sql)[source]

Executes a SQL statement with a Redshift database connection.

Parameters
  • conn – Redshift database connection object,

  • sql – SQL Statement to execute.

Returns

Elapsed time to execute query.

classmethod sequential_load_rs(override, tgt_tbl, conn, dt_start, dt_end, saved_day_id_range_placeholder, dt1_interval, dt2_interval, sql_loop_fn, sql_loop_fn_type, filter_day_id_field1=False, sql_loop_fn_dt_placeholder1=False, filter_day_id_field2=False, filter_id_type2=False, sql_loop_fn_dt_placeholder2=False, filter_day_id_field3=False, filter_id_type3=False, sql_loop_fn_dt_placeholder3=False, loop_src1=False, loop_src2=False, loop_src3=False, log_dir=False)[source]
Parameters
  • override

  • tgt_tbl

  • conn

  • dt_start

  • dt_end

  • saved_day_id_range_placeholder

  • dt1_interval

  • dt2_interval

  • sql_loop_fn

  • sql_loop_fn_type

  • filter_day_id_field1

  • sql_loop_fn_dt_placeholder1

  • filter_day_id_field2

  • filter_id_type2

  • sql_loop_fn_dt_placeholder2

  • filter_day_id_field3

  • filter_id_type3

  • sql_loop_fn_dt_placeholder3

  • loop_src1

  • loop_src2

  • loop_src3

  • log_dir

Returns

classmethod sequential_load_rs_wk(rptg_dates, override, tgt_tbl, conn, rptg_wk, rptg_wk_start, rptg_wk_end, sql_loop_fn, filter_dt_field1=False, filter_dt_type1=False, filter_dt_placeholder1=False, filter_dt_field2=False, filter_dt_type2=False, filter_dt_placeholder2=False, filter_dt_field3=False, filter_dt_type3=False, filter_dt_placeholder3=False, log_dir=False)[source]
Parameters
  • rptg_dates

  • override

  • tgt_tbl

  • conn

  • rptg_wk

  • rptg_wk_start

  • rptg_wk_end

  • sql_loop_fn

  • filter_dt_field1

  • filter_dt_type1

  • filter_dt_placeholder1

  • filter_dt_field2

  • filter_dt_type2

  • filter_dt_placeholder2

  • filter_dt_field3

  • filter_dt_type3

  • filter_dt_placeholder3

  • log_dir

Returns

classmethod upsert_tbl_rs(src_tbl, tgt_tbl, src_join_cols, src_insert_cols, src_update_cols=False, update_compare_cols=False)[source]

Creates SQL to run an UPSERT (INSERT new records or UPDATE existing records) operation of a given Redshift table.

Parameters
  • src_tbl – Redshift source table that contains data to be merged from.

  • tgt_tbl – Redshift target table to receive UPSERT operation.

  • src_join_cols – Columns to use to join source and target tables.

  • src_insert_cols – Columns to be inserted from source table.

  • src_update_cols – Columns to be updated from source table.

  • update_compare_cols – Columns to use to compare values across source and target tables.

Returns

A SQL Insert statement and a SQL Update statement.