DVD Rental Database#

This example translates a query from the DVD Rental Sample Database. It covers most of the more advanced features that have been implemented.

Start the database#

Using Docker, start the database by running:

docker run -p 5002:5432 --rm rsundqvist/sakila-preload:postgres

from the terminal. To describe the database, run

psql -h localhost -p 5001 -U postgres -d sakila -c "\d+"
# password: Sofia123!

from a separate terminal window. Leave out the last part (-c "\d+") to query the database manually. For details about this image, see rsundqvist/sakila-preload on Docker Hub.

Query to translate#

We will use a small query to describe rental transactions in the database:

Query returning DVD rental transactions.#
 1SELECT customer_id,
 2       inventory.film_id,
 3       film_category.category_id,
 4       staff_id,
 5       rental_date,
 6       return_date
 7FROM rental
 8         LEFT JOIN inventory ON rental.inventory_id = inventory.inventory_id
 9         LEFT JOIN film_category on film_category.film_id = inventory.film_id
10ORDER BY rental.rental_id DESC;

The query above shows who rented what and when, what store they rented from and from whom.

Randomly sampled rows from the query. The first column is the record index in the query.#

customer_id

film_id

category_id

staff_id

rental_date

return_date

708

313

797

12

1

2005-08-22 20:55:56

2005-08-27 18:52:56

1486

107

767

12

1

2005-08-21 16:22:59

2005-08-22 21:15:59

1833

54

7

5

2

2005-08-21 04:34:11

2005-08-27 10:30:11

4649

565

454

12

1

2005-08-02 18:56:28

2005-08-11 23:19:28

5023

419

487

4

2

2005-08-02 05:39:12

2005-08-08 00:09:12

7359

381

489

2

1

2005-07-29 16:31:32

2005-07-30 12:39:32

10609

26

773

15

2

2005-07-09 21:32:29

2005-07-15 00:27:29

12460

493

134

2

1

2005-07-06 04:24:42

2005-07-09 02:37:42

13654

261

479

8

1

2005-06-18 15:34:18

2005-06-19 16:22:18

13703

20

994

13

2

2005-06-18 11:46:26

2005-06-22 11:37:26

Configuration files#

The database has a few quirks, which are managed by configuration. See the Translator Configuration Files page to learn more about config files.

Translation configuration, mapping, and definition of the categories.#
 1################################################################################
 2# For help, see https://rics.readthedocs.io                                    #
 3################################################################################
 4[translator]
 5fmt = "{id}:{name}[ {last_name}]"
 6
 7# ------------------------------------------------------------------------------
 8# Name-to-source mapping configuration. Binds names to source, eg 'cute_animals'
 9# -> 'my_database.animals'. Overrides take precedence over scoring logic.
10[translator.mapping]
11unmapped_values_action = "raise"
12score_function = "modified_hamming"
13
14[[translator.mapping.score_function_heuristics]]
15function = "like_database_table"
16
17[[translator.mapping.filter_functions]]
18function = "require_regex_match"
19regex = ".*_id$"
20where = "name"
21
22################################################################################
23# Fetching configuration.
24################################################################################
25[fetching.MemoryFetcher.data.category]
26id = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
27name = [
28    "Action", "Animation",
29    "Children", "Classics", "Comedy",
30    "Documentary", "Drama",
31    "Family", "Foreign",
32    "Games", "Horror", "Music", "New",
33    "Sci-Fi", "Sports", "Travel"
34]
Configuration for fetching SQL data.#
 1[fetching.SqlFetcher]
 2connection_string = "@DVDRENTAL_CONNECTION_STRING"
 3password = "@DVDRENTAL_PASSWORD"
 4blacklist_tables = ["category"]  # Let's pretend this table is off-limits
 5
 6[fetching.SqlFetcher.engine_kwargs]
 7# fast_executemany = true  # Not supported by the driver
 8
 9# ------------------------------------------------------------------------------
10# Placeholder mapping configuration. Binds actual names in sources (such as
11# column names in an SQL table) to wanted names, eg id = 'animal_id'.
12[fetching.mapping]
13unmapped_values_action = "ignore"
14
15[[fetching.mapping.filter_functions]]
16function = "banned_substring"
17substrings = ["p2007"]
18where = "context"
19
20[[fetching.mapping.score_function_heuristics]]
21function = "value_fstring_alias"
22fstring = "first_{value}"
23[[fetching.mapping.score_function_heuristics]]
24function = "value_fstring_alias"
25fstring = "{context}_{value}"
26
27[fetching.mapping.overrides.address]
28code = "postal_code"
29[fetching.mapping.overrides.film]
30name = "title"

Translating#

Translating now becomes a simple matter. The following snippet is a test case which translates all of the ~16000 rows returned by the query, verifying a random sample of 5 rows.

 1from os import getenv
 2from pathlib import Path
 3from sys import platform
 4
 5import pandas as pd
 6import pytest
 7import sqlalchemy
 8
 9from .conftest import QUERY, check_status, get_connection_string, get_translator
10
11DIALECTS = [
12    "mysql",
13    "postgresql",
14    "mssql",  # Quite slow, mostly since the (pyre-python) driver used doesn't support fast_executemany
15]
16
17
18@pytest.mark.filterwarnings("ignore:Did not recognize type:sqlalchemy.exc.SAWarning")
19@pytest.mark.parametrize("dialect", DIALECTS)
20@pytest.mark.skipif(getenv("CI") == "true" and platform != "linux", reason="No Docker for Mac and Windows in CI/CD.")
21def test_dvd_rental(dialect):
22    check_status(dialect)
23    engine = sqlalchemy.create_engine(get_connection_string(dialect))
24    translator = get_translator(dialect)
25    expected = pd.read_csv(
26        Path(__file__).with_name("translated.csv"), index_col=0, parse_dates=["rental_date", "return_date"]
27    )
28    df: pd.DataFrame = pd.read_sql(QUERY, engine).loc[expected.index]
29    actual = translator.translate(df)
30
31    assert actual is not None
32    pd.testing.assert_frame_equal(actual, expected)

Result#

Date columns are not translated, nor is the first (row number/index) column.

Translated data.#

customer_id

film_id

category_id

staff_id

rental_date

return_date

708

313:DONALD MAHON

797:SILENCE KANE

12:Music

1:Mike Hillyer

2005-08-22 20:55:56

2005-08-27 18:52:56

1486

107:FLORENCE WOODS

767:SCALAWAG DUCK

12:Music

1:Mike Hillyer

2005-08-21 16:22:59

2005-08-22 21:15:59

1833

54:TERESA ROGERS

7:AIRPLANE SIERRA

5:Comedy

2:Jon Stephens

2005-08-21 04:34:11

2005-08-27 10:30:11

4649

565:JAIME NETTLES

454:IMPACT ALADDIN

12:Music

1:Mike Hillyer

2005-08-02 18:56:28

2005-08-11 23:19:28

5023

419:CHAD CARBONE

487:JINGLE SAGEBRUSH

4:Classics

2:Jon Stephens

2005-08-02 05:39:12

2005-08-08 00:09:12

7359

381:BOBBY BOUDREAU

489:JUGGLER HARDLY

2:Animation

1:Mike Hillyer

2005-07-29 16:31:32

2005-07-30 12:39:32

10609

26:JESSICA HALL

773:SEABISCUIT PUNK

15:Sports

2:Jon Stephens

2005-07-09 21:32:29

2005-07-15 00:27:29

12460

493:BRENT HARKINS

134:CHAMPION FLATLINERS

2:Animation

1:Mike Hillyer

2005-07-06 04:24:42

2005-07-09 02:37:42

13654

261:DEANNA BYRD

479:JEDI BENEATH

8:Family

1:Mike Hillyer

2005-06-18 15:34:18

2005-06-19 16:22:18

13703

20:SHARON ROBINSON

994:WYOMING STORM

13:New

2:Jon Stephens

2005-06-18 11:46:26

2005-06-22 11:37:26