This example translates a query from the DVD Rental Sample Database. It covers most of the more advanced features that have been implemented.
Using Docker, start the database by running:
docker run -p 5002:5432 --rm rsundqvist/sakila-preload:postgres
from the terminal. To describe the database, run
psql -h localhost -p 5001 -U postgres -d sakila -c "\d+"
# password: Sofia123!
from a separate terminal window. Leave out the last part (-c "\d+") to query the database manually. For details
about this image, see rsundqvist/sakila-preload on Docker Hub.
We will use a small query to describe rental transactions in the database:
1SELECT customer_id,
2 inventory.film_id,
3 film_category.category_id,
4 staff_id,
5 rental_date,
6 return_date
7FROM rental
8 LEFT JOIN inventory ON rental.inventory_id = inventory.inventory_id
9 LEFT JOIN film_category on film_category.film_id = inventory.film_id
10ORDER BY rental.rental_id DESC;
The query above shows who rented what and when, what store they rented from and from whom.
customer_id |
film_id |
category_id |
staff_id |
rental_date |
return_date |
|
|---|---|---|---|---|---|---|
708 |
313 |
797 |
12 |
1 |
2005-08-22 20:55:56 |
2005-08-27 18:52:56 |
1486 |
107 |
767 |
12 |
1 |
2005-08-21 16:22:59 |
2005-08-22 21:15:59 |
1833 |
54 |
7 |
5 |
2 |
2005-08-21 04:34:11 |
2005-08-27 10:30:11 |
4649 |
565 |
454 |
12 |
1 |
2005-08-02 18:56:28 |
2005-08-11 23:19:28 |
5023 |
419 |
487 |
4 |
2 |
2005-08-02 05:39:12 |
2005-08-08 00:09:12 |
7359 |
381 |
489 |
2 |
1 |
2005-07-29 16:31:32 |
2005-07-30 12:39:32 |
10609 |
26 |
773 |
15 |
2 |
2005-07-09 21:32:29 |
2005-07-15 00:27:29 |
12460 |
493 |
134 |
2 |
1 |
2005-07-06 04:24:42 |
2005-07-09 02:37:42 |
13654 |
261 |
479 |
8 |
1 |
2005-06-18 15:34:18 |
2005-06-19 16:22:18 |
13703 |
20 |
994 |
13 |
2 |
2005-06-18 11:46:26 |
2005-06-22 11:37:26 |
The database has a few quirks, which are managed by configuration. See the Translator Configuration Files page to learn more about config files.
1################################################################################
2# For help, see https://rics.readthedocs.io #
3################################################################################
4[translator]
5fmt = "{id}:{name}[ {last_name}]"
6
7# ------------------------------------------------------------------------------
8# Name-to-source mapping configuration. Binds names to source, eg 'cute_animals'
9# -> 'my_database.animals'. Overrides take precedence over scoring logic.
10[translator.mapping]
11unmapped_values_action = "raise"
12score_function = "modified_hamming"
13
14[[translator.mapping.score_function_heuristics]]
15function = "like_database_table"
16
17[[translator.mapping.filter_functions]]
18function = "require_regex_match"
19regex = ".*_id$"
20where = "name"
21
22################################################################################
23# Fetching configuration.
24################################################################################
25[fetching.MemoryFetcher.data.category]
26id = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
27name = [
28 "Action", "Animation",
29 "Children", "Classics", "Comedy",
30 "Documentary", "Drama",
31 "Family", "Foreign",
32 "Games", "Horror", "Music", "New",
33 "Sci-Fi", "Sports", "Travel"
34]
1[fetching.SqlFetcher]
2connection_string = "@DVDRENTAL_CONNECTION_STRING"
3password = "@DVDRENTAL_PASSWORD"
4blacklist_tables = ["category"] # Let's pretend this table is off-limits
5
6[fetching.SqlFetcher.engine_kwargs]
7# fast_executemany = true # Not supported by the driver
8
9# ------------------------------------------------------------------------------
10# Placeholder mapping configuration. Binds actual names in sources (such as
11# column names in an SQL table) to wanted names, eg id = 'animal_id'.
12[fetching.mapping]
13unmapped_values_action = "ignore"
14
15[[fetching.mapping.filter_functions]]
16function = "banned_substring"
17substrings = ["p2007"]
18where = "context"
19
20[[fetching.mapping.score_function_heuristics]]
21function = "value_fstring_alias"
22fstring = "first_{value}"
23[[fetching.mapping.score_function_heuristics]]
24function = "value_fstring_alias"
25fstring = "{context}_{value}"
26
27[fetching.mapping.overrides.address]
28code = "postal_code"
29[fetching.mapping.overrides.film]
30name = "title"
Translating now becomes a simple matter. The following snippet is a test case which translates all of the ~16000 rows returned by the query, verifying a random sample of 5 rows.
1from os import getenv
2from pathlib import Path
3from sys import platform
4
5import pandas as pd
6import pytest
7import sqlalchemy
8
9from .conftest import QUERY, check_status, get_connection_string, get_translator
10
11DIALECTS = [
12 "mysql",
13 "postgresql",
14 "mssql", # Quite slow, mostly since the (pyre-python) driver used doesn't support fast_executemany
15]
16
17
18@pytest.mark.filterwarnings("ignore:Did not recognize type:sqlalchemy.exc.SAWarning")
19@pytest.mark.parametrize("dialect", DIALECTS)
20@pytest.mark.skipif(getenv("CI") == "true" and platform != "linux", reason="No Docker for Mac and Windows in CI/CD.")
21def test_dvd_rental(dialect):
22 check_status(dialect)
23 engine = sqlalchemy.create_engine(get_connection_string(dialect))
24 translator = get_translator(dialect)
25 expected = pd.read_csv(
26 Path(__file__).with_name("translated.csv"), index_col=0, parse_dates=["rental_date", "return_date"]
27 )
28 df: pd.DataFrame = pd.read_sql(QUERY, engine).loc[expected.index]
29 actual = translator.translate(df)
30
31 assert actual is not None
32 pd.testing.assert_frame_equal(actual, expected)
Date columns are not translated, nor is the first (row number/index) column.
customer_id |
film_id |
category_id |
staff_id |
rental_date |
return_date |
|
|---|---|---|---|---|---|---|
708 |
313:DONALD MAHON |
797:SILENCE KANE |
12:Music |
1:Mike Hillyer |
2005-08-22 20:55:56 |
2005-08-27 18:52:56 |
1486 |
107:FLORENCE WOODS |
767:SCALAWAG DUCK |
12:Music |
1:Mike Hillyer |
2005-08-21 16:22:59 |
2005-08-22 21:15:59 |
1833 |
54:TERESA ROGERS |
7:AIRPLANE SIERRA |
5:Comedy |
2:Jon Stephens |
2005-08-21 04:34:11 |
2005-08-27 10:30:11 |
4649 |
565:JAIME NETTLES |
454:IMPACT ALADDIN |
12:Music |
1:Mike Hillyer |
2005-08-02 18:56:28 |
2005-08-11 23:19:28 |
5023 |
419:CHAD CARBONE |
487:JINGLE SAGEBRUSH |
4:Classics |
2:Jon Stephens |
2005-08-02 05:39:12 |
2005-08-08 00:09:12 |
7359 |
381:BOBBY BOUDREAU |
489:JUGGLER HARDLY |
2:Animation |
1:Mike Hillyer |
2005-07-29 16:31:32 |
2005-07-30 12:39:32 |
10609 |
26:JESSICA HALL |
773:SEABISCUIT PUNK |
15:Sports |
2:Jon Stephens |
2005-07-09 21:32:29 |
2005-07-15 00:27:29 |
12460 |
493:BRENT HARKINS |
134:CHAMPION FLATLINERS |
2:Animation |
1:Mike Hillyer |
2005-07-06 04:24:42 |
2005-07-09 02:37:42 |
13654 |
261:DEANNA BYRD |
479:JEDI BENEATH |
8:Family |
1:Mike Hillyer |
2005-06-18 15:34:18 |
2005-06-19 16:22:18 |
13703 |
20:SHARON ROBINSON |
994:WYOMING STORM |
13:New |
2:Jon Stephens |
2005-06-18 11:46:26 |
2005-06-22 11:37:26 |