DVD Rental Database#
This example translates a query from the DVD Rental Sample Database. It covers most of the more advanced features that have been implemented.
Start the database#
Using Docker, start the database by running:
docker run -p 5001:5432 moertel/postgresql-sample-dvdrental@sha256:e35f8dc4011d053777631208c85e3976a422b65e12383579d8a856a7849082c5
from the terminal. To describe the database, run
psql -h localhost -p 5001 -U postgres -d dvdrental -c "\d+"
from a separate terminal window. Leave out the last part (-c "\d+") to query the database manually.
Query to translate#
We will use a small query to describe rental transactions in the database:
1SELECT customer_id,
2 inventory.film_id,
3 film_category.category_id,
4 staff_id,
5 store_id,
6 rental_date,
7 return_date
8FROM rental
9LEFT JOIN inventory ON rental.inventory_id = inventory.inventory_id
10LEFT JOIN film_category on film_category.film_id = inventory.film_id;
The query above shows who rented what and when, what store they rented from and from whom.
customer_id |
film_id |
staff_id |
store_id |
rental_date |
return_date |
|
|---|---|---|---|---|---|---|
9564 |
385 |
129 |
1 |
2 |
2005-07-31 02:40:37 |
2005-08-01 01:59:37 |
14933 |
328 |
158 |
1 |
1 |
2005-08-22 05:53:52 |
2005-08-27 06:20:52 |
544 |
506 |
448 |
2 |
1 |
2005-05-28 07:24:28 |
2005-06-06 01:31:28 |
1869 |
206 |
786 |
2 |
1 |
2005-06-17 02:27:03 |
2005-06-20 22:41:03 |
928 |
421 |
743 |
1 |
2 |
2005-05-30 12:53:01 |
2005-05-31 14:28:01 |
Configuration files#
The database has a few quirks, which are managed by configuration. See the Translator Configuration Files page to learn more about config files.
1################################################################################
2# For help, see https://rics.readthedocs.io #
3################################################################################
4[translator]
5fmt = "{id}:{name}[ {last_name}]"
6
7# ------------------------------------------------------------------------------
8# Name-to-source mapping configuration. Binds names to source, eg 'cute_animals'
9# -> 'my_database.animals'. Overrides take precedence over scoring logic.
10[translator.mapping]
11unmapped_values_action = "raise"
12score_function = "modified_hamming"
13
14[[translator.mapping.score_function_heuristics]]
15function = "like_database_table"
16
17[[translator.mapping.filter_functions]]
18function = "require_regex_match"
19regex = ".*_id$"
20where = "name"
21
22################################################################################
23# Fetching configuration.
24################################################################################
25[fetching.MemoryFetcher.data.category]
26id = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
27name = [
28 "Action", "Animation",
29 "Children", "Classics", "Comedy",
30 "Documentary", "Drama",
31 "Family", "Foreign",
32 "Games", "Horror", "Music", "New",
33 "Sci-Fi", "Sports", "Travel"
34]
1[fetching.SqlFetcher]
2connection_string = "postgresql+pg8000://postgres@localhost:5001/dvdrental"
3blacklist_tables = [ "category" ] # Let's pretend this table is off-limits
4
5# ------------------------------------------------------------------------------
6# Placeholder mapping configuration. Binds actual names in sources (such as
7# column names in an SQL table) to wanted names, eg id = 'animal_id'.
8[fetching.mapping]
9unmapped_values_action = "ignore"
10
11[[fetching.mapping.filter_functions]]
12function = "banned_substring"
13substrings = ["p2007"]
14where = "context"
15
16[[fetching.mapping.score_function_heuristics]]
17function = "value_fstring_alias"
18fstring = "first_{value}"
19[[fetching.mapping.score_function_heuristics]]
20function = "value_fstring_alias"
21fstring = "{context}_{value}"
22
23[fetching.mapping.overrides.addess]
24code = "postal_code"
25[fetching.mapping.overrides.film]
26name = "title"
Translating#
Translating now becomes a simple matter. The following snippet is a test case which translates all of the ~16000 rows returned by the query, verifying a random sample of 5 rows.
1import pandas as pd
2import pytest
3
4from rics.translation import Translator
5
6from .conftest import DATE_COLUMNS, DVD_RENTAL_SKIP_REASON, wait_for_dvdrental
7
8
9@pytest.fixture(scope="module")
10def translator():
11 extra_fetchers = ["tests/translation/dvdrental/sql-fetcher.toml"]
12 config = "tests/translation/dvdrental/translation.toml"
13 return Translator.from_config(config, extra_fetchers)
14
15
16@pytest.mark.skipif(not wait_for_dvdrental(), reason=DVD_RENTAL_SKIP_REASON)
17def test_dvd_rental(translator):
18 expected = pd.read_csv("tests/translation/dvdrental/translated.csv", index_col=0, parse_dates=DATE_COLUMNS)
19
20 with open("tests/translation/dvdrental/query.sql") as f:
21 query = f.read()
22
23 engine = translator._fetcher.fetchers[-1]._engine # The SQL fetcher was given last
24 df: pd.DataFrame = pd.read_sql(query, con=engine)
25 translator.translate(df, inplace=True)
26
27 assert (df.select_dtypes("datetime").columns == DATE_COLUMNS).all()
28 actual = df.sample(len(expected), random_state=0)
29 assert expected.equals(actual), actual.to_string()
The translated rows are:
customer_id |
film_id |
category_id |
staff_id |
store_id |
rental_date |
return_date |
|
|---|---|---|---|---|---|---|---|
12503 |
29:ANGELA HERNANDEZ |
235:DIVIDE MONSTER |
14:Sci-Fi |
1:Mike Hillyer |
2:MySQL |
2005-08-18 13:20:13 |
2005-08-22 12:55:13 |
13357 |
565:JAIME NETTLES |
191:CROOKED FROGMEN |
3:Children |
2:Jon Stephens |
1:MySakila |
2005-08-19 21:07:54 |
2005-08-20 17:29:54 |
1638 |
259:LENA JENSEN |
643:ORIENT CLOSER |
7:Drama |
2:Jon Stephens |
2:MySQL |
2005-06-16 08:46:26 |
2005-06-24 14:39:26 |
7129 |
574:JULIAN VEST |
235:DIVIDE MONSTER |
14:Sci-Fi |
1:Mike Hillyer |
2:MySQL |
2005-07-27 06:33:06 |
2005-07-28 06:15:06 |
5931 |
258:MYRTLE FLEMING |
739:ROCKY WAR |
7:Drama |
2:Jon Stephens |
2:MySQL |
2005-07-10 22:11:04 |
2005-07-19 18:12:04 |