The Translator
The Translator is the main entry point for all translation tasks. For a simple usage example, see the
Translating IDs in 30 seconds section.
- class rics.translation.Translator(fetcher: Union[Fetcher, TranslationMap, Dict[SourceType, PlaceholderTranslations], Dict[SourceType, Union[PlaceholderTranslations, DataFrame, Dict[str, Sequence[Any]]]]], fmt: Union[str, Format] = '{id}:{name}', mapper: Optional[Mapper] = None, default_fmt: Optional[Union[str, Format]] = None, default_translations: Optional[Union[DefaultTranslations, Dict[str, Union[Dict[str, Any], Dict[str, Dict[str, Any]]]]]] = None)[source]
Translate IDs to human-readable labels.
Untranslatable IDs will be None by default if neither default_fmt nor default_translations is given.
- Parameters
fetcher – A
Fetcheror ready-to-use translations.fmt – String
Formatspecification for translations.mapper – A
Mapperinstance for binding names to sources.default_fmt – Alternative format specification to use instead of fmt for fallback translation of unknown IDs.
default_translations – Shared and/or source-specific default placeholder values for unknown IDs.
See also
Related classes:
rics.translation.offline.Format, the format specification.rics.translation.offline.TranslationMap, application of formats.rics.translation.fetching.Fetcher, fetching of translation data from external sources.
- classmethod from_config(path: Union[str, bytes, PathLike, Dict[str, Any]]) Translator[source]
Create a translator from a YAML file.
- Parameters
path – Path to a YAML file, or a pre-parsed dict.
- Returns
A Translator object.
- Raises
ConfigurationError – If the config is invalid.
- translate(translatable: DefaultTranslatable, names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None, ignore_names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None, inplace: bool = False) Optional[DefaultTranslatable][source]
Translate IDs to human-readable strings.
- Parameters
translatable – A data structure to translate.
names – Explicit names to translate. Will try to derive form translatable if not given. May also be a predicate which indicates (returns True for) derived names to keep.
ignore_names – Names not to translate. Always precedence over names, both explicit and derived. May also be a predicate which indicates (returns True for) names to ignore.
inplace – If True, translation is performed in-place and this function returns None.
- Returns
A copy of translatable with IDs replaced by translations if
inplace=False, otherwise None.- Raises
UntranslatableTypeError – If translatable is not translatable using any standard IOs.
AttributeError – If names are not given and cannot be derived from translatable.
MappingError – If required (explicitly given) names fail to map to a source.
- map_to_sources(translatable: DefaultTranslatable, names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None, ignore_names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None) Optional[DirectionalMapping][source]
Map names to translation sources.
- Parameters
translatable – A data structure to map names for.
names – Explicit names to translate. Will try to derive form translatable if not given. May also be a predicate which indicates (returns True for) derived names to keep.
ignore_names – Names not to translate. Always precedence over names, both explicit and derived. May also be a predicate which indicates (returns True for) names to ignore.
- Returns
A mapping of names to translation sources. Returns None if mapping failed but success was not required.
- Raises
AttributeError – If names are not given and cannot be derived from translatable.
MappingError – If required (explicitly given) names fail to map to a source.
- fetch(translatable: DefaultTranslatable, name_to_source: DirectionalMapping[NameType, SourceType], data_structure_io: Optional[Type[DataStructureIO]] = None) TranslationMap[source]
Fetch translations.
- Parameters
translatable – A data structure to translate.
name_to_source – Mappings of names in translatable to translation sources known the fetcher.
data_structure_io – Static Data Structure IO class used to extract IDs from translatable. None=derive.
- Returns
- Raises
OfflineError – If disconnected from the fetcher, ie not
online.
- property online: bool
Return connectivity status. If False, no new translations may be fetched.
- store(translatable: Optional[DefaultTranslatable] = None, names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None, ignore_names: Optional[Union[NameType, Iterable[NameType], Callable[[NameType], bool]]] = None, delete_fetcher: bool = True) Translator[source]
Retrieve and store translations in a local cache.
- Parameters
translatable – Data from which IDs to fetch will be extracted. None=fetch all IDs.
names – Explicit names to translate. Will try to derive form translatable if not given. May also be a predicate which indicates (returns True for) derived names to keep.
ignore_names – Names not to translate. Always precedence over names, both explicit and derived. May also be a predicate which indicates (returns True for) names to ignore.
delete_fetcher – If True, go offline after retrieving data. The translation will still function, but some methods may raise exceptions and new data cannot be retrieved. Deleting allows the fetcher to close files and connections. If the fetcher has a
close()-method, it will be called before deletion.
- Returns
Self, for chained assignment.
- Raises
ForbiddenOperationError – If the fetcher does not permit the FETCH_ALL operation (only when translatable is None).
MappingError – If a translatable is given, but no names to translate could be extracted.
To actually get the translations, a Fetcher implementation is needed.
Handling unknown IDs
Untranslatable IDs are will be None by default. Both and alternative alternative translation format and default values
may be specified to handle IDs which weren’t returned by the underlying fetcher. Alternative formats work just like
regular formats, but if any placeholders other than id are specified, these must be included in the default
translations. As an example, by copying the default_fmt and default-translations sections from config.yaml,
we see that the output for an unknown title with ID “tt0043440” is translated the way we specified it.
Actor |
Debut Title |
|---|---|
nm0038172:Peter Aryans *1918†2001 |
tt0063897:Floris (original: Floris) *1969†1969 |
nm0040962:Ugo Attanasio *1887†1969 |
tt0043440:Title unknown (original: Original title unknown) *?†? |
Hint
A simple default_fmt such as "{id} not translated" or just "unknown" may be enough, and will only
fail if the fetcher is configured to fail for unknown IDs. Using one of these we could’ve skipped the
default-translations section entirely in the example above.
Fetching: SQL database
Implementation based on SQLAlchemy. Any supported dialect should work out of the box, though drivers for your particular dialect may need to be installed separately.
- class rics.translation.fetching.SqlFetcher(connection_string: str, password: Optional[str] = None, whitelist_tables: Optional[Iterable[str]] = None, blacklist_tables: Optional[Iterable[str]] = None, include_views: bool = True, fetch_in_below: int = 1200, fetch_between_over: int = 10000, fetch_between_max_overfetch_factor: float = 2.5, fetch_all_limit: Optional[int] = 100000, **kwargs: Any)[source]
Fetch data from a SQL source. Requires SQLAlchemy.
- Parameters
connection_string – A SQLAlchemy connection string. Read from environment variable if connection_string starts with ‘@’ followed by the name. Example:
@TRANSLATION_DB_CONNECTION_STRINGreads from the TRANSLATION_DB_CONNECTION_STRING environment variable.password – Password to insert into the connection string. Will be escaped to allow for special characters. If given, the connection string must contain a password key, eg;
dialect://user:{password}@host:port. Can be an environment variable just like connection_string.whitelist_tables – The only tables the fetcher may access.
blacklist_tables – The only tables the fetcher may not access.
include_views – If True, discover views as well.
fetch_in_below – Always use
IN-clause when fetching less than fetch_in_below IDs.fetch_between_over – Always use
BETWEEN-clause when fetching more than fetch_between_over IDs.fetch_between_max_overfetch_factor – If number of IDs to fetch is between fetch_in_below and fetch_between_over, use this factor to choose between
INandBETWEENclause.fetch_all_limit – Maximum size of table to allow a fetch all-operation. None=No limit, 0=never allow.
- Raises
ValueError – If both whitelist_tables and blacklist_tables are given.
- sanitize_id(arg: IdType) IdType[source]
Sanitize an input.
- fetch_placeholders(instr: FetchInstruction) PlaceholderTranslations[source]
Fetch columns from a SQL database.
- property online: bool
Return connectivity status. If False, no new translations may be fetched.
- property allow_fetch_all: bool
Flag indicating whether the
fetch_all()operation is permitted.
- classmethod parse_connection_string(connection_string: str, password: Optional[str]) str[source]
Parse a connection string. Read from environment if connection_string starts with ‘@’.
- make_table_summary(table: Table) TableSummary[source]
Create a table summary.
- get_approximate_table_size(table: Table) int[source]
Return the approximate size of a table.
Called only by the
make_table_summary()method during discovery. The default implementation performs a count on the ID column, which may be expensive.- Parameters
table – A table object.
- Returns
An approximate size for table.
- get_metadata() MetaData[source]
Create a populated metadata object.
Fetching: Local files
Implementation wrapping a pandas Read-function where file names are interpreted as source names. Most readers in pandas.io should work, though additional dependencies may be required for some of them. Many of these functions do not actually require the file to be present on the local file system, allowing translation data to be shared if stored centrally.
- class rics.translation.fetching.PandasFetcher(read_function: ~typing.Union[~typing.Callable[[~typing.Union[str, bytes, ~os.PathLike], ~typing.Any, ~typing.Any], ~pandas.core.frame.DataFrame], str] = <function read_pickle>, read_path_format: ~typing.Optional[~typing.Union[str, ~typing.Callable[[~typing.Union[str, bytes, ~os.PathLike]], str]]] = 'data/{}.pkl', read_function_args: ~typing.Optional[~typing.Iterable[~typing.Any]] = None, read_function_kwargs: ~typing.Optional[~typing.Mapping[str, ~typing.Any]] = None, **kwargs: ~typing.Any)[source]
Fetcher using pandas DataFrames as the data format.
Fetch data from serialized DataFrames. How this is done is determined by the read_function. This is typically a Pandas function such as
pandas.read_csv()orpandas.read_pickle(), but any function that accepts a string source as the first argument and returns a data frame can be used.- Parameters
read_function – A Pandas read-function.
read_path_format – A formatting string or a callable to apply to a source before passing them to read_function. Must contain a source as its only placeholder. Example:
data/{source}.pkl. None=leave as-is.read_function_args – Additional positional arguments for read_function.
read_function_kwargs – Additional keyword arguments for read_function.
See also
The official Pandas IO documentation
- read(source_path: Union[str, bytes, PathLike]) DataFrame[source]
Read a DataFrame from a source path.
- Parameters
source_path – Path to serialized DataFrame.
- Returns
A deserialized DataFrame.
- find_sources() Dict[str, Path][source]
Search for source paths to pass to read_function using read_path_format.
- Returns
A dict {source, path}.
- Raises
IOError – If files cannot be read.
- fetch_placeholders(instr: FetchInstruction) PlaceholderTranslations[source]
Read data from disk.
Fetching: User implementations
The base class may be inherited by users to customize all aspects of the fetching process. You will find the API reference for this class below.
- class rics.translation.fetching.Fetcher(allow_fetch_all: bool = True, placeholder_overrides: Optional[Union[PlaceholderOverrides, Dict[str, Union[Dict[str, str], Dict[str, Dict[str, str]]]]]] = None)[source]
Base class for fetching translations from an external source.
Users who wish to define their own fetching logic should inherit this class, but there are implementations for common uses cases. See
PandasFetcherfor a versatile base fetcher orSqlFetcherfor a more specialized solution.- Parameters
allow_fetch_all – If False, an error will be raised when
fetch_all()is called.placeholder_overrides – Placeholder name overrides. Used to adapt placeholder names in sources to wanted names.
See also
Related classes:
rics.translation.offline.Format, the format specification.rics.translation.offline.TranslationMap, application of formats.rics.translation.Translator, the main user interface for translation.
- property online: bool
Return connectivity status. If False, no new translations may be fetched.
- assert_online() None[source]
Raise an error if offline.
- Raises
OfflineError – If not online.
- abstract property sources: List[SourceType]
Source names known to the fetcher, such as
citiesorlanguages.
- property placeholder_overrides: Optional[PlaceholderOverrides]
Return the override.
- property allow_fetch_all: bool
Flag indicating whether the
fetch_all()operation is permitted.
- fetch(ids_to_fetch: Iterable[IdsToFetch], placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]
Fetch translations.
- Parameters
ids_to_fetch – Tuples (source, ids) to fetch. If
ids=None, retrieve data for as many IDs as possible.placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
- Returns
A mapping
{source: PlaceholderTranslations}for translation.- Raises
UnknownPlaceholderError – For placeholder(s) that are unknown to the fetcher.
UnknownSourceError – For sources(s) that are unknown to the fetcher.
ForbiddenOperationError – If trying to fetch all IDs when not possible or permitted.
ImplementationError – For errors made by the inheriting implementation.
Notes
Placeholders are usually columns in relational database applications. These are the components which are combined to create ID translations. See
rics.translation.offline.Formatdocumentation for details.
- fetch_all(placeholders: Iterable[str] = (), required: Iterable[str] = ()) Dict[SourceType, PlaceholderTranslations][source]
Fetch as much data as possible.
- Parameters
placeholders – All desired placeholders in preferred order.
required – Placeholders that must be included in the response.
- Returns
A mapping
{source: PlaceholderTranslations}for translation.- Raises
ForbiddenOperationError – If fetching all IDs is not possible or permitted.
UnknownPlaceholderError – For placeholder(s) that are unknown to the fetcher.
ImplementationError – For errors made by the inheriting implementation.
- abstract fetch_placeholders(instruction: FetchInstruction) PlaceholderTranslations[source]
Fetch translations.
- Parameters
instruction – A single instruction for IDs to fetch. If IDs is None, the fetcher should retrieve data for as many IDs as possible.
- Returns
Placeholder translation elements.
- Raises
UnknownPlaceholderError – If the placeholder is unknown to the fetcher.
- classmethod make_and_verify(instr: FetchInstruction, known_placeholders: Collection[str], records: Sequence[Sequence[Any]]) PlaceholderTranslations[source]
Make a
PlaceholderTranslationsinstance from records.Convenience method meant for use by implementations.
- Parameters
instr – A fetch instruction.
known_placeholders – Known placeholders for the instr.source.
records – Records produced from the instruction.
- Returns
Placeholder translation elements.
- Raises
UnknownPlaceholderError – If required placeholders are missing.
ImplementationError – If the underlying fetcher does not return enough IDs.
- classmethod verify_placeholders(instr: FetchInstruction, known_placeholders: Collection[str]) None[source]
Verify required placeholders for a source.
Convenience method meant for use by implementations.
- Parameters
instr – A fetch instruction.
known_placeholders – Known placeholders for the instr.source.
- Raises
UnknownPlaceholderError – If required placeholders are missing.
- classmethod select_placeholders(instr: FetchInstruction, known_placeholders: Collection[str]) List[str][source]
Select as many known, requested placeholders as possible.
- Parameters
instr – A fetch instruction.
known_placeholders – Known placeholders for the instr.source.
- Returns
Known placeholders in the desired order.
- Raises
UnknownPlaceholderError – If required placeholders are missing.
Offline translation
If you do not want to keep the fetcher connected to a database or the file system, you can use the translator
store()-method to fetch as much data as possible after which the fetcher will be
disconnected and discarded. Alternatively, you may supply a TranslationMap as the
fetcher instance when initializing the translator. May cause high memory consumption.