Using importlib_resources#

importlib_resources is a library that leverages Python’s import system to provide access to resources within packages and alongside modules. Given that this library is built on top of the import system, it is highly efficient and easy to use. This library’s philosophy is that, if one can import a module, one can access resources associated with that module. Resources can be opened or read, in either binary or text mode.

What exactly do we mean by “a resource”? It’s easiest to think about the metaphor of files and directories on the file system, though it’s important to keep in mind that this is just a metaphor. Resources and packages do not have to exist as physical files and directories on the file system.

If you have a file system layout such as:

data/
    __init__.py
    one/
        __init__.py
        resource1.txt
        module1.py
        resources1/
            resource1.1.txt
    two/
        __init__.py
        resource2.txt
standalone.py
resource3.txt

then the directories are data, data/one, and data/two. Each of these are also Python packages by virtue of the fact that they all contain __init__.py files. That means that in Python, all of these import statements work:

import data
import data.one
from data import two

Each import statement gives you a Python module corresponding to the __init__.py file in each of the respective directories. These modules are packages since packages are just special module instances that have an additional attribute, namely a __path__ [1].

In this analogy then, resources are just files or directories contained in a package directory, so data/one/resource1.txt and data/two/resource2.txt are both resources, as are the __init__.py files in all the directories.

Resources in packages are always accessed relative to the package that they live in. resource1.txt and resources1/resource1.1.txt are resources within the data.one package, and two/resource2.txt is a resource within the data package.

Resources may also be referenced relative to another anchor, a module in a package (data.one.module1) or a standalone module (standalone). In this case, resources are loaded from the same loader that loaded that module.

Example#

Let’s say you are writing an email parsing library and in your test suite you have a sample email message in a file called message.eml. You would like to access the contents of this file for your tests, so you put this in your project under the email/tests/data/message.eml path. Let’s say your unit tests live in email/tests/test_email.py.

Your test could read the data file by doing something like:

data_dir = os.path.join(os.path.dirname(__file__), 'tests', 'data')
data_path = os.path.join(data_dir, 'message.eml')
with open(data_path, encoding='utf-8') as fp:
    eml = fp.read()

But there’s a problem with this! The use of __file__ doesn’t work if your package lives inside a zip file, since in that case this code does not live on the file system.

You could use the pkg_resources API like so:

# In Python 3, resource_string() actually returns bytes!
from pkg_resources import resource_string as resource_bytes
eml = resource_bytes('email.tests.data', 'message.eml').decode('utf-8')

This requires you to make Python packages of both email/tests and email/tests/data, by placing an empty __init__.py files in each of those directories.

The problem with the pkg_resources approach is that, depending on the packages in your environment, pkg_resources can be expensive just to import. This behavior can have a serious negative impact on things like command line startup time for Python implement commands.

importlib_resources solves this performance challenge by being built entirely on the back of the stdlib importlib. By taking advantage of all the efficiencies in Python’s import system, and the fact that it’s built into Python, using importlib_resources can be much more performant. The equivalent code using importlib_resources would look like:

from importlib_resources import files
# Reads contents with UTF-8 encoding and returns str.
eml = files('email.tests.data').joinpath('message.eml').read_text()

Anchors#

The importlib_resources files API takes an anchor as its first parameter, which can either be a package name (as a str) or an actual module object. If a string is passed in, it must name an importable Python module, which is imported prior to loading any resources. Thus the above example could also be written as:

import email.tests.data
eml = files(email.tests.data).joinpath('message.eml').read_text()

Namespace Packages#

importlib_resources supports namespace packages as anchors just like any other package. Similar to modules in a namespace package, resources in a namespace package are not allowed to collide by name. For example, if two packages both expose nspkg/data/foo.txt, those resources are unsupported by this library. The package will also likely experience problems due to the collision with installers.

It’s perfectly valid, however, for two packages to present different resources in the same namespace package, regular package, or subdirectory. For example, one package could expose nspkg/data/foo.txt and another expose nspkg/data/bar.txt and those two packages could be installed into separate paths, and the resources should be queryable:

data = importlib_resources.files('nspkg').joinpath('data')
data.joinpath('foo.txt').read_text()
data.joinpath('bar.txt').read_text()

File system or zip file#

A consumer need not worry whether any given package is on the file system or in a zip file, as the importlib_resources APIs abstracts those details. Sometimes though, the user needs a path to an actual file on the file system. For example, some SSL APIs require a certificate file to be specified by a real file system path, and C’s dlopen() function also requires a real file system path.

To support this need, importlib_resources provides an API to extract the resource from a zip file to a temporary file or folder and return the file system path to this materialized resource as a pathlib.Path object. In order to properly clean up this temporary file, what’s actually returned is a context manager for use in a with-statement:

from importlib_resources import files, as_file

source = files(email.tests.data).joinpath('message.eml')
with as_file(source) as eml:
    third_party_api_requiring_file_system_path(eml)

Use all the standard contextlib APIs to manage this context manager.

Migrating from Legacy#

Starting with Python 3.9 and importlib_resources 1.4, this package introduced the files() API, to be preferred over the legacy API, i.e. the functions open_binary, open_text, path, contents, read_text, read_binary, and is_resource.

To port to the files() API, refer to the _legacy module to see simple wrappers that enable drop-in replacement based on the preferred API, and either copy those or adapt the usage to utilize the files and Traversable interfaces directly.

Extending#

Starting with Python 3.9 and importlib_resources 2.0, this package provides an interface for non-standard loaders, such as those used by executable bundlers, to supply resources. These loaders should supply a get_resource_reader method, which is passed a module name and should return a TraversableResources instance.

Footnotes