Usage

PlasmaStMan maps Apache Arrow Tensors and Tables (i.e., their Object IDs in the Plasma store) to individual columns within a casacore Table.

Arrow Tensors map directly to casacore Columns one to one. The mapping then consists on a pair of strings indicating the Object ID of the Tensor in the Plasma store and the name of the casacore Table column it provides data to. Checks are in place to ensure that a Tensor’s shape and type match those of the corresponding column of the casacore Table. All casacore data types are supported by this mapping with the exception of Strings.

Arrow Tables on the other hand contain one or more Fields, which individually map to casacore Columns. The mapping then consists on a pair of strings indicating the ObjectID of the Table in the Plasma store and the name of the Field that should be considered, which should match the name of the casacore Table column it provides data to. Like in the case of Tensors, a Field’s shape (length) and type are checked against those of the corresponding column of the casacore Table. Columns in an Arrow Table have only a single dimension, so they are currently only supported as scalar columns. Additionally, Complex values are not supported natively by Arrow Tables, and therefore Complex and DComplex values are supported as Arrow Struct objects with r and i fields.

Configuration

PlasmaStMan always needs to connect to a Plasma store. This happens through a Unix socket in the filesystem. The location of this socket defaults to /tmp/plasma, but its value can be overriden by setting the PLASMA_SOCKET environment variable.

Either when reading or writing, certain aspects of PlasmaStMan can be configured at runtime via Storage manager properties (arbitrary key-value pairs). PlasmaStMan supports the following properties:

  • PLASMACONNECTRETRIES: the number of times the Plasma client should try to connect to the Plasma store before giving up. Defaults to 50.

  • PLASMAGETTIMEOUT: the timeout in milliseconds to use when getting an object from the Plasma store that is not immediately available. Defaults to 10000.

Reading

When reading data from a Table backed by a PlasmaStMan storage manager users need to ensured that the libplasmastman shared library is visible in the dynamic linker’s path (e.g., adding the directory containing the library to the LD_LIBRARY_PATH environment variable in Linux).

Other than this, existing casacore-based applications do not require any modification or recompilation.

Writing

Note

At the moment PlasmaStMan does not support writing data to plasma.

Writing is a trickier business.

Even though the data itself cannot be written through PlasmaStMan, what can currently be done is creating a casacore table that points to existing data in Plasma. To achieve this one must inform the storage manager about the mapping between Object IDs and columns. This can be done in two different ways:

  • If writing a program in C++, one can use the PlasmaStMan class to create the storage manager object and bind it to tables. The main constructor of this class accepts two std::map objects to provide the mapping from Object ID to column name for Tensors and Tables.

  • Storage managers allow specifications to be given at creation time. This includes the properties specified above, along with the following additional keys:

    • PLASMASOCKET: the Unix socket used to connect to Plasma, override the PLASMA_SOCKET environment variable.

    • TENSOROBJECTIDS: a casacore Record object (i.e., a mapping) where keys are Tensor Object IDs and values are column names.

    • TABLEOBJECTIDS: a casacore Record object (i.e., a mapping) where keys are Table Object IDs and values are column names.

    Because this is a generic mechanism, these specifications can be given through different interfaces. For example, the TaQL language supports the creation of tables with a given Data Manager specification (see section 8.2, Data manager specification). The python-casacore python bindings also allow the creation of tables with specific Data Manager inforation (see dminfo argument).

Example

Note

This example needs pyarrow installed.

Included in the plasma-storage-manager repository is a python-based script that demonstrates how to create a casacore Table pointing to Plasma-stored Tensors and Tables. This can be used to test PlasmaStMan from external programs:

# Start a plasma store and store tensor and table data with arbitrary values
# and create a table pointing to this new data (using taql).
# Use -h to see a bit more of information on how to use it
$> python scripts/plasma_writer.py -o <table_name> -t <tensor1> -t <tensor2> -T <table1> ... &

# Make the new storage manager visible to third-party apps
$> export LD_LIBRARY_PATH=your-build-directory/src/ska/plasma

# Read the table metadata with casacore's showtableinfo
$> showtableinfo in=<table_name>

# Read the table data back with casacore's taql
$> taql 'select * FROM <table_name>'