API

Casacore classes

ska::plasma::PlasmaStMan and ska::plasma::PlasmaStManColumn are the two main classes implementing the Storage Manager API as mandated by casacore.

class PlasmaStMan : public DataManager

The Plasma-based storage manager

This is implemented using a pimpl idiom to hide the particulars of the implementation and hide it from users.

Public Functions

PlasmaStMan(std::string plasma_socket = "", const std::map<std::string, ObjectID> &tensor_object_ids = {}, const std::map<std::string, ObjectID> &table_object_ids = {})

Creates a new instance of the Plasma Storage Manager connected to the given socket, and mapping columns to Arrow Tensors and Tables as indicated in the given mappings.

Parameters
  • plasma_socket – The UNIX socket where the Plasma store listens for connections. If not given, or empty, it defaults to /tmp/plasma, unless the PLASMA_SOCKET environment variable is set, in which case its value takes precedence.

  • tensor_object_ids – A mapping from column names to Object IDs in the Plasma store where Arrow Tensors with the data for the respective column can be found.

  • table_object_ids – A mapping from column names to Object IDs in the Plasma store where Arrow Tables with the data for the respective column can be found (the name of the column being mapped must be the same as the column name in the Arrow Table).

~PlasmaStMan()

Destructor declaration because of the pimpl idiom, otherwise its implementation is defaulted.

void ping_plasma()

void set_plasma_get_timeout(std::int64_t timeout)

void set_plasma_connect_retries(int connect_retries)

Public Static Functions

static casacore::DataManager *makeObject(const casacore::String &aDataManType, const casacore::Record &spec)

Factory function invoked by casacore to create an instance of PlasmaStMan from a given DataManager specification.

Parameters
  • aDataManType – The name of the data manager.

  • spec – The specification of the data manager.

Returns

A new PlasmaStMan object.

class impl

The Plasma-based storage manager implementation

This class fully implements the plasma-based storage manager, while PlasmaStMan only exposes this implementation, while hiding its dependencies.

Public Functions

impl(std::string plasma_socket = "", std::map<std::string, ObjectID> tensor_object_ids = {}, std::map<std::string, ObjectID> table_object_ids = {})

~impl()

Destructor declaration because of incomplete PlasmaStManColumn type usage in one of our members; otherwise its implementation is defaulted.

void ping_plasma()

void set_plasma_get_timeout(std::int64_t timeout)

void set_plasma_connect_retries(int connect_retries)

DataManager *clone() const

See also

PlasmaStMan::clone

String dataManagerType() const

See also

PlasmaStMan::dataManagerType

String dataManagerName() const

See also

PlasmaStMan::dataManagerName

void create64(rownr_t aNrRows)

See also

PlasmaStMan::create64

rownr_t open64(rownr_t aRowNr, AipsIO &ios)

See also

PlasmaStMan::open64

rownr_t resync64(rownr_t aRowNr)

See also

PlasmaStMan::resync64

Bool flush(AipsIO&, Bool doFsync)

See also

PlasmaStMan::flush

DataManagerColumn *makeScalarColumn(const String &aName, int aDataType, const String &aDataTypeID)

See also

PlasmaStMan::makeScalarColumn

DataManagerColumn *makeDirArrColumn(const String &aName, int aDataType, const String &aDataTypeID)

See also

PlasmaStMan::makeDirArrColumn

DataManagerColumn *makeIndArrColumn(const String &aName, int aDataType, const String &aDataTypeID)

See also

PlasmaStMan::makeIndArrColumn

void deleteManager()

See also

PlasmaStMan::deleteManager

void addRow64(rownr_t aNrRows)

See also

PlasmaStMan::addRow64

Record dataManagerSpec() const

See also

PlasmaStMan::dataManagerSpec

Record getProperties() const

See also

PlasmaStMan::getProperties

void setProperties(const Record &props)

See also

PlasmaStMan::setProperties

inline rownr_t nrows() const

Return the number of rows used by all columns managed by this storage manager

Returns

The number of rows used by all columns managed by this storage manager

Public Static Functions

static DataManager *makeObject(const String &aDataManType, const Record &spec)

class PlasmaStManColumn : public StManColumnBase

A single column of the Plasma Storage Manager

A PlasmaStManColumn manages a single column on a casacore Table, which will be backed up by an Arrow object stored in Plasma. The actual handling of the underlying Arrow object is done via an ArrowReader instace, which hides the differences between the different types of Arrow objects that can hold data. At the moment the only supported reader is TensorReader (and thus this class still silently assumes that), but more will come. When the Tensor is retrieved from Plasma this class will create the corresponding TensorReader instance, which will ensure the data types are compatible. Also, upon data access (again, through the reader), the tensor’s shape is compared against the column’s cell shape to ensure the tensor and the column define the same dimensionality.

While casacore is column-major, Arrow is by default row-major. On the other hand, the dimensions that this column receives via setShapeColumn are those of individual cells, while Arrow Tensors will contain the full column data. Thus:

  • The first dimension of the Tensor should always be the number of rows of the column

  • For the rest of the dimensions, they should match the column cell’s shape in reverse order.

In principle support for non-row-major Tensors should be possible to add, but that is left as a future improvement.

Public Functions

PlasmaStManColumn(const std::string &name, PlasmaClient &client, PlasmaStMan::impl &storage_manager, const ArrowObjectInfo &object_info, int dataType)

Create a new PlasmaStManColumn with the given name and data type. Upon construction it connects to Plasma and retrieves the underlying Arrow object, if known at this stage; otherwise a call to initialize_reader needs to be issued later before attempting to read anything.

Parameters
  • name – The name of this column.

  • client – The Plasma client object used to read Arrow objects off Plasma.

  • storage_manager – A reference to the owning storage manager, used to retrieve the number of rows after table creation.

  • object_info – Structure containing the Object ID and type of Arrow object to read from Plasma. If the type is ArrowObjectType::UNKNOWN then no reading occurs.

  • dataType – The data type of this column.

void initialize_reader(const ArrowObjectInfo &object_info)

Initializes the underlying reader object with the provided information.

Parameters

object_info – Structure containing the Object ID and type of Arrow object to read from Plasma. If the type is ArrowObjectType::UNKNOWN then no initialization occurs.

bool reader_initialized() const
Returns

Whether the underlying reader is initialized or not.

Plasma access

class PlasmaClient

A class encapsulating access to a Plasma Store.

This class encapsulates access to a Plasma Store. Although it’s a very thin wrapper around ::plasma::PlasmaClient, it adds configuration capabilities around certain aspects, like timeouts, the socket to connect to, retries and others.

Public Functions

PlasmaClient(std::string socket)

Create a new PlasmaClient that will connect to the given socket.

Parameters

socket – The Plasma socket to connect to.

void ping()

Ensure communication between the client and the server works.

inline void set_get_timeout(std::int64_t timeout)

Set the timeout for the Plasma Get operation, in milliseconds.

Parameters

timeout – The timeout for the Plasma Get operation, in milliseconds.

inline std::int64_t get_timeout() const
Returns

The timeout for the Plasma Get operation, in milliseconds.

inline void set_connect_retries(int connect_retries)

Set the number of attempts to connect to the Plasma socket before failing.

Parameters

connect_retries – the number of attempts to connect to the Plasma socket before failing.

inline int connect_retries() const
Returns

The number of attempts to connect to the Plasma socket before failing.

::plasma::ObjectBuffer get(const ObjectID &object_id)

Read an object from the Plasma store. A plasma_error exception is thrown if no such object is found within the timeout.

Parameters

object_id – The ID of the object to read.

Returns

A Plasma Object Buffer pointing to the object in the Plasma Store.

inline std::string socket() const
Returns

The socket where this Plasma client connects to.

Data reading

Internally, data reading is organised in a hierarchy of the Reader classes, each taking care of reading different Arrow objects.

class ArrowReader

Base class for Arrow data readers used by the PlasmaStManColumn class.

Arrow offers different storage types, like Tensors and Tables. This base class offers a common interface for accessing data from these different storage types.

Subclassed by ska::plasma::TableReader, ska::plasma::TensorReader

Public Functions

inline ArrowReader(const std::string &column_name, casacore::DataType data_type)

Constructs a reader for the given data type.

Parameters
  • column_name – The casacore column backed by this reader.

  • data_type – The casacore data type of the column backed by this reader.

virtual ~ArrowReader() = default

Virtual destructor required by virtual base class.

inline void check_conformance(const Shape &column_shape)

Checks that the data type and the shape of the underlying Arrow object match those of the casacore column this reader backs up. The column data type is known at construction time, and the column shape is given here.

Parameters

column_shape – The shape of the casacore column this reader backs up.

virtual void read_scalar(rownr_t rownr, void *dataPtr) = 0

Read a single scalar value from the underlying Arrow object. The scalar value is that corresponding to the cell in row rownr.

Parameters
  • rownr – The (casacore) row number of the cell for which the scalar is being read.

  • dataPtr – The address where the scalar should be written to.

virtual void read_array(ArrayBase &array, std::size_t offset) = 0

Read an array from the underlying Arrow object starting at the given offset. The array’s shape determines how much data is effectively read, and might or might not be able to be created with zero-copy.

Parameters
  • array – The array where the data should be read into.

  • offset – The offset in the underlying Arrow object at which reading will start.

class TensorReader : public ska::plasma::ArrowReader

An ArrowReader that reads data off an Arrow Tensor.

TODO: The current implementation contains two private templated methods to handle all data types. This means we need to continuously do a runtime check for the casacore data type to choose the correct template instance. This could be avoided by offering a TensorReaderBase class that handles all common aspects, then a TensorReader class templated on the casacore data type, and finally a factory function that is called once from PlasmaStManColumn to create the correct reader for the given casacore data type.

Public Functions

TensorReader(const std::string &column_name, casacore::DataType data_type, arrow::io::InputStream *input_stream)

Constructs a TensorReader for the given casacore data type and column from an input stream.

Parameters
  • column_name – The casacore column backed by this reader.

  • data_type – The casacore data type of the column backed by this

  • input_stream – The input stream from where the Tensor will be read. This is possibly created from an object read from Plasma.

virtual void read_scalar(rownr_t rownr, void *dataPtr) override

Read a single scalar value from the underlying Arrow object. The scalar value is that corresponding to the cell in row rownr.

Parameters
  • rownr – The (casacore) row number of the cell for which the scalar is being read.

  • dataPtr – The address where the scalar should be written to.

virtual void read_array(ArrayBase &array, std::size_t offset) override

Read an array from the underlying Arrow object starting at the given offset. The array’s shape determines how much data is effectively read, and might or might not be able to be created with zero-copy.

Parameters
  • array – The array where the data should be read into.

  • offset – The offset in the underlying Arrow object at which reading will start.

class TableReader : public ska::plasma::ArrowReader

An ArrowReader that reads data off an Arrow Table.

Tables can contain multiple “fields” or “columns”. The column read by this reader is the one with the same name of the casacore Table column backed up by this reader. If no such field/column is found in the Arrow Table then an error is raised. Only Tables written as a single BatchRecord are currently supported.

Public Functions

TableReader(const std::string &column_name, casacore::DataType data_type, arrow::io::InputStream *input_stream)

Constructs a TableReader for the given casacore data type and column from an input stream. The column name in casacore must be the same as the column in the Arrow Table that will be read.

Parameters
  • column_name – The casacore column backed by this reader. Should be the same as the column in the Arrow Table.

  • data_type – The casacore data type of the column backed by this

  • input_stream – The input stream from where the Table will be read. This is possibly created from an object read from Plasma.

virtual void read_scalar(rownr_t rownr, void *dataPtr) override

Read a single scalar value from the underlying Arrow object. The scalar value is that corresponding to the cell in row rownr.

Parameters
  • rownr – The (casacore) row number of the cell for which the scalar is being read.

  • dataPtr – The address where the scalar should be written to.

virtual void read_array(ArrayBase &array, std::size_t offset) override

Read an array from the underlying Arrow object starting at the given offset. The array’s shape determines how much data is effectively read, and might or might not be able to be created with zero-copy.

Parameters
  • array – The array where the data should be read into.

  • offset – The offset in the underlying Arrow object at which reading will start.

Misc

class ObjectID

Simple, immutable class containing an Object ID.

This is a simpler version of plasma’s own Object ID class, but without carrying all its dependencies, allowing us to have a specific type to represent Object IDs (other than std::string) without permeating the codebase with plasma dependencies.

Public Functions

ObjectID() = default

Construct an empty ObjectID, it can’t be used for anything.

ObjectID(const std::string &object_id)

Constructs an Object ID for the given string, which must be a valid plasma Object ID.

Parameters

object_id – The contents of the Object ID

ObjectID(const char *object_id)

Constructs an Object ID for the given null-terminated C string, which must be a valid plasma Object ID.

Parameters

object_id – The contents of the Object ID

inline const std::string &string() const

Returns the underlying string.

Returns

The underlying string

inline bool valid() const

Returns whether this is a valid Object ID or not.

Returns

true if this Object ID is valid