SciPy

NEP 17 — Split Out Masked Arrays

Author:Stéfan van der Walt <stefanv@berkeley.edu>
Status:Rejected
Type:Standards Track
Created:2018-03-22
Resolution:https://mail.python.org/pipermail/numpy-discussion/2018-May/078026.html

Abstract

This NEP proposes removing MaskedArray functionality from NumPy, and publishing it as a stand-alone package.

Detailed description

MaskedArrays are a sub-class of the NumPy ndarray that adds masking capabilities, i.e. the ability to ignore or hide certain array values during computation.

While historically convenient to distribute this class inside of NumPy, improved packaging has made it possible to distribute it separately without difficulty.

Motivations for this move include:

  • Focus: the NumPy package should strive to only include the ndarray object, and the essential utilities needed to manipulate such arrays.
  • Complexity: the MaskedArray implementation is non-trivial, and imposes a significant maintenance burden.
  • Compatibility: MaskedArray objects, being subclasses [1] of ndarrays, often cause complications when being used with other packages. Fixing these issues is outside the scope of NumPy development.

This NEP proposes a deprecation pathway through which MaskedArrays would still be accessible to users, but no longer as part of the core package.

Implementation

Currently, a MaskedArray is created as follows:

from numpy import ma
ma.array([1, 2, 3], mask=[True, False, True])

This will return an array where the values 1 and 3 are masked (no longer visible to operations such as np.sum).

We propose refactoring the np.ma subpackage into a new pip-installable library called maskedarray [2], which would be used in a similar fashion:

import maskedarray as ma
ma.array([1, 2, 3], mask=[True, False, True])

For two releases of NumPy, maskedarray would become a NumPy dependency, and would expose MaskedArrays under the existing name, np.ma. If imported as np.ma, a NumpyDeprecationWarning will be raised, describing the impending deprecation with instructions on how to modify code to use maskedarray.

After two releases, np.ma will be removed entirely. In order to obtain np.ma, a user will install it via pip install or via their package manager. Subsequently, importing maskedarray on a version of NumPy that includes it intgrally will raise an ImportError.

Documentation

NumPy’s internal documentation refers explicitly to MaskedArrays in certain places, e.g. ndarray.concatenate:

> When one or more of the arrays to be concatenated is a MaskedArray, > this function will return a MaskedArray object instead of an ndarray, > but the input masks are not preserved. In cases where a MaskedArray > is expected as input, use the ma.concatenate function from the masked > array module instead.

Such documentation will be removed, since the expectation is that users of maskedarray will use methods from that package to operate on MaskedArrays.

Other appearances

Explicit MaskedArray support will be removed from:

  • numpygenfromtext
  • numpy.libmerge_arrays, numpy.lib.stack_arrays

Backward compatibility

For two releases of NumPy, apart from a deprecation notice, there will be no user visible changes. Thereafter, np.ma will no longer be available (instead, MaskedArrays will live in the maskedarray package).

Note also that new PEPs on array-like objects may eventually provide better support for MaskedArrays than is currently available.

Alternatives

After a lively discussion on the mailing list:

  • There is support (and active interest in) making a better new masked array class.
  • The new class should be a consumer of the external NumPy API with no special status (unlike today where there are hacks across the codebase to support it)
  • MaskedArray will stay where it is, at least until the new masked array class materializes and has been tried in the wild.