mangadap.util.covariance module¶
Defines a class used to store and interface with covariance matrices.
 License:
 Copyright (c) 2015, SDSSIV/MaNGA Pipeline Group
 Licensed under BSD 3clause license  see LICENSE.rst
 Source location:
 $MANGADAP_DIR/python/mangadap/util/covariance.py
 Imports and python version compliance:
from __future__ import division from __future__ import print_function from __future__ import absolute_import from __future__ import unicode_literals import sys if sys.version > '3': long = int import numpy from scipy import sparse from astropy.io import fits from matplotlib import pyplot
 Usage examples:
You can calculate the covariance matrix for a given wavelength channel in a
mangadap.drpfits.DRPFits
object:# Access the DRP RSS file from mangadap.drpfits import DRPFits drpf = DRPFits(7495, 12703, 'RSS', read=True) # Calculate a single covariance matrix C = drpf.covariance_matrix(2281) # Show the result in an image C.show() # Access specific elements print(C[0,0]) # Convert to a 'dense' array dense_C = C.toarray() # Get the triplets of the nonzero elements i, j, v = C.find() # Write it to disk (clobber existing file) C.write('test_covariance.fits', clobber=True)
The covariance matrix is stored in “coordinate” format in a fits binary table. Since the covariance matrix is symmetric by definition, only those nonzero elements in the upper triangle (\(C_{ij}\) where \(i\leq j\)) are saved (in memory or on disk). You can read an existing covariance matrix fits file:
from mangadap.util.covariance import Covariance C = Covariance(ifile='test_covariance.fits')
You can calculate a set of covariance matrices or the full covariance cube:
# Access the DRP RSS file from mangadap.drpfits import DRPFits drpf = DRPFits(7495, 12703, 'RSS', read=True) # Calculate the full cube CC = drpf.covariance_cube() # BEWARE: This may require a LOT of memory # Calculate fewer but many covariance matrices channels = [ 0, 1000, 2000, 3000, 4000 ] C = drpf.covariance_cube(channels=channels) # Access individual elements print(C[0,0,0]) print(C[0,0,1000]) # Write the full cube, or set of channels CC.write('full_covariance_cube.fits') # BEWARE: This will be a BIG file C.write('covariance_channel_set.fits')
Although you can access the data in the covariance matrix as explained above, this is generally inefficient because the
Covariance.__getitem__()
function currently cannot handle slices. If you need to perform operations with the covariance matrix, you’re better off working with theCovariance.cov
attribute directly. To do this, you won’t be able to use the aliasing of the channel indices for 3D covariance matrices.The
Covariance
class also allows you to toggle between accessing the matrix as a true covariance matrix, or by splitting the matrix into its variance and correlation components. For example,:# Get the covariance matrix for a single wavelength channel from mangadap.drpfits import DRPFits drpf = DRPFits(7495, 12703, 'RSS', read=True) C = drpf.covariance_matrix(2281) # Show the covariance matrix before and after changing it to a # correlation matrix C.show() C.to_correlation() C.show() print(C.is_correlation) C.revert_correlation()
Covariance matrices that have been converted to correlation matrices can be written and read in without issue. See
Covariance.write()
andCovariance.read()
. For example:# Get the covariance matrix for a single wavelength channel import numpy from mangadap.util.covariance import Covariance from mangadap.drpfits import DRPFits drpf = DRPFits(7495, 12703, 'RSS', read=True) channels = [ 0, 1000, 2000, 3000, 4000 ] Cov = drpf.covariance_cube(channels=channels) Cov.to_correlation() Cov.show(channel=2000) Cov.write('correlation_matrix.fits', clobber=True) Cov.revert_correlation() Corr = Covariance(ifile='correlation_matrix.fits') Corr.revert_correlation() assert not (numpy.abs(numpy.sum(Cov.toarray(channel=2000)  Corr.toarray(channel=2000))) > 0.0)
 Revision history:
 23 Feb 2015: Original Implementation by K. Westfall (KBW)04 Aug 2015: (KBW) Sphinx documentation and minor edits.29 Mar 2016: (KBW) Allow the object to flip between a true covariance matrix or a correlation matrix with saved variances. Allow for a function that only returns the astropy.io.fits.BinTableHDU object with the coordinateformat covariance data.06 Apr 2016: (KBW) Allow
Covariance.read()
to read from a file or an astropy.io.fits.hdu.hdulist.HDUList object, and allow the specification of the extensions to read the header, covariance, and plane data.23 Feb 2017: (KBW) Major revisions: Use DAPFitsUtil to read/write the HDUList. Output is always a correlation matrix. Rearrange ordering of 3D covariance matrices to reflect the ordering in the DRPFits cube and the DAP Maps channels. Change plane keyword to channel. Change the format of the covariance matrices when written/read from a file to match what is done by the DRP. Change read method tofrom_fits()
class method. Includefrom_samples()
class method.
Todo
 Allow for calculation of the inverse of the covariance matrix.
 Instead of 3D covariance cubes being an array of sparse objects, make the whole thing a sparse array.

class
mangadap.util.covariance.
Covariance
(inp, input_indx=None, impose_triu=False, correlation=False)[source]¶ Bases:
object
A general utility for storing, manipulating, and file I/O of large but sparse covariance matrices.
Works under the assumption that covariance matrices are symmetric by definition.
Todo
 Can create empty object. Does this make sense?
Parameters:  inp (scipy.sparse.csr_matrix or numpy.ndarray) – Covariance matrix to store. Input must be coviariance data, not correlation data. Data type can be either a single scipy.sparse.csr_matrix object or a 1D array of them. Assumes all sparse matrices in the input ndarray have the same size. If None, the covariance object is instantiated empty.
 input_indx (numpy.ndarray) – (Optional) If inp is an array of
scipy.sparse.csr_matrix objects, this is an integer
array specifying a pseudoset of indices to use instead of
the direct index in the array. I.e., if inp is an array
with 5 elements, one can provide a 5element array for
input_indx
that are used instead as the designated index for the covariance matrices. See:__getitem__()
,_grab_true_index()
.  impose_triu (bool) – (Optional) Flag to force the scipy.sparse.csr_matrix object to only be the upper triangle of the covariance matrix. The covariance matrix is symmetric such that C_ij = C_ji, so it’s not necessary to keep both values. This flag will force a call to scipy.sparse.triu when setting the covariance matrix. Otherwise, the input matrix is assumed to only have the upper triangle of numbers.
 correlation (bool) – (Optional) Convert the input to a correlation matix. The input must be the covariance matrix.
Raises: TypeError
– Raised if the input array is not onedimensional or the input covariance matrix are not scipy.sparse.csr_matrix objects.Exception
– Raised if theinput_indx
either does not have the correct size or is not required due to the covariance object being a single matrix.

cov
¶ The covariance matrix stored in sparse format.
Type: scipy.sparse.csr_matrix or numpy.ndarray

shape
¶ Shape of the full array.
Type: tuple

dim
¶ The number of dimensions in the covariance matrix. This can either be 2 (a single covariance matrix) or 3 (multiple covariance matrices).
Type: int

nnz
¶ The number of nonzero covariance matrix elements.
Type: int

input_indx
¶ If
cov
is an array of scipy.sparse.csr_matrix objects, this is an integer array specifying a pseudoset of indices to use instead of the direct index in the array. I.e., ifcov
is an array with 5 elements, one can provide a 5element array forinput_indx
that are used instead as the designated index for the covariance matrices. See:__getitem__()
,_grab_true_index()
.Type: numpy.ndarray

inv
¶ The inverse of the covariance matrix. This is not currently calculated!
Type: scipy.sparse.csr_matrix or numpy.ndarray

var
¶ Array with the variance provided by the diagonal of the/each covariance matrix. This is only populated if necessary, either by being requested (
variance()
) or if needed to convert between covariance and correlation matrices.Type: numpy.ndarray

is_correlation
¶ Flag that the covariance matrix has been saved as a variance vector and a correlation matrix.
Type: bool

_grab_true_index
(inp)[source]¶ In the case of a 3D array, return the true arrayelement index given the pseudo index.
Parameters: inp (int) – Requested index to convert to the real index in the stored array of covariance matrices. Returns: The index in the stored array of the requested covariance matrix. Return type: int Todo
Search operation is inefficient. Apparently a better option is a feature request for numpy 2.0

_impose_upper_triangle
()[source]¶ Force
cov
to only contain nonzero elements in its upper triangle.

_with_lower_triangle
(channel=None)[source]¶ Return a scipy.sparse.csr_matrix object with both its upper and lower triangle filled, ensuring that they are symmetric.
Parameters: channel (int) – (Optional) The pseudoindex of the covariance matrix to return. Returns: The sparse matrix with both the upper and lower triangles filled (with symmetric information). Return type: scipy.sparse.csr_matrix Raises: ValueError
– Raised if the object is 3D and channel is not provided.

apply_new_variance
(var)[source]¶ Using the correlation matrix defined by self, use the provided variance data to construct a new covariance matrix.
The input variance must be of the correct shape.

bin_to_spaxel_covariance
(bin_indx)[source]¶ Propagate the covariance matrix data for the stacked spectra into the full cube.
This (self) should be the covariance/correlation matrix for the binned spectra.
Parameters: bin_indx (numpy.ndarray) – The integer vector with the bin associated with each spectrum in the DRP cube. This is the flattened BINID array. Returns: mangadap.util.covariance.Covariance: Covariance/Correlation matrix for the spaxelized binned data. Return type: class Todo
 This needs to be tested.

copy
()[source]¶ Return a copy of this Covariance object, not a reference to it, by returning a new Covariance instance with the same data. Only sticking point is making sure to keep track of whether the object was saved as a covariance matrix or as a correlation matrix with a variance vector.

find
(channel=None)[source]¶ Find the nonzero values in the full covariance matrix (not just the upper triangle).
Parameters: channel (int) – (Optional) The pseudoindex of the covariance matrix to plot. Required if the covariance object is 3D. Returns: A tuple of arrays i, j, and c. The arrays i and j contain the index coordinates of the nonzero values, and c contains the values themselves. Return type: tuple

classmethod
from_fits
(source, ivar_ext='IVAR', covar_ext='CORREL', row_major=False, impose_triu=False, correlation=False, quiet=False)[source]¶ Read an existing covariance object previously written to disk; see
write()
. The class can read covariance data written by other programs as long as they have a commensurate format.If the extension names and column names are correct,
Covariance
can read fits files that were not necessarily produced by the class itself. This is useful for MaNGA DAP products that include the covariance data in among other output products. Because of this,output_hdus()
andoutput_fits_data()
are provided to produce the data that can be placed in any fits file.The function determines if the output data were reordered by checking the number of binary columns.
Parameters:  source (str or astropy.io.fits.hdu.hdulist.HDUList) – Initialize the object using an astropy.io.fits.hdu.hdulist.HDUList object or path to a fits file.
 ivar_ext (str) – (Optional) If reading the data from source, this is the name of the extension with the inverse variance data. Default is ‘IVAR’. If None, the variance is taken as unity.
 covar_ext (str) – (Optional) If reading the data from source, this is the name of the extension with covariance data. Default is ‘CORREL’.
 row_major (bool) – (Optional) If reading the data from an
HDUList, this sets if the data arrays have been
rearranged into a pythonnative (rowmajor) structure.
See
mangadap.util.fitsutil.DAPFitsUtil.transpose_image_data()
. Default is False.  impose_triu (bool) – (Optional) Flag to force the scipy.sparse.csr_matrix object to only be the upper triangle of the covariance matrix. The covariance matrix is symmetric such that C_ij = C_ji, so it’s not necessary to keep both values. This flag will force a call to scipy.sparse.triu when setting the covariance matrix. Otherwise, the input matrix is assumed to only have the upper triangle of numbers.
 correlation (bool) – (Optional) Return the matrix as a
correlation matrix. Default (False) is to use the data
(always saved in correlation format; see
write()
) to construct the covariance matrix.  quiet (bool) – (Optional) Suppress terminal output.

classmethod
from_matrix_multiplication
(T, Sigma)[source]¶ Construct the covariance matrix that results from a matrix multiplication.
The matrix multiplication should be of the form:
\[{\mathbf T} \times {\mathbf X} = {\mathbf Y}\]where \({\mathbf T}\) is a transfer matrix of size \(N_y\times N_x\), \({\mathbf X}\) is a vector of size \(N_x\), and \({\mathbf Y}\) is the vector of length \({N_y}\) that results from the multiplication.
The covariance matrix is then
\[{\mathbf C} = {\mathbf T} \times {\mathbf \Sigma} \times {\mathbf T}^{\rm T},\]where \({\mathbf \Sigma}\) is the covariance matrix for the elements of \({\mathbf X}\). If Sigma is provided as a vector of length \(N_x\), it is assumed that the elements of \({\mathbf X}\) are independent and the provided vector gives the variance in each element; i.e., the provided data represent the diagonal of \({\mathbf \Sigma}\).

classmethod
from_samples
(samples, cov_tol=None, rho_tol=None)[source]¶ Define a covariance object using descrete samples from an Ndimensional parameter space using the numpy covariance function. The shape of the input array must be \(N_{\rm par}\times N_{\rm samples}\).
Any correlation coefficent less than tol is forced to zero, if tol is provided.

classmethod
from_variance
(variance, correlation=False)[source]¶ Construct a diagonal covariance matrix using the provided variance.

output_fits_data
(reshape=False)[source]¶ Construct the data arrays to write to a fits file, providing the covariance matrix/matrices in coordinate format.
Regardless of whether or not the current internal data is a covariance matrix or a correlation matrix, the data is always returned as a correlation matrix.
If the covariance matrix is 2 dimensional, four columns are output:
i, j: The i,j indices of the covariance matrix.
rho_ij: The correlation coefficient between pixels i and j.
 variance: The variance in each pixel; i.e., the value of
 \(C_{ii} \forall i\). If reshape is True, this will be output as a twodimenional array.
If the covariance matrix is 3dimensional, one additional column is output:
 k: The k indices of the channels associated with each
 covariance matrix. This is either just the index of the covariance matrix or the provided pseudoindices of each channel.
If using the reshape option, the i and j indices are converted to two columns each providing the indices in the associated reshaped array with coordinates c1,c2.
Parameters: reshape (bool) – (Optional) Reshape the output in the case when i and j are actually pixels in a twodimensional image. The shape of the image is expected to be square, such that the shape of the covariance matrix is \(N_x\times N_y\), where \(N_x = N_y\). Each of the I and J output columns are then split into two columns according to associate coordinates, such that there are 8 output columns. Returns: Either 6 or 8 output arrays depending on if the data has been reshaped into an image. Return type: numpy.ndarray

output_hdus
(reshape=False, hdr=None)[source]¶ Construct the output HDUs and header that contain the covariance data.

static
ravel_indices
(size, i, j)[source]¶ Return the indices in the 1D vector that results from flattening a
size
bysize
square array.

static
reshape_indices
(size, i)[source]¶ Return the indices in the 2D array that results from reshaping a vector of length size into a square 2D array. The indices of interest in the vector are give by i.

revert_correlation
()[source]¶ Revert the object from a correlation matrix back to a full covariance matrix.
Todo
Allow this function to take in a new set of variances to use.
This function does nothing if the correlation flag has not been flipped. The variances must have already been calculated!

show
(channel=None, zoom=None, ofile=None, log10=False)[source]¶ Convert the (selected) covariance matrix to a filled array and plot the array using matplotlib.pyplot.imshow. If an output file is provided, the image is redirected to the designated output file; otherwise, the image is plotted to the screen.
Parameters:  channel (int) – (Optional) The pseudoindex of the covariance matrix to plot. Required if the covariance object is 3D.
 zoom (float) – (Optional) Factor by which to zoom in on the center of the image by removing the other regions of the array. E.g. zoom=2 will show only the central quarter of the covariance matrix.
 ofile (str) – (Optional) If provided, the array is output to this file instead of being plotted to the screen.

spaxel_to_bin_covariance
(bin_indx)[source]¶ Opposite of
bin_to_spaxel_covariance()
: Revert the covariance matrix to the covariance between the unique binned spectra.This (self) should be the covariance/correlation matrix for the binned spectra redistributed to the size of the original spaxels map.
Warning
This does NOT propagate the covariance between the spaxels into the covariance in the binned data. That operation is done by, e.g.,
mangadap.proc.spectralstack.SpectralStack._stack_with_covariance()
. This only performs the inverse operation ofbin_to_spaxel_covariance()
.Parameters: bin_indx (numpy.ndarray) – The integer vector with the bin associated with each spectrum in the DRP cube. This is the flattened BINID array. Returns: mangadap.util.covariance.Covariance: Covariance/Correlation matrix for the stacked spectra. Return type: class Todo
 This needs to be tested.

to_correlation
()[source]¶ Convert the covariance matrix into a correlation matrix by dividing each element by the variances.
If the matrix is a correlation matrix already (see
is_correlation
), no operations are performed. Otherwise, the variance vectors are computed, if necessary, and used to normalize the covariance values.A
Covariance
object can be reverted from a correlation matrix; seerevert_correlation()
.

toarray
(channel=None)[source]¶ Convert the covariance to a full array, filled with zeros when appropriate.
Parameters: channel (int) – (Optional) The pseudoindex of the covariance matrix to plot. Required if the covariance object is 3D. Returns: Dense array with the full covariance matrix. Return type: numpy.ndarray

variance
(copy=True)[source]¶ If not already done, grab the variances along the diagonal of the covariance matrix. The function returns the variance for all channels if more than one exists.

write
(ofile, reshape=False, hdr=None, clobber=False)[source]¶ Write the covariance object to a fits file such that it can be read for later use; see
from_fits()
. The covariance matrix (matrices) are stored in “coordinate” format using fits binary tables; see scipy.sparse.coo_matrix. The matrix is also always stored as a correlation matix, even if the object is currently in the state holding the covariance data.Independent of the dimensionality of the covariance matrix, the written file has a
PRIMARY
extension with the keywordCOVSHAPE
that specifies the original dimensions of the covariance matrix; seeshape
.The correlation data are written to the
CORREL
extension. The number of columns in this extension depends on the provided keywords; seeoutput_fits_data()
. The column names are:INDXI
,INDXJ
,INDXK
: indices in the covariance matrix. The last index is provided only if the object is 3D.INDXI
andINDXJ
are separated into two columns if the output is reshaped; these columns areINDXI_C1
,INDXI_C2
,INDXJ_C1
,INDXJ_C2
.RHOIJ
: The nonzero correlation coefficients located the specified coordinates
The inverse of the variance along the diagonal of the covariance matrix is output in an ImageHDU in extension
IVAR
.For 3D matrices, if pseudoindices have been provided, these are used in the
INDXK
column; however, theCOVSHAPE
header keyword only gives the shape of the unique indices of the covariance channels.Parameters:  ofile (str) – File name for the output.
 reshape (bool) – (Optional) Reshape the output in the case when i and j are actually pixels in a twodimensional image. The shape of the image is expected to be square, such that the shape of the covariance matrix is \(N_x\times N_y\), where \(N_x = N_y\). Each of the I and J output columns are then split into two columns according to associate coordinates.
 hdr (astropy.io.fits.Header) – (Optional) A header object to include in the PRIMARY extension. The SHAPE keyword will be added/overwritten.
 clobber (bool) – (Optional) Overwrite any existing file.
Raises: TypeError
– Raise if the input hdr does not have the correct type.