How to Fit a non-MaNGA Datacube
After the initial ingestion of the MaNGA datacube, the main DAP command-line script is effectively independent of the specific data format. Although all algorithms were fine-tuned using MaNGA datacubes, the pipeline can be executed on data from other instruments. This document details how this is done and gives an example using data from Keck’s KCWI.
Overview
The critical components that the DAP needs to perform its analysis modules are:
wavelengths are in vacuum,
spectra that are geometrically binned in wavelength,
spectral errors, pixel masks, and an estimate of the spectral resolution,
an estimate of the redshift that is accurate to within \(\pm 2000\) km/s,
conventions for how to identify the datacube being analyzed,
a plan for the analyses to be performed, and
a naming convention for the DAP output.
The first four components are handled by the datacube container class,
DataCube
. Therefore, to run the DAP on a
custom datacube, you need to build a custom
DataCube
subclass.
The last two components are handled by the parameter class
AnalysisPlanSet
. Depending on the
desired output path structure, it is likely that DAP analysis of a custom
datacube can use this class directly, instead of having to create a new
subclass.
Creating a new DataCube subclass
The primary function of the DataCube
is to
provide an interface for the DAP to read the data to be analyzed and provide
access to it in a way that the DAP understands. All datacubes analyzed by the
DAP must subclass from this object.
To work directly with the manga_dap
command-line script, the subclass must
instantiate the datacube object either directly from the datacube file (as its
__init__
method) or via a classmethod
that reads a “configuration” file,
where the class method is called from_config
. In the example below, we
focus on the former; however, the limitation is that the metadata required to
run the DAP (the redshift estimate) must be read from the file or hard-coded.
For reference, MaNGA datacubes analyzed at the survey level use configuration
files; see from_config()
.
In its simplest implementation, a custom datacube subclass would look like this:
from pathlib import Path
from astropy.io import fits
from astropy.wcs import WCS
from mangadap.datacube.datacube import DataCube
class CustomDataCube(DataCube):
r"""
Container class for a custom datacube.
Args:
ifile (:obj:`str`):
The name of the file to read.
"""
instrument = 'custom'
"""
Set the name of the instrument. This is used to construct the
output file names.
"""
def __init__(self, ifile):
_ifile = Path(ifile).resolve()
if not _ifile.exists():
raise FileNotFoundError(f'File does not exist: {_ifile}')
# Set the paths
self.directory_path = _ifile.parent
self.file_name = _ifile.name
# Collect the metadata into a dictionary
meta = {}
# Open the file and initialize the DataCube base class
with fits.open(str(_ifile)) as hdu:
print('Reading datacube ...', end='\r')
prihdr = hdu[0].header
wcs = WCS(header=prihdr, fix=True)
...
print('Reading datacube ... DONE')
# Instantiate the base class
super().__init__(flux, ivar=ivar, mask=mask, sres=sres, wave=wave, meta=meta,
prihdr=prihdr, wcs=wcs, name=_ifile.name.split('_')[0])
In terms of the requirements above:
Geometrically binned spectra: The binning is not checked by the DAP immediately when the code executed (yet). Instead, the code will start and likely crash as soon as it tries to construct the template library to be fit. See Resampling and the KCWI example below for how to use DAP code to resample a datacube. NOTE: The DAP expects datacube arrays to be oriented with the two spatial axes first and the spectral data along the last axis; i.e., the flux array should be ordered \((x,y,$\lambda$)\). This can be done internal to the instantiation by the base class; see the
DataCube
axes
argument.Ancillary data: The script provides spectral errors in the
ivar
array, a pixel mask in themask
array, and an estimate of the spectral resolution in thesres
array (this can be a single number if the resolution is constant as a function of wavelength and position). It also provides the primary header, which is simply copied to output files, and the WCS read from the fits file (assuming it can be constructed from the primary header).Redshift estimate: The example above does not explicitly define the redshift. This is assumed to be done within the
...
block. The redshift must be included by definingmeta['z']
. The code will crash if this is not defined as soon as the DAP attempts to analyze the datacube. All other metadata that can be used by the DAP is optional; seepopulate_metadata()
.Naming conventions: The code establishes the naming convention for output files using the defined
instrument
andname
for each datacube. In this example, theinstrument
is defined as a class attribute, and thename
is an instance attribute defined in the instantiation of the base class based on the input file name. The actual value of these strings is free form, but at least one of them must be defined. The name of the output files is then{instrument}-{name}
, if both are defined, and just theinstrument
orname
string if only one is defined.
Example: KCWI
As an example, here’s a fully implemented
DataCube
subclass for use with KCWI data,
where the data was reduced by the the Keck KCWI DRP (see here).
from pathlib import Path
import numpy
from astropy.io import fits
from astropy.wcs import WCS
from mangadap.datacube.datacube import DataCube
from mangadap.util.sampling import Resample, angstroms_per_pixel
class KCWIDataCube(DataCube):
r"""
Container class for KCWI datacubes.
Args:
ifile (:obj:`str`):
The name of the file to read.
"""
instrument = 'kcwi'
"""
Set the name of the instrument. This is used to construct the
output file names.
"""
def __init__(self, ifile):
_ifile = Path(ifile).resolve()
if not _ifile.exists():
raise FileNotFoundError(f'File does not exist: {_ifile}')
# Set the paths
self.directory_path = _ifile.parent
self.file_name = _ifile.name
# Collect the metadata into a dictionary
meta = {}
meta['z'] = 0.028 # Specific to the target observed
sres = 1800 # Specific to this KCWI instrument setting
# Open the file and initialize the DataCube base class
with fits.open(str(_ifile)) as hdu:
print('Reading KCWI datacube data ...', end='\r')
prihdr = hdu[0].header
wcs = WCS(header=prihdr, fix=True)
flux = hdu[0].data
err = hdu['UNCERT'].data
mask = hdu['MASK'].data.astype(bool) | numpy.logical_not(err > 0.)
print('Reading KCWI datacube data ... DONE')
# Resample to a geometric sampling
# - Get the wavelength vector
spatial_shape = flux.shape[1:][::-1]
nwave = flux.shape[0]
coo = numpy.array([numpy.ones(nwave), numpy.ones(nwave), numpy.arange(nwave)+1]).T
wave = wcs.all_pix2world(coo, 1)[:,2]*wcs.wcs.cunit[2].to('angstrom')
# - Convert the fluxes to flux density
dw = angstroms_per_pixel(wave, regular=False)
flux /= dw[:,None,None]
# - Set the geometric step to the mean value. This means some
# pixels will be oversampled and others will be averaged.
dlogl = numpy.mean(numpy.diff(numpy.log10(wave)))
# - Resample all the spectra. Note that the Resample arguments
# expect the input spectra to be provided in 2D arrays with the
# last axis as the dispersion axis.
r = Resample(flux.T.reshape(-1,nwave), e=err.T.reshape(-1,nwave),
mask=mask.T.reshape(-1,nwave), x=wave, inLog=False, newRange=wave[[0,-1]],
newLog=True, newdx=dlogl)
# - Reshape and reformat the resampled data in prep for
# instantiation
ivar = r.oute.reshape(*spatial_shape,-1)
mask = r.outf.reshape(*spatial_shape,-1) < 0.8
ivar[mask] = 0.0
gpm = numpy.logical_not(mask)
ivar[gpm] = 1/ivar[gpm]**2
_sres = numpy.full(ivar.shape, sres, dtype=float)
flux = r.outy.reshape(*spatial_shape,-1)
# Default name assumes file names like, e.g., '*_icubew.fits'
super().__init__(flux, ivar=ivar, mask=mask, sres=_sres,
wave=r.outx, meta=meta, prihdr=prihdr, wcs=wcs,
name=_ifile.name.split('_')[0])
A few specifics of this implementation to note:
The redshift and spectral resolution for the data to reduce are hard-coded. To be more general, one would rather have a
from_config
classmethod for KCWI where the redshift can be file specific.Much of the instantiation is concerned with resampling the data from a linear step in wavelength to a geometric step, which relies on the
Resample
class.Note that the
wcs
is still provided, unaltered by the change in the wavelength coordinates. This is only possible because the wavelength vector (wave
) is provided to the base-class instantiation directly, taking precedence over anything provided by the original WCS. Beware, though, this means the input and output WCS may be inconsistent for the wavelength axis (GitHub issue raised).
Creating a new AnalysisPlan subclass
For a detailed description of the components of the
AnalysisPlan
and its associated toml
file, see Analysis Plans.
The primary function of the AnalysisPlan
class is to set how the DAP analyzes the data. However, much of how this is
determined is by using abstracted methods and keywords, meaning that the
implementation of the AnalysisPlan
class
is largely independent of the data being analyzed. The only exception to this
is the definition of the output paths. Adopting a top-level output path
referred to as the analysis_path
, the default DAP output paths are:
{analysis_path}/common/
: The path where analysis products “common” to many analysis methods are placed.
{analysis_path}/{plan-key}/
: The path where analysis products specific to a single analysis method are placed.
{analysis_path}/{plan-key}/ref
: The path for DAP reference files
{analysis_path}/{plan-key}/qa
: The path for DAP quality assurance plots
These defaults are set by
common_path()
and
method_path()
. The root of the
output file names is set by
dap_file_root()
. In most
cases, you will likely not want to change these. However, the survey-level
execution of the DAP for the MaNGA Survey did change these to allow the files to
be organized by the plate and IFU number of the analyzed datacube. See
MaNGAAnalysisPlan
for an example of how to
create a subclass that can define specific output paths for your data (beyond
the ability of simply defining the top-level root directory).
Executing the DAP
The main DAP scripts, manga_dap
, can accept the user-defined classes as
command-line arguments. Say you have defined the class in a local file called
my_datacube.py
, and the name of the implemented datacube class is
MyDataCube
. You can execute the DAP on a file (input_file.fits
) with
the datacube to analyze as follows:
manga_dap -f input_file.fits --cube_module my_datacube MyDataCube --plan_module mangadap.config.analysisplan.AnalysisPlan -o dap
A few notes:
I’ve used the
-f
tag, which specifies the input file as the file that contains the datacube data. To use a configuration file, you would use the-c
option, andMyDataCube
would have to have afrom_config
class method.In defining the cube module to use, note that I did not include the
.py
extension of the file name, and I included a space betweenmy_datacube
andMyDataCube
; these are the current way that you indicate that you’re trying to import the relevant class from a local file.Currently, the
manga_dap
defaults to using the MaNGA-specific classes. This means that you must force the code to use theAnalysisPlan
base class, if you haven’t created a new subclass. Here, I’ve defined the full object to import, connected by.
.I did not include a plan file (using the
-p
option). This means the code will use the “default” analysis plan; seedefault()
. See default_plan.tomlI set the top-level directory for the output to be
./dap/
.
Problems?
If you have problems, particularly those that you think may be a more general problem, please Submit an issue.