Running the Pipeline

Pipeline Discription

The SALT3 Training pipeline consists of several procedures that will be run in series. The pipeline modifies a base input file to create a customized one and calls the external program with the customized input. Details are described below and the pipeline will be described and published in Dai et al. (in prep.).

Param File

General Structure

Each section in the param file defines one procedure in the pipeline. The gerenal structure is as follows:

[Procedure Name] ([byosed], [simulation], [training], [lcfitting], [cosmology], ...)

# external program to call
pro =

# arguments for the external program
proargs =

# base input location
baseinput =

# define the section (optional), key and value to be added or changed from the base input

set_key= [NCOL] # 2 if no section or 3 if section exists in the config file
    [SECTION1] [KEY] [VALUE]
    [SECTION2] [KEY2] [VALUE2]
    [SECTION2] [KEY3] [VALUE3]

Batch mode

The pipeline supports batch submission for certain stages (e.g. simulation, lcfitting, …)

set batch=True under that stage.

Running the Pipeline

The pipeline class

The pipeline can be run using the SALT3pipe class. In the examples/pipelinetest directory, you can run the pipeline with the following commands:

from salt3.pipeline.pipeline import *
pipe = SALT3pipe(finput='sampleinput.txt')
pipe.build()
pipe.configure()
pipe.run()

Building the pipeline with selected stages

The build method need to be called before configure and run The default pipeline includes all the stages. Currently they are ['byosed','sim','train','lcfit','getmu','cosmofit'] or ['data','train','lcfit','getmu','cosmofit'], depending on the value of the data option. This can be set simply by

pipe.build()

The option data can be turned on/off to use data/sims, for example:

pipe.build(data=False)

The default value is data=True

To specify or skip certain stages, set the option mode='customize', and specify/skip stages using onlyrun/skip. Note that the only one of the options can be set.

pipe.build(data=False,mode='customize',onlyrun=['lcfit','getmu','cosmofit'])

Once the build method is called, the configure method need to be called following it so that the input files are properly configured.

Connecting the input/output of different stages using the ‘glue’ method

The glue method can be called so that the input and output of the gluing stages are properly connected. This will overwrite the config (input) files of the stages and should be called after configure.

pipe.glue(['sim','train'])

For some stages that are connected with multiple stages, the on option specify what input/output files to glue on:

pipe.glue(['train','lcfit'],on='model')
pipe.glue(['sim','lcfit'],on='phot')

Running the pipeline

After calling build and glue, call the run method the execute the pipeline:

pipe.run()

Note the build, configure, glue and run methods can be called multiple times to build a customized pipeline. Keep in mind each time configure is called, it modifies the config (input) file of certain stages in specified in build; and each time glue is called, it overwrites the existing config (input) file. So these methods should be called logically given how the pipeline is run.

The following example will run the Simulation and Training stages first with their input/output properly connected, then run the LCfitting, Getmu, and Cosmofit stages. Since to glue Training and Lcfitting (lcfitting using the trained model), the training code need to be run first so that the trained model files exist.

def test_pipeline():
    pipe = SALT3pipe(finput='sampleinput.txt')
    pipe.build(data=False,mode='customize',onlyrun=['byosed','sim','train'])
    pipe.configure()
    pipe.glue(['sim','train'])
    pipe.run()
    pipe.build(data=False,mode='customize',onlyrun=['lcfit','getmu','cosmofit'])
    pipe.configure()
    pipe.glue(['train','lcfit'],on='model')
    pipe.glue(['sim','lcfit'],on='phot')
    pipe.glue(['lcfit','getmu'])
    pipe.glue(['getmu','cosmofit'])
    pipe.run()

Running the Pipeline using the runpipe.py utility [batch submission supported]

Currently the runpipe.py utility is under salt3/pipeline/. We plan to pre-install it in the future.

Using runpipe.py

To use the utility, first define the environmental variable MY_SALT3_DIR:

export MY_SALT3_DIR='THE_SALT3_DIRECTORY'

Then in the terminal call:

python $MY_SALT3_DIR/SALT3/salt3/pipeline/runpipe.py -[OPTIONS] [OPTVALUES]

To see the currently available options, use

python $MY_SALT3_DIR/SALT3/salt3/pipeline/runpipe.py --help
usage: runpipe.py [-h] [-c PIPEINPUT] [--mypipe MYPIPE]
                  [--batch_mode BATCH_MODE] [--batch_script BATCH_SCRIPT]
                  [--randseed RANDSEED] [--fseeds FSEEDS] [--num NUM]
                  [--norun]

Run SALT3 Pipe.

optional arguments:
  -h, --help            show this help message and exit
  -c PIPEINPUT          pipeline input file
  --mypipe MYPIPE       define your own pipe in yourownfilename.py
  --batch_mode BATCH_MODE
                        >0 to specify how many batch jobs to submit
  --batch_script BATCH_SCRIPT
                        base batch submission script
  --randseed RANDSEED   [internal use] specify randseed for single simulation
  --fseeds FSEEDS       provide a list of randseeds for multiple batch jobs
  --num NUM             [internal use] suffix for multiple batch jobs
  --norun               set to only check configurations without launch jobs

Define your own pipeline

Define your own pipeline is supported by runpipe.py.

Simply write your own pipeline in a MYPIPE.py (name can be arbitrary) file and use the –mypipe MYPIPE flag when calling the program. Make sure to drop the pipe.run() line, the pipeline will be called and run in the program. Example MYPIPE.py file:

def MyPipe(finput,**kwargs):
    from pipeline import SALT3pipe
    # write your own pipeline here
    pipe = SALT3pipe(finput)
    pipe.build(data=False,mode='customize',onlyrun=['byosed','sim','train','lcfit'])
    pipe.configure()
    pipe.glue(['sim','train'])
    pipe.glue(['sim','lcfit'])
    return pipe