Cumulus WDL workflows and Dockerfiles¶
All of our docker images are publicly available on Quay and Docker Hub. Our workflows use Quay as the
default Docker registry. Users can use Docker Hub as the Docker registry by entering cumulusprod
for the workflow
input “docker_registry”, or enter a custom registry name of their own choice.
If you use Cumulus in your research, please consider citing:
Li, B., Gould, J., Yang, Y. et al. “Cumulus provides cloud-based data analysis for large-scale single-cell and single-nucleus RNA-seq”. Nat Methods 17, 793–798 (2020). https://doi.org/10.1038/s41592-020-0905-x
Release Highlights in Current Stable¶
2.0.0 March 14, 2022¶
Overall:
Cumulus workflows are now released on Dockstore:
- Add the tutorial on importing Cumulus workflows to Terra.
- Archive the legacy versions on Broad Method Registry.
Add support on multiple platforms via backend input:
gcp
for Google Cloud,aws
for Amazon AWS,local
for local machine. Enable Google Cloud support by default.For Amazon AWS backend, add awsMaxRetries input to set the maximum retries allowed for job execution at runtime. By default, use
5
.Update the command-line job submission tutorial to work with Altocumulus v2.0.0 or later.
On Examples:
- Update gene expression, hashing and CITE-Seq example tutorial.
- Add tutorial on 10x CellPlex analysis using Cumulus workflows on Cloud.
Workflow-specific:
Add STARsolo_create_reference workflow to build genome references for STARsolo counting. See its documentation for details.
On Cellranger workflow:
- Add support for 10x Cell Ranger version
6.1.1
and6.1.2
, and use6.1.2
by default. See Cell Ranger v6.1 release notes. - Add support for 10x Cell Ranger ARC version
2.0.1
, and use it by default. See Cell Ranger ARC v2.0 release notes for the release notes. - Upgrade cumulus_feature_barcoding to version
0.7.0
to allow manually set barcode starting position (via input crispr_barcode_pos). - Add support for non 10x CRISPR assays. See the description of
crispr
DataType value in this section for details. - For input data consisting of fastq files, it’s able to handle folder structure of both flat (all fastq files in one folder) and nested (one subfolder per sample listed in the input sample sheet) forms.
- Add fastq_outputs to workflow output, which contains mkfastq step output folders for samples listed in the input sample sheet.
- Add count_outputs to workflow output, which contains count step output folderrs for samples listed in the input sample sheet.
- Add support for 10x Cell Ranger version
On Spaceranger workflow:
Add support for 10x Space Ranger version
1.3.0
and1.3.1
, and use1.3.1
by default. See Space Ranger v1.3 release notes for the release notes.For input data consisting of fastq files, it’s able to handle folder structure of both flat (all fastq files in one folder) and nested (one subfolder per library) forms.
Add output section for the workflow. See here for details.
Retire old genome references:
- Keep
GRCh38-2020-A
andmm10-2020-A
. - Retire
GRCh38
,mm10
,GRCh38-2020-A-premrna
andmm10-2020-A-premrna
. Users can still reach out to Cumulus team to ask for URIs to these old references, but they are not provided by default.
- Keep
In the description of ReorientImages field of input sample sheet, add the information on its valid values.
On STARsolo workflow:
Add support for STAR version
2.7.9a
, and use it by default. See STAR v2.7.9a release notes for the release notes.Reorganize the workflow by exposing more inputs to users.
Add support on more protocols: 10x multiome, 10x 5’ (both SC5P-R2 and SC5P-PE), Slide-Seq and Share-Seq. See here <./starsolo.html#prepare-a-sample-sheet> for details.
Use input read1_fastq_pattern and read2_fastq_pattern to support fastq files generated by Cell Ranger or SeqWell, as well as Sequence Read Archive (SRA) data.
For input data consisting of fastq files, it’s able to handle folder structure of both flat (all fastq files in one folder) and nested (one subfolder per library) forms.
Do not attach filename prefix to output files to avoid the incorrect SJ raw feature.tsv symlink error, which would cause the folder delocalization fail. (see discussion with STAR team)
Add STAR log file to workflow output. This is the Log.out file if running STAR locally, which can be used for tracking the process and sharing with STAR team when opening an issue there.
Retire old genome references:
- Keep
GRCh38-2020-A
,mm10-2020-A
, andGRCh38-and-mm10-2020-A
. - Retire old references listed here. Users can still reach out to Cumulus team to ask for URIs to them, but they are not provided by default.
- Keep
On Demultiplexing workflow:
- Upgrade demuxEM to version
0.1.7
for bug fix.
- Upgrade demuxEM to version
On Cellranger_create_reference workflow:
- Add the generated reference file to the workflow output.
- Bug fix in using input memory.
- Update documentation to suggest only using Cell Ranger version
6.1.1
or later for building reference, as v6.0.1 has issues which leave the job running without terminating.
On Cellranger_atac_create_reference workflow:
- Add the generated reference file to the workflow output.
On Cellranger_vdj_create_reference workflow:
- Add the generated reference file to the workflow output.