Nanostring GeoMx DSP

This section contains two workflows: geomxngs_fastq_to_dcc and geomxngs_dcc_to_count_matrix.

geomxngs_fastq_to_dcc workflow wraps Nanostring GeoMx Digital Spatial NGS Pipeline and can convert FASTQ files into DCC files.

geomxngs_dcc_to_count_matrix workflow takes the DCC zip file from geomxngs_fastq_to_dcc and other files produced by the GeoMx DSP machine as inputs, and outputs an area of illumination (AOI) by probe count matrix with pathologists’ annotation.


Convert FASTQ files into DCC files by the Nanostring GeoMx Digital Spatial NGS Pipeline

The geomxngs_fastq_to_dcc workflow converts FASTQ files to DCC files by wrapping the Nanostring GeoMx Digital Spatial NGS Pipeline. After generating DCC files, use the geomxngs_dcc_to_count_matrix workflow to generate an area of interest by probe count matrix.

Workflow Input

Relevant workflow inputs are described below (required inputs in bold)

Name

Description

Example

Default

fastq_directory

FASTQ directory URL

“gs://foo/bar/fastqs” or “s3://foo/bar/fastqs”

ini

Configuration file in INI format, containing pipeline processing parameters

“gs://foo/bar/config.ini”

output_directory

URL to write results

“gs://foo/bar/out” or “s3://foo/bar/out”

fastq_rename

Optional 2 column TSV file with no header used to map original FASTQ names to FASTQ names that GeoMX recognizes.

“gs://foo/bar/fastq_rename.tsv”

delete_fastq_directory

Whether to delete the input fastqs upon successful completion

true

false

geomxngs_version

Version of the geomx software, currently only “2.3.3.10”.

“2.3.3.10”

“2.3.3.10”

docker_registry

Docker registry to use for this workflow. Options:

  • “quay.io/cumulus” for images on Red Hat registry;

  • “cumulusprod” for backup images on Docker Hub.

“quay.io/cumulus”

“quay.io/cumulus”

backend

Backend for computation. Available options:
  • “gcp” for Google Cloud

  • “aws” for Amazon AWS

  • “local” for local machine

“aws”

“gcp”

zones

Google cloud zones

“us-central1-a”

“us-central1-a us-central1-b us-central1-c us-central1-f”

preemptible

Number of preemptible tries

2

2

memory

Memory string

“64GB”

“64GB”

cpu

Number of CPUs

4

4

disk_space

Disk space in GB

500

500

aws_queue_arn

The arn URI of the AWS job queue to be used. Only works when backend is aws.

“arn:aws:batch:us-east-1:xxx:job-queue/priority-gwf”

“”

Workflow Output

Name

Description

Type

dcc_zip

URL to the output DCC zip file

String

geomxngs_output

URL to the output of geomxngspipeline; the DCC zip file is part of the output here

String


Generate probe count matrix with pathologists’ annotation

The geomxngs_dcc_to_count_matrix workflow generates an area of illumination (AOI) by probe count matrix with patholgoists’ annotation from the output of the geomxngs_fastq_to_dcc workflow and user inputs.

Workflow Input

Workflow inputs are described below (required inputs in bold).

Name

Description

Example

Default

dcc_zip

DCC zip file from geomxngs_fastq_to_dcc workflow output

“gs://foo/bar/out/DCC-20221001.zip”

ini

Configuration file in INI format, containing pipeline processing parameters

“gs://foo/bar/config.ini”

lab_worksheet

A text file containing library setups

“gs://foo/bar/LabWorksheet.txt”

dataset

Data QC and annotation file (Excel) downloaded from instrument after uploading DCC zip file; we only use the first tab (SegmentProperties)

“gs://foo/bar/BioprobeQC.xlsx”

pkc

GeoMx DSP configuration file to associate assay targets with GeoMx HybCode barcodes and Seq Code primers. Options: - CTA_v1.0-4 for Cancer Transcriptome Atlas - COVID-19_v1.0 for COVID-19 Immune Response Atlas - Human_WTA_v1.0 for Human Whole Transcriptome Atlas - Mouse_WTA_v1.0 for Mouse Whole Transcriptome Atlas If your configuration file is not listed, you can provide a URL to a PKC zip file or PKC file instead.

“Human_WTA_v1.0”

output_directory

URL to write results

“gs://foo/bar/out” or “s3://foo/bar/out”

backend

Backend for computation. Available options: - “gcp” for Google Cloud - “aws” for Amazon AWS - “local” for local machine

“aws”

“gcp”

docker_registry

Docker registry to use for this workflow. Options:

  • “quay.io/cumulus” for images on Red Hat registry;

  • “cumulusprod” for backup images on Docker Hub.

“quay.io/cumulus”

“quay.io/cumulus”

docker_version

Docker image version.

“1.0.0”

“1.0.0”

preemptible

Number of preemptible tries

2

2

memory

Memory string

“8GB”

“8GB”

cpu

Number of CPUs

1

1

extra_disk_space

Extra disk space in GB.

5

5

aws_queue_arn

The arn URI of the AWS job queue to be used. Only works when backend is aws

“arn:aws:batch:us-east-1:xxx:job-queue/priority-gwf”

“”

Workflow Output

Name

Description

Type

count_matrix_h5ad

URL to a count matrix in h5ad format. X contains the count matrix, obs contains AOI information, and .var contains probe metadata

String

count_matrix_text

URL to a count matrix in text format. Each row is one probe and each column is one AOI. First column is RTS_ID (Readout Tag Sequence-ID (RTS-ID)). Second column is Gene (if multiple probes map to the same gene, their values are the same). Third columns is Probe (if multiple probes map to the same gene, values are different control_1, control_2). Starting from column 4, we have counts.

String

count_matrix_metadata

URL to a count matrix metadata in text format. All columns from dataset file are included; each row describes one AOI (area of illumination)

String