Cell Ranger alternatives to generate gene-count matrices for 10X data
This count
workflow generates gene-count matrices from 10X FASTQ data using alternative methods other than Cell Ranger. If your data start from BCL files, please first run BCL Convert to demultiplex flowcells to generate FASTQ files.
Prepare input data and import workflow
1. Import count
Import count workflow to your workspace.
See the Terra documentation for adding a workflow. The count workflow is under
Broad Methods Repository
with name “cumulus/count”.Moreover, in the workflow page, click the
Export to Workspace...
button, and select the workspace to which you want to export count workflow in the drop-down menu.
2. Prepare a sample sheet
2.1 Sample sheet format:
The sample sheet for count workflow should be in TSV format, i.e. columns are seperated by tabs not commas. Please note that the columns in the TSV can be in any order, but that the column names must match the recognized headings.
The sample sheet describes how to identify flowcells and generate channel-specific count matrices.
A brief description of the sample sheet format is listed below (required column headers are shown in bold).
Column
Description
Sample
Contains sample names. Each 10x channel should have a unique sample name.
Flowcells
Indicates the Google bucket URLs of folder(s) holding FASTQ files of this sample.
The sample sheet supports sequencing the same 10x channel across multiple flowcells. If a sample is sequenced across multiple flowcells, simply list all of its flowcells in a comma-seperated way. In the following example, we have 2 samples sequenced in two flowcells.
Example:
Sample Flowcells sample_1 gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4/sample_1_fastqs,gs://fc-e0000000-0000-0000-0000-000000000000/VK10WBC9Z2/sample_1_fastqs sample_2 gs://fc-e0000000-0000-0000-0000-000000000000/VK18WBC6Z4/sample_2_fastqsMoreover, if one flowcell of a sample contains multiple FASTQ files for each read, i.e. sequences from multiple lanes, you should keep your sample sheet as the same, and count workflow will automatically merge lanes altogether for the sample before performing counting.
2.2 Upload your sample sheet to the workspace bucket:
Use gsutil (you already have it if you’ve installed Google cloud SDK) in your unix terminal to upload your sample sheet to workspace bucket.
Example:
gsutil cp /foo/bar/projects/sample_sheet.tsv gs://fc-e0000000-0000-0000-0000-000000000000/
3. Launch analysis
In your workspace, open
count
inWORKFLOWS
tab. Select the desired snapshot version (e.g. latest). SelectProcess single workflow from files
as below![]()
and click
SAVE
button. SelectUse call caching
and clickINPUTS
. Then fill in appropriate values in theAttribute
column. Alternative, you can upload a JSON file to configure input by clickingDrag or click to upload json
.Once INPUTS are appropriated filled, click
RUN ANALYSIS
and then clickLAUNCH
.
Workflow inputs
Below are inputs for count workflow. Notice that required inputs are in bold.
Name |
Description |
Example |
Default |
---|---|---|---|
input_tsv_file |
Input TSV sample sheet describing metadata of each sample. |
“gs://fc-e0000000-0000-0000-0000-000000000000/sample_sheet.tsv” |
|
genome |
Genome reference name. Current support: GRCh38, mm10. |
“GRCh38” |
|
chemistry |
10X genomics’ chemistry name. Current support: “tenX_v3” (for V3 chemistry), “tenX_v2” (for V2 chemistry), “dropseq” (for Drop-Seq). |
“tenX_v3” |
|
output_directory |
GS URL of output directory. |
“gs://fc-e0000000-0000-0000-0000-000000000000/count_result” |
|
run_count |
If you want to run count tools to generate gene-count matrices. |
true |
true |
count_tool |
Count tool to generate result. Options:
|
“StarSolo” |
“StarSolo” |
docker_registry |
Docker registry to use. Notice that docker image for Bustools is seperate.
|
“quay.io/cumulus” |
“quay.io/cumulus” |
config_version |
Version of config docker image to use. This docker is used for parsing the input sample sheet for downstream execution. Available options: |
“0.2” |
“0.2” |
zones |
Google cloud zones to consider for execution. |
“us-east1-d us-west1-a us-west1-b” |
“us-central1-a us-central1-b us-central1-c us-central1-f us-east1-b us-east1-c us-east1-d us-west1-a us-west1-b us-west1-c” |
num_cpu |
Number of CPUs to request for count per channel.
Notice that when use Optimus for count, this input only affects steps of copying files. Optimus uses CPUs due to its own strategy.
|
32 |
32 |
disk_space |
Disk space in GB needed for count per channel.
Notice that when use Optimus for count, this input only affects steps of copying files. Optimus uses disk space due to its own strategy.
|
500 |
500 |
memory |
Memory size in GB needed for count per channel.
Notice that when use Optimus for count, this input only affects steps of copying files. Optimus uses memory size due to its own strategy.
|
120 |
120 |
preemptible |
Number of maximum preemptible tries allowed.
Notice that when use Optimus for count, this input only affects steps of copying files. Optimus uses preemptible tries due to its own strategy.
|
2 |
2 |
merge_fastq_memory |
Memory size in GB needed for merge fastq per channel. |
32 |
32 |
starsolo_star_version |
STAR version to use. Currently only support “2.7.3a”.
This input only works when setting count_tool to
StarSolo . |
“2.7.3a” |
“2.7.3a” |
alevin_version |
Salmon version to use. Currently only support “1.1”.
This input only works when setting count_tool to
Alevin . |
“1.1” |
“1.1” |
bustools_output_loom |
If BUSTools generates gene-count matrices in
loom format.This input only works when setting count_tool to
Bustools . |
false |
false |
bustools_output_h5ad |
If BUSTools generates gene-count matrices in
h5ad format.This input only works when setting count_tool to
Bustools . |
false |
false |
bustools_docker |
Docker image used for Kallisto BUSTools count.
This input only works when setting count_tool to
Bustools . |
“shaleklab/kallisto-bustools” |
“shaleklab/kallisto-bustools” |
bustools_version |
kb version to use. Currently only support “0.24.4”.
This input only works when setting count_tool to
Bustools . |
“0.24.4” |
“0.24.4” |
optimus_output_loom |
If Optimus generates gene-count matrices in
loom format.This input only works when setting count_tool to
Optimus . |
true |
true |
Workflow outputs
See the table below for count workflow outputs.
Name |
Type |
Description |
---|---|---|
output_folder |
String |
Google Bucket URL of output directory. Within it, each folder is for one sample in the input sample sheet. |