Getting started

Installing RADSex

Requirements

  • A C++11 compliant compiler (GCC >= 4.8.1, Clang >= 3.3)

  • The zlib library (usually installed on linux by default)

Installation

There are three ways to install RADSex:

1. Install the latest release

  • Download the latest release from GitHub

  • Unzip the archive

  • Navigate to the RADSex directory

  • Run make

The compiled radsex binary will be located in RADSex/bin/.

2. Install the latest stable development version

To install the latest stable version of RADSex directly from the GitHub repository, run the following commands:

git clone https://github.com/SexGenomicsToolkit/RADSex.git
cd RADSex
make

The compiled radsex binary will be located in RADSex/bin/.

3. Install RADSex with conda

RADSex is available in Bioconda. To install RADSex with Conda, run the following command:

conda install -c bioconda radsex

Update RADSex

To update RADSex, you can download the latest stable release and install it as described in the Installation section.

If you installed RADSex directly from the GitHub repository, update RADSex by running the following commands from the RADSex directory:

git pull
make rebuild

If you installed RADSex with Conda, run:

conda update -c bioconda radsex

Before starting

Before running the pipeline, you should prepare the following files:

  • A set of demultiplexed reads. The current version of RADSex does not implement demultiplexing. Raw sequencing reads can be demultiplexed using Stacks or pyRAD.

  • A group information file (popmap): a tabulated file with individual ID as the first column and group as the second column. It is important that the individual IDs in the popmap are the same as the names of the demultiplexed reads files (see the Group info section).

  • To align markers to a genome: the genome file in fasta format.

Note

When visualizing map results with sgtr, linkage groups / chromosomes are automatically inferred from scaffold names in the reference genome if their name starts with LG, CHR, or NC (case unsensitive). If chromosomes are named differently in the genome, you should prepare a tabulated file with reference contig ID in the first column and corresponding chromosome name in the second column (see the Chromosomes file section).

Running RADSex

Computing the markers depth table

The first step of RADSex is to create a table of marker depths for the entire dataset using the process command:

radsex process --input-dir ./samples --output-file markers_table.tsv --threads 16 --min-depth 1

In this example, demultiplexed reads are located in ./samples and the markers table generated by process will be saved to markers_table.tsv. The parameter --threads specifies the number of threads to use, and --min-depth specifies the minimum depth to consider a marker present in an individual: markers which are not present with depth higher than this value in at least one individual will not be retained in the markers table. It is advised to keep the minimum depth to the default value of 1 for this step, as it can be adjusted for each analysis later.

The resulting file markers_table.tsv is a tabulated file described in the Markers depth table section.

Computing the distribution of markers between groups

The distrib command computes the distribution of markers between groups from a markers depth table:

radsex distrib --markers-table markers_table.tsv --output-file distribution.tsv --popmap popmap.tsv --min-depth 5 --groups M,F``

In this example, --markers-table is the table generated in the Computing the markers depth table section, and the distribution of markers between groups will be saved to distribution.tsv. The group of each individual in the population is given by popmap.tsv (see the Group info section). Groups of individuals to compare (as defined in popmap.tsv) are specified manually with the parameter --groups. The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.

The resulting file distribution.tsv is a table described in the Distribution of markers between groups section.

This distribution can be visualized with the radsex_distrib() function of sgtr, which generates a tile plot of marker counts with number of males on the x-axis and number of females on the y-axis.

Extracting markers significantly associated with sex

Markers significantly associated with sex are obtained with the signif command:

radsex signif --markers-table markers_table.tsv --output-file markers.tsv --popmap popmap.tsv --min-depth 5 --groups M,F

In this example, --markers-table is the table generated in the Computing the markers depth table section, and markers significantly associated with sex are saved to markers.tsv. The sex of each individual in the population is given by popmap.tsv (see the Group info section). Groups of individuals to compare (as defined in popmap.tsv) are specified manually with the parameter --groups. The minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual.

By default, the signif function generates an output file in the same format as the markers depth table. Markers can also be exported to a fasta file using the --output-fasta parameter (see the Fasta file section).

The markers table generated by signif can be visualized with the radsex_markers_depth() function of sgtr, which generates a heatmap showing the depth of each marker in each individual.

Aligning markers to a genome

Markers can be aligned to a genome using the map command:

radsex map --markers-file markers_table.tsv --output-file alignment_results.tsv --popmap popmap.tsv --genome-file genome.fasta --min-quality 20 --min-frequency 0.1 --min-depth 5 --groups M,F

In this example, --markers-file is the markers depth table generated in the Computing the markers depth table step, and the path to the reference genome file is given by --genome-file; results will are saved to alignment_results.tsv. The sex of each individual in the population is given by popmap.tsv (see the Group info section), and the minimum depth to consider a marker present in an individual is set to 5, meaning that markers with depth lower than 5 in an individual will not be considered present in this individual. Groups of individuals to compare (as defined in popmap.tsv) are specified manually with the parameter --groups

The parameter --min-quality specifies the minimum mapping quality (as defined in BWA) to consider a marker properly aligned and is set to 20 in this example. The parameter --min-frequency specifies the minimum frequency of a marker in the population to retain this marker and is set to 0.1 here, meaning that only sequences present in at least 10% of individuals of the population are aligned to the genome.

The resulting file mapping.tsv is a table described in the Alignment results section.

Alignment results from map can be visualized with the radsex_map_circos() function of sgtr, which generates a circular plot showing bias and association with sex for each marker aligned to the genome. The same data can be shown in a manhattan plot with the radsex_map_manhattan() function.

Alignment results for a specific contig can be visualized with the radsex_map_region() function to show the same metrics for a single contig.