Output files¶
Markers depth table¶
A markers depth table is a tabulated file (i.e. a tabulated file using “\t” - the “tab” character - as a separator) with a comment line (starting with ‘#’) and a header line. This file can be generated for the entire dataset using the process
command, or for specific subsets of markers using the subset
and signif
commands. The comment line indicates the total number of markers in the table for a table generated with process
; for tables generated with signif
or subset
, the comment line has the following format:
#source:<signif/subset>;<subset parameters if applicable>;min_depth:<value of --min-depth>;signif_threshold:<value of --signif-threshold>;bonferroni:<true/false>
The first column in the table contains marker IDs, and the second column contains marker sequences itself. Each additional column contains the depth of the corresponding marker in a given individual. An example of markers depth table is given below for 4 markers and 5 individuals (sequences were shortened for readability):
#Number of markers : 4
id sequence individual_1 individual_2 individual_3 individual_4 individual_5
0 TGCA..TATT 0 15 24 17 21
1 TGCA..GACC 20 18 3 26 4
2 TGCA..ATCG 2 1 5 16 0
3 TGCA..CCGA 14 29 23 2 19
In this example, the marker “1”” corresponding to the sequence “TGCA..GACC” has a depth of 20 in individual_1 and 4 in individual_5.
Distribution of markers between groups¶
The distribution of markers between groups is a tabulated file (i.e. a tabulated file using “\t” - the “tab” character - as a separator) with a header line. This distribution is generated using the distrib
command.
The first and second columns indicate the number of individuals from the first and second compared groups in which a marker is present. The third column contains the number of markers present in the corresponding number of individuals from the first and second compared groups. The fourth column contains the p-value of a chi-squared test for association with group, and the fifth column contains the corrected p-value (i.e. the p-value multiplied by the total number of markers in the table). The sixth column indicates whether this p-value is significant after Bonferroni correction. The last column contains the bias between groups, defined as:
(Number of individuals from the first group / Total number of individuals from the first group) - (Number of individuals from the second group / Total number of individuals from the second group)
An example of distribution table is given below for 3 from a “Males” group and 3 individuals from a “Females” group:
Males Females Markers P CorrectedP Signif Bias
0 1 7 1 1 False -0.333
0 2 3 0.39 1 False -0.666
0 3 1 0.10 1 False -1.000
1 0 6 1 1 False 0.333
1 1 5 1 1 False 0.000
1 2 1 1 1 False -0.333
1 3 2 0.39 1 False -0.666
2 0 3 0.39 1 False 0.666
2 1 8 1 1 False 0.333
2 2 4 1 1 False 0.000
2 3 2 1 1 False -0.333
3 0 4 0.10 1 False 1.000
3 1 7 0.39 1 False 0.666
3 2 6 1 1 False 0.333
3 3 9 1 1 False 0.000
In this example, there are 68 sequences in total, therefore sequences are significantly associated with sex if the p-value of a chi-squared test on the number of males and females is lower than 0.05 / 68 = 0.00074 (Bonferroni correction).
Fasta file¶
FASTA files are generated by the subset
and signif
commands for a subset of markers using the parameter --output-fasta
.
FASTA headers are generated with the following pattern:
><ID>_<G1>:<G1_C>_<G2>:<G2_C>_p:<P>_pcorr:<P_corrected>_mindepth:<D>
<ID>: marker ID in the markers depth table
<G1>: name of the first compared group
<G1_C>: number of individuals from the first compared group in which the marker is present
<G2>: name of the second compared group
<G2_C>: number of individuals from the second compared group in which the marker is present
<P>: p-value of association with group
<P_corrected>: p-value of association with group corrected with Bonferroni
<D>: minimum depth to consider a marker present in an individual
Example:
>4495827_F:0_M:21_p:1.14577e-07_pcorr:3.64567e-03_mindepth:10
Alignment results¶
Alignment results from the map
command are stored as a tabulated file (i.e. a tabulated file using “\t” - the “tab” character - as a separator) with a header line.
The first and second columns indicate the contig and position on this contig where the markers was aligned, and the third column gives the length of this contig. The fourth column contains the marker ID from the markers depth table. The fifth column contains the bias between groups, as defined in the Distribution of markers between groups section. The sixth and seven column contains the p-value and corrected p-value of a chi-squared test for association with group, and the last column indicates whether the corrected p-value is significant.
An example of alignment results is given below:
Contig Position Length Marker_id Bias P CorrectedP Signif
LG03 18366992 36623554 4335174 -0.202 0.073 1 False
LG05 28289991 33792114 4335919 0 1 1 False
LG05 29738230 33792114 4336169 0.149 0.356 1 False
LG22 71119 28810691 4336631 0.159 0.162 1 False
LG15 20142338 30000224 4336732 0 1 1 False
LG02 26668964 31118443 4337320 0 1 1 False
LG03 4463700 36623554 4337383 -0.033 0.973 1 False
LG13 32240045 33409148 4338936 -0.073 0.704 1 False
LG13 19113343 33409148 4340342 0.064 0.479 1 False
LG22 22503191 28810691 4341087 -0.080 0.704 1 False
LG01 17881236 39973033 8678129 -0.736 1.112e-03 1 True
LG01 16475480 39973033 8888270 -0.705 4.773e-03 1 True
LG01 15761951 39973033 8954765 -0.769 2.629e-04 1 True
LG01 16562550 39973033 8990122 -0.736 1.112e-03 1 True
Distribution of markers in the population¶
The distribution of markers in the population is a tabulated file (i.e. a tabulated file using “\t” - the “tab” character - as a separator) with a header line. This distribution is generated using the freq
command.
The first column indicates the number of individuals in which a marker was present and the second columns give the number of markers present in the corresponding number of individuals.
An example of distribution table is given below for a population with 10 individuals:
Frequency Count
1 10389
2 3869
3 2884
4 1824
5 1672
6 1276
7 1261
8 1278
9 1355
10 1291
Distribution of marker depth in each individual¶
The distribution of marker depth in each individual is a tabulated file (i.e. a tabulated file using “\t” - the “tab” character - as a separator) with a header line. This distribution is generated using the depth
command.
The first and second columns contain the ID and group of each individual. The third column indicates the total number of reads in the individual. The fourth and fifth columns indicate the total number of markers in the individual and the number of markers retained to compute the marker depth statistics (i.e. markers present in at least 75% of individuals). The last four columns give the minimum, maximum, median, and average depth of a retained marker in the individual.
An example of depth distribution table is given below for a population with 10 individuals and two groups (M and F):
Individual Group Reads Markers Retained Min_depth Max_depth Median_depth Average_depth
SRR1519834 M 3929067 669084 72938 0 60604 60 71
SRR1519837 M 6018684 963531 72938 0 48628 44 53
SRR1519830 F 4844480 818700 72938 0 35358 54 72
SRR1519853 M 3462244 502028 72938 0 27276 28 33
SRR1519824 F 3518348 604081 72938 0 23912 21 27
SRR1519819 F 3815684 622309 72938 0 36001 24 32
SRR1519846 M 4731003 758814 72938 0 30307 31 36
SRR1519829 F 6928277 909117 72938 0 64723 45 61
SRR1519812 F 7547724 1165312 72938 0 44358 36 46
SRR1519862 M 5948867 945346 72938 0 64356 69 81