Output

 

resquiggle

image

f5c resquiggle aligns raw signals to basecalled reads. f5c resquiggle output is explained below.

resquiggle TSV output

The default output is an intuitive TSV format with the following columns.

Col Type Name Description
1 string read_id Read identifier name
2 int kmer_idx k-mer index on the basecalled read (0-based)
3 int start_raw_idx Raw signal start index for the corresponding k-mer (0-based; BED-like; closed)
4 int end_raw_idx Raw signal end index for the corresponding k-mer (0-based; BED-like; open)

If a corresponding base has no corresponding signal samples, a . will be printed.

resquiggle PAF output

The above TSV output is bulky. Specifying -c will produce a condensed output in PAF-like format (inspired by UNCALLED) with the following columns: The query is the raw-signal and the target is the basecalled-read.

Col Type Name Description
1 string read_id Read identifier name
2 int len_raw_signal Raw signal length (number of samples)
3 int start_raw Raw signal start index (0-based; BED-like; closed)
4 int end_raw Raw signal end index (0-based; BED-like; open)
5 char strand Relative strand: “+” or “-“
6 string read_id Same as column 1
7 int len_kmer base-called sequence length (no. of k-mers)
8 int start_kmer k-mer start index on basecalled sequence (0-based; see note below)
9 int end_kmer k-mer end index on basecalled sequence (0-based; see note below)
10 int matches Number of k-mers matched on basecalled sequence
11 int len_block Same as column 7
12 int mapq Mapping quality (0-255; 255 for missing)

For DNA reads, column 8 is a closed coordinate (inclusive) and column 9 is an open coordinate (not-inclusive); and, column 8 coordinate is always smaller than column 9. However, this is different in direct-RNA reads because sequencing of direct-RNA happens in the reverse direction (3’->5’) and therefore the raw signal is also in the reverse direction. The basecalled read output by basecallers are however in the correct direction (5’->3’). Thus, For RNA reads, column 8 coordinate will be larger than that on column 9. column 9 is a closed coordinate (inclusive) while column 8 is an open coordinate (not-inclusive) in this case, in contrary to DNA.

10,11 and 12 are yet to be decided. column 12 is always 255 for now.

Following optional tags are present, which are subsequently described in detail after the table:

Tag Type Description
sc f Post alignment recalibrated scale parameter
sh f Post alignment recalibrated shift parameter
ss Z signal alignment string in format

The sh and sc tag values can be used to scale the raw signal to the pore model. This can be done as: scaled pA current values = (pA - sh) / sc where pA = (raw_signal + offset) * range / digitisation)).

ss tag

ss string is a custom encoding that compacts the signal-base alignment. It can be thought of as an extended version CIGAR string that accommodates for signal alignment needs. This ss string was inspired by the –sam option in Nanopolish for eventalign.

Consider the example 8,5,4,8I4,3D4,5, for DNA. This means 8 signal samples map to the starting base of the sequence; the next 5 samples to the next base, the next 4 samples to the next base; 8 next samples are missing a mapping in the basecalled read (insertion to reference); 4 samples map to the next base; 3 bases in the basecalled read have no corresponding signal samples (deletion); 4 samples map to the next base; and 5 samples map to the next base. Note that the start indexes of read and signal are the absolute values in columns 8 and 3 above. the ss string is relative to this.

The ,, D and I can be though of as three different operations. The , after a number means step one base in the basecall reference, while step number of samples preceding the , in the raw signal. The D after a number means step number of bases preceding the D in the basecalled read and no stepping in the raw signal. The I after a number means step number of samples preceding the I in the raw signal and no stepping in the basecalled read.

DNA Example

To make things further clear, given below is an illustration for an alignment of DNA raw-signal to a basecalled read.

image

Each dot in the top sequence in the illustration represent a k-mer and the number above each dot is the corresponding k-mer index. If the basecalled read is ACGGTAACTATAC and assuming the k-mer size in the k-mer model is 6, the 0th k-mer is ACGGTA, 1st k-mer is CGGTAA … and the 7th k-mer is CTATAC. Each dot in the bottom sequence in the illustration represent a raw-signal sample and the number below each dot is the corresponding signal index.

The TSV output from resquiggle will look like below:

read_id kmer_idx start_raw_idx end_raw_idx
rid0 0 2 4
rid0 1 4 7
rid0 2 7 9
rid0 3 11 12
rid0 4 . .
rid0 5 . .
rid0 6 12 14
rid0 7 14 17

The PAF output from resquiggle will look like below (the header is not present in the actual output):

read_id len_raw_signal start_raw end_raw strand read_id len_kmer start_kmer end_kmer matches len_block mapq  
rid0 20 2 17 + rid0 8 0 8 6 8 255 ss:Z:2,3,2,2I1,2D2,3,

direct-RNA Example

Now see the illustration below for direct-RNA.

image

Note that the RNA is sequenced 3’->5’ end, so the raw signal is 3’->5’ direction. As the basecaller outputs the basecalled read in 5’->3’ direction, the basecalled read is reversed to be 3’->5’ in the illustration (note: indices in illustration denote the actual index in the basecalled read in 5’->3’ direction). If the basecalled read in 5’->3’ direction is ACGGUAACUAUA and assuming the k-mer size in the k-mer model is 5, the 0th k-mer is ACGGU, 1st k-mer is CGGUA … and the 7th k-mer is CUAUA.

The TSV output will look like below:

read_id kmer_idx start_raw_idx end_raw_idx
rid0 7 2 4
rid0 6 4 7
rid0 5 7 9
rid0 4 11 12
rid0 3 . .
rid0 2 . .
rid0 1 12 14
rid0 0 14 17

The PAF output will look like below:

read_id len_raw_signal start_raw end_raw strand read_id len_kmer start_kmer end_kmer matches len_block mapq  
rid0 20 2 17 + rid0 8 8 0 6 8 255 ss:Z:2,3,2,2I1,2D2,3,

C code snippet to parse ss tag

A C code snippet that converts the value in the ss tag (a readable code which is not optimised) is given below:

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

#define MAX_LEN_KMER 20000

int main(){

    //paf fields
    char *ss="2,3,2,2I1,2D2,3,";
    int start_raw=2; int end_raw=17; int len_raw_signal=20;
    int start_kmer=0; int end_kmer=8; int len_kmer=8;

    // Raw signal start index for the corresponding k-mer and Raw signal end index for the corresponding k-mer
    int st_raw_idx[MAX_LEN_KMER]; int end_raw_idx[MAX_LEN_KMER];

    //intialise to -1
    for(int i=0; i<MAX_LEN_KMER; i++){ st_raw_idx[i]=end_raw_idx[i]=-1; }

    int st_k = start_kmer; int end_k = end_kmer; //if DNA, start k-kmer index is start_kmer column in paf and end k-kmer index is end_kmer column in paf
    int8_t rna = start_kmer > end_kmer ? 1 : 0; //if RNA start_kmer>end_kmer in paf
    if(rna){ st_k = end_kmer; end_k = start_kmer; } //if RNA, start k-kmer index is end_kmer column in paf and end k-kmer index is start_kmer column in paf

    int i_k = st_k; int i_raw = start_raw; //current k-mer index and current raw signal index

    //buffer for storing digits preceding each operation and its index
    char buff[11]; int i_buff=0;

    while(*ss){

        if(*ss==',' || *ss=='I' || *ss=='D'){
            if(i_buff <= 0){ fprintf(stderr,"Bad ss: Preceding digit missing\n"); exit(1); }//if nothing in buff

            buff[i_buff]=0; //null terminate buff
            int num = atoi(buff);
            if(num < 0){ fprintf(stderr,"Bad ss: Cannot have negative numbers\n"); exit(1); }
            i_buff=0; buff[0]=0; //reset buff

            if(*ss=='I'){ //if an insertion, current raw signal index is incremented by num
                i_raw += num;
            } else if(*ss=='D'){ //if an deletion, current k-mer index is incremented by num
                i_k += num;
            } else if (*ss==','){ //if a mapping, increment accordingly and set raw signal indices for the current k-mer
                end_raw_idx[i_k] = i_raw; i_raw += num;
                st_raw_idx[i_k] = i_raw; i_k++;
            }
        } else {
            if(!isdigit(*ss)){ fprintf(stderr,"Bad ss: A non-digit found when expected a digit\n"); exit(1); }
            buff[i_buff++]=*ss;
        }
        ss++;
    }

    if(i_raw!=end_raw){ fprintf(stderr,"Bad ss: Signal end mismatch\n"); exit(1); } //current raw signal index should be equal to end_raw
    if(i_k!=end_k){ fprintf(stderr,"Bad ss: Kmer end mismatch\n"); exit(1); } //current k-mer index should be equal to end_k

    for(int i=st_k; i<end_k; i++){
        if(end_raw_idx[i]==-1){
            if(st_raw_idx[i] != -1) { fprintf(stderr,"Bad ss: This shoud not have happened\n"); exit(1); }//if st_raw_idx[i] is -1, then end_raw_idx[i] should also be -1
            printf("%d\t.\t.\n", rna ? len_kmer-i-1 : i);
        }else {
            printf("%d\t%d\t%d\n", rna ? len_kmer-i-1 : i, end_raw_idx[i], st_raw_idx[i]);
        }
    }
}

Note that the beginning of the alignment could be crude due to the adaptor and barcode in the raw signal. In future versions, an option for this trimming may be introduced. Until then, if there is a huge step in the alignment in the beginning, that part could be trimmed by the user.

eventalign

eventalign TSV output

The default TSV output is as below:

Col Type Name Description
1 string contig Contig on the reference which the read maps to
2 int position Start index on the contig (0-based; BED-like; closed)
3 string reference_kmer The k-mer on the reference
4 int read_index Index in the BAM file for the coresponding read (0-based)
5 char strand Legacy, ignore
6 int event_index Index of the event on the event table (0-based; BED-like; closed)
7 float event_level_mean Mean level of the current values of the event
8 float event_stdv Standard deviation of the current values of the event
9 float event_length Length of the event (in seconds)
10 int model_kmer The k-mer on the pore-model which this event matched
11 int model_mean Scaled mean level on the pore-model for the matched k-mer (scaling.scale * level_mean + scaling.shift) where level_mean in the first column in the pore-model
12 int model_stdv Scaled standard deviation on the pore-model for the matched k-mer (level_stdv * scaling.var) where level_stdv in the second column in the pore-model
13 int standardized_level (event_level_mean - model_mean) / (sqrt(scalings.var) * model_stdv)

Command line options can be used to modify default columns or print additional columns. --samples will print two additional columns, namely start_idx and end_idx.

Following optional columns are controlled via command line options:

Command_line_Option Type Name Description
--print-read-name string read_name Column 4 will become read_name that prints the read ID
--signal-index int start_idx start_idx is the starting index on the raw signal which the corresponding k-mer maps to (0-based; BED-like; closed)
--signal-index int end_idx end_idx is the ending index on the raw signal which the corresponding k-mer maps to (0-based; BED-like; open)
--scale-events float event_level_mean Intead of scaling the model to the events, now events will be scaled to the model. column 7 becomes (event_level_mean-scaling.shift)/scaling.scale
--scale-events float model_mean Intead of scaling the model to the events, now events will be scaled to the model. column 11 becomes level_mean (level_mean is the first column in the pore model)
--scale-events float model_stdv Intead of scaling the model to the events, now events will be scaled to the model. column 12 becomes level_stdv (level_stdv is the second column in the pore model)
--samples float,float,… samples Prints the comma separated signal samples corresponding to the mapped k-mer (scaled pA current values). scaled pA current values = (pA - scaling.shift) / scaling.scale where pA = (raw_signal + offset) * range / digitisation

eventalign PAF output

Specifying -c will generate the output in a PAF-like format (much smaller than the TSV output, f5c v1.3 or higher). This is similar to the PAF output format in f5c resquiggle with major difference being that the “basecalled read” in resquiggle is now the “refererence sequence” in eventalign. Assumming that the reader is well familiarised with the PAF output explained in resquiggle, that information is not repeated here, instead only a summary is given. Unlike in resquiggle, the strand column (column 5) can be now both ‘+’ and ‘-‘. The query is the raw-signal and the target is the reference.

Col Type Name Description
1 string read_id Read identifier name
2 int len_raw_signal Raw signal length (number of samples)
3 int start_raw Raw signal start index (0-based; BED-like; closed)
4 int end_raw Raw signal end index (0-based; BED-like; open)
5 char strand Relative strand: “+” or “-“
6 string read_id Reference sequence name
7 int len_kmer Reference sequence length (no. of k-mers)
8 int start_kmer k-mer start index on reference sequence (0-based; see note below)
9 int end_kmer k-mer end index on sequence sequence (0-based; see note below)
10 int matches Number of k-mers matched on reference sequence
11 int len_block Number of k-mers on the mapped segment on reference sequence
12 int mapq Mapping quality (0-255; 255 for missing)

For DNA reads, column 8 is a closed coordinate (inclusive) and column 9 is an open coordinate (not-inclusive); and, column 8 coordinate is always smaller than column 9. However, this is different in direct-RNA reads because sequencing of direct-RNA happens in the reverse direction (3’->5’) and therefore the raw signal is also in the reverse direction. The sequences in the reference transcriptome are however in the correct direction (5’->3’). Thus, For RNA reads, column 8 coordinate will be larger than that on column 9. column 9 is a closed coordinate (inclusive) while column 8 is an open coordinate (not-inclusive) in this case, in contrary to DNA.

10,11 and 12 are yet to be decided. column 12 is always 255 for now.

Following optional tags are present, which are subsequently described in detail after the table:

Tag Type Description
sc f Post alignment recalibrated scale parameter
sh f Post alignment recalibrated shift parameter
ss Z signal alignment string in format described

The sh and sc tag values can be used to scale the raw signal to the pore model. This can be done as: scaled pA current values = (pA - sh) / sc where pA = (raw_signal + offset) * range / digitisation)). Please refer to the explanations in f5c resquiggle about the ss tag. Some examples are given below.

DNA examples

Positive strand

Assume we have a read signal named rid0 of 1000 signal samples, mapped to a reference contig named ctg0 of 35 bases Assume a k-mer size of 6. We have a total of 30 k-mers in the reference. Assume the signal-reference alignment looks like in teh figure below. Assume that the 12-24th bases (0-index; bed-like) inthis contig are TTGATGGTGGAA. Thus, 12th kmer is TTGATG, 13th k-mer is TGATGG, 14th k-mer is GATGGT, .. and the 18th k-mer is GTGGAA.

image

The TSV output from resquiggle will look like below (assume --print-read-name and --signal-index are provided):

contig position reference_kmer read_id strand event_index event_level_mean event_stdv event_length model_kmer model_mean model_stdv standardized_level start_idx end_idx
ctg0 12 TTGATG rid0 t 50 77.39 1.226 0.00150 TTGATG 68.19 3.20 2.64 265 271
ctg0 12 TTGATG rid0 t 51 85.51 1.016 0.00125 NNNNNN 0.00 0.00 inf 271 276
ctg0 12 TTGATG rid0 t 52 87.75 1.029 0.00125 NNNNNN 0.00 0.00 inf 276 281
ctg0 14 GATGGT rid0 t 53 94.90 1.399 0.00100 GATGGT 99.32 3.48 -1.17 281 285
ctg0 14 GATGGT rid0 t 54 91.33 1.452 0.00075 GATGGT 99.32 3.48 -2.11 285 288
ctg0 14 GATGGT rid0 t 55 93.10 0.863 0.00150 GATGGT 99.32 3.48 -1.64 288 294
ctg0 15 ATGGTG rid0 t 56 76.79 1.980 0.00075 ATGGTG 76.10 4.06 0.16 294 297
ctg0 15 ATGGTG rid0 t 57 71.11 1.023 0.00100 ATGGTG 76.10 4.06 -1.13 297 301
ctg0 16 TGGTGG rid0 t 58 63.34 3.575 0.00125 TGGTGG 64.60 2.53 -0.46 301 306
ctg0 17 GGTGGA rid0 t 59 94.76 2.090 0.00175 GGTGGA 94.13 3.39 0.17 306 313
ctg0 18 GTGGAA rid0 t 60 78.95 1.436 0.00100 GTGGAA 80.55 3.83 -0.38 313 317

The PAF output from eventalign will look like below (the header is not present in the actual output):

read_id len_raw_signal start_raw end_raw strand ref_id len_kmer start_kmer end_kmer matches len_block mapq  
rid0 1000 265 317 + ctg0 30 12 19 6 7 255 ss:Z:16,1D13,7,5,7,4,

Negative strand

Assume we have a read signal named rid1 of 1000 signal samples, mapped to a reference contig named ctg0 of 35 bases Assume a k-mer size of 6. We have a total of 30 k-mers in the reference. Assume the signal-reference alignment looks like in the figure below (note: indices in illustration denote the actual index in the + strand of the reference genome in 5’->3’ direction).

image

Assume that the 11-24th bases (0-index; bed-like) in this contig are ATTGATGGTGGAA. Thus, 11th kmer is ATTGAT, 12th k-mer is TTGATG, 13th k-mer is TGATGG, .. 17th k-mer is GGTGGA and the 18th k-mer is GTGGAA.

Th negative strand is like:

5' ATTGATGGTGGAA 3' + strand
   |||||||||||||
3' TAACTACCACCTT 5' - strand

The Reverse complement is thus TTCCACCATCAAT. The 11th k-mer ATTGAT in the + strand relates to ATCAAT in the - strand, 12th k-mer TTGATG relates to CATCAA, 13th k-mer TGATGG relates to CCATC ,… , 17th k-mer GGTGGA relates to TCCACC and 18th k-mer GTGGAA relates to TTCCAC.

The TSV output from resquiggle will look like below (assume --print-read-name and --signal-index are provided):

contig position reference_kmer read_id strand event_index event_level_mean event_stdv event_length model_kmer model_mean model_stdv standardized_level start_idx end_idx
ctg0 11 ATTGAT rid1 t 216 80.09 0.969 0.00150 NNNNNN 0.00 0.00 inf 723 729
ctg0 13 TGATGG rid1 t 215 78.14 0.577 0.00100 CCATCA 75.83 2.68 0.79 719 723
ctg0 13 TGATGG rid1 t 214 74.39 1.898 0.00075 CCATCA 75.83 2.68 -0.49 716 719
ctg0 14 GATGGT rid1 t 213 86.91 2.630 0.00200 ACCATC 86.41 2.16 0.21 708 716
ctg0 15 ATGGTG rid1 t 212 93.88 1.184 0.00150 CACCAT 97.64 2.26 -1.52 702 708
ctg0 15 ATGGTG rid1 t 211 99.83 1.552 0.00125 CACCAT 97.64 2.26 0.88 697 702
ctg0 15 ATGGTG rid1 t 210 98.60 0.511 0.00075 CACCAT 97.64 2.26 0.39 694 697
ctg0 15 ATGGTG rid1 t 209 99.54 1.254 0.00225 CACCAT 97.64 2.26 0.77 685 694
ctg0 15 ATGGTG rid1 t 208 97.82 1.226 0.00075 CACCAT 97.64 2.26 0.07 682 685
ctg0 15 ATGGTG rid1 t 207 99.46 0.562 0.00100 CACCAT 97.64 2.26 0.73 678 682
ctg0 15 ATGGTG rid1 t 206 98.85 0.916 0.00125 CACCAT 97.64 2.26 0.49 673 678
ctg0 16 TGGTGG rid1 t 205 86.25 1.327 0.00100 CCACCA 85.37 1.55 0.52 669 673
ctg0 17 GGTGGA rid1 t 204 86.88 1.283 0.00075 TCCACC 87.17 1.76 -0.15 666 669
ctg0 17 GGTGGA rid1 t 203 85.62 0.589 0.00075 TCCACC 87.17 1.76 -0.81 663 666
ctg0 18 GTGGAA rid1 t 202 90.79 1.335 0.00075 TTCCAC 91.76 1.96 -0.45 660 663

The PAF output from eventalign will look like below (the header is not present in the actual output):

read_id len_raw_signal start_raw end_raw strand ref_id len_kmer start_kmer end_kmer matches len_block mapq  
rid1 1000 660 729 - ctg0 30 11 19 7 8 255 ss:Z:3,6,4,35,8,7,1D6,

RNA examples

Positive strand

Assume we have a read signal named rid0 of 3000 signal samples, mapped to a reference transcript (or can be a ctg in the reference genome) named trn0 of 65 bases. Assume a k-mer size of 5. We have a total of 61 k-mers in the reference.

Assume the signal-reference alignment looks like in the figure below. Note that the RNA is sequenced 3’->5’ end, so the raw signal is 3’->5’ direction. However, as transcipts in the reference are in 5’->3’ direction, the transcript is reversed to be 3’->5’ in the illustration (note: indices in illustration denote the actual index in the transcriptd in 5’->3’ direction).

Assume that the 45-56th bases (0-index; bed-like) in this transcript in 5’->3’ direction is GAGAGCCCTGA. Then, 45th kmer is GAGAG, 46th k-mer is AGAGC, 47th k-mer is GAGCC, .. and the 51st k-mer is CCTGA.

image

The TSV output from resquiggle will look like below (assume --print-read-name and --signal-index are provided):

contig position reference_kmer read_id strand event_index event_level_mean event_stdv event_length model_kmer model_mean model_stdv standardized_level start_idx end_idx
trn0 45 GAGAG rid0 t 62 139.24 11.306 0.02390 GAGAG 140.16 10.24 -0.07 2199 2271
trn0 46 AGAGC rid0 t 63 141.04 11.203 0.01926 AGAGC 144.32 9.93 -0.25 2141 2199
trn0 47 GAGCC rid0 t 64 117.06 5.038 0.01029 GAGCC 118.37 5.26 -0.19 2110 2141
trn0 47 GAGCC rid0 t 65 114.03 1.993 0.00365 GAGCC 118.37 5.26 -0.62 2099 2110
trn0 47 GAGCC rid0 t 66 116.30 3.843 0.00398 GAGCC 118.37 5.26 -0.30 2087 2099
trn0 48 AGCCC rid0 t 67 132.00 1.294 0.00697 AGCCC 126.60 5.74 0.71 2066 2087
trn0 48 AGCCC rid0 t 68 93.27 13.847 0.00863 NNNNN 0.00 0.00 inf 2040 2066
trn0 50 CCCTG rid0 t 69 80.59 2.676 0.00498 CCCTG 86.42 4.10 -1.08 2025 2040
trn0 51 CCTGA rid0 t 70 100.31 2.563 0.00398 CCTGA 109.18 4.97 -1.35 2013 2025
trn0 51 CCTGA rid0 t 71 116.03 2.440 0.00631 CCTGA 109.18 4.97 1.04 1994 2013
trn0 51 CCTGA rid0 t 72 101.67 1.433 0.00332 CCTGA 109.18 4.97 -1.14 1984 1994

The PAF output from eventalign will look like below (the header is not present in the actual output):

read_id len_raw_signal start_raw end_raw strand ref_id len_kmer start_kmer end_kmer matches len_block mapq  
rid0 3000 1984 2271 + trn0 61 52 45 6 7 255 ss:Z:41,15,1D47,54,58,72,

Note that start_kmer and end_kmer are otherway round compared to DNA.

Negative strand

Assume we have a read signal named rid1 of 500 signal samples, mapped to a reference contig named ctg1 of 20 bases. Assume a k-mer size of 5. We have a total of 24 k-mers in the reference. Assume the signal-reference alignment looks like in the figure below (note: indices in illustration denote the actual index in the + strand of the reference genome in 5’->3’ direction).

image

Assume that the 4-13th bases (0-index; bed-like) in this contig are AAATGGCTGA. Thus, 4th kmer is AAATG, 5th k-mer is AATGG, .. 8th k-mer is GGCTG and the 9th k-mer is GCTGA.

Th negative strand is like:

5' AAATGGCTGA 3' + strand
   ||||||||||
3' TTTACCGACT 5' - strand

The Reverse complement is thus TCAGCCATTT. The 4th k-mer AAATG in the + strand relates to CATTT in the - strand, 5th k-mer AATGG relates to CCATT ,… , 8th k-mer GGCTG relates to CAGCC and 9th k-mer GCTGA relates to TCAGC.

The TSV output from resquiggle will look like below (assume --print-read-name and --signal-index are provided):

contig position reference_kmer read_id strand event_index event_level_mean event_stdv event_length model_kmer model_mean model_stdv standardized_level start_idx end_idx
ctg1 4 AAATG rid1 t 14 100.70 2.678 0.00664 CATTT 100.40 6.94 0.04 61 81
ctg1 5 AATGG rid1 t 13 96.55 4.060 0.00432 CCATT 98.42 2.68 -0.62 81 94
ctg1 6 ATGGC rid1 t 12 83.36 1.416 0.00332 GCCAT 84.06 2.64 -0.24 94 104
ctg1 6 ATGGC rid1 t 11 87.00 6.172 0.00299 GCCAT 84.06 2.64 0.99 104 113
ctg1 7 TGGCT rid1 t 10 121.45 4.970 0.00598 AGCCA 121.42 4.13 0.01 113 131
ctg1 7 TGGCT rid1 t 9 128.10 1.094 0.00299 AGCCA 121.42 4.13 1.44 131 140
ctg1 7 TGGCT rid1 t 8 124.88 4.039 0.00631 AGCCA 121.42 4.13 0.75 140 159
ctg1 8 GGCTG rid1 t 7 115.80 2.588 0.00465 CAGCC 115.81 3.78 -0.00 159 173
ctg1 8 GGCTG rid1 t 6 111.04 2.482 0.00332 CAGCC 115.81 3.78 -1.13 173 183
ctg1 9 GCTGA rid1 t 5 99.95 2.843 0.00498 TCAGC 101.66 6.39 -0.24 183 198

The PAF output from eventalign will look like below (the header is not present in the actual output):

read_id len_raw_signal start_raw end_raw strand ref_id len_kmer start_kmer end_kmer matches len_block mapq  
rid1 500 61 198 - ctg1 24 10 4 6 6 255 ss:Z:20,13,19,46,24,15,

eventalign SAM output

Specifying -a will generate the output in SAM format (much smaller than the TSV output, f5c v1.3 or higher). This output SAM file will contain the following two additional tags added to the S/BAM file given as the input to f5c:

Tag Type Description
si Z coordinates associated with the ss tag below (explained below)
ss Z signal alignment string in format described under here

si tag contains four comma separated values start_raw, end_raw, start_kmer and end_kmer, respectively. Those values have the same as the columns 3,4,8 and 9 in the PAF format explained above when specified along with -c.

call-methylation

Same as Nanopolish output.