Hello,

I am trying to filter some vcf files from mpileup by using the average base quality of reference bases and the average base quality of the variant bases.

In the vcf file, the I16 category is formatted as:

1 #reference Q13 bases on the forward strand

2 #reference Q13 bases on the reverse strand

3 #non-ref Q13 bases on the forward strand

4 #non-ref Q13 bases on the reverse strand

5 sum of reference base qualities

6 sum of squares of reference base qualities

7 sum of non-ref base qualities

8 sum of squares of non-ref base qualities

9 sum of ref mapping qualities

10 sum of squares of ref mapping qualities

11 sum of non-ref mapping qualities

12 sum of squares of non-ref mapping qualities

13 sum of tail distance for ref bases

14 sum of squares of tail distance for ref bases

15 sum of tail distance for non-ref bases

16 sum of squares of tail distance for non-ref

My problem is that numbers 1-4 are only presenting high quality bases, while numbers 5 and 7 are summing the base quality of all reads, not just the reads included in 1-4.

Is there a way to change how I16 values 5 and 7 are calculated?

Or is there any other settings I can change so that my output will display the average base quality of only my high quality bases, the bases included in I16[1-4]?

This is a great question I would also love the answer to.