Skip to content

bcftools norm inconsistency with allelic depth (AD) when splitting multi-allelic sites #360

@freeseek

Description

@freeseek

I hope this is not something that has been already asked before. Here an example VCF file:

## fileformat=VCFv4.1
## FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
## FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
## contig=<ID=20,length=63025520>
# CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE
20  20202020    .   C   G,T 0   PASS    .   GT:AD   1/2:0,48,61

If I split this file with the command "bcftools norm -m -any" I obtain:

## fileformat=VCFv4.1
## FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
## FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
## contig=<ID=20,length=63025520>
# CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  SAMPLE
20  20202020    .   C   G   0   PASS    .   GT:AD   1/0:0,48
20  20202020    .   C   T   0   PASS    .   GT:AD   0/1:0,61

However now I am in the uncomfortable situation where each site is heterozygous despite the allelic depth supporting "1/1" calls rather than "0/1" calls. I am sure people will have different opinions about this, but part of the reason many want to split multi-allelic sites is to consider each alternate allele as an allele to be interpreted as that allele against every other allele. It would be great to have at least an option to properly re-format the AD field so that the total sum of the AD fields is maintained after splitting, so that instead of splitting:

1/2:AD[0],AD[1],AD[2] -> 1/0:AD[0],AD[1] and 0/1:AD[0],AD[2]

It gets split instead as:

1/2:AD[0],AD[1],AD[2] -> 1/0:AD[0]+AD[2],AD[1] and 0/1:AD[0]+AD[1],AD[2]

I hope this makes sense.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions