Documentation : Example Filtering Strategies

VariantDB allows filtering according to several strategies. Below are a few example setups.

1. De Novo Trio:

De Novo Pedigree

  1. Case: Two healthy parents, no familial history. Expected model: De novo mutation. Sequencing data from all three indivuals is available.
  2. Strategy: Select variants not present in parents, with effect on protein function
  3. Upload the three VCF (and BAM files to variantDB (see here for instructions)
  4. Set Family relations in the Sample Management section: Assign obht parents to the offspring sample (see image below)
    De Novo Set Relations
  5. At 'Variant Filter', select the correct sample 'Affected Daughter'
  6. Set the following filters:
    1. Filter On Family Information : 'Not Match', 'In Parent', 'Father + Mother', 'As Any Genotype'
    2. Filter On snpEff Information : 'Match', 'Effect Impact', 'High'
      De Novo Filter Settings
  7. Select some relevant annotations
  8. Execute Query

The above approach is a good starting point selecting high impact variants that are not present in the parents. We excluded all variants present in the parents, regardless of the exact genotype (heterozygous/homozygous) in either child or parents. The resulting variants have a severe impact (frameshift/stopgain/...) on at least one transcript as annotated by snpEff. To further restrict your result, you might add some quality thresholds. If no clear candidate remains, adjust your filters. For example: add 'moderate' effect impact from snpEff. Alternatively, remove the snpEff filter and activate filtering on CADD score, for example with a PhredScaled CADD score above 30. This relates to the 0.1% most likely pathogenic SNVs of all SNVs possible in the genome.

2. Autosomal Dominant Family:

Autosomal Dominant Pedigree

  1. Case:Large pedigree with familial history. Expected model: Automsomal Dominant. Sequence data from three affected family members and one unaffected sibling (S2) of the index patient (S1).
  2. Strategy Select variants shared amongst affected members, absent from healthy sibling
  3. Set Family relations. Although Cousin is not a direct option in VariantDB, you can assign S3 as a sibling of S1 without any other consequences
    Relation settings for Autosomal Dominant filtering
  4. At 'Variant Filter', select the correct sample 'S1.Affected.Index'
  5. Set the following filters:
    1. Filter On Family Information : 'Match', 'In Parent', 'S4.Affected.Mother'
    2. Filter On Family Information : 'Match', 'In Sibling', 'S3.Affected.Cousin'
    3. Filter On Family Information : 'Not Match', 'In Sibling', 'S2.Unaffected.sibling'
      Automsomal Dominant Familial Filter Settings
  6. Select Filters with regard to function (eg RefSeq, snpEff, CADD, ...)
  7. Select Annotations
  8. Execute Query

The above approach selects variants shared amongst affected family members and not present in healthy family members. As we expect dominant inheritance, no criteria are set on the genotype, as any genotype (heterozygous or homozygous) causes the phenotype.

3. Autosomal Recessive

Autosomal Recessive Pedigree

  1. Case:Pedigree with familial history. Expected model: Automsomal Recessive. Sequence data from affected index patient (S1), both parents (S3-4) (and unaffected sibling (S2)).
  2. Strategy Select variants homozygous in index, heterozygous in both parents (and not homozygous in sibling)
  3. Set Family relations.
  4. At 'Variant Filter', select the correct sample 'S1.Affected.Index'
  5. Set the following filters:
    1. Filter On Family Information : 'Match', 'In Parent', 'S4.Mother', 'Heterozygous'
    2. Filter On Family Information : 'Match', 'In Sibling', 'S3.Father', 'Heterozygous'
    3. Filter On Family Information : 'Not Match', 'In Sibling', 'S2.Unaffected.sibling', 'Homozgyous'
      Automsomal Recessive Familial Filter Settings
  6. Select Filters with regard to function (eg RefSeq, snpEff, CADD, ...)
  7. Select Annotations
  8. Execute Query

Note the difference with the De novo filtering. For de novo variants, we set that variants in the index patient should not be present in either father or mother. This was combined in a single statement (multi-select box, not match : Translates to Not in Father OR Mother). In contrast, to ensure that the variant is present in both father and mother we need to set two rules, each specifying to match one parent. Furthermore, we specified the genotype to match.

IMPORTANT: Selecting multiple values in a single filtering rule matches if one of the values matches (eg: either non-synomous, or frameshift or stopgain).

4. Linkage Based

  1. Case:Pedigree with familial history. SNP-array based linkage analysis resulted in a single region with high LOD score. Sequencing data from one affected indivual
  2. Strategy Select variants within the linkage region
  3. At 'Variant Filter', select the correct sample
  4. Set the following filters:
    1. Filter On Location : 'Match', 'Chromosome', 'Chromosome of linkage region'
    2. Filter On Location : 'Match', 'Position', 'Bigger than', 'linkage region start coordinate'
    3. Filter On Location : 'Match', 'Position', 'Smaller than', 'linkage region end coordinate'
  5. Select Filters with regard to function (eg RefSeq, snpEff, CADD, ...)
  6. Select Annotations
  7. Execute Query

This filtering method allows to select variants specific to a genomic region. If additional samples from the linkage study are available, results can be refined using the above described family based filters.

5. Complex pedigree or Case/Control study

Complex Pedigree

  1. Case:Large Cohort or family screened. Expected inheritance : complex or incomplete penetrance.
  2. Strategy Select variants with minimal occurence ratio in cases, and maximal occurence ratio in controls
  3. Order your affected in unaffected samples into two seperate projects from the sample management page. In this case:
    1. Project Complex_Pedigree_Affected : Individuals 12, 14, 37, 1, 34 and 38
    2. Project Complex_Pedigree_Unaffected : All other available individuals
  4. At 'Variant Filter', select an affected sample (eg. Indv0001)
  5. Set the following filters:
    1. Filter On Occurence : 'Match', 'Relative Occurence By Project', 'Cases_Project', 'Bigger Than', '0.83'. This allows one false-negative amongst the 6 affected samples.
    2. Filter On Occurence : 'Match', 'Relative Occurence By Prroject', 'Control_Project', 'Smaller Than' '0.1'. This allows healthy family members to carry the mutation. Adjust this value to reflect the degree of penetrance.
  6. Select Filters with regard to function (eg RefSeq, snpEff, CADD, ...)
  7. At 'Filter Logic', further refine the filtering scheme if necessary. As shown in the example below, you might want to bypass quality filters, if a mutation is known in ClinVar to be pahogenic. As an example, the scheme now returns all high quality variants highly represented in affected family members and rare among the healthy siblings. In addition, it will return known pathogenic mutations from ClinVar, and high quality variants with a CADD-score over 40, which are also very likely to be pathogenic, regardless of frequencies in the pedigree.
    Complex Pedigree
  8. Select Annotations
  9. Execute Query

Here we say that 90% of the affected family members should harbour the mutant allele, while at most 10% of unaffected family members may harbour it. Setting the affected-ratio to less than 100% allows for false negative calls, while setting a maximal ratio for unaffected family members allows for incomplete penetrance. This filtering approach is also available using absolute occurence numbers, when the pedigree is too small for frequency measures. In case of large case/control cohorts, a similar approach with adequate frequencies can be applied.