# Inferring Sequence Traits (like Drug Resistance) Unfortunately this method currently only works with VCF-input files. It will be updated to work with Fasta-input soon! The augur function `sequence-traits` can identify any trait associated with particular nucleotide or amino-acid mutations, but it's often used to identify drug resistance mutations (DRMs). To tell augur which sites confer what trait (or drug resistance), you'll need to pass a file detailing these sites. The file should usually contain five columns: GENE, SITE, ALT, DISPLAY_NAME, and FEATURE. DISPLAY_NAME can be blank, and the GENE column can be omitted if only nucleotide locations are used. ### Amino Acid Sites For example, for drug resistance in TB, we list the gene, the AA position in the gene, the AA mutation that confers resistance (you can list a site multiple times if multiple bases give resistance), and the name of the drug this mutation gives resistance to: ``` GENE SITE ALT DISPLAY_NAME FEATURE gyrB 461 N Fluoroquinolones gyrB 499 D Fluoroquinolones rpoB 432 E Rifampicin rpoB 432 K Rifampicin ``` We can leave DISPLAY_NAME blank, as auspice will by default display the gene, site, and original and alternative base. ### Nucleotide Sites For mutations outside of protein-coding genes, we can specify their position using nucleotides, and specify how we'd like them to be named when displayed: ``` GENE SITE ALT DISPLAY_NAME FEATURE nuc 1472749 A rrs: C904A Streptomycin nuc 1473246 G rrs: A1401G Amikacin Capreomycin Kanamycin nuc 1673423 T fabG1: G-17T Isoniazid Ethionamide nuc 1673425 T fabG1: C-15T Isoniazid Ethionamide ``` In the TB literature, these mutations are still referred to by their position within non-protein-coding genes (`rrs`) or location near genes (`-17 fabG1`), not their nucleotide location. We can ensure auspice displays the more useful common nomenclature by giving entries for the DISPLAY_NAME column. If you are only using nucleotide sites, you can also omit the GENE column: ``` SITE ALT DISPLAY_NAME FEATURE 1472749 A rrs: C904A Streptomycin 1473246 G rrs: A1401G Amikacin Capreomycin Kanamycin ``` ### Both Nucleotide Sites and AA Sites You can also mix sites identified by nucleotide position and those identified by AA position: ``` GENE SITE ALT DISPLAY_NAME FEATURE gyrB 461 N Fluoroquinolones gyrB 499 D Fluoroquinolones rpoB 432 E Rifampicin rpoB 432 K Rifampicin nuc 1472749 A rrs: C904A Streptomycin nuc 1473246 G rrs: A1401G Amikacin Capreomycin Kanamycin nuc 1673423 T fabG1: G-17T Isoniazid Ethionamide nuc 1673425 T fabG1: C-15T Isoniazid Ethionamide ``` ### Options `sequence-traits` will return a value for each "feature" - for example, all the mutations on the tree that lead to resistance to Streptomycin. It will also generate a count either of the total number of "features" each node has (ex: the total number of drugs a sequence is resistant to), or the total number or mutations specified in the file each node has (ex: the total number of DRMs a sequence has, even if some are for the same drug). You can specify a name for this count using the `--label` argument (ex: "Drug_Resistance"). The `--count` argument value specifies whether to count the number of traits (ex: drugs resistant to) (use `traits`) or number of overall mutations (use `mutations`).