============ Data formats ============ .. contents:: Table of Contents :local: TSV === Nextstrain strongly prefers using TSV files for metadata even though Augur commands support other delimiters as inputs. If you are using other formats, we recommend using :doc:`augur curate passthru ` to convert them to TSV. Nextstrain tools and workflows produce `RFC 4180 CSV-like TSVs `__. When using `csvtk `__ * the ``--lazy`` (``-l``) option should not be necessary * the ``fix-quotes``/``del-quotes`` commands should not be necessary When using `tsv-utils `__ * pass the inputs through ``csv2tsv --csv-delim $'\t'`` * pass the final ``tsv-util`` outputs through ``csvtk fix-quotes --tabs`` .. code-block:: bash csv2tsv --csv-delim $'\t' metadata.tsv \ | tsv-select -H -f strain,date \ | tsv-uniq -H -f strain \ | csvtk fix-quotes --tabs > output.tsv If you are writing Python scripts that process TSV files, we recommend using the `csv module `__ for file I/O. .. note:: Be sure to follow `csv module's recommendation `__ to open files with ``newline=''``. Reading a TSV file: .. code-block:: Python with open(input_file, 'r', newline='') as handle: reader = csv.reader(handle, delimiter='\t') for row in reader: ... Writing a TSV file: .. code-block:: Python with open(output_file, 'w', newline='') as output_handle: tsv_writer = csv.writer(output_handle, delimiter='\t') tsv_writer.writerow(header) for record in records: tsv_writer.writerow(record) See our internal `discussion on TSV standardization `__ for more details. .. _data-formats-json: JSON ==== Nextstrain uses a few different kinds of `JSON `__ files at various stages in a typical build. The primary JSON files used by Nextstrain are those consumed by Auspice to display a dataset. Without these **dataset files**, Auspice has nothing to display. These files are typically the final output of a build and produced by the Augur command :doc:`augur export `. They come in two versions: v2 Newer format, with a filename of your choosing like ``${name}.json``. This is often referred to as the **"main" file**. v1 Original format, with filenames like ``${name}_tree.json`` and ``${name}_meta.json``, often referred to as the **"tree" and "meta" files**. Secondary JSON files used by Nextstrain come in two flavors: **sidecar** files and **node data** files. **Sidecar files** are produced by Augur for direct consumption by Auspice, alongside the primary JSON files described above. They come in three types with filenames enforced by convention: .. _data-formats-root-sequence: root-sequence Filenames like ``${name}_root-sequence.json``, produced by ``augur export v2``'s ``--include-root-sequence`` option. tip-frequencies Filenames like ``${name}_tip-frequencies.json``, produced by :doc:`augur frequencies ` with the ``--output-format auspice --output …`` options. measurements Filenames like ``${name}_measurements.json``, produced by one of the :doc:`augur measurements ` subcommands, ``export`` or ``concat``. **Node data** files are typically produced by various Augur commands such as :doc:`augur traits ` or :doc:`augur ancestral ` and are then fed into :doc:`augur export ` to be merged together into a final output for Auspice. Node data files can have any filename you want but some common names are: - ``nt_muts.json`` - ``aa_muts.json`` - ``traits.json`` - ``branch_lengths.json`` - ``${name}_aa-mutation-frequencies.json`` - ``${name}_entropy.json`` - ``${name}_frequencies.json`` - ``${name}_sequences.json`` - ``${name}_titers.json`` Node data files have a :doc:`generic structure ` to allow them to contain all kinds of data about your tree. In advanced builds, custom node data files are often produced by build-specific scripts in addition to the ones produced by Augur commands. For example, our `ncov build `__ produces a custom ``epiweeks.json`` node data file using `this workflow step `__ and `this script `__. Similarly, it's possible for other bioinformatics software to produce compatible dataset JSONs (primary or sidecars) for use by Auspice; they aren't required to be generated by Augur, although that is the most common way. Augur's :doc:`validation command ` can check that dataset JSONs have the required schema. Once you have Nextstrain JSON files, you can visualize and share them in a variety of ways. See :doc:`our guide to sharing your results ` to find a way that meets your needs for privacy and collaboration.