Metadata-Version: 2.4
Name: vvv2_display
Version: 0.2.5.0
Summary: Viral Variant Visualizer 2 display
Home-page: https://github.com/ANSES-Ploufragan/vvv2_display
Author: Fabrice Touzain
Author-email: fabrice.touzain@anses.fr
Project-URL: Bug Reports, https://github.com/ANSES-Ploufragan/vvv2_display/issues
Project-URL: Source, https://github.com/ANSES-Ploufragan/vvv2_display/
Keywords: display,variant,virus,viral
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Topic :: Software Development :: Build Tools
Classifier: License :: OSI Approved :: GPL3
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: python>=3.9
Requires-Dist: r-ggplot2>=3.4.4
Requires-Dist: r-gridextra>=2.3
Requires-Dist: r-cowplot>=1.1.1
Requires-Dist: r-stringr>=1.5.1
Requires-Dist: r-jsonlite>=1.8.8
Requires-Dist: pysam==0.19.1
Requires-Dist: numpy>=1.23.1
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: project-url
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# vvv2_display

# Description

Tools to create:
- a _.png_ image file describing all variants (obtained from vardict-java variant caller) alongside a genome/assembly (to provide) with their proportion (ordinates), with CDS descriptions (obtained from vadr annotator). At the top of the figure can be displayed the coverage depth repartition (if `-o cov_depth_f` option is provided).
- a _.tsv_ file describing all details of significant variants (according to the proportion threshold chosen by the user, default: 7 percents)
- [optional] a _.vcf_ file describing all significant variants (according to the proportion threshold)

Python/R scripts and Galaxy wrapper to use them.

It uses the results of:
- vadr >= 1.4.1 for annotation (of reference/assembly, tested with vadr 1.6.4 too)
- vardict-java 1.8.3 for variant calling (of BAM alignement using reference/assembly and reads)

# Programs

- ```vvv2_display.py```: main script running each step of analyses
This script can be run independently, once __vvv2__ conda environment is installed and activated.
Type ```./vvv2_display.py``` then enter to get help on how to use it.

- ```PYTHON_SCRIPTS/convert_tbl2json.py```: 
Convert ```vadr``` annotation output .tbl file to json

- ```PYTHON_SCRIPTS/convert_vcffile_to_readablefile.py```: 
Convert ```vardict-java``` variant calling vcf file to human readable txt file

- ```PYTHON_SCRIPTS/correct_multicontig_vardict_vcf.py```: 
Correct ```vadr``` annotation output .tbl file for contigs positions when the assembly provided is composed of more than one contig.

<!-- - ```R_SCRIPTS/visualize_coverage_depth.R```: -->
<!-- Create a .png file showing coverage depth alongside the genome, from a bam alignment file. -->

- ```R_SCRIPTS/visualize_snp_v4.R```:
Create a .png file showing on the same png figure:
  - coverage depth repartition alongside the genome/assembly (if `-o cov_depth_d` option provided)
  - variant proportions alongside the genome/assembly and CDS positions.

# Installation

Use conda environment:
```
conda create -n vvv2_display -y
conda activate vvv2_display
mamba/conda install -c bioconda -c conda-forge vvv2_display
```
Prefer mamba installation if completely new conda environments (faster). Do not mix mamba and conda.

Description:
```
vvv2_display.py -h
```

Typical usage:
```bash
vvv2_display.py -p res_vadr_pass.tsv -f res_vadr_fail.tsv -s res_vadr_seqstat.txt -n res_vardict_all.vcf -r res_vvv2_display.png -u res_vvv2_display_snp_summary.tsv -o cov_depth_f.txt -y -w 10 -x res_vvv2_display_snp_summary.vcf
```
where:
  - `res_vadr_pass.tsv` is the 'pass' file of vadr annotation program run on the genome/assembly (__input__)
  - `res_vadr_fail.tsv` is the 'fail' file of vadr annotation program (__input__)
  - `res_vadr_seqstat.txt` is the 'seqstat' file of vadr annotation program (__input__)
  - `res_vardict_all.vcf` is the result of vardict-java variant caller (__input__)
  - `res_vvv2_display.png` is the name of the main output file (will be created) (__main output__)
  - `res_vvv2_display_snp_summary.tsv` is the name of the main output file (will be __always__ created, this option __allow to choose its name__) (__main output__)
  - `cov_depth_f.txt` is the coverage depth by position, provided by `samtools depth` run on the bam alignement file (__optional input__)
  - `-y` tells to display coverage depth in _linear scale_ (default _log10 scale_) (__optional input__)
  - `-w 10` tells to set var significant threshold at _10%_ (default _7%_): graphics display all variants, tsv summary will keep only significant ones (representation higher than this threshold) (__optional input__)
  - `res_vvv2_display_snp_summary.vcf` is the summary of significatn variants in vcf format (__optional output__)

> All other options are for Galaxy wrapper compatibility (these are intermediate temporary files that must appear as parameter for Galaxy wrapper but are not used in a usual command line call)

Minimal usage:
```bash
vvv2_display.py -p res_vadr_pass.tsv -f res_vadr_fail.tsv -s res_vadr_seqstat.txt -n res_vardict_all.vcf -r res_vvv2_display.png [-o cov_depth_f.txt]
```

# Output example

Example is obtained on Turkey Coronavirus sequencing data, with as reference, the first draft assembly.

* png file:

![img/res_vvv2.png](img/res_vvv2.png)

> Dotted vertical dash lines are contig boundaries.


* tsv summary file:
```
indice	position	position_ori	ref	alt	freq	gene	prot	lseq	rseq	isHomo*
1	6388	6388	A	G	0.1429	1a	ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3  putative papain-like protease	GTATGGTCATCAAAATACAT	GTATTGTAGAAATTGTGATG	no
2	6622	6622	A	G	0.0833	1a	ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3  putative papain-like protease	GGAAGCATTGAAATGTGAAC	GAAGAAAGCTGTTTTTCTTA	no
3	6838	6838	A	G	0.1429	1a	ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3  putative papain-like protease	TATAATTTCTGTAGATACTG	AGTTTGTGACATTTTGTCTA	no
4	7014	7014	R	A	0.8824	1a	ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP3  putative papain-like protease	CTGATAAATTAACACCTCGT	TACCGTCATATGGTATAGAC	no
5	7833	7833	G	A	0.0909	1a	ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP4	ATGCACCTGGAGCTTTACCA	ATTGTTTTAATGGTGATAAT	no
6	8110	8110	T	A	0.0833	1a	ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP4	TAGTACATTCTTTACTGGTG	AGAACTTATGTTTAATATGG	no
7	9328	9328	A	G	0.1034	1a	ORF1a,ORF1ab polyprotein [exception ribosomal slippage],NSP5  putative 3C-like proteinase	CCTACATGGTGAGTTCTATG	TGCATTACACACTGGAACGG	no
8	13404	48	A	C	0.1429	intergene	intergene	TTTAGTTGATCTTAGAACGT	GTTAGTGGGAACATCCAATA	no
9	15255	1358	A	T	0.0882	1ab	similar to ORF1ab polyprotein,similar to NSP13:GBSEP:putative helicase	GTTGTCAATACCGTTAGTAT	CTGTGGTAATCATAAACCAA	no
10	15319	1422	C	T	0.0769	1ab	similar to ORF1ab polyprotein,similar to NSP13:GBSEP:putative helicase	AGCGAAAATGTTGATGATTT	TACAGGGCTAATTGTGCTGG	no
11	15326	1429	A	G	0.08	1ab	similar to ORF1ab polyprotein,similar to NSP13:GBSEP:putative helicase	ATGTTGATGATTTTAATCAA	CTAATTGTGCTGGCAGCGAA	no
12	19937	6040	G	A	0.0714	1ab	similar to ORF1ab polyprotein,similar to NSP16:GBSEP:putative 2-O-ribose methyltransferase	AAAATTTATATGACATTGCA	TAACAGAGACAAGTTGGCAC	no
13	21092	7195	T	C	0.0811	S	similar to spike protein	GTTTCTTATGATTATCAGTG	TTACGTGGTGATAACACTGG	no
14	25794	11897	TT	AA	0.0838	5b	5b protein	CTTAACAAAGCAGGACAAGC	AGGATTAGATTGTGTTTACT	no

*NB: an homopolymer region is set to 'yes' if there is a succession of at least 3 identical nucleotides.
     it looks like a restrictive measure, but Ion Torrent and Nanopore sequencing are very bad on such region, so make sure you verify these variants.
```

# Test set

Input data files to test the program are provided in the __test-data__ directory when you clone the repository of vvv2_display program.

Then you can run one of the following command depending on your expected graphical output.

* if you __don't want coverage depth__ graphical display in the picture or __do not have coverage depth informations__ of your sample:
```
vvv2_display.py -p test-data/res2_vadr_pass.tbl -f test-data/res2_vadr_fail.tbl -s test-data/res2_vadr.seqstat -n test-data/res2_vardict.vcf -r test-data/res2_vvv2.png -u test-data/res2_vvv2.tsv
```

* if you __want coverage depth graphical display__ in the picture (log scale)
```
vvv2_display.py -p test-data/res2_vadr_pass.tbl -f test-data/res2_vadr_fail.tbl -s test-data/res2_vadr.seqstat -n test-data/res2_vardict.vcf -o test-data/res2_covdepth.txt -r test-data/res2_vvv2.png -u test-data/res2_vvv2.tsv
```

* if you __want coverage depth graphical display__ in the picture (normal scale)
```
vvv2_display.py -p test-data/res2_vadr_pass.tbl -f test-data/res2_vadr_fail.tbl -s test-data/res2_vadr.seqstat -n test-data/res2_vardict.vcf -o test-data/res2_covdepth.txt -r test-data/res2_vvv2.png -u test-data/res2_vvv2.tsv -y
```

# Citation

Please, if you use __vvv2_display__ and publish results, cite:
- The __article__: Flageul, Alexandre, Edouard Hirchaud, Céline Courtillon, Flora Carnet, Paul Brown, Béatrice Grasland, and Fabrice Touzain. "vvv2_align_SE, vvv2_align_PE / __vvv2_display__: Galaxy-Based Workflows and Tool Designed to Perform, Summarize and Visualize Variant Calling and Annotation in Viral Genome Assemblies". _Viruses_. 2025;17:1385. https://doi.org/10.3390/v17101385.

And for __vardict-java__ and __vadr__, respectively:
- Lai, Zhongwu, Aleksandra Markovets, Miika Ahdesmaki, Brad Chapman, Oliver Hofmann, Robert McEwen, Justin Johnson, Brian Dougherty, J. Carl Barrett, and Jonathan R. Dry. “__VarDict__: A Novel and Versatile Variant Caller for next-Generation Sequencing in Cancer Research.” _Nucleic Acids Research_ 44, no. 11 (June 20, 2016): e108–e108. https://doi.org/10.1093/nar/gkw227.
- Schäffer, Alejandro A., Eneida L. Hatcher, Linda Yankie, Lara Shonkwiler, J. Rodney Brister, Ilene Karsch-Mizrachi, and Eric P. Nawrocki. “__VADR__: Validation and Annotation of Virus Sequence Submissions to GenBank.” _BMC Bioinformatics_ 21, no. 1 (December 2020): 211. https://doi.org/10.1186/s12859-020-3537-3.

# Galaxy wrapper

- ```vvv2_display.xml```:
Allow Galaxy integration of ```vvv2_display.py```. vvv2_display can be used in Galaxy pipelines.
> it can be found in the __Galaxy toolshed__ at https://toolshed.g2.bx.psu.edu/repository

# Related Galaxy workflows on workflowhub

* with bwa-mem2 alignment of __Illumina paired-end__ sequencing data (Mi-seq, Nextseq, Novaseq, Hiseq, Iseq):
https://workflowhub.eu/workflows/1738
* with bwa-mem2 alignment of __Illumina__ or __Proton__ __single-end__ sequencing data:
https://workflowhub.eu/workflows/1739
* with bwa-mem2 alignment of __Nanopore__ sequencing data (MinION, PromethION, GridION):
https://workflowhub.eu/workflows/1740
* with minimap2 alignment of __Pacbio__ sequencing data (high quality long reads):
https://workflowhub.eu/workflows/1741


# Additional informations / data for upstream programs

* Poster of the program accepted in JOBIM 2025 conference in Bordeaux (France, July 2025), can be found here:
  [doi: 10.5281/zenodo.16918391](https://zenodo.org/records/16918392) or accessed using these QRcode (A0 pdf, __2.7 MB__):

  ![QRcode_poster](img/QRcode_poster.png)
  

* Additional vadr database for specific viruses:
  - Porcin Circo Virus: [doi: 10.5281/zenodo.15065124](https://zenodo.org/records/15065124)
