Metadata-Version: 2.4
Name: viralrecall
Version: 3.0
Summary: Tool to identify giant viruses integrated into eukaryotic genomes
Author-email: "Abdeali M. Jivaji" <abdeali@vt.edu>, "Frank O. Aylward" <faylward@vt.edu>
License-Expression: MIT
Keywords: giant virus,virus,genomics,viralrecall
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy==2.2.*
Requires-Dist: pyhmmer>=0.11.3
Requires-Dist: pyrodigal-gv==0.3.*
Requires-Dist: pyfaidx==0.9.0
Requires-Dist: pandas==2.2.*
Requires-Dist: progressbar==2.5
Requires-Dist: requests==2.32.*
Requires-Dist: matplotlib==3.10.*
Dynamic: license-file

# Viralrecall v3.0

Written by Abdeali Jivaji, PhD Candidate in Aylward Lab, and Dr. Frank Aylward, Assoc. Professor, Dept. of Biological Sciences, Virginia Tech. Please submit issues in the Github issues or email Abdeali <abdeali@vt.edu> or Dr. Aylward <faylward@vt.edu>

## Introduction

Viralrecall is a python tool to primarily identify Giant Endogenous Viral Elements (GEVEs) integrated in the genome of eukaryotes. The current version is an update on the original tool by Dr. Aylward and uses the same GVOG HMM database to detect signatures of giant viruses. The key motivation for updating Viralrecall was to make it more efficient at processing the larger euykaryotic genomes that are being published with the rise in popularity of long-read sequencing.
We also include a small set of HMMs to detect key Mirusvirus hallmark proteins to aid in the detection of Mirusviruses. However, this feature is still in it's early stages and any detection of Mirusvirus proteins should be independently and manually verified by the user.

## Installation

### Install via conda (Recommended)

The tool is available as a conda package through the bioconda channel which also install all the dependencies. To install it, please run:

``` bash
conda create -n viralrecall -c bioconda viralrecall
```

This will create an environment named viralrecall and install the package with all it's dependencies. You can specify the name of the environment by changing the name after the `-n` flag. If you already have an environment, you can simply run:

``` bash
conda install -c bioconda viralrecall
```

The `-c bioconda` flag specifies the channel to use, i.e. bioconda through which the package is made available.

### Install from source

Source installation may be preferable if you want to obtain the latest version of viralrecall as the conda package may not always be the most up-to-date. You can download Viralrecall v3.0 by running

``` bash

git clone https://github.com/abdealijivaji/ViralRecall_3.0.git

```

To install viralrecall in a conda environment from source, run:

``` bash
cd Viralrecall_3.0
conda env create -n viralrecall -f environment.yaml
conda activate viralrecall
pip install --no-build-isolation --no-deps .

```

This will install a viralrecall and set up the dependencies in a conda environment called `viralrecall`. You can change the environment name by specifying the name of environment with the `-n` flag.

## Database Download

``` bash
viralrecall_database  
```

The `-d` flag can be used to specify the download directory and `-n` can be used to set the directory name.

This will automatically download and set up the database directory.

Or if you want to run it manually, you can do the following steps:

```bash
wget https://zenodo.org/records/17859729/files/hmm.tar.gz
tar -xvzf hmm.tar.gz
```

## Basic Usage

```bash
viralrecall -i < path to input file or directory > -o < output directory > -d < Path to database directory >
```

The input can be a genome file in fasta format or a directory containing genome files fasta format. The tool can recognize if it's a directory and run in batch mode to process all the file in parallel. By default, viralrecall uses all cpu cores available but the number of cores can be specified by the `-c` flag.
