Metadata-Version: 2.4
Name: gfftk
Version: 26.2.12
Summary: GFFtk: genome annotation GFF3 tool kit
Project-URL: Homepage, https://github.com/nextgenusfs/gfftk
Project-URL: Repository, https://github.com/nextgenusfs/gfftk.git
Author-email: Jon Palmer <nextgenusfs@gmail.com>
License: BSD 2-Clause License
        
        Copyright (c) 2016, Jonathan M. Palmer
        All rights reserved.
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        * Redistributions of source code must retain the above copyright notice, this
          list of conditions and the following disclaimer.
        
        * Redistributions in binary form must reproduce the above copyright notice,
          this list of conditions and the following disclaimer in the documentation
          and/or other materials provided with the distribution.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
License-File: LICENSE.md
Keywords: annotation,bioinformatics,completeness,genome
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: Unix
Classifier: Programming Language :: Python
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.7.0
Requires-Dist: gb-io>=0.3.2
Requires-Dist: natsort
Requires-Dist: numpy
Requires-Dist: requests
Description-Content-Type: text/markdown

[![Latest Github release](https://img.shields.io/github/release/nextgenusfs/gfftk.svg)](https://github.com/nextgenusfs/gfftk/releases/latest)
![Conda](https://img.shields.io/conda/dn/bioconda/gfftk)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Tests](https://github.com/nextgenusfs/gfftk/actions/workflows/tests.yml/badge.svg)](https://github.com/nextgenusfs/gfftk/actions/workflows/tests.yml)
[![codecov](https://codecov.io/gh/nextgenusfs/gfftk/branch/master/graph/badge.svg)](https://codecov.io/gh/nextgenusfs/gfftk)

# GFFtk: genome annotation tool kit

GFFtk is a comprehensive toolkit for working with genome annotation files in GFF3, GTF, and TBL formats. It provides powerful conversion, filtering, and manipulation capabilities for genomic data.

## Features

- **Format Conversion**: Convert between GFF3, GTF, TBL, and GenBank formats
- **Combined GFF3+FASTA**: Support for combined files containing both annotations and sequences
- **Sequence Extraction**: Extract protein and transcript sequences from annotations
- **Advanced Filtering**: Filter annotations using flexible regex patterns
- **Consensus Models**: Generate consensus gene models from multiple sources
- **Non-Standard Features**: Support for intron, noncoding_exon, five_prime_UTR_intron, and pseudogenic_exon features
- **File Manipulation**: Sort, sanitize, and rename features in annotation files

## Installation

To install release versions use the pip package manager:
```bash
python -m pip install gfftk
```

To install the most updated code in master you can run:
```bash
python -m pip install git+https://github.com/nextgenusfs/gfftk.git
```

## Quick Start

### Basic Format Conversion
```bash
# Convert GFF3 to GTF
gfftk convert -i input.gff3 -f genome.fasta -o output.gtf

# Extract protein sequences
gfftk convert -i input.gff3 -f genome.fasta -o proteins.faa --output-format proteins
```

### Combined GFF3+FASTA Format
```bash
# Create a combined file from separate GFF3 and FASTA files
gfftk convert -i input.gff3 -f genome.fasta -o combined.gff --output-format combined

# Read a combined file (no separate FASTA file needed)
gfftk convert -i combined.gff -o output.gff3 --output-format gff3
```

### Advanced Filtering
```bash
# Keep only kinase genes
gfftk convert -i input.gff3 -f genome.fasta -o kinases.gff3 --grep product:kinase

# Remove augustus predictions
gfftk convert -i input.gff3 -f genome.fasta -o filtered.gff3 --grepv source:augustus

# Case-insensitive filtering with regex
gfftk convert -i input.gff3 -f genome.fasta -o results.gff3 --grep product:KINASE:i

# Combined filtering
gfftk convert -i input.gff3 -f genome.fasta -o filtered.gff3 \
    --grep product:kinase --grepv source:augustus
```

### Filter Pattern Syntax
- `key:pattern` - Basic string matching
- `key:pattern:i` - Case-insensitive matching
- `key:regex` - Regular expression patterns
- Multiple `--grep` or `--grepv` flags for complex filtering

Common filter keys: `product`, `source`, `name`, `note`, `contig`, `strand`, `type`, `db_xref`, `go_terms`

For more examples and detailed documentation, see the [tutorial](docs/tutorial.rst).

## Development

### Code Formatting

This project uses [pre-commit](https://pre-commit.com/) to ensure code quality and consistency. The pre-commit hooks run Black (code formatter), isort (import sorter), and flake8 (linter).

To set up pre-commit:

1. Install pre-commit:

```bash
pip install pre-commit
```

2. Install the git hooks:

```bash
pre-commit install
```

3. (Optional) Run against all files:

```bash
pre-commit run --all-files
```

After installation, the pre-commit hooks will run automatically on each commit to ensure your code follows the project's style guidelines.
