Metadata-Version: 2.4
Name: atol-qc-raw-pacbio
Version: 0.2.2
Summary: python3 wrapper for AToL raw read QC steps (PacBio version)
Author-email: Amy Tims <amy.tims@unimelb.edu.au>, Tom Harrop <tharrop@unimelb.edu.au>
License-Expression: GPL-3.0-or-later
Project-URL: Homepage, https://github.com/amytims/atol-qc-raw-pacbio
Classifier: Development Status :: 3 - Alpha
Classifier: Natural Language :: English
Classifier: Operating System :: POSIX :: Linux
Classifier: Private :: Do Not Upload
Classifier: Programming Language :: Python :: 3.12
Requires-Python: <3.13,>=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: snakemake<10,>=9.11.6
Requires-Dist: pandas<3,>=2.3.3
Dynamic: license-file

# atol-qc-raw-pacbio

Runs QC and produces summary stats on Pacbio HiFi reads
1. Filter .bam file on tag `rq >= 0.99` to remove low-quality reads (`bamtools`)
2. Convert .bam file to .fastq for downstream processing
3. Filter reads (`cutadapt`)
4. Compress output (`pigz`)
5. Output stats and read length distribution plot (`seqkit`)

## Installation: Use the [BioContainer](https://quay.io/repository/biocontainers/atol-qc-raw-pacbio?tab=tags)

*e.g.* with Apptainer/Singularity:

```bash
apptainer exec \
  docker://quay.io/biocontainers/atol-qc-raw-pacbio:0.2.0--pyhdfd78af_0 \
  atol-qc-raw-pacbio  
  
```

## Usage
```bash
atol-qc-raw-pacbio \
    --bam data/reads.bam \
    --out results/filtered_reads.fastq.gz \
    --stats results/stats.json \
    --logs results/logs \
    --match-read-wildcards \
    --revcomp \
    --discard-trimmed
```

### Full Usage
```
atol-qc-raw-pacbio version 0.1.dev1+g8817b02cb.d20260316
usage: atol-qc-raw-pacbio [-h] [-t THREADS] [-m MEM_GB] [-n] --bam BAM [--pacbio_adapters PACBIO_ADAPTERS] [--error-rate ERROR_RATE] [--overlap OVERLAP] [--match-read-wildcards]
                          [--revcomp] [--discard-trimmed] [--min-length MIN_LENGTH] --out READS_OUT --stats STATS [--logs LOGS_DIRECTORY]

options:
  -h, --help            show this help message and exit
  -t THREADS, --threads THREADS
  -m MEM_GB, --mem MEM_GB
                        Intended maximum RAM in GB. NOTE: some stepsdon't allow memory usage to be specified by the user.
  -n                    Dry run

Input:
  --bam BAM             Input .bam file
  --pacbio_adapters PACBIO_ADAPTERS

cutadapt options:
  --error-rate ERROR_RATE
  --overlap OVERLAP
  --match-read-wildcards
  --revcomp
  --discard-trimmed
  --min-length MIN_LENGTH
                        Minimum length read to output. Default is 1, i.e. keep all reads.

Output:
  --out READS_OUT       Combined output in fastq.gz
  --stats STATS         Stats output (json)
  --logs LOGS_DIRECTORY
                        Log output directory. Default: logs are discarded.
```
