Metadata-Version: 2.2
Name: cifi
Version: 0.2.3
Summary: CiFi - toolkit for downstream processing of CiFi long reads.
Keywords: bioinformatics,chromatin,chromosome-conformation-capture,cifi,hi-c,pore-c,porec,hic,restriction-enzyme,paired-end,long-reads,pacbio,hifi
Author-Email: Mohamed Abuelanin <mabuelanin@gmail.com>
Maintainer-Email: Mohamed Abuelanin <mabuelanin@gmail.com>
License: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: C++
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Project-URL: Homepage, https://github.com/mr-eyes/cifi-toolkit
Project-URL: Documentation, https://github.com/mr-eyes/cifi-toolkit#readme
Project-URL: Repository, https://github.com/mr-eyes/cifi-toolkit
Project-URL: Issues, https://github.com/mr-eyes/cifi-toolkit/issues
Project-URL: Changelog, https://github.com/mr-eyes/cifi-toolkit/releases
Requires-Python: >=3.9
Requires-Dist: click>=8.0
Requires-Dist: jinja2>=3.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Provides-Extra: pdf
Requires-Dist: weasyprint>=60.0; extra == "pdf"
Provides-Extra: plots
Requires-Dist: matplotlib>=3.5; extra == "plots"
Provides-Extra: all
Requires-Dist: weasyprint>=60.0; extra == "all"
Requires-Dist: matplotlib>=3.5; extra == "all"
Requires-Dist: pytest>=7.0; extra == "all"
Requires-Dist: pytest-cov>=4.0; extra == "all"
Description-Content-Type: text/markdown

# CiFi

Toolkit for downstream processing of CiFi long reads.

https://dennislab.org/cifi

## Install

```bash
pip install cifi
# or
mamba install bioconda::cifi
```

## Commands

| Command | Description |
|---------|-------------|
| `cifi qc` | Sample reads and report enzyme site frequency, fragment sizes, estimated yield |
| `cifi digest` | In-silico digestion → paired-end FASTQ (all pairwise contacts) |
| `cifi filter` | MAPQ-based filtering of aligned paired-end BAM |
| `cifi enzymes` | List built-in restriction enzymes |

### qc

```bash
cifi qc reads.bam -e HindIII -o qc_out
cifi qc reads.bam -e NlaIII -n 50000 -o qc_out    # sample 50k reads
cifi qc reads.bam -e HindIII -n 0 -o qc_out        # all reads
cifi qc reads.bam --site GANTC --cut-pos 1 -o qc_out  # custom site
```

Writes an output directory with HTML report, JSON, TSV tables, distribution plots (PNG), and a multi-page PDF.

### digest

```bash
cifi digest reads.bam -e HindIII -o output
cifi digest reads.fq.gz -e NlaIII -o output -m 5 --gzip
cifi digest reads.bam --site GANTC --cut-pos 1 -o output
```

Produces `{prefix}_R1.fastq` and `{prefix}_R2.fastq` (optionally gzipped), plus an HTML report and JSON stats.

### filter

```bash
cifi filter aligned.bam -o filtered.bam -q 30
```

Keeps properly paired reads where both mates meet the MAPQ threshold.

## Enzymes

Built-in enzymes:

| 4-cutters | 6-cutters |
|-----------|-----------|
| NlaIII (CATG) | HindIII (AAGCTT) |
| DpnII (GATC) | |
| MboI (GATC) | |
| Sau3AI (GATC) | |

Any recognition site can be specified with `--site` and `--cut-pos`, including IUPAC degenerate bases (N, R, Y, W, S, M, K, B, D, H, V).

## How it works

CiFi reads are concatemers of restriction fragments from genomic regions in 3D proximity. The toolkit finds all enzyme cut sites in each read, extracts fragments, and generates every pairwise combination as a pseudo paired-end read:

```
Read with 4 fragments: [A]-[B]-[C]-[D]
Pairs: A-B, A-C, A-D, B-C, B-D, C-D  (6 pairs)
```

## Citation

Coming soon.
