现在可以在CageCat Web服务器上安装CBLASTER和CRENKER。
基因簇比较图生成器
熟料是一种容易生成出版物质量基因簇比较图的管道。

给定一组GenBank文件,熟料将自动提取蛋白质翻译,在每个群集中的序列之间执行全局对齐,确定基于群集相似性的最佳显示顺序,并生成可以在以前进行广泛调整的交互式可视化(使用clustermmap.js)被导出为SVG文件。
熟料的设计主要是一种可视化同源生物合成基因簇的组的简单方法,这些基因簇通常是很小的基因组区域,其基因不多(如示例GIF中)。它使用Biopython内置的对齐器在所有输入文件中执行所有基因的成对对齐,然后在浏览器中生成一个交互式SVG文档。对齐阶段的比对阶段的缩放范围很差,可用于多个基因的多个基因组,并且鉴于其将包含多少个SVG元素,所得的可视化也将非常慢。如果您希望将整个基因组保持一致,则使用为此目的构建的工具(例如仙人掌),您可能会更好地为您服务。

熟料可以通过PIP直接安装:
pip install clinker
通过从github克隆源代码:
git clone https://github.com/gamcil/clinker.git
cd clinker
pip install .
或者,通过conda:
conda create -n clinker -c conda-forge -c bioconda clinker-py
conda activate clinker
如果您发现熟料有用,请引用:
clinker & clustermap.js: Automatic generation of gene cluster comparison figures.
Gilchrist, C.L.M., Chooi, Y.-H., 2020.
Bioinformatics. doi: https://doi.org/10.1093/bioinformatics/btab007
运行熟料可以很简单:
clinker clusters/*.gbk
这将在文件夹中的所有GenBank文件中读取,对齐它们,并将对齐方式打印到终端。要生成可视化,请使用-p/--plot参数:
clinker clusters/*.gbk -p <optional: file name to save static HTML>
熟料还可以解析GFF3文件:
clinker cluster1.gff3 cluster2.gff3 -p
注意:必须在同一目录中找到相同名称的相应fasta文件(扩展名为“ .fa”,“ .fsa”,“ .fna”,“ .fasta”或“ cluster1.fa ”)。 cluster1.fa和cluster2.fa 。
请参阅-h/--help有关更多信息:
usage: clinker [-h] [--version] [-r RANGES [RANGES ...]] [-gf GENE_FUNCTIONS] [-na] [-i IDENTITY] [-j JOBS] [-s SESSION] [-ji JSON_INDENT] [-f] [-o OUTPUT] [-p [PLOT]] [-dl DELIMITER] [-dc DECIMALS] [-hl] [-ha] [-mo MATRIX_OUT] [-ufo] [files ...]
clinker: Automatic creation of publication-ready gene cluster comparison figures.
clinker generates gene cluster comparison figures from GenBank files. It performs pairwise local or global alignments between every sequence in every unique pair of clusters and generates interactive, to-scale comparison figures using the clustermap.js library.
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
Input options:
files Gene cluster GenBank files
-r RANGES [RANGES ...], --ranges RANGES [RANGES ...]
Scaffold extraction ranges. If a range is specified, only features within the range will be extracted from the scaffold. Ranges should be formatted like: scaffold:start-end (e.g. scaffold_1:15000-40000)
-gf GENE_FUNCTIONS, --gene_functions GENE_FUNCTIONS
2-column CSV file containing gene functions, used to build gene groups from same function instead of sequence similarity (e.g. GENE_001,PKS-NRPS).
Alignment options:
-na, --no_align Do not align clusters
-i IDENTITY, --identity IDENTITY
Minimum alignment sequence identity [default: 0.3]
-j JOBS, --jobs JOBS Number of alignments to run in parallel (0 to use the number of CPUs) [default: 0]
Output options:
-s SESSION, --session SESSION
Path to clinker session
-ji JSON_INDENT, --json_indent JSON_INDENT
Number of spaces to indent JSON [default: none]
-f, --force Overwrite previous output file
-o OUTPUT, --output OUTPUT
Save alignments to file
-p [PLOT], --plot [PLOT]
Plot cluster alignments using clustermap.js. If a path is given, clinker will generate a portable HTML file at that path. Otherwise, the plot will be served dynamically using Python's HTTP server.
-dl DELIMITER, --delimiter DELIMITER
Character to delimit output by [default: human readable]
-dc DECIMALS, --decimals DECIMALS
Number of decimal places in output [default: 2]
-hl, --hide_link_headers
Hide alignment column headers
-ha, --hide_aln_headers
Hide alignment cluster name headers
-mo MATRIX_OUT, --matrix_out MATRIX_OUT
Save cluster similarity matrix to file
Visualisation options:
-ufo, --use_file_order
Display clusters in order of input files
Example usage
-------------
Align clusters, plot results and print scores to screen:
$ clinker files/*.gbk
Only save gene-gene links when identity is over 50%:
$ clinker files/*.gbk -i 0.5
Save an alignment session for later:
$ clinker files/*.gbk -s session.json
Save alignments to file, in comma-delimited format, with 4 decimal places:
$ clinker files/*.gbk -o alignments.csv -dl "," -dc 4
Generate visualisation:
$ clinker files/*.gbk -p
Save visualisation as a static HTML document:
$ clinker files/*.gbk -p plot.html
Cameron Gilchrist, 2020
默认情况下,熟料会自动为每组同源基因分配名称和颜色。您可以使用-gf/--gene_functions参数预先分配名称(IE函数),该参数采用2列逗号分隔的文件,例如:
GENE_001,Cytochrome P450
GENE_002,Cytochrome P450
GENE_003,Methyltransferase
GENE_004,Methyltransferase
这将产生两组,分别是细胞色素P450(Gene_001和002)和甲基转移酶(Gene_003,Gene_004)。如果发现其他同源基因,它们将自动添加到这些组中。
从熟料v0.0.28开始,您现在可以指定由-gf/--gene_functions参数定义的基因的颜色。为此,请使用-cm/--colour_map参数,该参数还采用一个包含组名称和十六进制颜色代码的2个CSV文件,例如:
Cytochrome P450,#FF0000
Methyltransferase,#0000FF