BioPal (v0.3) is a bioinformatics toolkit designed to process FASTA sequence files. The tool provides several functionalities such as splitting FASTA files, calculating protein parameters, querying taxonomic information from NCBI, and much more. It uses the tkinter library to provide a user-friendly graphical interface for easy file input and function selection.
Split FASTA File: Splits a FASTA file into multiple smaller files with a maximum of 99 sequences per file. Sometimes required for post prrocessing.
Header Resumer: Resumes long headers into shorter, standardized ones (e.g., based on the organism name from the NCBI format [organism=...]) and outputs a CSV mapping the original and new headers. It provides both a new FASTA file with the new short headers and the sequences, and a CSV file with both the "old" and "new/short" header names for adequate tracking of sequences.
ProtParam Calculator: Performs bulk calculations of various protein properties (e.g., molecular weight, isoelectric point, etc.) similar to ExPASy's ProtParam tool and outputs the results into a CSV file. Note: This program IGNORES "X" characters in all sequences to perform calculations without errors. Returns a CSV file with the results. So far, this feature is still hard-coded, and the user can't change the output of the program.
Fold Index Calculator: Queries the proteopedia fold index tool for each sequence in the FASTA file and outputs the fold index of each sequence to a CSV file.
Taxa Sage: Queries taxonomic information (Division, Order, Class, Family) for organisms in the FASTA file (requires the presence of [organism=...] in the header) and writes the results to a CSV file.
Microsintenic retriever: Starting from a fasta file conatining genes downloaded from NCBI's dataset collections, it parses data and finds gff3 data of the 20 kbp surrounding the genes of interest. Depicts the data in a readable csv file, great for evolutionary analysis.
Help Menu: Provides a description of the tool's functionalities.
Exit: Safely closes the application. Program does not hold path/file information.
This tool requires the following Python libraries to be installed:
tkinter for the graphical user interface.biopython for parsing FASTA files and retrieving taxonomic information.requests and json for querying online databases like proteopedia and NCBI.You can install the necessary dependencies using:
pip install biopython requestsgit clone https://github.com/SilicoGoBrr/BioPal.git
cd BioPalpip install -r requirements.txtpython biopal.pySelect Input File: Click the "Select Input File" button to choose your FASTA file.
Choose an Operation:
The results will be saved in the same directory as the input file, with appropriate file names based on the operation performed.
The Taxa Sage feature uses NCBI’s Entrez API to retrieve taxonomic data. For this, you need to specify your email address, as required by NCBI's Entrez API.
In the code, locate the following line:
Entrez.email = "" # Add your email hereReplace it with your valid email address:
Entrez.email = "[email protected]"This step is necessary for the Entrez API requests to work properly.
Rate-Limiting: The NCBI Entrez API may impose rate limits. To avoid being rate-limited, the tool introduces a short delay between API requests when using the Taxa Sage feature.
FASTA Format Requirements: The input FASTA file must contain [organism=...] tags for the Taxa Sage function to work correctly.
This program is provided "as is" without any warranties or guarantees. Use it at your own risk. Some features may require an active internet connection.