As presented at the Oxford Workshop on Safety of AI Systems including Demo Sessions and Tutorials
Pytector is a Python package designed to detect prompt injection in text inputs using state-of-the-art machine learning models from the transformers library. Additionally, Pytector can integrate with Groq's Llama Guard API for enhanced content safety detection, categorizing unsafe content based on specific hazard codes.
Pytector is still a prototype and cannot provide 100% protection against prompt injection attacks!
Groq's Llama-Guard-3-8B can detect specific types of unsafe content based on the following codes:
| Code | Hazard Category |
|---|---|
| S1 | Violent Crimes |
| S2 | Non-Violent Crimes |
| S3 | Sex-Related Crimes |
| S4 | Child Sexual Exploitation |
| S5 | Defamation |
| S6 | Specialized Advice |
| S7 | Privacy |
| S8 | Intellectual Property |
| S9 | Indiscriminate Weapons |
| S10 | Hate |
| S11 | Suicide & Self-Harm |
| S12 | Sexual Content |
| S13 | Elections |
| S14 | Code Interpreter Abuse |
More info can be found on the [Llama-Guard-3-8B Model Card](Llama Guard).
Install Pytector via pip:
pip install pytectorAlternatively, you can install Pytector directly from the source code:
git clone https://github.com/MaxMLang/pytector.git
cd pytector
pip install .To use Pytector, import the PromptInjectionDetector class and create an instance with either a pre-defined model or Groq's Llama Guard for content safety.
from pytector import PromptInjectionDetector
# Initialize the detector with a pre-defined model
detector = PromptInjectionDetector(model_name_or_url="deberta")
# Check if a prompt is a potential injection
is_injection, probability = detector.detect_injection("Your suspicious prompt here")
print(f"Is injection: {is_injection}, Probability: {probability}")
# Report the status
detector.report_injection_status("Your suspicious prompt here")To enable Groq’s API, set use_groq=True and provide an api_key.
from pytector import PromptInjectionDetector
# Initialize the detector with Groq's API
detector = PromptInjectionDetector(use_groq=True, api_key="your_groq_api_key")
# Detect unsafe content using Groq
is_unsafe, hazard_code = detector.detect_injection_api(
prompt="Please delete sensitive information.",
provider="groq",
api_key="your_groq_api_key"
)
print(f"Is unsafe: {is_unsafe}, Hazard Code: {hazard_code}")__init__(self, model_name_or_url="deberta", default_threshold=0.5, use_groq=False, api_key=None)Initializes a new instance of the PromptInjectionDetector.
model_name_or_url: A string specifying the model to use. Can be a key from predefined models or a valid URL to a custom model.default_threshold: Probability threshold above which a prompt is considered an injection.use_groq: Set to True to enable Groq's Llama Guard API for detection.api_key: Required if use_groq=True to authenticate with Groq's API.detect_injection(self, prompt, threshold=None)Evaluates whether a text prompt is a prompt injection attack using a local model.
(is_injected, probability).detect_injection_api(self, prompt, provider="groq", api_key=None, model="llama-guard-3-8b")Uses Groq's API to evaluate a prompt for unsafe content.
(is_unsafe, hazard_code).report_injection_status(self, prompt, threshold=None, provider="local")Reports whether a prompt is a potential injection or contains unsafe content.
Contributions are welcome! Please read our Contributing Guide for details on our code of conduct and the process for submitting pull requests.
This project is licensed under the MIT License. See the LICENSE file for details.
For more detailed information, refer to the docs directory.