qget is an Apache2 licensed library, written in Python, for downloading web resources in asynchronous manner as fast as possible.
Under the hood it benefits from asyncio and aiohttp to create multiple
simultaneous connections to resource and download it using buffered part files.
Table of Contents
wget vs qgettqdm)chunk_bytes and max_part_mbasyncio by qget_coro coroutinewget vs qgetConsider simple nginx configuration fragment like this:
http {
server {
...
limit_rate 5m;
...
}
}Now let's compare download statistics for wget and qget for 1000MB file and configuration
mentioned above:
| Application | Total time [s] | AVG Speed [MB/s] | Details |
|---|---|---|---|
wget |
251.34 | 3.98 | |
qget |
16.00
|
95.97
|
Connection limit test: 5.00s
Download: 10.42s
Parts rewrite: 0.58s
|
Conclusion:
For simple rate limiting (per connection) qget allows to achieve multiple times faster download speed
based on user internet connection speed, number of simultaneous requests and resource server configuration.
In example above qget achieved over 24x download speed of wget.
For more complicated cases (e.g. connections limit per IP) automatic connection limit measurement test was created to calculate how many simultaneous requests could be achieved before server rejects next one.
You can download selected binary files from Releases. Available versions:
To install qget module, simply:
$ pip install qgetMake sure Anaconda is installed.
To build on Windows (in Anaconda Prompt):
$ build.batTo build on POSIX (libc-bin and binutils packages are required):
$ build.shFunction arguments:
url (str): The URL to download the resource.
filepath (str, optional): Output path for downloaded resource.
If not set it points to current working directory and filename from url. Defaults to None.
override (bool, optional): Flag if existing output file should be override. Defaults to False.
auth (str, optional): String of user:password pair for SSL connection. Defaults to None.
verify_ssl (bool, optional): Flag if SSL certificate validation should be performed. Defaults to True.
mock_browser (bool, optional): Flag if User-Agent header should be added to request. Defaults to True.
Default User-Agent string: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36'
proxy_url (str, optional): HTTP/SOCKS4/SOCKS5 proxy url in format 'protocol://user:password@ip:port'.
Defaults to None.
headers: (Dict[str, str], optional): Custom headers to be sent. Default to None.
If set user can specify own User-Agent and Accept headers, otherwise defaults will be used.
progress_ref (ProgressState, optional): Reference to progress state.
If passed all parts bytes and rewrite status will be updated in it. Defaults to None.
max_connections (int, optional): Maximum amount of asynchronous HTTP connections. Defaults to 50.
connection_test_sec (int, optional): Maximum time in seconds assigned to test
how much asynchronous connections can be achieved to URL.
If set to 0 test will be omitted. Defaults to 5.
chunk_bytes (int, optional): Chunk of data read in iteration from url and save to part file in bytes.
Will be used also when rewriting parts to output file. If limit is supplied this can be override for
stream iteration. Defaults to 2621440.
max_part_mb (float, optional): Desirable (if possible) max part size in megabytes. Defaults to 5.
retries (int, optional): Retries number for part download. Defaults to 10.
retry_sec (int, optional): Time to wait between retries of part download in seconds. Defaults to 1.
limit (str, optional): Download rate limit in MBps. Can be supplied with unit as "Nunit", eg. "5M".
Valid units (case insensitive): b, k, m, g, kb, mb, gb. 0 bytes will be treat as no limit.
Defaults to None.
tmp_dir (str, optional): Temporary directory path. If not set it points to OS tmp directory.
Defaults to None.
debug (bool, optional): Debug flag. Defaults to False.
To use in code simply import module function:
from qget import qget
url = "https://speed.hetzner.de/100MB.bin"
qget(url)To use in code with own loop and asyncio:
import asyncio
from qget import qget_coro
async def main(loop):
url = "https://speed.hetzner.de/100MB.bin"
download_task = loop.create_task(qget_coro(url))
await download_task
# Or just
# await qget_coro(url)
loop = asyncio.get_event_loop()
loop.run_until_complete(main(loop))
loop.close()Usage for progress hooks (by default hooks are used to display tqdm progress bar):
from qget import ProgressState, qget
def print_download_progress(progress: ProgressState) -> None:
print(f"Download: {progress.get_download_progress():.2f}%", end="r")
if progress.get_download_bytes() == progress.total_bytes:
print()
def print_rewrite_progress(progress: ProgressState) -> None:
print(f"Rewrite: {progress.get_rewrite_progress():.2f}%", end="r")
if progress.rewrite_bytes == progress.total_bytes:
print()
url = "https://speed.hetzner.de/100MB.bin"
progress = ProgressState(
on_download_progress=print_download_progress,
on_rewrite_progress=print_rewrite_progress
)
qget(url, progress_ref=progress)usage: qget [-h] [-o FILEPATH] [-f] [-a AUTH] [--no-verify] [--no-mock]
[--proxy PROXY_URL] [-H HEADER] [-c MAX_CONNECTIONS]
[--test CONNECTION_TEST_SEC] [--bytes CHUNK_BYTES] [--part MAX_PART_MB]
[--retries RETRIES] [--retry_sec RETRY_SEC] [--limit LIMIT] [--tmp TMP_DIR]
[--debug] [-v]
url
Downloads resource from given URL in buffered parts using asynchronous HTTP connections
with aiohttp session.
positional arguments:
url URL of resource
options:
-h, --help show this help message and exit
-o FILEPATH, --output FILEPATH
Output path for downloaded resource.
-f, --force Forces file override for output.
-a AUTH, --auth AUTH String of user:password pair for SSL connection.
--no-verify Disables SSL certificate validation.
--no-mock Disables default User-Agent header.
--proxy PROXY_URL HTTP/SOCKS4/SOCKS5 proxy url in format
'protocol://user:password@ip:port'.
-H HEADER, --header HEADER
Custom header in format 'name:value'. Can be supplied multiple
times.
-c MAX_CONNECTIONS, --connections MAX_CONNECTIONS
Maximum amount of asynchronous HTTP connections.
--test CONNECTION_TEST_SEC
Maximum time in seconds assigned to test how much asynchronous
connections can be achieved to URL. Use 0 to skip.
--bytes CHUNK_BYTES Chunk of data read in iteration from url and save to part file in
bytes. Will be used also when rewriting parts to output file.
--part MAX_PART_MB Desirable (if possible) max part size in megabytes.
--retries RETRIES Retries number for part download.
--retry_sec RETRY_SEC Time to wait between retries of part download in seconds.
--limit LIMIT Download rate limit in MBps. Can be supplied with unit as 'Nunit',
eg. '5M'. Valid units (case insensitive): b, k, m, g, kb, mb, gb.
0 bytes will be treat as no limit.
--tmp TMP_DIR Temporary directory path. If not set it points to OS tmp
directory.
--debug Debug flag.
-v, --version Displays actual version of qget.
Can be used also from python module with same arguments as for binary:
python -m qget https://speed.hetzner.de/100MB.binMultiple headers can be supplied as follow:
python -m qget -H 'name1:value1' -H 'name2:value2' https://speed.hetzner.de/100MB.binLimiter tries to reduce rate of downloaded bytes by adding pauses between iteration over resource content.
If very low download rate is requested try to lower connections amount (max_connections or --connections
MAX_CONNECTIONS) to achieve better accuracy for limit.
Part size is calculated in runtime based on resource size in bytes and maximum amount of asynchronous
connections set by user (or connection test). Max part size param (max_part_mb or --part MAX_PART_MB)
supplied by user is use as a top limit for calculated value.
part_bytes = min(resource_bytes/connections, max_part_bytes)