phinde Download - phinde Source code download

phinde

Other source code

1.0.0

Download

phinde - generic web search engine

Self-hosted search engine you can use for your static blog or about any other website you want search functionality for.

My live instance is at http://search.cweiske.de/ and indexes my website, blog and all linked URLs.

Features

Crawler and indexer with the ability to run many in parallel
Shows and highlights text that contains search words
Boolean search queries:
- foo bar searches for foo AND bar
- foo OR bar
- title:foo searches for foo only in the page title
Facets for tag, domain, language and type
Date search:
- before:2016-08-30 - modification date before that day
- after:2016-08-30 - modified after that day
- date::2016-08-30 - exact modification day match
Site search
- Query: foo bar site:example.org/dir/
- or use the site GET parameter: /?q=foo&site=example.org/dir
OpenSearch support with HTML and Atom result lists
Instant indexing with WebSub (formerly PubSubHubbub)

Dependencies

PHP 8.x
Elasticsearch 2.0
MySQL or MariaDB for WebSub subscriptions
Gearman (Debian 9: gearman-job-server, not gearman-server)
gearadmin command line tool (gearman-tools package)
PHP Gearman extension
Some PHP libraries that get installed with composer

Setup

Install and run Elasticsearch and Gearman
Install php-gearman and gearman-tools

Get a local copy of the code:

$ git clone https://git.cweiske.de/phinde.git phinde

Install dependencies via composer:
```
$ composer install --no-dev
```
Point your webserver's document root to phinde's www directory
Copy data/config.php.dist to data/config.php and adjust it. Make sure your add your domain to the crawl whitelist.
Create a MySQL database and import the schema from data/schema.sql
Run bin/setup.php which sets up the Elasticsearch schema

Put your homepage into the queue:

$ ./bin/process.php http://example.org/

Start at least one worker to process the crawl+index queue:
```
$ ./bin/phinde-worker.php
```
Check phinde's status page in your browser. The number of open tasks should be > 0, the number of workers also.

Re-index when your site changes

When your site changed, the search engine needs to re-crawl and re-index the pages.

Simply tell phinde that something changed by running:

$ ./bin/process.php http://example.org/foo.htm

phinde supports HTML pages and Atom feeds, so if your blog has a feed it's enough to let phinde reindex that one. It will find all linked pages automatically.

Website integration

Adding a simple search form to your website is easy. It needs two things:

<form> tag with an action that points to the phinde instance
Search text field with name of q.

Example:

<form method="get" action="http://phinde.example.org">
  <input type="text" name="q" placeholder="Search text"/>
  <button type="submit">Search</button>
</form>

System service

When using systemd, you can let it run multiple worker instances when the system boots up:

Copy files data/systemd/phinde*.service into /etc/systemd/system/
Adjust user and group names, and the work directories

Enable three worker processes:

$ systemctl daemon-reload
$ systemctl enable phinde@1
$ systemctl enable phinde@2
$ systemctl enable phinde@3
$ systemctl enable phinde
$ systemctl start phinde

Now three workers are running. Restarting the phinde service also restarts the workers.

Cron job

Run bin/renew-subscriptions.php once a day with cron. It will renew the WebSub subscriptions.

Howto

Delete index data from one domain:

$ curl -iv -XDELETE -H 'Content-Type: application/json' -d '{"query":{"term":{"domain":"example.org"}}}' http://127.0.0.1:9200/phinde/_query

That's delete-by-query 2.0, see https://www.elastic.co/guide/en/elasticsearch/plugins/2.0/delete-by-query-usage.html

Subscribe to a website/feed

Phinde supports WebSub to get subscribe to changes of a website. When phinde gets notified by the website's hub about changes, it will immediately crawl and index the changed pages.

Subscribe to a website's feed:

$ php bin/subscribe.php http://example.org/feed.atom

Phinde will determine the website's hub and send a registration request to it.

The status page will show the number of working, and the number of open subscriptions.

Unsubscribing also happens on command line:

$ php bin/unsubscribe.php http://example.org/feed.atom

About phinde

Source code

phinde's source code is available from http://git.cweiske.de/phinde.git or the mirror on github.

License

phinde is licensed under the AGPL v3 or later.

Author

phinde was written by Christian Weiske.

Expand

Additional Information

Version 1.0.0
Type Other source code
Update Time 2025-03-13
size 87.03KB
From Github

Related Applications

Google Dorks

2025-03-10
shepherd

2025-06-04
mongo express

2025-06-04
hidusbf

2025-02-14
Free Algorithms Books

2025-05-29
markdownpedia

2025-04-22

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3
Google Dorks

Other source code

1.0
shepherd

Other source code

v6.1.6-react-shepherd: Prepare Release (#3063)
mongo express

Other source code

v1.1.0-rc-3

Related Information All