CLARIN Federated Content Search v3.0 Aggregator – Augmenting your Search Engine
The CLARIN Federated Content Search (CLARIN-FCS) introduces an interface specification that decouples the search engine functionality from its exploitation, i.e. user-interfaces, third-party applications, to allow services to access heterogeneous search engines in a uniform way.
The Aggregator v3.0 is running at The National Swedish Language Bank's text division as well as at CLARIN.
The Specification for Federated Content Search v2.0 can be found as a PDF document. For more details visit at the CLARIN FCS - Technical Details page.
For a detailed list of changes, please take a look at CHANGELOG.md.
The backwards compatibility gives you as a Centre search engine maintainer a smooth transtion to the new features and capabilities at your own convenience.
These new additions to the CLARIN-FCS will not only enhance the power user experience and possibilities when performing queries from repositories, but also that less experienced users will find it easier to explore different corpora.
If you have any kind of RESTful API to your Search Engine using the Korp Endpoint Reference Implementation as a starting point should be the way to go. If you more specifically are using Korp it should only be a simple adaptation to corpora and tagsets needed. In any case do not forget to look at the tests.
To test your Endpoint you can point the IDS Endpoint Tester (code) to your Endpoint.
There is also an Endpoint developer's tutorial available.
To build the FCS Aggregator you need a few simple steps (if you have not changed anything just skip to step 3):
./build.sh --npm./build.sh --jsx./build.sh --jarThe frontend (React) and backend (jersey servlet) are then built using node and maven.
Check the aggregator_devel.yml configuration file. If you want to sideload your enpoint simply
add the endpoint to either additionalCQLEndpoints or additionalFCSEndpoints before running:
./build.sh --run
you might also want to change the path to your cache files in AGGREGATOR_FILE_PATH and AGGREGATOR_FILE_PATH_BACKUP respectively.
You then can access the locally running Aggregator at http://localhost:4019/
See DEPLOYMENT.md for example deployment configurations and descriptions about settings.