# Installation Installation was only tested on Debian bullseye (on amd64). The instructions below are for this system. (Please adapt to other environments.) ## System packages ``` apt install pandoc tidy python3-systemd openjdk-17-jre-headless apt install protobuf-compiler libprotobuf-dev build-essential libpython3-dev ``` Java is needed for tika. The second line is required for python package gcld3 (see below). ## PostgreSQL database We need access to a PostgreSQL database. Install PostgreSQL or provide connectivity to a PostgreSQL database over TCP/IP. Create a new database: ``` createdb -E UTF8 --lc-collate=C --lc-ctype=C -T template0 -O atextcrawler atextcrawler ``` ## Elasticsearch We need access to an elasticsearch instance (over TCP/IP). Note: TLS is not yet supported, so install this service locally. See [elasticsearch howto](elasticsearch.md). Create an API key (using the password for user elastic): ``` http --auth elastic:******************* -j POST http://127.0.0.1:9200/_security/api_key name=atext role_descriptors:='{"atext": {"cluster": [], "index": [{"names": ["atext_*"], "privileges": ["all"]}]}}' ``` ## Tensorflow model server We need access to a tensorflow model server (over TCP/IP). It should serve `universal_sentence_encoder_multilingual` or a similar language model. Note: TLS is not yet supported, so install this service locally. See [tensorflow howto](tensorflow_model_server.md). ## Setup virtualenv and install atextcrawler ``` apt install python3-pip adduser --home /srv/atextcrawler --disabled-password --gecos "" atextcrawler su - atextcrawler cat >>.bashrc <>.profile <