This guide covers handling INCEpTION from an administrator’s perspective.
Installation
You can run INCEpTION on any major platform supporting Java, i.e. Linux, macOS or Windows. However, we do not provide explicit support for setting up a production-ready instance of each of these platforms.
This guide assumes Debian 9.1 (Stretch). It may also work on Ubuntu with some modifications, but we do not test this. Instructions for other Linux distributions and other platforms (i.e. macOS and Windows) likely deviate significantly.
It is further assumed that the user www-data already exists on the system and that it shall be used to run the application.
All commands assume that you are logged in as the root user.
If you cannot log in as root but have to use sudo to become root, then the recommended way
to do that is using the command sudo su - .
|
System Requirements
Browser |
Chrome or Safari |
Operating System |
Linux (64bit), macOS (64bit), Windows (64bit) |
Java Runtime Environment |
version 8 or higher |
Operating System |
Linux (64bit), macOS (64bit), Windows (64bit) |
Java Runtime Environment |
version 8 or higher |
Apache Tomcat (or compatible) |
version 9.0 or higher (Servlet API 4.0.0) |
MySQL Server (or compatible) |
version 5 or higher |
Install Java
You can install an Oracle Java 8 JDK using the following commands.
$ apt-get update
$ apt-get install openjdk-8-jdk
Application home folder
The INCEpTION home folder is the place where INCEpTION’s configuration file settings.properties
resides and where INCEpTION stores its data. The settings.properties
file is not automatically created by the application
and needs to be created manually in the case that settings need to be configured.
Mind that if you are using a MySQL database server
(recommended), then INCEpTION also stores some data in the MySQL database. This is important when
you plan to perform a backup, as both the home folder and the database content need to be
included in the backup.
Now, let’s go through the steps of setting up a home folder for INCEpTION and creating a configuration file instructing INCEpTION to access the previously prepared MySQL database.
-
Create INCEpTION home folder. This is the directory where INCEpTION settings files and projects (documents, annotations, etc.) are stored
$ mkdir /srv/inception
-
Create and edit
/srv/inception/settings.properties
to define the database connection as well as internal backup properties:database.dialect=org.hibernate.dialect.MySQL5InnoDBDialect database.driver=com.mysql.jdbc.Driver database.url=jdbc:mysql://localhost:3306/inception?useSSL=false&serverTimezone=UTC database.username=inception database.password=t0t4llYSecreT # 60 * 60 * 24 * 30 = 30 days backup.keep.time=2592000 # 60 * 5 = 5 minutes backup.interval=300 backup.keep.number=10
-
Fix permissions in INCEpTION home folder
$ chown -R www-data /srv/inception
Database
INCEpTION uses a SQL database to store project and user data.
INCEpTION uses by default an embedded HSQLDB database. However, we recommend using the embedded database only for testing purposes. For production use, we recommend using a MySQL server. The reason for this is, that:
-
some users have reported that HSQLDB databases may become corrupt when the computer crashes (note that this could probably also happen with MySQL, but we did so far not have any reports about this);
-
most INCEpTION developers use MySQL when running INCEpTION on their servers;
-
in the past, we had cases where we described in-place upgrade procedures that required performing SQL commands to change the data model as part of the upgrade. We promise to try avoiding this in the future. However, in case we offer advice on fixing anything directly in the database, this advice will refer to a MySQL database.
We try to keep the data model simple, so there should be no significant requirements to the database
being used. Theoretically, it should be possible to use any JDBC-compatible database after adding a
corresponding driver to the classpath and configuring INCEpTION to use the driver in the
settings.properties
file.
MySQL
For production use of INCEpTION, it is highly recommended to use a MySQL database. In this section, we briefly describe how to install a MySQL server and how to prepare it for use with the application.
Prepare database
-
Install MySQL
$ apt-get install mysql-server
-
make sure your MySQL server is configured for UTF-8. Check the following line is present in
/etc/mysql/mariadb.conf.d/50-server.cnf
(this is specific to Debian 9; on other systems the relevant file may be/etc/mysql/my.cnf
):character-set-server = utf8 collation-server = utf8_bin
-
also ensure the default settings for client connections to are UTF-8 in
/etc/mysql/mariadb.conf.d/50-client.cnf
(again Debian 9; likely in/etc/mysql/my.cnf
on other systems)default-character-set = utf8
-
login to MySQL
$ mysql -u root -p
-
create a database
mysql> CREATE DATABASE inception DEFAULT CHARACTER SET utf8 COLLATE utf8_bin ;
-
create a database user called
inception
with the passwordt0t4llYSecreT
which is later used by the application to access the database (instructions forsettings.properties
file below).mysql> CREATE USER 'inception'@'localhost' IDENTIFIED BY 't0t4llYSecreT'; mysql> GRANT ALL PRIVILEGES ON inception.* TO 'inception'@'localhost'; mysql> FLUSH PRIVILEGES;
For production use, make sure you choose a different, secret, and secure password. |
Configuration options
This section explains some settings that can be added to the database.url
in the
settings.properties
file when using MySQL. Settings are separated from the host name and database
name with a ?
character and multiple settings are separated using the &
character, e.g.:
database.url=jdbc:mysql://localhost:3306/inception?useSSL=false&serverTimezone=UTC
To suppress the warning about non-SSL database connections with recent MySQL databases, append the
following setting to the database.url
:
useSSL=false
Recent MySQL drivers may refuse to work unless a database server timezone has been specified. The
easiest way to do this is to add the following setting to the database.url
:
serverTimezone=UTC
If you plan to use UTF-8 encoding for project name and tagset/tag name, make sure either of the following settings for MySQL database
-
in the
settings.properties
file, make sure thatdatabase.url
includesuseUnicode=true&characterEncoding=UTF-8
-
change the
my.conf
MySQL databse configuration file to include the following linecharacter-set-server = utf8
HSQLDB (embedded)
INCEpTION displays a warning in the user interface when an embedded database is being used. It is not recommended to used an embedded database for various reasons:
-
HSQLDB databases are known to run a risk of becoming corrupt in case of power failures which may render the application inaccessible and your data difficult to recover.
-
In very rare cases it may be necessary to fix the database content which is more inconvenient for embedded databases.
In case that you really want to run INCEpTION with an embedded database in production,
you probably want to disable this warning. To do so, please add the following entry to
the settings.properties
file:
warnings.embeddedDatabase=false
Running via embedded Tomcat (JAR)
The INCEpTION standalone JAR with an embedded Tomcat server and can be easily set up as a UNIX service. This is the recommended way of running INCEpTION on a server.
The instructions below expect a Debian Linux system. Details may vary on other OSes and Linux distributions.
Installing as a service
To set it up as a service, you can do the following steps. For the following
example, we assume that you install INCEpTION in /srv/inception
:
-
Copy the standalone JAR file
inception-app-standalone-0.17.4.jar
to/srv/inception/inception.jar
. Note the change of the filename toinception.jar
. -
Create the file
/srv/inception/inception.conf
with the following contentJAVA_OPTS="-Djava.awt.headless=true -Dinception.home=/srv/inception"
-
In the previous step, you have already created the
/srv/inception/settings.properties
file. You may optionally configure the Tomcat port using the following lineserver.port=18080
If you need to do additional configurations of the embedded Tomcat, best refer to the documentation of Spring Boot itself.
-
Make sure that the file
/srv/inception/inception.conf
is owned by the root user. If this is not the case, INCEpTION will ignore it and any settings made there will not have any effect. If you start INCEpTION and instead of using the MySQL database, it is using an embedded database, then you should double-check that/srv/inception/inception.conf
is owned by the root user.$ chown root:root /srv/inception/inception.conf
-
We will run INCEpTION as the user
www-data
. Change the owner/group of/srv/inception/inception.jar
towww-data
. Do NOT run INCEpTION as root.$ chown www-data:www-data /srv/inception/inception.jar
-
Make the JAR file executable:
$ chmod +x /srv/inception/inception.jar
-
Create a file in
/etc/systemd/system/inception.service
with the following content:[Unit] Description=INCEpTION [Service] ExecStart=/srv/inception/inception.jar User=www-data [Install] WantedBy=multi-user.target
-
Enable the INCEpTION service using
$ systemctl enable inception
-
Start INCEpTION using
$ systemctl start inception
-
Check the log output
$ journalctl -u inception
-
Stop INCEpTION using
$ systemctl stop inception
Running the standalone behind HTTPD
These are optional instructions if you want to run INCEpTION behind an Apache web-server instead of accessing it directly. This assumes that you already have the following packages installed:
-
Apache Web Server
-
mod_proxy
-
mod_proxy_ajp
-
Add the following lines to
/srv/inception/settings.properties
:server.ajp.port=18009 server.ajp.secret=SECRET_STRING_YOU_CHOOSE server.ajp.address=127.0.0.1 server.servlet.context-path=/inception server.use-forward-headers=true
-
Edit
/etc/apache2/conf.d/inception.local.conf
ProxyPreserveHost On <Proxy ajp://localhost/inception > Order Deny,Allow Deny from none Allow from all </Proxy> <Location /inception > ProxyPass ajp://localhost:18009/inception timeout=1200 secret="SECRET_STRING_YOU_CHOOSE" ProxyPassReverse http://localhost/inception </Location>
-
Restart Apache web server
$ service apache2 restart
The secret option is supported e.g. in link:https://httpd.apache.org/docs/trunk/mod/
mod_proxy_ajp.html[Apache HTTP 2.5 mod_proxy_ajp]. If you are using reverse proxy which does not
support passing along a secret, you may set server.ajp.secret-required=false in the
settings.properties file.
|
Securing with SSL
This section assumes Debian 9.1 (Stretch) as the operating system using NGINX as a web server.
It further assumes that you want to use Let’s Encrypt as a CA for obtaining valid SSL certificates.
-
In addition, you will need a fully registered domain name. This tutorial uses
example.com
. Replace it accordingly.
We strongly encourage securing your production system with a firewall like UFW. |
Obtaining a Let’s Encrypt certificate
The Certification Authority (CA) Let’s Encrypt provides free TLS/SSL certificates. These certificates allow for secure HTTPS connections on web servers. Let’s Encrypt provides the software Certbot which automates the obtaining process for NGINX.
-
Enable the Stretch backports repo if needed
-
Install Certbot preconfigured for NGINX
$ apt-get install python-certbot-nginx -t stretch-backports
-
Obtain the certificates for your domain
example.com
$ certbot --nginx certonly -d example.com
-
You will be prompted to enter your e-mail address and asked to agree to the terms of service. Certificate renewal information will be sent to this e-mail. If the certification process is successful it will yield the information where your certificates can be found.
IMPORTANT NOTES: - Congratulations! Your certificate and chain have been saved at /etc/letsencrypt/live/example.com/fullchain.pem. Your cert will expire on 2019-04-22. To obtain a new or tweaked version of this certificate in the future, simply run certbot again with the "certonly" option. To non-interactively renew *all* of your certificates, run "certbot renew" - Your account credentials have been saved in your Certbot configuration directory at /etc/letsencrypt. You should make a secure backup of this folder now. This configuration directory will also contain certificates and private keys obtained by Certbot so making regular backups of this folder is ideal. - If you like Certbot, please consider supporting our work by: Donating to ISRG / Let's Encrypt: https://letsencrypt.org/donate Donating to EFF: https://eff.org/donate-le
Certificates issued by Let’s Encrypt are valid for 90 days. You will receive an expiry notification to the e-mail address you provided during the certification process. |
-
Run Certbot with the command
renew
to renew all certificates that are due. You can also create a cron job for this purpose. The command for renewal is
$ certbot --nginx renew
-
You can simulate the certificate renewal process with the command
$ certbot --nginx renew --dry-run
-
The directory
/etc/letsencrypt/live/example.com/
now contains the necessary certificates to proceed
$ ls /etc/letsencrypt/live/example.com Output: cert.pem chain.pem fullchain.pem privkey.pem
Installing NGINX
This section assumes Debian 9.1 (Stretch) as the operating system using NGINX as a web server. It further assumes that you want to use Let’s Encrypt as a CA for obtaining valid SSL certificates.
-
You can install NGINX by typing
$ apt-get update $ apt-get install nginx
-
Verify the installation with
$ systemctl status nginx Output: ● nginx.service - A high-performance web server and a reverse proxy server Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2019-01-21 14:42:01 CET; 20h ago Docs: man:nginx(8) Process: 7947 ExecStop=/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid (code=exited, status=0/SUCCESS) Process: 7953 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS) Process: 7950 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS) Main PID: 7955 (nginx) Tasks: 9 (limit: 4915) CGroup: /system.slice/nginx.service ├─7955 nginx: master process /usr/sbin/nginx -g daemon on; master_process on; ├─7956 nginx: worker process
-
You can stop, start or restart NGINX with
$ systemctl stop nginx $ systemctl start nginx $ systemctl restart nginx
Putting it all together
By now you should have
-
INCEpTION running on port 8080
-
NGINX running with default configurations on port 80
-
your issued SSL certificates
If you are running INCEpTION on a different port than 8080, please make sure to adjust the configurations below accordingly! |
We will now configure NGINX to proxy pass all traffic received at example.com/inception
to our INCEpTION instance.
Create a new virtual host for your domain. Inside of /etc/nginx-sites-available/
create a new file for your domain (e.g. example.com
). Paste the following contents:
# Server block for insecure http connections on port 80. Redirect to https on port 443 server { listen 80; listen [::]:80; server_name example.com; return 301 https://$server_name$request_uri; } # Server block for secure https connections server { listen 443 ssl; listen [::]:443 ssl; server_name inception.example.com; ssl on; # Replace certificate paths ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem; ssl_trusted_certificate /etc/letsencrypt/live/example.com/fullchain.pem; # Modern SSL Config from # https://mozilla.github.io/server-side-tls/ssl-config-generator/ ssl_protocols TLSv1.2; ssl_ciphers 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256'; ssl_prefer_server_ciphers on; ssl_session_timeout 1d; ssl_session_tickets off; add_header Strict-Transport-Security max-age=15768000; ssl_stapling on; ssl_stapling_verify on; ignore_invalid_headers off; #pass through headers from INCEpTION which are considered invalid by NGINX server. # Change body size if needed. This defines the maximum upload size for files. client_max_body_size 10M; # Uncommend this for a redirect from example.com to example.com/inception #location / { # return 301 https://$host/inception; #} location ^~ /inception/ { proxy_pass http://127.0.0.1:8080/inception/; proxy_redirect default; proxy_http_version 1.1; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_max_temp_file_size 0; proxy_connect_timeout 180; proxy_send_timeout 180; proxy_read_timeout 180; proxy_temp_file_write_size 64k; # Required for new HTTP-based CLI proxy_request_buffering off; proxy_buffering off; # Required for HTTP-based CLI to work over SSL proxy_set_header Connection ""; # Clear for keepalive } # Deny access to Apache .htaccess files. They have no special meaning for NGINX and might leak sensitive information location ~ /\.ht { deny all; } }
Create a symlink for the new configuration file to the folder for accessible websites:
$ ln -s /etc/nginx/sites-available/example.com /etc/nginx/sites-enabled/example.com
Test if the NGINX configuration file works without restarting (and possibly breaking) the webserver:
$ nginx -t Output: nginx: the configuration file /etc/nginx/nginx.conf syntax is ok nginx: configuration file /etc/nginx/nginx.conf test is successful
If the config works restart the webserver to enable the new site
$ service nginx restart
Tell INCEpTION that it is running behind a proxy
If you are running INCEpTION via the JAR file, edit the settings.properties
file to add these
settings:
server.tomcat.internal-proxies=127\.0\.[0-1]\.1 server.tomcat.remote-ip-header=x-forwarded-for server.tomcat.accesslog.request-attributes-enabled=true server.tomcat.protocol-header=x-forwarded-proto server.tomcat.protocol-header-https-value=https
Restart INCEpTION
$ service inception restart
INCEpTION now knows how to interpret the proxy header fields from NGINX. With this step, everything is now set up to access INCEpTION trough a secure https connection.
CSRF protection
Depending on your situation, you may get an error message such as this when trying to use INCEpTION.
Whitelabel Error Page This application has no explicit mapping for /error, so you are seeing this as a fallback.
Fri Nov 29 14:01:15 BRT 2019 There was an unexpected error (type=Bad Request, status=400). Origin does not correspond to request
If this is the case, then CSRF protection is kicking in. What seems to work in this case is to turn
off CSRF entirely by adding the following lines to your settings.properties
file (see Settings):
wicket.core.csrf.enabled=false wicket.core.csrf.no-origin-action=allow wicket.core.csrf.conflicting-origin-action=allow
Turning off a security feature is obviously not a great solution. Better check out the documentation for the Wicket Spring Boot CSRF settings and if you figure out a better solution than the above, please get in touch with use via our issue tracker. |
Running via Docker
Quick start
If you have Docker installed, you can run INCEpTION using
$ docker run -it --name inception -p8080:8080 inceptionproject/inception:0.17.4
The command downloads INCEpTION from Dockerhub and starts it on port 8080. If this port is not
available on your machine, you should provide another port to the -p
parameter.
The logs will be printed to the console. To stop the container, press CTRL-C
.
To run the INCEpTION docker in the background use
$ docker run -d --name inception -p8080:8080 inceptionproject/inception:0.17.4
Logs are accessible by typing
$ docker logs inception
Use docker run only the first time that you run INCEpTION. If you try it a second time, Docker
will complain about the name inception already being in use. If you follow Docker`s suggestion
to delete the container, you will loose all your INCEpTION data. Further below, we explain how
you can store your data outside the container in a folder on your host.
|
When you want to run INCEpTION again later, use the command
$ docker start -ai inception
or for the background mode
$ docker start inception
Storing data on the host
If you follow the quick start instructions above, INCEpTION will store all its data inside the docker container. This is normally not what you want because as soon as you delete the container, all data is gone. That means for example that you cannot easily upgrade to a new version of the INCEpTION docker image when one is released.
To store your data on your host computer, first create a folder where you want to store your data.
For example, if you are on Linux, you could create a folder /srv/inception
:
$ mkdir /srv/inception
When you run INCEpTION via Docker, you then mount this folder into the container:
$ docker run -it --name inception -v /srv/inception:/export -p8080:8080 inceptionproject/inception:0.17.4
Settings file
The dockerized INCEpTION expects the settings.properties
file in the /export
folder. Instead of
injecting a custom settings.properties
file into the container, it is strongly recommender to
use the instructions above (Storing data on the host) to mount a folder from the host system to
/export
then to place the into the mounted folder settings.properties
. Thus, if you follow
the instructions above, the settings file would go to /srv/inception/settings.properties
on the host
system.
Connecting to a MySQL database
By default, INCEpTION uses an embedded SQL database to store its metadata (not the texts, annotations and knowledge bases, these are stored in files on disk). For production use, it is highly recommended to use a separate MySQL database instead of the embedded SQL database.
Docker Compose
Using Docker Compose, you can manage multiple related containers. This section illustrates how to use Docker Compose to jointly set up a INCEpTION container as well as a database container (i.e. this one).
The following Compose script sets these containers up.
##
# docker-compose up [-d]
# docker-compose down
##
version: '2.1'
networks:
inception-net:
services:
mysqlserver:
image: "mysql:5"
container_name: inception_mysql
environment:
- MYSQL_RANDOM_ROOT_PASSWORD=yes
- MYSQL_DATABASE=inception
- MYSQL_USER=${DBUSER}
- MYSQL_PORT=3306
- MYSQL_PASSWORD=${DBPASSWORD}
volumes:
- ${INCEPTION_HOME}/mysql-data:/var/lib/mysql
command: ["--character-set-server=utf8", "--collation-server=utf8_bin"]
healthcheck:
test: ["CMD", "mysqladmin" ,"ping", "-h", "localhost", "-p${DBPASSWORD}", "-u${DBUSER}"]
interval: 20s
timeout: 10s
retries: 10
networks:
inception-net:
webserver:
image: "inceptionproject/inception:0.17.4"
container_name: inception_webserver
ports:
- "${INCEPTION_PORT}:8080"
environment:
- INCEPTION_DB_DIALECT=org.hibernate.dialect.MySQL5InnoDBDialect
- INCEPTION_DB_DRIVER=com.mysql.jdbc.Driver
- INCEPTION_DB_URL=jdbc:mysql://mysqlserver:3306/inception?useSSL=false&useUnicode=true&characterEncoding=UTF-8
- INCEPTION_DB_USERNAME=${DBUSER}
- INCEPTION_DB_PASSWORD=${DBPASSWORD}
volumes:
- ${INCEPTION_HOME}/server-data:/export
depends_on:
mysqlserver:
condition: service_healthy
mem_limit: 1g
memswap_limit: 1g
restart: unless-stopped
networks:
inception-net:
Place the script into any folder, change to that folder, and issue the following commands which define the username/password you wish to use for INCEpTION to talk to the database, the folder on the host system where the application data is stored, and the port on which the application will run. The last command starts the containers.
$ export DBUSER=<USER_NAME>
$ export DBPASSWORD=<PASSWORD>
$ export INCEPTION_HOME=/srv/inception
$ export INCEPTION_PORT=8080
$ docker-compose -p inception up -d
This will start two docker containers: inception_mysqlserver_1
, and inception_webserver_1
.
You can check the logs of each by running
$ docker logs inception_mysqlserver_1
$ docker logs inception_webserver_1
The actual name of these containers might vary. A list of running containers can be retrieved by
$ docker ps
Two directories in your INCEpTION home folder will be created: mysql-data
and webserver-data
.
No data is stored in the containers themselves, you are safe to delete them with
$ docker-compose -p inception down
You can also just stop or pause them, please see the docker-compose reference for details.
Monitoring the INCEpTION instance
Available metrics
We expose some metrics of the running INCEpTION instance via JMX. These are currently
-
the number of active as well as enabled users
-
the overall number of documents
-
the number of enabled recommenders
-
the number of annotation documents i.e. documents being annotated per user
To make the metrics available spring.jmx.enabled=true
and monitoring.metrics.enabled=true
must be set in the settings.properties
file
(see Application home folder on this file).
Setting up metrics exporter
To export the metrics so they can be queried by the monitoring solution Prometheus, you can e.g. use the JMX exporter as a java agent.
The JMX exporter can be run as a .jar file that should be placed together with its config.yml
file next to the INCEpTION .jar file. An example config.yml
file that exposes metrics from
INCEpTION but not webanno brat metrics (metrics associated with brat rendering) and conforms JMX metric
names to Prometheus Naming conventions is:
ssl: false whitelistObjectNames: ["de.tudarmstadt.ukp.inception.recommendation.metrics:*", "de.tudarmstadt.ukp.clarin.webanno.api.dao.metrics:*", "de.tudarmstadt.ukp.clarin.webanno.security.metrics:*"] blacklistObjectNames: ["de.tudarmstadt.ukp.clarin.webanno.brat.metrics:*"] lowercaseOutputName: true lowercaseOutputLabelNames: true rules: - pattern: 'de.tudarmstadt.ukp.inception.recommendation.metrics<name=recommendationMetricsImpl, type=RecommendationMetricsImpl><>(\w+): (\d+)' name: inception_$1 value: $2 help: "Inception metric $1" type: GAUGE attrNameSnakeCase: true - pattern: 'de.tudarmstadt.ukp.clarin.webanno.([\.\w]+).metrics<name=(\w+), type=(\w+)><>(\w+): (\d+)' name: webanno_$4 value: $5 help: "Inception metric $4" type: GAUGE attrNameSnakeCase: true
The following line will run the JMX exporter for the JVM that runs the inception.jar. The exporter will expose the metrics on the http-endpoint localhost:9404. Make sure to use a port, 9404 in this case, that is not open to the public (only to the local network that your Prometheus instance runs in).
java -javaagent:./jmx_prometheus_javaagent-0.13.0.jar=9404:config.yaml -jar inception.jar
The JMX exporter will also automatically expose JVM metrics in the java.lang
namespace
which can be used to e.g. monitor memory usage:
-
jvm_memory_bytes_used: Used bytes of a given JVM memory area.
-
jvm_memory_bytes_committed: Committed (bytes) of a given JVM memory area. This means (opposed to max memory) that this memory is available to the JVM.
and others.
Upgrading
Backup your data
-
Make a copy of your INCEpTION home folder
-
If you are using MySQL, make a backup of your INCEpTION database, e.g. using the mysqldump command.
Upgrading with embedded Tomcat
-
Stop the INCEpTION service
-
Replace the
inception.jar
file with the new version -
Ensure that the file has the right owner/group (usually
www-data
) -
Start the INCEpTION service again
Remote API
In order to programmatically manage annotation project, a REST-like remote API is offered. This API
is disabled by default. In order to enable it, add the setting remote-api.enabled=true
to the
settings.properties
file.
Setting | Description | Default | Example |
---|---|---|---|
remote-api.enabled |
Enable remote API |
false |
true |
Once the remote API is enabled, it becomes possible to assign the role ROLE_REMOTE
to a user. Create a new user, e.g. remote-api
via the user management page and assign at least the roles ROLE_USER
and ROLE_REMOTE
. Most of the actions accessible through the remote API require administrator access, so adding the ROLE_ADMIN
is usually necessary as well.
Once the remote API has been enabled, it offers a convenient and self-explanatory web-based user interface under <APPLICATION_URL>/swagger-ui.html
which can be accessed by any user with the role ROLE_REMOTE
. Here, you can browse the different operations, their parameters, and even try them out directly via a web browser. The actual AERO remote API uses <APPLICATION_URL/api/aero/v1
as the
base URL for its operations.
The API follows the Annotation Editor Remote Operations (AERO) protocol.
The third-party Python library pycaprio can be used to facilitate accessing the remote API.
Webhooks
Webhooks allow INCEpTION to notify external services about certain events. For example, an external service can be triggered when an annotator marks a document as finished or when all documents in a project have been completely curated.
Webhooks are declared in the settings.properties
file. For every webhook, it is necessary to
specify an URL (url
) and a set of topics (topics
) about with the remote service listening at the
given URL is notified. If the remote service is accessible via https and the certificate is not
known to the JVM running INCEpTION, the certificate verification can be disabled
(verify-certificates
).
The following topics are supported:
-
DOCUMENT_STATE
- events related to the change of a document state such as when any user starts annotating or curating the document. -
ANNOTATION_STATE
- events related to the change of an annotation state such as when a user starts or completes the annotation of a document. -
PROJECT_STATE
- events related to the change of an entire project such as when all documents have been curated.
webhooks.globalHooks[0].url=http://localhost:3333/ webhooks.globalHooks[0].topics[0]=DOCUMENT_STATE webhooks.globalHooks[0].topics[1]=ANNOTATION_STATE webhooks.globalHooks[0].topics[2]=PROJECT_STATE webhooks.globalHooks[0].verify-certificates=false
Settings
settings.properties
which must reside in the
application home folder. This file is optional and might need to be created first in the Application home folder. If the file does not exist, default values are assumed.
General Settings
Setting | Description | Default | Example |
---|---|---|---|
warnings.unsupportedBrowser |
Warn about unsupported browser |
true |
false |
debug.showExceptionPage |
Show a page with a stack trace instead of an "Internal error" page. Do not use in production! |
false |
true |
login.message |
Custom message to appear on the login page, such as project web-site, annotation guideline link, … The message can be an HTML content. |
unset |
|
user.profile.accessible |
Whether regular users can access their own profile to change their password and other profile information. This setting has no effect when running in pre-authentication mode. |
false |
true |
user-selection.hideUsers |
Whether the list of users show in the users tab of the project settings is restricted. If this setting is enable, the full name of a user has to be entered into the input field before the user can be added. If this setting is disabled, it is possible to see all enabled users and to add any of them to the project. |
false |
true |
Database connection
Setting | Description | Default | Example |
---|---|---|---|
database.dialect |
Database dialect |
org.hibernate.dialect.HSQLDialect |
org.hibernate.dialect.MySQL5InnoDBDialect |
database.driver |
Database driver |
org.hsqldb.jdbc.JDBCDriver |
com.mysql.jdbc.Driver |
database.url |
JDBC connection string |
location in application home |
jdbc:mysql://localhost:3306/weblab?useUnicode=true&characterEncoding=UTF-8&serverTimezone=UTC |
database.username |
Database username |
sa |
user |
database.password |
Database password |
unset |
pass |
database.initial-pool-size |
Initial database connection pool size |
4 |
|
database.min-pool-size |
Minimum database connection pool size |
4 |
|
database.max-pool-size |
Maximum database connection pool size |
10 |
|
warnings.embeddedDatabase |
Warn about using an embedded database |
true |
false |
The basic database connection details can also be configured via environment variables. When these
environment variables are present, they are preferred over the settings.properties
file.
The following environment variables can be used:
Setting | Description | Default | Example |
---|---|---|---|
|
Database dialect |
org.hibernate.dialect.HSQLDialect |
org.hibernate.dialect.MySQL5InnoDBDialect |
|
Database driver |
org.hsqldb.jdbc.JDBCDriver |
com.mysql.jdbc.Driver |
|
JDBC connection string |
location in application home |
jdbc:mysql://localhost:3306/inception?useUnicode=true&characterEncoding=UTF-8 |
|
Database username |
sa |
user |
|
Database password |
unset |
pass |
Server Settings
These settings relate to the embedded web server in the JAR version of INCEpTION.
Setting | Description | Default | Example |
---|---|---|---|
server.port |
Port on which the server listens |
8080 |
18080 |
server.address |
IP address on which the server listens |
0.0.0.0 |
127.0.0.1 |
server.ajp.port |
Port for AJP connector |
-1 (disabled) |
8009 |
server.ajp.address |
IP address on which the AJP connector listens |
127.0.0.1 |
0.0.0.0 |
server.ajp.secret-required |
Whether AJP connections require a shared secret |
true |
false |
server.ajp.secret |
Shared secret for AJP connections |
unset |
some secret string of your choice |
The application is based on Spring Boot and using an embedded Tomcat server. You can configure additional aspects of the embedded web server using default Spring Boot configuration settings. |
Internal backup
INCEpTION stores its annotations internally in files. Whenever a user performs an action on a document, the file is updated. It is possible to configure INCEpTION to keep internal backups of these files, e.g. to safeguard against crashes or bugs.
The internal backups are controlled through three properties:
Setting | Description | Default | Example |
---|---|---|---|
backup.interval |
Time between backups (seconds) |
0 (disabled) |
300 (60 * 5 = 5 minutes) |
backup.keep.number |
Maximum number of backups to keep |
0 (unlimited) |
5 |
backup.keep.time |
Maximum age of backups to keep (seconds) |
0 (unlimited) |
2592000 (60 * 60 * 24 * 30 = 30 days) |
By default, backups are disabled (backup.interval is set to 0
). Changing this properties to
any positive number enables internal backups. The interval controls the minimum time between changes
to a document that needs to have elapsed in order for a new backup to be created.
When backups are enabled, either or both of the properties backup.keep.number and backup.keep.time should be changed as well, because their default values will cause the backups to be stored indefinitely and they will eventually fill up the disk.
The properties backup.keep.number and backup.keep.time control how long backups are keep and the maximal number of backups to keep. These settings are effective simultaneously.
backup.interval = 300 backup.keep.number = 10 backup.keep.time = 0
backup.interval = 300 backup.keep.number = 0 backup.keep.time = 604800
backup.interval = 300 backup.keep.number = 10 backup.keep.time = 604800
Custom header icons
INCEpTION allows adding custom icons to the page header. You can declare such custom icons in the settings.properties
file as shown in the example below. Each declaration begins with the prefix style.header.icon.
followed by an identifier (here myOrganization
and mySupport
). The suffixes .linkUrl
and .imageUrl
indicate the URL of the target page and of the icon image respectively. Images are automatically resized via CSS. However, to keep loading times low, you should point to a reasonably small image.
The order of the icons is controlled by the ID, not by the order in the configuration file!
style.header.icon.myOrganization.linkUrl=http://my.org style.header.icon.myOrganization.imageUrl=http://my.org/logo.png style.header.icon.mySupport.linkUrl=http://my.org/support style.header.icon.mySupport.imageUrl=http://my.org/help.png
Setting | Description | Default | Example |
---|---|---|---|
style.logo |
Logo image displayed in the upper-right corner |
unset |
path to an image file |
style.header.icon… |
Icons/links to display in the page header. For details, see below. |
unset |
Annotation editor
Setting | Description | Default | Example |
---|---|---|---|
ui.brat.autoScroll |
Whether to scroll the annotation being edited into the center of the page |
true |
|
ui.brat.pageSize |
The number of sentences to display per page |
5 |
|
ui.brat.singleClickSelection |
Whether to select annotations with a single click |
false |
|
ui.brat.rememberLayer |
Whether "remember layer" is activated by default |
false |
|
annotation.feature-support.string.autoCompleteThreshold |
If the tagset is larger than the threshold, an auto-complete field is used instead of a standard combobox. |
75 |
100 |
annotation.feature-support.string.autoCompleteMaxResults |
When an auto-complete field is used, this determines the maximum number of items shown in the dropdown menu. |
100 |
1000 |
External pre-authentication
INCEpTION can be used in conjunction with header-based external per-authentication. In this mode,
the application looks for a special HTTP header (by default remote_user
) and if that header exists,
it is taken for granted that this user has been authenticated. The application will check its internal
database if a user by the given name exists, otherwise it will create the user.
Pre-authentication can be enabled by setting the property auth.mode
to preauth
. When enabling
pre-authentication mode, the default roles for new users can be controlled using the
auth.preauth.newuser.roles
property. The ROLE_USER
is always added, even if not specified
explicitly. Adding also the role ROLE_PROEJCT_CREATOR
allows all auto-created users also to
create their own projects.
Since the default administrator user is not created in pre-authentication, it is useful to also
declare at least one user as an administrator. This is done through the property
auth.user.<username>.roles
where <username>
must be replaced with the name of the user.
The example below shows how the user Franz is given administrator permissions.
remote_user
header, new users can create projects, user Franz is always admin.auth.mode = preauth auth.preauth.header.principal = remote_user auth.preauth.newuser.roles = ROLE_PROJECT_CREATOR auth.user.Franz.roles = ROLE_ADMIN
The roles specified through auth.preauth.newuser.roles are saved in the database when a
user logs in for the first time and can be changed after creation through the user interface.
|
The roles added through auth.user.<username>.roles properties are not saved in the
database and cannot be edited through the user interface.
|
Setting | Description | Default | Example |
---|---|---|---|
auth.mode |
Authentication mode |
database |
preauth |
auth.preauth.header.principal |
Principal header |
remote_user |
some other header |
auth.preauth.newuser.roles |
Default roles for new users (comma separated) |
<none> |
ROLE_PROJECT_CREATOR |
auth.user.<username>.roles |
Extra roles for user (comma separated) |
<none> |
ROLE_ADMIN |
Concept Linking
There are several configurable parameters related to the Concept Linking functionality:
This parameter controls the size of the Candidate Cache, which stores a set of candidates for a mention. Increasing the cache size will reduce the number of queries that have to be made against the KB and therefore increase average retrieval time.
This parameter controls after how many concepts the ranking approach should take into account by
selecting the n
most frequent concepts. Increasing this parameter will lead to a longer ranking time,
since more candidates are considered for ranking.
This parameter declares the size k
of the context, where the context is defined as the words
included in a window with k
words to both left and right.
This parameter defines how many concepts should be retrieved for the Candidate Retrieval step. Increasing this parameter will lead to a longer time to retrieve candidates from the KB.
This parameter defines how many concepts should be retrieved for the Semantic Signature of a candidate. Increasing this parameter will lead to a longer time to retrieve concepts for constructing the Semantic Signature.
This parameter regulates how many candidates will be displayed for a mention in the Concept Selector UI.
If no value for a parameter is specified, its default value is used. The default values are shown as examples of how the parameters can be configured below:
Setting | Description | Default | Example |
---|---|---|---|
inception.entity-linking.cacheSize |
Cache size |
1024 |
- |
inception.entity-linking.candidateQueryLimit |
Candidate Retrieval Limit |
2500 |
- |
inception.entity-linking.mentionContextSize |
Mention Context Size |
5 |
- |
inception.entity-linking.candidateDisplayLimit |
Candidate Display Limit |
100 |
- |
inception.entity-linking.signatureQueryLimit |
Semantic Signature Query Limit |
2147483647 |
- |
Resources
In order to improve the quality of suggestions, several additional resources can be incorporated.
These are to be put into the .inception/resources
folder. These include:
-
properties_with_labels.txt
-
List of properties, each line containing information for one property, tab-separated
-
ID |
Label |
Description |
Aliases |
Data type |
Count |
P6 |
head of government |
head of the executive power of this town, city, municipality, state, country, or other governmental body |
government headed by, executive power headed by, president, chancellor |
wikibase-item |
17,592 |
-
property_blacklist.txt
-
A list of properties that are filtered when computing the Semantic Signature, one property ID per line, e.g.
P1005
,P1014
-
-
stopwords-en.txt
-
A list of stopwords, one stopword per line, e.g.
i
,me
-
-
wikidata_entity_freqs.map
-
Each line consists of a the ID of a concept and its frequency in the KB, tab-separated, e.g.
Q4664130 409104, Q30 205747
-
Knowledge Base Settings
This section describes the global settings related to the knowledge base module.
This parameter determines the default value for the maximum number of results that can be retrieved from a SPARQL query. The queries are used to retrieve concepts, statements, properties, etc. from the knowledge base. The maximum number of results can also be configured separately for each knowledge base in the project settings.
A hard limit for the Max results parameter.
If no value for the parameter is specified, its default value is used. The default value is shown as an example of how the parameter can be configured below:
Setting | Description | Default | Example |
---|---|---|---|
knowledge-base.enabled |
enable/disable KB support |
true |
false |
knowledge-base.defaultMaxResults |
default result limit for SPARQL query |
1000 |
10000 |
knowledge-base.hardMaxResults |
hard limit for the maximum number of results from a query |
10000 |
5000 |
knowledge-base.cacheSize |
number of items (classes, instances and properties) to cache |
100000 |
500000 |
knowledge-base.cacheExpireDelay |
time before items are expunged from the cache |
15m |
1h |
knowledge-base.cacheRefreshDelay |
time before items are asynchronously refreshed |
5m |
30m |
Disabling the knowledge base support will lead to the loss of concept linked features from documents/projects that were using them. If you wish to run the application without knowledge base support, it is strongly recommended to disable the feature immediately after the installation and not after any projects have potentially started using it. |
Scheduler Settings
This section describes the global settings related to the scheduler.
This parameter determines the number of threads the scheduler uses. It should be less than hardware threads available on the machine that runs INCEpTION. The higher the number, the more tasks can be run in parallel.
This parameter determines the maximum number of tasks that can be waiting in the scheduler queue. If the queue is full, then no new tasks can be scheduled until running tasks are completed.
If no value for the parameter is specified, its default value is used. The default value is shown as an example of how the parameter can be configured below:
Setting | Description | Default | Example |
---|---|---|---|
inception.scheduler.numberOfThreads |
Number of threads that run tasks |
4 |
8 |
inception.scheduler.queueSize |
Maximum number of tasks waiting for execution |
100 |
200 |
Document Repositories Settings
This section describes the global settings related to the external document repository support.
Setting | Description | Default | Example |
---|---|---|---|
external-search.enable |
Enable/disable document repository support |
true |
false |
Recommender Settings
This section describes the global settings related to the recommender module.
Setting | Description | Default | Example |
---|---|---|---|
recommender.enabled |
enable/disable recommender support |
true |
false |
recommender.evaluation-page.enabled |
enable/disable evaluation page |
true |
false |