How to manage mining client watchdog?

In this article we'll explore what a watchdog is, how it functions, how it's different from hardware watchdogs, and how to set it up in various mining clients.

What is a watchdog and how it functions?

A watchdog is a utility that monitors your hardware and software operation. Let's see how watchdogs are utilized in mining.

Software watchdogs

In case of software watchdogs, the monitoring is done by a process running in background on the system, which periodically checks if your mining client is performing well, if the GPUs are responsive, etc. Upon detecting a failure an action is taken, most often a restart or force exit of the client. The drawbacks of such approach become apparent when the mining client process crashes so hard that the watchdog process crashes with it, or, if the system experiences a hardware crash, or gets stuck, on such occasions the software watchdog will not be able to perform any action.

Hardware watchdogs

In case of hardware watchdogs, the monitoring is done by an external devices that periodically sends requests to the machine, most often it's a USB device which sends signal to the kernel of the system. By getting a reply from the system, the watchdog "knows" that the system is working and resets it's internal timer. If the request sent to the system times out, the watchdog uses simple electrical connection to the Reset Switch pins on the motherboard to force the reset of the system. The drawback of such approach becomes apparent when the mining client process crashes "lightly" or gets stuck, reporting that it's mining when it actually is not and is stuck. The system will respond to the hardware watchdog successfully and no action is taken, but effectively no mining is done.

Note You can refer to the hardware watchdogs article to see some of the supported models and more information about them.

Configuring watchdogs

Configuring software watchdogs will take us to the advanced configuration editor instead of the simple. However, using it isn't as hard it may seem, just make sure to enter the options as listed.

Did you know? On msOS and Windows node we already have software watchdog built-in as part of the agent. If the Windows-based machine's client has crashed without manual (or automated) mining stop requested, the minerstat node will attempt to restart the miner. If the msOS machine is not mining, the agent reboots it with subsequent client start. However, it is sometimes worth to set-up additional watchdog, on the mining client.

PhoenixMiner

By default, PhoenixMiner has watchdog enabled. To disable it, add option -wdog 0

You can control the timer for the watchdog when it's enabled, the option is -wdtimeout 30, where 30 is the number of seconds it takes for the watchdog to timeout, the acceptable range is 30 to 300, with default being 45.

You can also control the action that is perform when watchdog timeout is triggered. The option for this is -rmode X

  • -rmode 0 No restart, the miner shuts down. Notice that minerstat will detect the miner crash and restart it.
  • -rmode 1 The default option, the miner gets restarted with the same command line options.
  • -rmode 2 The miner shuts down and reboots the system.

Below is a configuration example of PhoenixMiner with auto-reboot after 90 seconds timeout with explicitly enabled watchdog:

-worker (WORKER)⁣⁣ -pool (POOL:ETC)⁣ -wal (WALLET:ETC)⁣.(WORKER) -pass x -coin etc -eres 0 -log 0 -gbase 0 -proto (AUTO) -wdog 1 -wdtimeout 90 -rmode 2

T-Rex

By default, T-Rex has watchdog disabled. To enable it, change option "no-watchdog":true to be "watchdog-exit-mode":"N:M:A" instead:

  • N is number of restarts
  • M is number of minutes in which N number of restarts needs to occur to trigger the watchdog
  • A is the action that is taken in case it gets triggered. The actions include: e to exit the miner, r to reboot the system, s to shutdown the system completely.

Here are a few examples:

  • "20:10:s" - watchdog will shutdown the system if the miner gets restarted 20 times within any 10 minute interval
  • "5:7:r" - watchdog will reboot the system if the miner gets restarted 5 times within any 7 minute interval
  • "1:1:e" - watchdog will exit the miner in case the mining process crashes

Here's the config of the miner with the last example used.

{ "pools": [ { "user": "(WALLET:ETC)⁣", "worker": "(WORKER)⁣", "url": "(POOL:ETC)⁣", "pass": "x" } ], "no-nvml": true, "api-bind-http": "127.0.0.1:4068", "json-response": true, "pci-indexing":true, "retries": 3, "retry-pause": 5, "timeout": 500, "watchdog-exit-mode":"1:1:e", "algo": "etchash", "exit-on-cuda-error": true, "exit-on-connection-lost": false }

Make sure to not remove any comma or bracket by accident when using Advanced config.

TeamRedMiner

By default, TeamRedMiner has watchdog enabled. To disable it, add option --watchdog_disabled.

There are several watchdog options available in TeamRedMiner:

  • --no_gpu_monitor Disables miner internal monitoring the GPU for it's temperature and fan speed.
  • --temp_limit=TEMP Sets the temperature at which the GPUs are considered too hot and stop mining. Default is 85C (Celsius). Make sure to always set the resume temp (listed below) to configure this correctly.
  • --temp_resume=TEMP Sets the temperature at which the GPUs are considered cold enough to resume mining, default is 99C (Celsius), effectively disabling the start-stop behavior.
  • --watchdog_script=X Configures the GPU watchdog to shut down the miner and run the specified platform and exits immediately. The default script is watchdog.bat/watchdog.sh in the current directory, but a different script can be provided as an optional argument, potentially with a absolute or relative path as well.
  • --watchdog_test Tests the configured watchdog script by triggering the same action as a dead GPU after ~20 secs of mining.

Specific to mining Ethash-based coins, i.e. Ethereum Classic (ETC) is--eth_hashwatch=N,M where N and M are hashrate values in MH/s set as a range. When the hashrate of one of the GPUs is outside the specified range, the watchdog will be triggered. You can enter -1 for one of the values to make it unlimited. I.e. --eth_hashwatch=-1,1000 so the watchdogs triggers only when hashrate is above 1000MH/s but not when it's even 0.01MH/s.

Here's a config example with hashrate watcher set for 20-35MH/s and no temperature or fan speed monitoring enabled:

--algo etchash -o (POOL:ETC)⁣ -u (WALLET:ETC)⁣.(WORKER)⁣ -p x --eth_dag_slowdown=9 --watchdog_script=/home/minerstat/minerstat-os/bin/reboot.sh --eth_no_ramp_up --eth_hashwatch=20,35 --no_gpu_monitor

lolMiner

By default, lolMiner comes with watchdog in "script" mode which exits the miner and runs the file inside the miner's default directory emergency.sh or emergency.bat depending on the platform. However, as the file is empty, the watchdog is effectively setting the miner to mode "exit". You can explicitly set it to be on or off:

  • "WATCHDOG": "off" This will do nothing except for printing a message. If only a single card did crash and not the whole driver this means the other cards will continue mining.
  • "WATCHDOG": "exit" This will close the miner with a exit code of 42. The miner will automatically restart after some seconds of pause as minerstat detects it's crash. This is recommended and default option.

Example of config with watchdog set to exit miner when a GPU is detected as "lost":

{ "MINERSTAT" : { "DEVICES" : "AUTO", "APIPORT" : 3333, "ALGO" : "ETCHASH", "POOLS" : [ {"POOL" : "(POOL:ETC)⁣", "PORT" : "(AUTO)⁣⁣", "USER" : "(WALLET:ETC)⁣⁣.(WORKER)⁣⁣", "PASS" : "x"} ], "WATCHDOG" : "exit" } }

GMiner

By default, GMiner comes with watchdog on. Note that in GMiner the watchdog also works in cases where the network connection to the pool is lost.

  • --watchdog 1 or shortly -w 1 - enables or disable watchdog, default value is 1, enabled. You can set this option to --watchdog 0 to disable the watchdog.
  • --watchdog_restart_delay 10 - miner restart delay for watchdog in seconds, default value is 10 seconds.

Example of config with watchdog set to restart the miner after 5 seconds of watchdog timeout:

-pass x --algo etchash --server (POOL:ETC)⁣ --port (AUTO)⁣ --ssl 0 --user (WALLET:ETC)⁣.(WORKER)⁣ --watchdog 1 --watchdog_restart_delay 10

NBMiner

By default, NBMiner comes with watchdog process enabled. You can disable it by adding option --no-watchdog.

Example of NBMiner configured with watchdog disabled:

-a etchash -o (POOL:ETC) -u (WALLET:ETC).(WORKER) -p x -long-format --no-watchdog
Revolutionize your mining operation with minerstat

It's easy and free to get started. Monitor, manage, and optimize your mining rigs with our powerful platform.

Sign up for free now