The first thing to note is do not panic — there are many reasons for a rig to be restarting, and it may take some time to diagnose, especially if a problem is only ocurring periodicaly. In that case, Triggers functionality can be a great aid, but it's always best to prevent issues rather than have them and fix them later.
The most common reason for a rig on any OS to crash completely is the used overclock profile. Unfortunately setting it up is a classic example of being easy to start but hard to master — and even then, there is a factor of silicon lottery, which makes it impossible to simply copy the best settings found for one card over to the other of the same model, as the actual physical chip is a bit different, the specific cooler may contact the system a bit better, etc., etc.
When the configured OC is unstable, the card may lose the connection to the driver and resources interacting with driver or card may hang or crash completely, which in some cases can trigger a restart, BSOD, etc.
You can read more about configuring the overclock here.
skipand don't forget to apply the profile to the worker in the worker's configuration.
On the software side of things, not counting the operating system, there still exist a few things to watch out for, in several categories.
As the driver is an intermediary layer between the hardware, the operating system, and the other applications, it is crucial to have one working properly.
The current recommended version for Nvidia GPUs include 472.12 and older "game ready" releases, as the 49x.xx line has been reported to have performance issues with some of the systems/mining clients.
Power management mode should be set as Prefer Maximum Performance in the driver's control panel.
For AMD the most recent recommended versions are: 21.1.1 if you're using 6800, 6800XT, or 6900XT (or older GPUs), with 21.3.1 recommended for 6700XT, and 21.10.2 reported as the most recent driver that properly runs with 6600 and 6600 XT models.
Manual OC mode needs to be enabled in the Driver's Control Panel for each GPU in the system for values set by ClockTune or different OC software to apply.
In some cases a mining client isn't very well compatible with the hardware, especially if the hardware or driver in use have been recently released, so it might be a good idea to try another mining client as a test. In case you're configuring advanced options for the mining client or are using a mining client with automatic configuration (i.e. T-Rex, NBMiner, Gminer on a rig with LHR cards) it's worth trying to set up a constant value for the auto-adjusting settings. For LHR settings see these articles: NBMiner T-Rex Gminer and for other settings / different mining clients, you might want to refer to the manuals of the specific client you're using.
If the rig was stable and then the mining client updated (which is done automatically by default), you can select one of the previous versions of it in the worker's config to see if that resolves the issue.
Having third-party software installed and running during mining can mess with the mining process leading to fluctuating load (which can destabilize the rig), for instance it's recommended to not use multiple OC softwares at the same time.
The other factor which plays a very important role is hardware. This includes all the components used in the system and BIOS settings.
The likeliest troublemakers are risers and the power cables used to run them — you should always power risers via PCI-E 6-pin connection instead of SATA or MOLEX as both SATA and MOLEX are not guaranteed to provide the power that might be needed by the cards in the system.
Still, a riser might be a cause for reboots, hardware being not detected or producing errors. It often helps to set PCI-E to Gen 2 in the BIOS of the system in order to prevent excessive stress for the risers.
The other part of the hardware that is most critical for the well-being of the system is the Power Supply Unit as with a good-quality but not big enough PSU the system will shutdown to prevent damage, and with lower quality power supplies, it may outright damage the components. Read this article on how to choose a good power supply (or multiple).
Another one is the GPUs used in the system. Some degrade and would require different clocks after years of mining, others might have issues with fans and overheat as a result of it, and some of the cards can break. If they do, it might not work at all, or be not recognized by the driver, or crash the system as soon as any load is applied to it.
It's necessary to take good care of the hardware in your system — i.e. dusting the cards regularly, in several years time a thermal paste replacement might be needed, and in case of Nvidia's 3000-series GPUs, generation of which is also known as Ampere, a thermal pads replacement is recommended for day 1, especifally for RTX 3090 cards.
CPU, RAM, Motherboard issues can all play a role too, for example causing a BSOD — these should be noticeable in the Event Viewer, which you can launch by typing
eventvwr.msc in the Run menu (You can open it quickly by pressing Win+R) or simply
Event Viewer in Start menu of the OS.
On the OS side of things there are various things one needs to configure correctly, some less obvious then others.
Not to be confused with VRAM or RAM, Virtual Memory is part of the hard drive (or SSD) storage that's allocated for a special file called pagefile.sys. Setting this file to a big enough size allows the mining client to request RAM from the system resources, even if there is not enough actual RAM in the system, which is needed for the mining client to work with the resources of the GPU. For most purposes it's recommended to set the virtual memory to 16000MB minimum, and for 2+ GPUs in the system — 8000MB per GPU used in the system. See the details of how to configure it here.
In the Power options for the system, launched by running
powercfg.cpl then selecting Change plan settings and Change advanced power settings it's recommended to set automatic power management settings to off / prefer maximum performance. For example, Turn off hard disk — Never, PCI Express — Link State Power Management — Off, Display — Turn Off Display After — Never.
In some case a rig can be restarting due to not having a licensed version of windows. It may also be required to re-enter your license key after you change hardware in the system.
Another common thing is updates changing versions of installed drivers, changing the previously configured settings (Defender turning on is a frequent showcase of such behavior) or simply breaking the Operating System outright, especially with the major version changes. It is therefore recommended to leave the OS on the known working versions and not let it automatically update.
One of the ways to stop automatic updates is to open
services.msc then locate Windows Update, enter Properties for it by double-clicking it and setting Startup type setting to Manual.
It is sometimes possible to roll back an update by opening
appwiz.cpl then pressing View Installed Updates, then selecting a necessary update and pressing Uninstall.
These are features, for which support has been added to the newest versions of Windows and for newer hardware.
The first can lead to performance issues and can be disabled in Windows Settings, Display, Graphics Settings, Change default graphics settings, and toggling the switch for Hardware-accelerated GPU scheduling.
The Resizable BAR is only supported on Intel's Comet Lake and AMD's Zen 2 series (or newer) processors and respective platforms, and it can be set to off in the BIOS settings.
While not being as common, the operating system installation can get corrupted — especially when experiencing crashes in it's internal components. Sometimes it can be "brought back" by updating by using an installer of a newer version or by reinstalling "on top" of the current install but in some cases a complete wipe is easier to start with.
It is here that we can suggest trying msOS — a Linux-based operating system that comes ready to be used for mining purposes out-of-the-box.
Here's an article on how to prepare your rig for using with it. And here's the complete installation process in written and video forms.
This article goes over the most common things that can cause the rig to misbehave, leading to restarts in many cases. There exist other reasons, not mentioned in this article, so you can use these tools to diagnose this behavior:
eventvwr.mscin the Run menu (You can open it quickly by pressing Win+R) or simply
Event Viewerin Start menu of the OS.
email@example.com join our discord server for support.
It's easy and free to get startedSign up now