Why does Windows rig keep restarting?

The first thing to note is do not panic — there are many reasons for a rig to be restarting, and it may take some time to diagnose, especially if a problem is only ocurring periodicaly. In that case, Triggers functionality can be a great aid, but it's always best to prevent issues rather than have them and fix them later.

OC

The most common reason for a rig on any OS to crash completely is the used overclock profile. Unfortunately setting it up is a classic example of being easy to start but hard to master — and even then, there is a factor of silicon lottery, which makes it impossible to simply copy the best settings found for one card over to the other of the same model, as the actual physical chip is a bit different, the specific cooler may contact the system a bit better, etc., etc.

When the configured OC is unstable, the card may lose the connection to the driver and resources interacting with driver or card may hang or crash completely, which in some cases can trigger a restart, BSOD, etc.

You can read more about configuring the overclock here.

Keep in mind that running multiple software applying OC is not advised, so if you intend to use third-party tools for it, set ClockTune values to skip and don't forget to apply the profile to the worker in the worker's configuration.

Software

On the software side of things, not counting the operating system, there still exist a few things to watch out for, in several categories.

Driver

As the driver is an intermediary layer between the hardware, the operating system, and the other applications, it is crucial to have one working properly.

Nvidia

The current recommended version for Nvidia GPUs include 472.12 and older "game ready" releases, as the 49x.xx line has been reported to have performance issues with some of the systems/mining clients.

Power management mode should be set as Prefer Maximum Performance in the driver's control panel.

AMD

For AMD the most recent recommended versions are: 21.1.1 if you're using 6800, 6800XT, or 6900XT (or older GPUs), with 21.3.1 recommended for 6700XT, and 21.10.2 reported as the most recent driver that properly runs with 6600 and 6600 XT models.

Manual OC mode needs to be enabled in the Driver's Control Panel for each GPU in the system for values set by ClockTune or different OC software to apply.

Miner and options for it

In some cases a mining client isn't very well compatible with the hardware, especially if the hardware or driver in use have been recently released, so it might be a good idea to try another mining client as a test. In case you're configuring advanced options for the mining client or are using a mining client with automatic configuration (i.e. T-Rex, NBMiner, Gminer on a rig with LHR cards) it's worth trying to set up a constant value for the auto-adjusting settings. For LHR settings see these articles: NBMiner T-Rex Gminer and for other settings / different mining clients, you might want to refer to the manuals of the specific client you're using. 

If the rig was stable and then the mining client updated (which is done automatically by default), you can select one of the previous versions of it in the worker's config to see if that resolves the issue.

Other software

Having third-party software installed and running during mining can mess with the mining process leading to fluctuating load (which can destabilize the rig), for instance it's recommended to not use multiple OC softwares at the same time.

Hardware

The other factor which plays a very important role is hardware. This includes all the components used in the system and BIOS settings.

Risers and cables

The likeliest troublemakers are risers and the power cables used to run them — you should always power risers via PCI-E 6-pin connection instead of SATA or MOLEX as both SATA and MOLEX are not guaranteed to provide the power that might be needed by the cards in the system.

Still, a riser might be a cause for reboots, hardware being not detected or producing errors. It often helps to set PCI-E to Gen 2 in the BIOS of the system in order to prevent excessive stress for the risers.

PSU

The other part of the hardware that is most critical for the well-being of the system is the Power Supply Unit as with a good-quality but not big enough PSU the system will shutdown to prevent damage, and with lower quality power supplies, it may outright damage the components. Read this article on how to choose a good power supply (or multiple).

GPUs

Another one is the GPUs used in the system. Some degrade and would require different clocks after years of mining, others might have issues with fans and overheat as a result of it, and some of the cards can break. If they do, it might not work at all, or be not recognized by the driver, or crash the system as soon as any load is applied to it.

It's necessary to take good care of the hardware in your system — i.e. dusting the cards regularly, in several years time a thermal paste replacement might be needed, and in case of Nvidia's 3000-series GPUs, generation of which is also known as Ampere, a thermal pads replacement is recommended for day 1, especifally for RTX 3090 cards.

Other hardware

CPU, RAM, Motherboard issues can all play a role too, for example causing a BSOD — these should be noticeable in the Event Viewer, which you can launch by typing eventvwr.msc in the Run menu (You can open it quickly by pressing Win+R) or simply Event Viewer in Start menu of the OS.

 

OS

On the OS side of things there are various things one needs to configure correctly, some less obvious then others.

Virtual Memory

Not to be confused with VRAM or RAM, Virtual Memory is part of the hard drive (or SSD) storage that's allocated for a special file called pagefile.sys. Setting this file to a big enough size allows the mining client to request RAM from the system resources, even if there is not enough actual RAM in the system, which is needed for the mining client to work with the resources of the GPU. For most purposes it's recommended to set the virtual memory to 16000MB minimum, and for 2+ GPUs in the system — 8000MB per GPU used in the system. See the details of how to configure it here

Power Settings and Screensaver

In the Power options for the system, launched by running powercfg.cpl then selecting Change plan settings and Change advanced power settings it's recommended to set automatic power management settings to off / prefer maximum performance. For example, Turn off hard disk — Never, PCI Express — Link State Power Management — Off, Display — Turn Off Display After — Never.

License

In some case a rig can be restarting due to not having a licensed version of windows. It may also be required to re-enter your license key after you change hardware in the system. 

Updates

Another common thing is updates changing versions of installed drivers, changing the previously configured settings (Defender turning on is a frequent showcase of such behavior) or simply breaking the Operating System outright, especially with the major version changes. It is therefore recommended to leave the OS on the known working versions and not let it automatically update.

One of the ways to stop automatic updates is to open services.msc then locate Windows Update, enter Properties for it by double-clicking it and setting Startup type setting to Manual.

It is sometimes possible to roll back an update by opening appwiz.cpl then pressing View Installed Updates, then selecting a necessary update and pressing Uninstall.

Hardware-accelerated Scheduler and Resizable BAR

These are features, for which support has been added to the newest versions of Windows and for newer hardware.

The first can lead to performance issues and can be disabled in Windows Settings, Display, Graphics Settings, Change default graphics settings, and toggling the switch for Hardware-accelerated GPU scheduling.

The Resizable BAR is only supported on Intel's Comet Lake and AMD's Zen 2 series (or newer) processors and respective platforms, and it can be set to off in the BIOS settings.

Corrupted OS

While not being as common, the operating system installation can get corrupted — especially when experiencing crashes in it's internal components. Sometimes it can be "brought back" by updating by using an installer of a newer version or by reinstalling "on top" of the current install but in some cases a complete wipe is easier to start with.

It is here that we can suggest trying msOS — a Linux-based operating system that comes ready to be used for mining purposes out-of-the-box.

Here's an article on how to prepare your rig for using with it. And here's the complete installation process in written and video forms.

Other reasons

This article goes over the most common things that can cause the rig to misbehave, leading to restarts in many cases. There exist other reasons, not mentioned in this article, so you can use these tools to diagnose this behavior:

  • In the dashboard you can use the latest activity and alerts view to see the latest reported events.
  • Running the diagnostic audit can help you diagnose issues with your build.
  • Additionally, you can take a look at the mining client logs in the folder of the mining client.
  • And for the system messages, the Event Viewer utility can show issues reported by the OS, you can launch it by typing eventvwr.msc in the Run menu (You can open it quickly by pressing Win+R) or simply Event Viewer in Start menu of the OS.
  • In extreme cases, trying to reduce the amount variables can be beneficial for diagnosing the instabilities. This can include swapping hardware to known working parts, trying to switch to msOS, or running the system with as little hardware as possible connected (i.e. removing a GPU and a riser if it seems that the specific card is unstable).
  • Feel free to contact support@minerstat.com or join our discord server for support.
Revolutionize your mining operation with minerstat

It's easy and free to get started. Monitor, manage, and optimize your mining rigs with our powerful platform.

Sign up for free now