Author Topic: DDR4 Failures  (Read 4730 times)

Offline maxTim

  • Newbie
  • *
  • Posts: 2
So I have failures across a single SODIMM DDR4 16gb chip. The general system test popped an advanced pattern test failure and the extended ram test had several failures:
  • Pattern
  • Bit High
  • Nibble Move
  • Checkerboard
  • Walking Ones Left
  • Walking Ones Right
  • Modulo20
  • Auxiliary Pattern
  • Moving Inversion

However, using a live OS boots fine with several Linux distributions. So if liveOS works, I shouldn't have bad RAM, right?

Offline colinc

  • Administrator
  • Jr. Member
  • *****
  • Posts: 50
Without knowing the history of the situation that led you to run tests on the RAM in the first place... You need to take into account, the faulty memory addresses and the Linux kernel virtual memory map. There are many great resources out there that can explain Linux kernel memory management. So, I will just comment on a few key areas that may be helpful in confirming the health of the RAM module.

If you have the faulty memory addresses saved from your previous tests, you can compare them to the memory allocation on your Linux live OS and see if Linux is even using that address space. From a terminal, as root: cat /proc/iomem

The Linux kernel can be configured to mark bad areas as reserved and work around them. Depending on your Live OS, it could be passing the /memtest kernel parameter on startup. You can grep the dmesg output for 'bad mem addr' to see if the kernel is running with memtest enabled. This assumes that the bad memory address(es) reside in the kernel virtual memory mapped area.

You will likely end up with app crashes and filesystem corruption if the faulty addresses get used by user space virtual memory. If you perform a heavy RAM and CPU intensive task, such as compiling the linux kernel, you would likely find the bad RAM fairly quickly.

Also, your Linux Live OS more than likely contains a version of memtest86. When in doubt, verify the results with another tool. But, be patient, memtest86 may take a long time to identify the faulty address(es).
To err is human... effective mayhem requires the root password.