You are not logged in.
Setup:
AMD 7950X
RX 9070 XT
KDE Plasma
multiple monitors connected to both the iGPU and dGPU
It is stable with kernel 6.17.9 and prior.
On 6.18 it crashes only with video playback on the browser, with or without llama.cpp running on the dGPU.
At least has of 2025-12-17, don't know if there was any update since then.
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State Completed
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 timeout, signaled seq=1, emitted seq=3
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: Process brave pid 4180 thread brave:cs0 pid 4244
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: Starting comp_1.1.0 ring reset
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: reset compute queue (1:1:0)
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0)
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040A40
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: MORE_FAULTS: 0x0
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: WALKER_ERROR: 0x0
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: PERMISSION_FAULTS: 0x4
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: MAPPING_ERROR: 0x0
Dec 17 10:12:52 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: RW: 0x1
Dec 17 10:12:53 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: Ring comp_1.1.0 reset failed
Dec 17 10:12:53 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!. Source: 1
Dec 17 10:12:53 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: Suspending all queues failed
Dec 17 10:12:53 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: remove_all_kfd_queues_mes: Failed to remove queue 3 for dev 3197
Dec 17 10:12:53 dt1 kernel: traps: llama-server[6229] general protection fault ip:7f0c7d6b09a2 sp:7f0c157fd4b0 error:0 in libc.so.6[289a2,7f0c7d6b0000+188000]
Dec 17 10:12:53 dt1 systemd-coredump[8938]: Process 6114 (llama-server) of user 0 terminated abnormally with signal 11/SEGV, processing...
...skipping...
Dec 17 10:12:57 dt1 kernel: amdgpu 0000:03:00.0: amdgpu: [drm] *ERROR* Failed to initialize parser -125!
Dec 17 10:12:57 dt1 flatpak[4180]: amdgpu: The CS has cancelled because the context is lost. This context is innocent.
Dec 17 10:12:57 dt1 flatpak[3827]: [1217/101257.183840:ERROR:third_party/crashpad/crashpad/util/linux/scoped_ptrace_attach.cc:27] ptrace: Operation not permitted (1)
Dec 17 10:12:57 dt1 plasmashell[2892]: QRhiGles2: Context is lost.
Dec 17 10:12:57 dt1 plasmashell[2892]: Graphics device lost, cleaning up scenegraph and releasing RHI
Dec 17 10:12:57 dt1 systemd-coredump[8978]: Process 4180 (brave) of user 1000 terminated abnormally with signal 6/ABRT, processing...
Dec 17 10:12:57 dt1 systemd-coredump[8979]: Process 1057 (Xorg) of user 0 terminated abnormally with signal 6/ABRT, processing...
Dec 17 10:12:57 dt1 systemd[1]: Started Process Core Dump (PID 8978/UID 0).
Dec 17 10:12:57 dt1 systemd[1]: Started Process Core Dump (PID 8979/UID 0).
Dec 17 10:12:57 dt1 kernel: amdgpu 0000:03:00.0: [drm] device wedged, but recovered through reset
Dec 17 10:12:57 dt1 systemd[1]: Started Pass systemd-coredump journal entries to relevant user for potential DrKonqi handling.
Dec 17 10:12:57 dt1 systemd[1]: Started Pass systemd-coredump journal entries to relevant user for potential DrKonqi handling.Offline
There is a bug in the 6.18.1 kernel that's caused by AI (ollama / comfyui / llama.cpp). No fixes yet.
Offline
hmm, I did try with llamacpp off and it still crashed... interesting!
Well I am going to test again to see if it is the issue.
Thanks!
Offline
In my case it freezes even after I close ComfyUI so if llama.cpp is running as a service for you that might be the reason even if you turn it off afterwards.
Offline
Your right, just tested it, probably something to do with ROCm, lets wait and see.
Thanks again!
Offline
Glad to see someone has the same problem with me, so I know that it isn't my problem, I was so panic until I see this post.
Do the kernel developers know this issue?
Last edited by laichiaheng (2025-12-18 17:16:33)
Offline
Yes, but I guess it would be OK to comment on either https://gitlab.freedesktop.org/drm/amd/-/issues/4765 or https://gitlab.freedesktop.org/drm/amd/-/issues/4783 to let them know it's affecting more people.
Offline
Hello, I have the exact same problem with a AMD 7800X3D CPU and a AMD 9070XT GPU since Linux kernel version 6.18.1.
Would just like to add that the crash/coredump does not only happen when running LLMs, but indeed also when attempting to play a video or a game.
Edit: reverting to kernel version 6.17.9 "fixed" it for now. I've added the kernel to ignored packages in /etc/pacman.conf, guess I'll stick with it for now ![]()
Last edited by pixeled (2025-12-21 20:05:00)
Offline
Well the bug is in the kernel, so it makes sense. But it's 100% reproducible with LLMs so it's easier to report. I will try to play some games today (holidays yey) and comment if it crashes for me.
Offline
It still happens on 6.18.2
Last edited by laichiaheng (2025-12-21 04:59:39)
Offline
I have the same bug, a random crash can occur when watching on full screen mode a youtube video in Firefox, graphic card Radeon RX7600 is used.
Last edited by Potomac (2025-12-29 23:55:55)
Offline
It happens on 6.18.3. With ROCm installed, playing a modern video game leads to crash instantly, playing video with vlc also cause crash randomly.
Offline
It happens on 6.18.3. With ROCm installed, playing a modern video game leads to crash instantly, playing video with vlc also cause crash randomly.
And without ROCm installed do you have the bug ?
I think the bug is not triggered by the presence of ROCm, it is a bug related to AMDGPU driver provided with kernel 6.18.x,
I have also the bug when watching a youtube video with Firefox (random crash), the bug disapears when I downgrade the kernel to 6.17.9 version.
I hope it will be solved with kernel 6.19.x.
Last edited by Potomac (2026-01-07 19:04:42)
Offline
Up until 6.18.3 the issue was easily reproducible for me on a laptop with a Radeon 890M GPU. All I needed to do was join a Google Meet meeting: the Wayland Gnome session would crash immediately - I don't have ROCm installed.
As of 6.18.4 I can no longer reproduce the issue. I see some amdgpu commits in the 6.18.4 changelog that might have something to do with it.
Offline
Up until 6.18.3 the issue was easily reproducible for me on a laptop with a Radeon 890M GPU. All I needed to do was join a Google Meet meeting: the Wayland Gnome session would crash immediately - I don't have ROCm installed.
As of 6.18.4 I can no longer reproduce the issue. I see some amdgpu commits in the 6.18.4 changelog that might have something to do with it.
To save more people from trying it, the bug is still there. It's specific for RDNA4 and ROCm. Just a simple `clinfo` is enough to trigger it.
Offline
There may be other factors involved.
Ryzen 9 9950x , Radeon RX 9060XT, kernel 6.18.4, linux-firmware-amdgpu 20251125-2 , with rocm-opencl-runtime installed.
$ clinfo
Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.1 AMD-APP.dbg (3581.0)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback
Platform Extensions function suffix AMD
Platform Host timer resolution 1ns
Platform Name AMD Accelerated Parallel Processing
Number of devices 2
Device Name gfx1200
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 2.0
Driver Version 3581.0 (HSA1.1,LC)
Device OpenCL C Version OpenCL C 2.0
Device Type GPU
Device Board Name (AMD) AMD Radeon RX 9060 XT
Device PCI-e ID (AMD) 0x7590
Device Topology (AMD) PCI-E, 0000:03:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 16
SIMD per compute unit (AMD) 4
SIMD width (AMD) 32
SIMD instruction width (AMD) 1
Max clock frequency 2740MHz
Graphics IP (AMD) 12.0
Device Partition (core)
Max number of sub-devices 16
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 256
Preferred work group size (AMD) 256
Max work group size (AMD) 1024
Preferred work group size multiple (kernel) 32
Wavefront width (AMD) 32
Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 17095983104 (15.92GiB)
Global free memory (AMD) 16465920 (15.7GiB) 16465920 (15.7GiB)
Global memory channels (AMD) 4
Global memory banks per channel (AMD) 4
Global memory bank width (AMD) 256 bytes
Error Correction support No
Max memory allocation 14531585632 (13.53GiB)
Unified memory for Host and Device No
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing No
Atomics No
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Preferred alignment for atomics
SVM 0 bytes
Global 0 bytes
Local 0 bytes
Max size for global variable 14531585632 (13.53GiB)
Preferred total size of global vars 17095983104 (15.92GiB)
Global Memory cache type Read/Write
Global Memory cache size 32768 (32KiB)
Global Memory cache line size 256 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 8192 images
Base address alignment for 2D image buffers 256 bytes
Pitch alignment for 2D image buffers 256 pixels
Max 2D image size 16384x16384 pixels
Max 3D image size 16384x16384x8192 pixels
Max number of read image args 128
Max number of write image args 8
Max number of read/write image args 64
Max number of pipe args 16
Max active pipe reservations 16
Max pipe packet size 1646683744 (1.534GiB)
Local memory type Local
Local memory size 65536 (64KiB)
Local memory size per CU (AMD) 65536 (64KiB)
Local memory banks (AMD) 32
Max number of constant args 8
Max constant buffer size 14531585632 (13.53GiB)
Preferred constant buffer size (AMD) 16384 (16KiB)
Max size of kernel argument 1024
Queue properties (on host)
Out-of-order execution No
Profiling Yes
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 262144 (256KiB)
Max size 8388608 (8MiB)
Max queues on device 1
Max events on device 1024
Prefer user sync for interop Yes
Number of P2P devices (AMD) 0
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 0ns (Thu Jan 1 01:00:00 1970)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Thread trace supported (AMD) No
Number of async queues (AMD) 8
Max real-time compute queues (AMD) 8
Max real-time compute units (AMD) 16
printf() buffer size 4194304 (4MiB)
Built-in kernels (n/a)
Device Extensions cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
Device Name gfx1036
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 2.0
Driver Version 3581.0 (HSA1.1,LC)
Device OpenCL C Version OpenCL C 2.0
Device Type GPU
Device Board Name (AMD) AMD Ryzen 9 9950X 16-Core Processor
Device PCI-e ID (AMD) 0x13c0
Device Topology (AMD) PCI-E, 0000:73:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 1
SIMD per compute unit (AMD) 4
SIMD width (AMD) 32
SIMD instruction width (AMD) 1
Max clock frequency 2200MHz
Graphics IP (AMD) 10.3
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x1024
Max work group size 256
Preferred work group size (AMD) 256
Max work group size (AMD) 1024
Preferred work group size multiple (kernel) 32
Wavefront width (AMD) 32
Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (cl_khr_fp16)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (cl_khr_fp16)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 32427675648 (30.2GiB)
Global free memory (AMD) 31524228 (30.06GiB) 31524228 (30.06GiB)
Global memory channels (AMD) 4
Global memory banks per channel (AMD) 4
Global memory bank width (AMD) 256 bytes
Error Correction support No
Max memory allocation 27563524296 (25.67GiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing No
Atomics No
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Preferred alignment for atomics
SVM 0 bytes
Global 0 bytes
Local 0 bytes
Max size for global variable 27563524296 (25.67GiB)
Preferred total size of global vars 32427675648 (30.2GiB)
Global Memory cache type Read/Write
Global Memory cache size 16384 (16KiB)
Global Memory cache line size 128 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 8192 images
Base address alignment for 2D image buffers 256 bytes
Pitch alignment for 2D image buffers 256 pixels
Max 2D image size 16384x16384 pixels
Max 3D image size 16384x16384x8192 pixels
Max number of read image args 128
Max number of write image args 8
Max number of read/write image args 64
Max number of pipe args 16
Max active pipe reservations 16
Max pipe packet size 1793720520 (1.671GiB)
Local memory type Local
Local memory size 65536 (64KiB)
Local memory size per CU (AMD) 65536 (64KiB)
Local memory banks (AMD) 32
Max number of constant args 8
Max constant buffer size 27563524296 (25.67GiB)
Preferred constant buffer size (AMD) 16384 (16KiB)
Max size of kernel argument 1024
Queue properties (on host)
Out-of-order execution No
Profiling Yes
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 262144 (256KiB)
Max size 8388608 (8MiB)
Max queues on device 1
Max events on device 1024
Prefer user sync for interop Yes
Number of P2P devices (AMD) 0
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 0ns (Thu Jan 1 01:00:00 1970)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Thread trace supported (AMD) No
Number of async queues (AMD) 8
Max real-time compute queues (AMD) 8
Max real-time compute units (AMD) 1
printf() buffer size 4194304 (4MiB)
Built-in kernels (n/a)
Device Extensions cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) AMD Accelerated Parallel Processing
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [AMD]
clCreateContext(NULL, ...) [default] Success [AMD]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name gfx1200
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (2)
Platform Name AMD Accelerated Parallel Processing
Device Name gfx1200
Device Name gfx1036
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (2)
Platform Name AMD Accelerated Parallel Processing
Device Name gfx1200
Device Name gfx1036
ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.3.4
ICD loader Profile OpenCL 3.0
$ Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline