You may need to utilize the gpu_memory_limit and/or lora_on_cpu config possibilities to prevent working from memory. If you still operate from CUDA memory, you can attempt to merge in program RAM with
Posted in博客 https://declanrteo198448.bloggip.com/profile