Cogvlm2 模型部署报错 torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 54.00 MiB. GPU

hengine · 2024 年6 月 3 日 02:11

Cogvlm2搭建
1：模型为：cogvlm2-llama3-chinese-chat-19B
2：GPU V100 3张，单张32G
3：修改cli_demo_multi_gpu.py中device_map = infer_auto_device_map( model=model, #max_memory={i: max_memory_per_gpu for i in range(num_gpus)}, max_memory={i: “32GiB” for i in range(num_gpus)}, no_split_module_classes=[“CogVLMDecoderLayer”] )
4：使用命令CUDA_VISIBLE_DEVICES=0,1,2 chainlit run web_demo.py启动报错信息：torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 54.00 MiB. GPU