Cogvlm2 模型部署报错 torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 54.00 MiB. GPU

Cogvlm2搭建
1:模型为:cogvlm2-llama3-chinese-chat-19B
2:GPU V100 3张,单张32G
3:修改cli_demo_multi_gpu.py中device_map = infer_auto_device_map( model=model, #max_memory={i: max_memory_per_gpu for i in range(num_gpus)}, max_memory={i: “32GiB” for i in range(num_gpus)}, no_split_module_classes=[“CogVLMDecoderLayer”] )
4:使用命令CUDA_VISIBLE_DEVICES=0,1,2 chainlit run web_demo.py启动 报错信息:torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 54.00 MiB. GPU