Skip to content

[BugFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0#750

Open
00why00 wants to merge 1 commit intoNVIDIA:mainfrom
WHY-Fork:main
Open

[BugFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0#750
00why00 wants to merge 1 commit intoNVIDIA:mainfrom
WHY-Fork:main

Conversation

@00why00
Copy link

@00why00 00why00 commented Aug 23, 2023

The problem is caused by the empty tensor check in quant ops in

quant = torch.ops.fastertransformer.symmetric_quantize_last_axis_of_batched_matrix
self.weight_transpose_calibrate_quantize = lambda x: quant(x, torch.int8)

When pipeline_para_size > 1, the current rank does not load the weights of all layers, resulting in quantization errors, and it is necessary to add is_load() check when traversing layers

@00why00 00why00 changed the title [BigFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0 Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant