[BugFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0 by 00why00 · Pull Request #750 · NVIDIA/FasterTransformer

00why00 · 2023-08-23T08:36:35Z

The problem is caused by the empty tensor check in quant ops in

FasterTransformer/examples/pytorch/gpt/utils/gpt.py

Lines 46 to 47 in f8e42aa

    
           quant = torch.ops.fastertransformer.symmetric_quantize_last_axis_of_batched_matrix 
        
           self.weight_transpose_calibrate_quantize = lambda x: quant(x, torch.int8)

When pipeline_para_size > 1, the current rank does not load the weights of all layers, resulting in quantization errors, and it is necessary to add is_load() check when traversing layers

…!= 0

[BigFix] add is_load check when pipeline_para_size > 1 and int8_mode …

322ad2a

…!= 0

00why00 changed the title ~~[BigFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0~~ Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0#750

[BugFix] GPT inference error when pipeline_para_size > 1 and int8_mode != 0#750
00why00 wants to merge 1 commit intoNVIDIA:mainfrom
WHY-Fork:main

00why00 commented Aug 23, 2023

Labels

1 participant

	quant = torch.ops.fastertransformer.symmetric_quantize_last_axis_of_batched_matrix
	self.weight_transpose_calibrate_quantize = lambda x: quant(x, torch.int8)

Conversation

00why00 commented Aug 23, 2023

Labels

1 participant