-
Notifications
You must be signed in to change notification settings - Fork 51
Description
Thanks for your great work!
I encountered the following error when use model to generate answer with audio input. The version of the model is ming-lite-omni-v1.5.
Traceback (most recent call last):
File "/workspace/audio_eval_llms/evals/ming_lite_omni1_5.py", line 247, in
main(args)
File "/workspace/audio_eval_llms/evals/ming_lite_omni1_5.py", line 219, in main
generated_ids = model.generate(
File "/anaconda3/envs/ming_lite_omni/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/workspace/Ming/modeling_bailingmm.py", line 655, in generate
audio_embeds, audio_embeds_lengths = self.extract_audio_feature(
File "/workspace/Ming/modeling_bailingmm.py", line 313, in extract_audio_feature
audio_embeds, _, audio_embeds_lengths = encode_audio_segments(
File "/workspace/Ming/modeling_utils.py", line 913, in encode_audio_segments
audio_feats_seg = encoder(feat_segs_batch)
File "/anaconda3/envs/ming_lite_omni/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/anaconda3/envs/ming_lite_omni/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/Ming/modeling_whisper_encoder.py", line 24, in forward
x = F.gelu(self.conv1(x))
File "/anaconda3/envs/ming_lite_omni/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/anaconda3/envs/ming_lite_omni/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
return forward_call(*args, **kwargs)
File "/anaconda3/envs/ming_lite_omni/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 375, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/anaconda3/envs/ming_lite_omni/lib/python3.10/site-packages/whisper/model.py", line 57, in _conv_forward
return super()._conv_forward(
File "/anaconda3/envs/ming_lite_omni/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 370, in _conv_forward
return F.conv1d(
RuntimeError: Given groups=1, weight of size [1280, 128, 3], expected input[1, 560, 353] to have 128 channels, but got 560 channels instead
Seems that the new edition of ming is using whisper audio encoder that cannot correctly deal with the original data form.
How can I make this work?
Waiting for your reply, thanks.