[python] Add multi-threaded prefetch for pytorch streaming read#7143
[python] Add multi-threaded prefetch for pytorch streaming read#7143XiaoHongbo-Hope wants to merge 11 commits intoapache:masterfrom
Conversation
Current read is synchronous. Yes, we have the performance test of data loader, single thread 200MB/s vs 260~270MB/s with 16 worker (process) and 10 prefetch threads. |
The multi-thread prefetch idea in this PR is from OSS connector of pytorch’s config. |
Can you share the code link? |
Seems native code is not open source. python code is https://github.com/aliyun/oss-connector-for-ai-ml,doc: https://github.com/aliyun/oss-connector-for-ai-ml/blob/a9b536d174163f0cd6db8e83261fcffc628e5f8c/docs/torchconnector/configuration.md?plain=1#L94 but python code do nothing. The logic is in native side. |
Purpose
Linked issue: close #xxx
Tests
API and Format
Documentation