Avoid using multiple processes for downloads#375
Merged
tompollard merged 1 commit intomainfrom Jun 3, 2022
Merged
Conversation
Member
|
@bemoody please could you fix the conflict? |
The standard multiprocessing module is used to distribute a task to multiple processes, which is useful when doing heavy computation due to the limitations of CPython; however, making this work is dependent on the ability to fork processes or else to kludgily emulate forking on systems that don't support it. In particular, it tends to cause problems on Windows unless you are very scrupulous about how you write your program. Therefore, as a rule, the multiprocessing module shouldn't be used by general-purpose libraries, and should only be invoked by application programmers themselves (who are in a position to guarantee that imports have no side effects, the main script uses 'if __name__ == "__main__"', etc.) However, downloading a file isn't a CPU-bound task, it's an I/O-bound task, and therefore for this purpose, parallel threads should work as well or even better than parallel processes. The multiprocessing.dummy module provides the same API as the multiprocessing module, but uses threads instead of processes, so it should be safe to use in a general-purpose library.
7a3ffde to
5357f55
Compare
Member
|
@lbugnon does this solve the windows downloading issue? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
multiprocessingis messy; it's a pretty nifty tool but requires you to write your entire application with multiprocessing in mind.For simply downloading files in parallel, there's little reason to use processes rather than threads.
This should, though I haven't tested it, solve the issues mentioned in pull #330 and probably also issue #306. I can't guarantee that this will solve every problem, but it should be strictly better than what we have now.