AVRO-4185: [C++] Use a temp file in StreamTests#3496
AVRO-4185: [C++] Use a temp file in StreamTests#3496gahr wants to merge 2 commits intoapache:mainfrom
Conversation
StreamTests uses the test_str.bin file. This is problematic in our setup because we build and test 32-bit and 64-bit versions in parallel. When the 32-bit and 64-bit StreamTest happen to run concurrently, one ends up deleting the file that the other is trying to read. My solution here is to use a temporary file instead of the hard-coded test_str.bin.
There was a problem hiding this comment.
Pull Request Overview
This PR fixes a concurrency issue in StreamTests where parallel execution of 32-bit and 64-bit test builds would interfere with each other by sharing the same hard-coded test file. The solution replaces the fixed filename with temporary files generated at runtime.
- Replaces hard-coded "test_str.bin" filename with dynamically generated temporary files
- Refactors FileRemover struct to FileGuard for better encapsulation of temporary file management
- Ensures each test run uses its own unique temporary file to prevent race conditions
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| explicit FileRemover(const char *fn) : file(fn) {} | ||
| ~FileRemover() { std::filesystem::remove(file); } | ||
| struct FileGuard { | ||
| const std::filesystem::path path{ std::tmpnam(nullptr) }; |
There was a problem hiding this comment.
Using std::tmpnam(nullptr) is unsafe as it can cause race conditions and security vulnerabilities. Consider using std::filesystem::temp_directory_path() with a unique filename generator or platform-specific secure temporary file functions instead.
There was a problem hiding this comment.
I don't think these concerns apply to this test code.
There was a problem hiding this comment.
Isn't tmpnam deprecated ?
Avro uses C++17, so we can use something like the following instead:
#include <random>
...
std::filesystem::path path = std::filesystem::temp_directory_path() /
("avro_test_" + std::to_string(std::random_device{}()));
There was a problem hiding this comment.
Isn't
tmpnamdeprecated ?
I can't find references to its deprecation.
I don't think using std::random_device is a good idea, as it can be implemented using a PRNG, i.e., two tests running at the same time would produce the same number.
There was a problem hiding this comment.
https://learn.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2010/hs3e7355(v=vs.100)
In the example code there is // Note: tmpnam is deprecated; consider using tmpnam_s instead
Also in the build logs we can see:
2025-10-23T08:00:00.3420284Z [ 68%] Linking CXX executable StreamTests
2025-10-23T08:00:00.4955456Z /usr/bin/ld: CMakeFiles/StreamTests.dir/test/StreamTests.cc.o: in function `avro::stream::FileGuard::FileGuard()':
2025-10-23T08:00:00.4957287Z /home/runner/work/avro/avro/lang/c++/test/StreamTests.cc:138:(.text._ZN4avro6stream9FileGuardC2Ev[_ZN4avro6stream9FileGuardC5Ev]+0x2a): warning: the use of `tmpnam' is dangerous, better use `mkstemp'
2025-10-23T08:00:00.6147334Z [ 68%] Built target StreamTests
There was a problem hiding this comment.
I don't have a more standard way, sorry.
There was a problem hiding this comment.
Since random_device() is meant to be used for seeding a PNRG and not directly for generating random numbers, we can do something like:
static std::random_device rdev;
static std::mt19937 rng(rdev());
struct FileGuard {
const std::filesystem::path path{ rng() };
...
}
There was a problem hiding this comment.
This doesn't address my concern: two processes can still generate the same sequence.
There was a problem hiding this comment.
Remember the code is in test. The conflict happens when two people run the run the exactly at the same sub-microsecond. How likely is it? Even if happens, who cares? Just rerun the test. It is anyway better than what we have now.
If we are still paranoid, let's xor the value of random_device() with pid.
There was a problem hiding this comment.
Where is the spec that say that random_device must be implemented in such a way? In the standard I read it can generate the same sequence each time.
| explicit FileRemover(const char *fn) : file(fn) {} | ||
| ~FileRemover() { std::filesystem::remove(file); } | ||
| struct FileGuard { | ||
| const std::filesystem::path path{ std::tmpnam(nullptr) }; |
There was a problem hiding this comment.
Isn't tmpnam deprecated ?
Avro uses C++17, so we can use something like the following instead:
#include <random>
...
std::filesystem::path path = std::filesystem::temp_directory_path() /
("avro_test_" + std::to_string(std::random_device{}()));
| ~FileRemover() { std::filesystem::remove(file); } | ||
| struct FileGuard { | ||
| const std::filesystem::path path{ std::tmpnam(nullptr) }; | ||
| ~FileGuard() { std::filesystem::remove(path); } |
There was a problem hiding this comment.
Should we care about failures in remove() here ?
Or since it is a test we don't care ?
There was a problem hiding this comment.
I would not care, and I wouldn't know how to recover from such an error.
There was a problem hiding this comment.
Not recover, but log it.
But this is minor. We can ignore it.
There was a problem hiding this comment.
If we fail to remove(), it is not catastrophic, but it would point to a bug somewhere. At lease we should report the error.
There was a problem hiding this comment.
A bug, or some temporary condition.
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
What is the purpose of the change
StreamTests uses the test_str.bin file. This is problematic in our setup because we build and test 32-bit and 64-bit versions in parallel. When the 32-bit and 64-bit StreamTest happen to run concurrently, one ends up deleting the file that the other is trying to read.
My solution here is to use a temporary file instead of the hard-coded test_str.bin.
This fixes AVRO-4185.
Verifying this change
This change is already covered by existing tests, such as StreamTests.
Documentation