-
Notifications
You must be signed in to change notification settings - Fork 86
feat: integration s3 with arrow filesystem #548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks for adding this!
Yes, I believe this is worth doing. I supposed to reuse
There is a related discussion with regard to |
|
I recommend using MinIO. It is relatively stable and suitable for the current project development phase. Once the community reaches a consensus, the cost of replacing MinIO will not be high. |
|
I think it is fine to use minio at this moment to unblock us. Let me know what you think on my proposed approach above. We might also need to add a FileIO registry to provide default implementation on us and enable users to override their own implementations of s3 and others. The key in the FileIO registry can be associated with table property |
|
We may also need to add top-level CMake options like |
FYI, there is a PR to replace MinIO with RustFS, apache/iceberg#14928 |
ArrowFileSystemFileIO is ok, I referenced MakeLocalFileIO and implemented a simple MakeS3FileIO interface using arrowfilesystem.
you mean this It's equivalent to setting the io-impl string in the catalog's properties. Then, RestCatalog the FileIORegistry looks up the implementation in the io-impl map. Is that roughly how it works? If so, I can try implementing some simple code to see if it's correct. |
|
Yes, I think it looks reasonable. |
8436b72 to
5197fa9
Compare
|
The current code is only simple implemented. Could you help me check it is ok? |
I have implemented Arrow FileSystem to access S3, but I'm still not sure if it meets the requirements.
There are still task or question to complete for the current PR, and it is not ready for merging yet.
Question:
Currently, the object storage options include Azure, AWS, and GCS. I have chosen AWS as the implementation for now is ok?
Task:
I need to deploy MinIO to facilitate testing access to S3, but I'm not sure where it would be best to set it up?