Add bounded multipart upload parallelism#228
Conversation
|
✅ Action performedReview finished.
|
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthrough
ChangesConcurrent multipart upload and credential synchronization
Sequence Diagram(s)sequenceDiagram
participant App
participant PutObject as Client::PutObject
participant AsyncPool as std::async workers
participant Server as MinIO Server
App->>PutObject: PutObject(args{max_inflight_parts=N})
PutObject->>PutObject: Validate() — reject if max_inflight_parts==0
PutObject->>Server: InitiateMultipartUpload → upload_id
rect rgba(70, 130, 180, 0.5)
note over PutObject,AsyncPool: Concurrent path (N > 1)
loop for each part buffer (round-robin)
PutObject->>AsyncPool: std::async UploadPart(buffer, part_num)
AsyncPool->>Server: PUT ?uploadId&partNumber=K
Server-->>AsyncPool: ETag
end
PutObject->>AsyncPool: drain_one() per completed future
AsyncPool-->>PutObject: Part{etag, checksum}
end
PutObject->>Server: CompleteMultipartUpload(parts[])
Server-->>App: result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/tests.cc (1)
711-721: ⚡ Quick winIncrease part count so bounded inflight behavior is actually exercised.
With ~10 MiB input and auto part sizing, this test usually produces only 2 parts. That validates correctness, but it doesn’t stress the producer/drain cap logic for
max_inflight_parts(especially2and4). Consider forcing a smaller explicit part size and larger payload sopart_count > max_inflight_parts.Suggested patch
- // 10MB data -> auto-calc ~5MiB parts = 2 parts, exercising multipart path - const size_t data_size = 10 * 1024 * 1024; + // Force many parts so inflight cap/drain path is exercised. + const size_t part_size = 5 * 1024 * 1024; // 5 MiB + const size_t data_size = (part_size * 5) + 7; // 6 parts total @@ - minio::s3::PutObjectArgs args(ss, static_cast<long>(data_size), 0); + minio::s3::PutObjectArgs args(ss, static_cast<long>(data_size), + static_cast<long>(part_size));🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/tests.cc` around lines 711 - 721, The test with 10MB data produces only 2 parts due to auto-calculated part sizing, which doesn't adequately stress the bounded inflight behavior for max_inflight_parts values of 2 and 4. Increase the data_size constant or specify a smaller explicit part size in the PutObjectArgs to ensure the part count exceeds the maximum inflight_values being tested (particularly for the values 2 and 4), so the producer/drain cap logic is properly exercised across all inflight test cases.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/client.cc`:
- Line 568: The user-controlled max_inflight_parts value is not properly
constrained before buffer allocation, which can cause memory exhaustion if a
huge value is passed. After the cast assignment of max_inflight_parts from
args.max_inflight_parts on line 568, add clamping logic to constrain
max_inflight_parts to both the known part_count value and a documented maximum
safe limit. This clamping must occur before any reserve() calls or buffer
allocations happen. Apply the same clamping logic to the related code block at
lines 730-744 that also uses max_inflight_parts.
- Line 774: The call to inflight_upload.future.get() at line 774 can throw
std::system_error or other exceptions from the async operation, but these
exceptions are not being caught and converted to error responses as required by
the function's error handling pattern. Wrap the future.get() call in a try-catch
block, catch std::system_error and other relevant exceptions, and convert them
into appropriate error responses instead of allowing them to propagate as
unhandled exceptions. Apply the same exception handling pattern to the similar
future.get() calls mentioned at lines 925-929 to ensure consistent error
handling throughout the function.
---
Nitpick comments:
In `@tests/tests.cc`:
- Around line 711-721: The test with 10MB data produces only 2 parts due to
auto-calculated part sizing, which doesn't adequately stress the bounded
inflight behavior for max_inflight_parts values of 2 and 4. Increase the
data_size constant or specify a smaller explicit part size in the PutObjectArgs
to ensure the part count exceeds the maximum inflight_values being tested
(particularly for the values 2 and 4), so the producer/drain cap logic is
properly exercised across all inflight test cases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 4f7e0ee8-250d-468d-86aa-9748d3b82b27
📒 Files selected for processing (4)
include/miniocpp/args.hsrc/args.ccsrc/client.cctests/tests.cc
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/client.cc`:
- Around line 1077-1221: The multipart upload loop has a data race where
buf_pool[buf_idx] is overwritten at the top of the loop while a previous
UploadPart async task that captured a pointer into the same buffer may still be
running. The capacity drain that waits for the slot to be available happens too
late (after the buffer has already been read into). Move the capacity wait logic
(the if block checking inflight.size() >= max_inflight that pops from inflight
and waits for the future) to the beginning of the loop before the buffer read
operation (before the char* buf = static_cast<char*>(buf_pool[buf_idx].ptr)
line). This ensures we wait for any in-flight tasks using a buffer before
reusing that buffer. After moving the capacity wait to the beginning, remove the
now-redundant capacity wait block that was originally at line 1170-1181.
- Around line 1049-1056: The concurrent multipart upload path does not invoke
the progress callback function (`args.progressfunc`) that is available and used
in the sequential path. To fix this, you need to integrate progress reporting
into the concurrent code path by invoking `args.progressfunc` whenever parts
complete during concurrent uploads. Specifically, add progress callback
invocation in the section where completed parts are being drained (around the
area mentioned at line 1172) to report progress as each part finishes uploading.
Alternatively, if implementing this is not feasible, add explicit documentation
in the code or API documentation clarifying that `progressfunc` callbacks are
not supported when `max_inflight_parts > 1`.
In `@tests/tests.cc`:
- Around line 739-742: The RemoveObject cleanup calls at the PutObject error
path (around line 740) and at lines 770-772 can throw exceptions that overwrite
the original test failure. Wrap these RemoveObject cleanup calls in try-catch
blocks to make them best-effort operations that don't propagate exceptions. This
ensures the original exception from PutObject or other operations is preserved
and not masked by cleanup failures. Apply this pattern consistently to all error
paths where RemoveObject is called for cleanup purposes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: cf3aff48-3fed-4b36-a1cf-059315f701ea
📒 Files selected for processing (4)
include/miniocpp/args.hsrc/args.ccsrc/client.cctests/tests.cc
🚧 Files skipped from review as they are similar to previous changes (2)
- src/args.cc
- include/miniocpp/args.h
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/tests.cc`:
- Around line 722-723: The comment above the data_size variable declaration
states "Generate 10MB of random data" but the actual allocation is 100 * 1024 *
1024 bytes, which equals 100 MiB. Either update the comment to say "Generate
100MiB of random data" to match the actual allocation, or change the data_size
assignment to 10 * 1024 * 1024 if 10 MiB was the intended size. Ensure the
comment and code are aligned.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b6246c40-53c7-4693-9632-f13999974ae6
📒 Files selected for processing (4)
include/miniocpp/args.hsrc/args.ccsrc/client.cctests/tests.cc
🚧 Files skipped from review as they are similar to previous changes (3)
- src/args.cc
- include/miniocpp/args.h
- src/client.cc
apply suggestion
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/client.cc (3)
1150-1178: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winPreserve single-PUT fallback for unknown-size small streams.
When the first read discovers
part_count == 1, this branch still starts multipart upload. That diverges from the sequential path’sBaseClient::PutObjectfallback and changes small-object behavior for callers that setmax_inflight_parts > 1.🛠️ Proposed fix
uploaded_size += part_size; + if (part_count.has_value() && *part_count == 1) { + PutObjectApiArgs api_args; + api_args.extra_query_params = args.extra_query_params; + api_args.bucket = args.bucket; + api_args.region = args.region; + api_args.object = args.object; + api_args.data = std::string_view(buf, part_size); + api_args.buf = buf; + api_args.size = part_size; + api_args.progressfunc = args.progressfunc; + api_args.progress_userdata = args.progress_userdata; + api_args.headers = headers; + return BaseClient::PutObject(api_args); + } + // Create multipart upload on first part. if (upload_id.empty()) {As per PR objectives, this path should handle “single-part fallback to
BaseClient::PutObject”.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/client.cc` around lines 1150 - 1178, When the first read determines there is only one part to upload (part_count == 1), the code should fall back to using BaseClient::PutObject instead of initiating a multipart upload. Add a check before the CreateMultipartUploadArgs instantiation to test if part_count has a value and equals 1; if true, execute the BaseClient::PutObject call with the buffered data and skip the CreateMultipartUpload flow entirely. This ensures single-part uploads use the optimized path rather than the multipart upload process.
1098-1105: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winHonor progress callback cancellation in the parallel path.
The sequential path aborts when
progressfuncreturnsfalse; these parallel drains ignore that return value, so caller-requested cancellation continues uploading and can still complete the multipart upload.🛠️ Proposed fix
actual_args.uploaded_bytes = completed_bytes; actual_args.userdata = args.progress_userdata; - args.progressfunc(actual_args); + if (!args.progressfunc(actual_args)) { + first_err = error::Error("aborted by progress function"); + break; + } } }actual_args.uploaded_bytes = completed_bytes; actual_args.userdata = args.progress_userdata; - args.progressfunc(actual_args); + if (!args.progressfunc(actual_args) && !first_err) { + first_err = error::Error("aborted by progress function"); + } } }Also applies to: 1230-1237
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/client.cc` around lines 1098 - 1105, The parallel upload path is ignoring the return value of the progressfunc callback, which means caller-requested cancellation (when progressfunc returns false) is not being honored. In the code block where args.progressfunc(actual_args) is called (around line 1098), capture the return value instead of discarding it, then check if it returns false and abort the parallel drain operation if so. Apply the same fix to the second location mentioned at lines 1230-1237. Reference how the sequential path handles progressfunc return value cancellation and implement the same logic in both parallel code sections.
1037-1049: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winEnable RDMA per registered buffer, not per pool.
!rdma_regs.empty()only proves at least one buffer registered. If a later buffer registration fails, parts using that buffer still getrdmaclientand can take the RDMA path with an unregistered buffer.🛠️ Proposed fix
`#ifdef` MINIO_CPP_RDMA // Register each pool buffer for RDMA, mirroring the sequential path. cuObjClient& rdma_client = SharedRDMAClient(); std::vector<ScopedRDMARegistration> rdma_regs; + std::vector<bool> rdma_registered(max_inflight, false); rdma_regs.reserve(max_inflight); bool rdma_connected = rdma_client.isConnected(); if (rdma_connected) { for (unsigned int i = 0; i < max_inflight; i++) { char* pool_buf = static_cast<char*>(buf_pool[i].ptr); if (rdma_client.cuMemObjGetDescriptor(pool_buf, args.part_size) == 0) { rdma_regs.emplace_back(&rdma_client, pool_buf); + rdma_registered[i] = true; } } } `#endif``#ifdef` MINIO_CPP_RDMA - if (!rdma_regs.empty()) { + if (rdma_registered[buf_idx]) { up_args.rdmaclient = &rdma_client; }Also applies to: 1191-1194
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/client.cc` around lines 1037 - 1049, The RDMA registration logic currently checks if any buffer was registered using !rdma_regs.empty(), but this is insufficient because a buffer could fail registration while others succeed. Instead of making the RDMA decision at the pool level, track registration status per individual buffer by creating a data structure (such as a bool vector aligned with buf_pool indices) that records which specific buffers were successfully registered by cuMemObjGetDescriptor(). When determining whether to use the RDMA path for any operation, check the registration status of the specific buffer being used rather than checking if the rdma_regs vector is non-empty. This ensures unregistered buffers never attempt the RDMA path even if other buffers in the pool are registered.
🧹 Nitpick comments (1)
src/client.cc (1)
1026-1026: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winTrim comments that only restate the next statement.
These headings are redundant with the code immediately below them; keep the non-obvious lifetime/RDMA/race comments.
As per coding guidelines,
**/*.{h,cc,cpp,hpp}: “Do not add obvious comments (e.g., 'Safe: Validate() ensures X has value')”.Also applies to: 1161-1161, 1181-1181, 1222-1222, 1253-1253
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/client.cc` at line 1026, Remove redundant comments that only restate what the code immediately below them does. In src/client.cc, delete the obvious comments at lines 1026, 1161, 1181, 1222, and 1253 that simply describe the code statement (like "Allocate pool of page-aligned buffers"). Keep only comments that provide non-obvious information about lifetime semantics, RDMA behavior, race conditions, or other implementation details that aren't immediately apparent from the code itself.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/client.cc`:
- Around line 1150-1178: When the first read determines there is only one part
to upload (part_count == 1), the code should fall back to using
BaseClient::PutObject instead of initiating a multipart upload. Add a check
before the CreateMultipartUploadArgs instantiation to test if part_count has a
value and equals 1; if true, execute the BaseClient::PutObject call with the
buffered data and skip the CreateMultipartUpload flow entirely. This ensures
single-part uploads use the optimized path rather than the multipart upload
process.
- Around line 1098-1105: The parallel upload path is ignoring the return value
of the progressfunc callback, which means caller-requested cancellation (when
progressfunc returns false) is not being honored. In the code block where
args.progressfunc(actual_args) is called (around line 1098), capture the return
value instead of discarding it, then check if it returns false and abort the
parallel drain operation if so. Apply the same fix to the second location
mentioned at lines 1230-1237. Reference how the sequential path handles
progressfunc return value cancellation and implement the same logic in both
parallel code sections.
- Around line 1037-1049: The RDMA registration logic currently checks if any
buffer was registered using !rdma_regs.empty(), but this is insufficient because
a buffer could fail registration while others succeed. Instead of making the
RDMA decision at the pool level, track registration status per individual buffer
by creating a data structure (such as a bool vector aligned with buf_pool
indices) that records which specific buffers were successfully registered by
cuMemObjGetDescriptor(). When determining whether to use the RDMA path for any
operation, check the registration status of the specific buffer being used
rather than checking if the rdma_regs vector is non-empty. This ensures
unregistered buffers never attempt the RDMA path even if other buffers in the
pool are registered.
---
Nitpick comments:
In `@src/client.cc`:
- Line 1026: Remove redundant comments that only restate what the code
immediately below them does. In src/client.cc, delete the obvious comments at
lines 1026, 1161, 1181, 1222, and 1253 that simply describe the code statement
(like "Allocate pool of page-aligned buffers"). Keep only comments that provide
non-obvious information about lifetime semantics, RDMA behavior, race
conditions, or other implementation details that aren't immediately apparent
from the code itself.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 23067705-c006-4284-b2e2-0cb5836c1752
📒 Files selected for processing (2)
src/client.cctests/tests.cc
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/tests.cc
|
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/client.cc (1)
1233-1234: 🩺 Stability & Availability | 🔴 CriticalFix race condition in concurrent credential fetching on shared provider.
Multiple concurrent
UploadPartthreads invokeprovider->Fetch()without synchronization—both through the HTTP path (ToHttpRequest→BuildHeaders→provider->Fetch()at request.cc:350) and the RDMA path (rdmaPut→provider->Fetch()at rdma.h:132).ChainedProvider::Fetch()mutatescreds_andprovider_members without locks (providers.cc:78–95), creating data races when concurrent threads callFetch()simultaneously. This can corrupt credentials, cause unsigned requests, or produce invalid signatures.Protect the provider's credential fetching with a mutex, or ensure credentials are fetched once per request in a thread-safe manner before dispatching to
std::async.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/client.cc` around lines 1233 - 1234, Multiple concurrent UploadPart threads spawned via std::async are invoking provider->Fetch() without synchronization, causing data races when ChainedProvider::Fetch() mutates creds_ and provider_ members. Fix this by fetching credentials in a thread-safe manner before dispatching the std::async call in the lambda, or by adding mutex protection around the provider's Fetch() method calls. Ensure that either credentials are obtained once per request prior to creating the async task, or that concurrent access to the provider's internal state is serialized with a lock.
♻️ Duplicate comments (2)
src/client.cc (2)
1021-1035: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy liftClamp
max_inflight_partsbefore allocating the buffer pool.
max_inflightis taken directly from the user-controlledargs.max_inflight_parts(validated only against0) and one full part buffer is allocated per slot at Lines 1028-1035. A small upload paired with an accidentally huge value can exhaust memory before any useful work begins. Clamp to the knownpart_countand to a documented maximum before allocating.🛡️ Proposed guard
if (args.max_inflight_parts > 1) { unsigned int max_inflight = args.max_inflight_parts; + if (args.part_count.has_value() && + max_inflight > static_cast<unsigned int>(*args.part_count)) { + max_inflight = static_cast<unsigned int>(*args.part_count); + } + if (max_inflight > static_cast<unsigned int>(utils::kMaxMultipartCount)) { + max_inflight = static_cast<unsigned int>(utils::kMaxMultipartCount); + } size_t alloc_size = (args.part_count > 0) ? args.part_size : (args.part_size + 1);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/client.cc` around lines 1021 - 1035, The max_inflight variable derived from user-controlled args.max_inflight_parts is used directly to allocate buffers without proper constraints, which can lead to memory exhaustion. Before the buffer allocation loop that calls AlignedAlloc, add validation to clamp the max_inflight value to a reasonable maximum and to args.part_count (if part_count is greater than zero) to prevent excessive memory allocation from untrusted input.
1233-1234: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winConvert
std::async/future.get()exceptions intoPutObjectResponseerrors.
std::async(std::launch::async, …)can throwstd::system_errorwhen a thread cannot be started, andfuture.get()(Lines 1090 and 1244) rethrows any exception fromUploadPart. These would escape as thrown exceptions and bypass the function's error-response contract; wrap the dispatch and theget()calls so failures becomefirst_err/error responses.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/client.cc` around lines 1233 - 1234, The std::async call that creates ip.future and the subsequent future.get() calls at lines 1090 and 1244 can throw exceptions (std::system_error from thread creation, or exceptions rethrown from UploadPart) that escape as uncaught exceptions rather than being converted to error responses. Wrap the std::async dispatch in a try-catch block to handle std::system_error and set first_err appropriately, and similarly wrap both future.get() calls in try-catch blocks to catch and convert any exceptions from UploadPart into error responses instead of letting them propagate.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/client.cc`:
- Around line 1233-1234: Multiple concurrent UploadPart threads spawned via
std::async are invoking provider->Fetch() without synchronization, causing data
races when ChainedProvider::Fetch() mutates creds_ and provider_ members. Fix
this by fetching credentials in a thread-safe manner before dispatching the
std::async call in the lambda, or by adding mutex protection around the
provider's Fetch() method calls. Ensure that either credentials are obtained
once per request prior to creating the async task, or that concurrent access to
the provider's internal state is serialized with a lock.
---
Duplicate comments:
In `@src/client.cc`:
- Around line 1021-1035: The max_inflight variable derived from user-controlled
args.max_inflight_parts is used directly to allocate buffers without proper
constraints, which can lead to memory exhaustion. Before the buffer allocation
loop that calls AlignedAlloc, add validation to clamp the max_inflight value to
a reasonable maximum and to args.part_count (if part_count is greater than zero)
to prevent excessive memory allocation from untrusted input.
- Around line 1233-1234: The std::async call that creates ip.future and the
subsequent future.get() calls at lines 1090 and 1244 can throw exceptions
(std::system_error from thread creation, or exceptions rethrown from UploadPart)
that escape as uncaught exceptions rather than being converted to error
responses. Wrap the std::async dispatch in a try-catch block to handle
std::system_error and set first_err appropriately, and similarly wrap both
future.get() calls in try-catch blocks to catch and convert any exceptions from
UploadPart into error responses instead of letting them propagate.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 09ee2e6e-234d-42b6-bbe0-12bf3f7aba8d
📒 Files selected for processing (2)
src/client.cctests/tests.cc
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/tests.cc
|
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/client.cc (1)
1090-1090: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUse
std::optionalfor the unset upload-speed state.
upload_speed = -1models “not observed yet”; preferstd::optional<double>here, or remove the field in this branch if concurrent uploads do not report speed.As per coding guidelines,
**/*.{h,cc,cpp,hpp}should “Usestd::optional<T>for values that may be uninitialized.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/client.cc` at line 1090, The upload_speed variable is currently initialized to -1 to represent an unobserved state, which relies on magic values and doesn't clearly express intent. Replace the double upload_speed variable declaration with std::optional<double> upload_speed, then update all code that checks or uses upload_speed to work with the optional type (checking if it has a value before dereferencing, and initializing it without a value when unset). This aligns with the coding guidelines requiring std::optional for potentially uninitialized values.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/client.cc`:
- Around line 1290-1301: The code unconditionally appends a Part to the parts
vector after the try-catch block in the final drain, without validating that the
UploadPartResponse is successful. After the try-catch block that handles
exceptions and sets first_err, add a check to verify the response is valid
before calling parts.push_back and report_progress, similar to how the capacity
drain handles invalid responses. If the response indicates failure, set
first_err and skip appending the part to prevent a failed upload from being
incorrectly reported as completed progress.
---
Nitpick comments:
In `@src/client.cc`:
- Line 1090: The upload_speed variable is currently initialized to -1 to
represent an unobserved state, which relies on magic values and doesn't clearly
express intent. Replace the double upload_speed variable declaration with
std::optional<double> upload_speed, then update all code that checks or uses
upload_speed to work with the optional type (checking if it has a value before
dereferencing, and initializing it without a value when unset). This aligns with
the coding guidelines requiring std::optional for potentially uninitialized
values.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: caf53dec-4df1-45fd-9a96-8dded270a04c
📒 Files selected for processing (3)
include/miniocpp/providers.hsrc/client.ccsrc/providers.cc
|
✅ Action performedReview finished.
|
fix: #216
Add bounded multipart upload parallelism
PutObjectnow supports concurrentUploadPartcalls via the newPutObjectArgs::max_inflight_partsfield (default1= sequential behavior preserved).Changes
include/miniocpp/args.hunsigned int max_inflight_parts = 1toPutObjectArgssrc/args.ccmax_inflight_parts > 0src/client.ccPutObjectinto sequential + parallel branchestests/tests.ccPutObjectWithInflight()— MD5 integrity testsrc/client.cchighlightsmax_inflight_parts <= 1): Original single-buffer loop, unchanged.max_inflight_parts > 1):AlignedBufferpool.std::asyncdispatchesUploadPart.std::deque+drain_onelambda.CompleteMultipartUploaddoesn't care about completion order.one_byteread-ahead indexing in the unknown-size path (buf[part_size]instead ofbuf[part_size + 1]).Testing
PutObjectWithInflight()generates 10MB of random data, computes the original MD5, uploads withmax_inflight_parts = 1, 2, 4, downloads each object, and verifies MD5 matches the original.Summary by CodeRabbit
Release Notes
New Features
max_inflight_partstoPutObjectArgs(default: 1) to control concurrent multipart part uploads.max_inflight_parts> 1.Bug Fixes
max_inflight_parts = 0.Tests
max_inflight_parts= 1, 2, and 4.