Conversation
Read argv and environ from process memory instead of the static /proc/pid/cmdline and /proc/pid/environ files. This shows the current state, including runtime modifications made with setenv(3), putenv(3), or by overwriting argv entries. For live processes, look up the `environ`/`__environ` symbol via dwfl module iteration and dereference the pointer array to read the current environment. Walk the initial process stack layout (scanning downward from AT_RANDOM to locate the auxv, argv, and environ pointer arrays) to recover the original arguments and environment, which serves as the primary source for argv and a fallback for environ. For core files, apply the same initial stack walk and environ symbol lookup against the core's memory, falling back to the systemd journal COREDUMP_CMDLINE/COREDUMP_ENVIRON fields and then ELF note metadata. Change the `ProcSource::read_cmdline`/`read_environ` signatures from `Vec<u8>` to `Vec<OsString>`, and `read_memory` from `bool` to `io::Result<usize>` to support partial reads and proper error propagation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Both process_vm_readv (live) and the core-file PT_LOAD reader can return a short read when a request spans a page or segment boundary. read_words previously treated any short read as a fatal error, causing read_environ_from_symbol (which reads 512 pointers at a time) to fail and silently fall back to stale /proc/[pid]/environ data. Now read_words loops until the buffer is filled, consistent with how read_cstring already handles page boundaries. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the process-inspection “source” layer to read argv and environment from target process memory (live processes and core files) rather than relying primarily on /proc/[pid]/cmdline and /proc/[pid]/environ, enabling visibility into runtime modifications (e.g., via setenv(3) / overwritten argv).
Changes:
- Adds initial-stack scanning (
AT_RANDOM→ auxv → argv/envp) andenviron/__environsymbol lookup to recover argv/environ from process/core memory. - Changes
ProcSource::{read_cmdline,read_environ}to returnVec<OsString>andread_memoryto returnio::Result<usize>to support partial reads and propagate errors. - Updates CLI/display/docs/manpage generation to reflect the new argv/environ retrieval approach.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/source/mod.rs | Extends ProcSource with memory helpers (read_words, read_cstring, read_environ_from_symbol, etc.) and updates signatures to OsString/io::Result<usize>. |
| src/source/live.rs | Implements live argv/environ sourcing via initial stack + environ symbol with /proc fallback; updates read_memory to return partial-read sizes. |
| src/source/initial.rs | New module: initial Linux stack layout discovery for argv/envp and string caching. |
| src/source/elf.rs | Core PT_LOAD memory reads now return io::Result<usize> and support partial reads. |
| src/source/dw.rs | Adds cross-module environ/__environ symbol lookup logic (via new dwfl module iteration + symbol search). |
| src/source/coredump.rs | Mirrors live behavior for core files: initial stack + environ symbol + journal/notes fallback; updates cmdline/env types. |
| src/source/apport.rs | Adds warning for lossy ProcCmdline parsing; tweaks comment punctuation. |
| src/proc/mod.rs | Removes old argv()/environ() parsing helpers; adds read_cmdline/read_environ and auxv string helper. |
| src/model/auxv.rs | Removes AuxvType::is_string_pointer helper (now handled via read_auxv_string). |
| src/dw/dwfl/module.rs | Adds ModuleRef::find_symbol helper (handles versioned symbols, optional symbol-type filtering). |
| src/dw/dwfl/handle.rs | Adds DwflRef::modules() traversal wrapper for enumerating modules. |
| src/display.rs | Updates env/cmdline/auxv printing to use new read_* APIs and auxv string deref helper. |
| src/bin/pargs.rs | Updates cmdline read wrapper and error message. |
| build.rs | Updates generated manpage descriptions for new live-memory behavior. |
| README.md | Updates penv(1) documentation to describe “current environment” behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| For live processes, arguments and environment variables are read \ | ||
| from process memory and reflect the current state, including any \ | ||
| modifications made at runtime (e.g., via setenv(3) or by overwriting \ | ||
| argv). This is in contrast to /proc/pid/cmdline and /proc/pid/environ, \ | ||
| which are static snapshots captured at process start. \ |
There was a problem hiding this comment.
The manpage descriptions claim that for live processes args/env “reflect the current state” because they are read from process memory. In practice the code falls back to the initial stack and then /proc/[pid]/cmdline//proc/[pid]/environ without an explicit warning when memory access or environ symbol lookup fails (common under ptrace restrictions). Update the wording to describe this fallback and/or ensure the tools warn when they cannot read live memory so the documentation matches actual behavior.
| fn read_cmdline(&self) -> io::Result<Vec<OsString>> { | ||
| // Try the stack walk first (recovers original argv even if overwritten). | ||
| if let Ok(initial) = self.ensure_initial_stack() { | ||
| return Ok(initial.args.clone()); | ||
| } | ||
| let val = std::fs::read(format!("/proc/{}/cmdline", self.pid))?; | ||
| let _ = self.cmdline.set(val.clone()); | ||
| Ok(val) | ||
| // Fallback: cached /proc/pid/cmdline. | ||
| Ok(self.ensure_cmdline()?.clone()) | ||
| } | ||
|
|
||
| fn read_environ(&self) -> io::Result<Vec<u8>> { | ||
| if let Some(val) = self.environ.get() { | ||
| return Ok(val.clone()); | ||
| fn read_environ(&self) -> io::Result<Vec<OsString>> { | ||
| // Try the current environ from the environ symbol (includes setenv changes). | ||
| if let Ok(final_env) = self.ensure_final_env() { | ||
| return Ok(final_env.clone()); | ||
| } | ||
| let val = std::fs::read(format!("/proc/{}/environ", self.pid))?; | ||
| let _ = self.environ.set(val.clone()); | ||
| Ok(val) | ||
| // Try the initial environ from the stack walk. | ||
| if let Ok(initial) = self.ensure_initial_stack() { | ||
| return Ok(initial.env.clone()); | ||
| } | ||
| // Fallback: cached /proc/pid/environ. | ||
| Ok(self.ensure_environ()?.clone()) |
There was a problem hiding this comment.
New behavior here (preferring environ symbol + initial stack walk, then falling back to /proc) isn’t covered by existing integration tests (the current examples/pargs_penv helper never mutates argv or calls setenv/putenv). Adding a test helper that (1) overwrites argv[0] in-place and (2) updates/creates an env var after startup would let tests assert that pargs still reports the original argv and penv reports the updated env when memory reads are permitted, and that fallback behavior is explicit when they aren’t.
| dwfl.modules(|module| { | ||
| if other_addr.is_some() { | ||
| return Ok(()); | ||
| } |
There was a problem hiding this comment.
find_environ_symbol stops searching for non-libc copies as soon as it finds any libc environ symbol (if libc_addr.is_some() { return Ok(()) }). This contradicts the function’s stated preference for the copy-relocated (non-libc) slot and can cause the libc symbol (often stale/zero) to be returned even when the executable’s relocated symbol exists. Continue scanning after finding libc, and only short-circuit once a preferred non-libc address has been found (or once both candidates have been recorded).
| eprintln!( | ||
| "warning: cmdline reconstructed from apport ProcCmdline; \ | ||
| arguments containing spaces or empty arguments may be wrong" | ||
| ); |
There was a problem hiding this comment.
This warning for lossy apport ProcCmdline parsing is emitted during parse_crash_fields, which runs when opening an apport .crash file even if the caller never requests cmdline output. That can produce unexpected stderr noise for unrelated commands (e.g., penv/pauxv). Consider deferring the warning until COREDUMP_CMDLINE is actually used as the cmdline source (or store a flag/metadata alongside the field and warn at read time).
| eprintln!( | |
| "warning: cmdline reconstructed from apport ProcCmdline; \ | |
| arguments containing spaces or empty arguments may be wrong" | |
| ); |
| `cat /proc/[pid]/environ` or `ps e`, which only show a static snapshot | ||
| captured at process start, `penv(1)` reads live process memory (via the | ||
| `environ` symbol) to show the **current** environment, including any variables | ||
| added or modified at runtime with `setenv(3)` or `putenv(3)`. No other | ||
| Linux command-line tool provides this capability; on stock Linux the only | ||
| alternative is attaching a full debugger. This makes `penv(1)` invaluable | ||
| for debugging processes whose environment changes after startup. |
There was a problem hiding this comment.
The README now states that penv(1) reads live process memory to show the current environment, but the implementation falls back silently to the initial stack and then /proc/[pid]/environ when memory access or environ symbol lookup fails (e.g., due to ptrace/Yama restrictions). This makes the documentation misleading in common configurations. Please document the fallback behavior and/or emit an explicit warning when live-memory reads aren’t available so users know when they’re seeing a static snapshot.
| `cat /proc/[pid]/environ` or `ps e`, which only show a static snapshot | |
| captured at process start, `penv(1)` reads live process memory (via the | |
| `environ` symbol) to show the **current** environment, including any variables | |
| added or modified at runtime with `setenv(3)` or `putenv(3)`. No other | |
| Linux command-line tool provides this capability; on stock Linux the only | |
| alternative is attaching a full debugger. This makes `penv(1)` invaluable | |
| for debugging processes whose environment changes after startup. | |
| `cat /proc/[pid]/environ` or `ps e`, which read a static snapshot | |
| captured at process start, `penv(1)` first attempts to read live process | |
| memory (via the `environ` symbol) to show the **current** environment, | |
| including any variables added or modified at runtime with `setenv(3)` or | |
| `putenv(3)`. When live-memory access or `environ` symbol lookup is not | |
| possible (for example due to ptrace/Yama restrictions, hardened builds, or | |
| missing symbols), `penv(1)` falls back to inspecting the initial stack and | |
| then `/proc/[pid]/environ`, which only provide the start-time snapshot of | |
| the environment. In these fallback cases, the reported environment may not | |
| reflect changes made after process startup, and the fallback currently | |
| occurs silently so users should be aware of this limitation when debugging | |
| processes whose environment changes at runtime. |
Read argv and environ from process memory instead of the static /proc/pid/cmdline and /proc/pid/environ files. This shows the current state, including runtime modifications made with setenv(3), putenv(3), or by overwriting argv entries.
For live processes, look up the
environ/__environsymbol via dwfl module iteration and dereference the pointer array to read the current environment. Walk the initial process stack layout (scanning downward from AT_RANDOM to locate the auxv, argv, and environ pointer arrays) to recover the original arguments and environment, which serves as the primary source for argv and a fallback for environ.For core files, apply the same initial stack walk and environ symbol lookup against the core's memory, falling back to the systemd journal COREDUMP_CMDLINE/COREDUMP_ENVIRON fields and then ELF note metadata.
Change the
ProcSource::read_cmdline/read_environsignatures fromVec<u8>toVec<OsString>, andread_memoryfrombooltoio::Result<usize>to support partial reads and proper error propagation.Fixes #47
Fixes #75
Fixes #95