Skip to content

feat: Read args and environment from live process memory#106

Merged
basil merged 3 commits intomasterfrom
args-env
Mar 3, 2026
Merged

feat: Read args and environment from live process memory#106
basil merged 3 commits intomasterfrom
args-env

Conversation

@basil
Copy link
Owner

@basil basil commented Mar 3, 2026

Read argv and environ from process memory instead of the static /proc/pid/cmdline and /proc/pid/environ files. This shows the current state, including runtime modifications made with setenv(3), putenv(3), or by overwriting argv entries.

For live processes, look up the environ/__environ symbol via dwfl module iteration and dereference the pointer array to read the current environment. Walk the initial process stack layout (scanning downward from AT_RANDOM to locate the auxv, argv, and environ pointer arrays) to recover the original arguments and environment, which serves as the primary source for argv and a fallback for environ.

For core files, apply the same initial stack walk and environ symbol lookup against the core's memory, falling back to the systemd journal COREDUMP_CMDLINE/COREDUMP_ENVIRON fields and then ELF note metadata.

Change the ProcSource::read_cmdline/read_environ signatures from Vec<u8> to Vec<OsString>, and read_memory from bool to io::Result<usize> to support partial reads and proper error propagation.

Fixes #47
Fixes #75
Fixes #95

Read argv and environ from process memory instead of the static
/proc/pid/cmdline and /proc/pid/environ files. This shows the current
state, including runtime modifications made with setenv(3), putenv(3),
or by overwriting argv entries.

For live processes, look up the `environ`/`__environ` symbol via dwfl
module iteration and dereference the pointer array to read the current
environment. Walk the initial process stack layout (scanning downward
from AT_RANDOM to locate the auxv, argv, and environ pointer arrays)
to recover the original arguments and environment, which serves as the
primary source for argv and a fallback for environ.

For core files, apply the same initial stack walk and environ symbol
lookup against the core's memory, falling back to the systemd journal
COREDUMP_CMDLINE/COREDUMP_ENVIRON fields and then ELF note metadata.

Change the `ProcSource::read_cmdline`/`read_environ` signatures from
`Vec<u8>` to `Vec<OsString>`, and `read_memory` from `bool` to
`io::Result<usize>` to support partial reads and proper error
propagation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@basil basil added the Type: Enhancement A request, idea, or new functionality label Mar 3, 2026
@github-actions github-actions bot added Component: Process Handle Issues related to process handles Component: Process Source Issues related to process sources Component: pargs Issues related to pargs(1) Type: Documentation Improvements or additions to documentation labels Mar 3, 2026
@basil basil requested a review from Copilot March 3, 2026 01:36
basil and others added 2 commits March 2, 2026 17:41
Both process_vm_readv (live) and the core-file PT_LOAD reader can
return a short read when a request spans a page or segment boundary.
read_words previously treated any short read as a fatal error, causing
read_environ_from_symbol (which reads 512 pointers at a time) to fail
and silently fall back to stale /proc/[pid]/environ data.

Now read_words loops until the buffer is filled, consistent with how
read_cstring already handles page boundaries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the process-inspection “source” layer to read argv and environment from target process memory (live processes and core files) rather than relying primarily on /proc/[pid]/cmdline and /proc/[pid]/environ, enabling visibility into runtime modifications (e.g., via setenv(3) / overwritten argv).

Changes:

  • Adds initial-stack scanning (AT_RANDOM → auxv → argv/envp) and environ/__environ symbol lookup to recover argv/environ from process/core memory.
  • Changes ProcSource::{read_cmdline,read_environ} to return Vec<OsString> and read_memory to return io::Result<usize> to support partial reads and propagate errors.
  • Updates CLI/display/docs/manpage generation to reflect the new argv/environ retrieval approach.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/source/mod.rs Extends ProcSource with memory helpers (read_words, read_cstring, read_environ_from_symbol, etc.) and updates signatures to OsString/io::Result<usize>.
src/source/live.rs Implements live argv/environ sourcing via initial stack + environ symbol with /proc fallback; updates read_memory to return partial-read sizes.
src/source/initial.rs New module: initial Linux stack layout discovery for argv/envp and string caching.
src/source/elf.rs Core PT_LOAD memory reads now return io::Result<usize> and support partial reads.
src/source/dw.rs Adds cross-module environ/__environ symbol lookup logic (via new dwfl module iteration + symbol search).
src/source/coredump.rs Mirrors live behavior for core files: initial stack + environ symbol + journal/notes fallback; updates cmdline/env types.
src/source/apport.rs Adds warning for lossy ProcCmdline parsing; tweaks comment punctuation.
src/proc/mod.rs Removes old argv()/environ() parsing helpers; adds read_cmdline/read_environ and auxv string helper.
src/model/auxv.rs Removes AuxvType::is_string_pointer helper (now handled via read_auxv_string).
src/dw/dwfl/module.rs Adds ModuleRef::find_symbol helper (handles versioned symbols, optional symbol-type filtering).
src/dw/dwfl/handle.rs Adds DwflRef::modules() traversal wrapper for enumerating modules.
src/display.rs Updates env/cmdline/auxv printing to use new read_* APIs and auxv string deref helper.
src/bin/pargs.rs Updates cmdline read wrapper and error message.
build.rs Updates generated manpage descriptions for new live-memory behavior.
README.md Updates penv(1) documentation to describe “current environment” behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +220 to +224
For live processes, arguments and environment variables are read \
from process memory and reflect the current state, including any \
modifications made at runtime (e.g., via setenv(3) or by overwriting \
argv). This is in contrast to /proc/pid/cmdline and /proc/pid/environ, \
which are static snapshots captured at process start. \
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manpage descriptions claim that for live processes args/env “reflect the current state” because they are read from process memory. In practice the code falls back to the initial stack and then /proc/[pid]/cmdline//proc/[pid]/environ without an explicit warning when memory access or environ symbol lookup fails (common under ptrace restrictions). Update the wording to describe this fallback and/or ensure the tools warn when they cannot read live memory so the documentation matches actual behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +211 to +230
fn read_cmdline(&self) -> io::Result<Vec<OsString>> {
// Try the stack walk first (recovers original argv even if overwritten).
if let Ok(initial) = self.ensure_initial_stack() {
return Ok(initial.args.clone());
}
let val = std::fs::read(format!("/proc/{}/cmdline", self.pid))?;
let _ = self.cmdline.set(val.clone());
Ok(val)
// Fallback: cached /proc/pid/cmdline.
Ok(self.ensure_cmdline()?.clone())
}

fn read_environ(&self) -> io::Result<Vec<u8>> {
if let Some(val) = self.environ.get() {
return Ok(val.clone());
fn read_environ(&self) -> io::Result<Vec<OsString>> {
// Try the current environ from the environ symbol (includes setenv changes).
if let Ok(final_env) = self.ensure_final_env() {
return Ok(final_env.clone());
}
let val = std::fs::read(format!("/proc/{}/environ", self.pid))?;
let _ = self.environ.set(val.clone());
Ok(val)
// Try the initial environ from the stack walk.
if let Ok(initial) = self.ensure_initial_stack() {
return Ok(initial.env.clone());
}
// Fallback: cached /proc/pid/environ.
Ok(self.ensure_environ()?.clone())
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior here (preferring environ symbol + initial stack walk, then falling back to /proc) isn’t covered by existing integration tests (the current examples/pargs_penv helper never mutates argv or calls setenv/putenv). Adding a test helper that (1) overwrites argv[0] in-place and (2) updates/creates an env var after startup would let tests assert that pargs still reports the original argv and penv reports the updated env when memory reads are permitted, and that fallback behavior is explicit when they aren’t.

Copilot uses AI. Check for mistakes.
Comment on lines +97 to +100
dwfl.modules(|module| {
if other_addr.is_some() {
return Ok(());
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_environ_symbol stops searching for non-libc copies as soon as it finds any libc environ symbol (if libc_addr.is_some() { return Ok(()) }). This contradicts the function’s stated preference for the copy-relocated (non-libc) slot and can cause the libc symbol (often stale/zero) to be returned even when the executable’s relocated symbol exists. Continue scanning after finding libc, and only short-circuit once a preferred non-libc address has been found (or once both candidates have been recorded).

Copilot uses AI. Check for mistakes.
Comment on lines +129 to +132
eprintln!(
"warning: cmdline reconstructed from apport ProcCmdline; \
arguments containing spaces or empty arguments may be wrong"
);
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning for lossy apport ProcCmdline parsing is emitted during parse_crash_fields, which runs when opening an apport .crash file even if the caller never requests cmdline output. That can produce unexpected stderr noise for unrelated commands (e.g., penv/pauxv). Consider deferring the warning until COREDUMP_CMDLINE is actually used as the cmdline source (or store a flag/metadata alongside the field and warn at read time).

Suggested change
eprintln!(
"warning: cmdline reconstructed from apport ProcCmdline; \
arguments containing spaces or empty arguments may be wrong"
);

Copilot uses AI. Check for mistakes.
Comment on lines +183 to +189
`cat /proc/[pid]/environ` or `ps e`, which only show a static snapshot
captured at process start, `penv(1)` reads live process memory (via the
`environ` symbol) to show the **current** environment, including any variables
added or modified at runtime with `setenv(3)` or `putenv(3)`. No other
Linux command-line tool provides this capability; on stock Linux the only
alternative is attaching a full debugger. This makes `penv(1)` invaluable
for debugging processes whose environment changes after startup.
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README now states that penv(1) reads live process memory to show the current environment, but the implementation falls back silently to the initial stack and then /proc/[pid]/environ when memory access or environ symbol lookup fails (e.g., due to ptrace/Yama restrictions). This makes the documentation misleading in common configurations. Please document the fallback behavior and/or emit an explicit warning when live-memory reads aren’t available so users know when they’re seeing a static snapshot.

Suggested change
`cat /proc/[pid]/environ` or `ps e`, which only show a static snapshot
captured at process start, `penv(1)` reads live process memory (via the
`environ` symbol) to show the **current** environment, including any variables
added or modified at runtime with `setenv(3)` or `putenv(3)`. No other
Linux command-line tool provides this capability; on stock Linux the only
alternative is attaching a full debugger. This makes `penv(1)` invaluable
for debugging processes whose environment changes after startup.
`cat /proc/[pid]/environ` or `ps e`, which read a static snapshot
captured at process start, `penv(1)` first attempts to read live process
memory (via the `environ` symbol) to show the **current** environment,
including any variables added or modified at runtime with `setenv(3)` or
`putenv(3)`. When live-memory access or `environ` symbol lookup is not
possible (for example due to ptrace/Yama restrictions, hardened builds, or
missing symbols), `penv(1)` falls back to inspecting the initial stack and
then `/proc/[pid]/environ`, which only provide the start-time snapshot of
the environment. In these fallback cases, the reported environment may not
reflect changes made after process startup, and the fallback currently
occurs silently so users should be aware of this limitation when debugging
processes whose environment changes at runtime.

Copilot uses AI. Check for mistakes.
@basil basil merged commit 42450b2 into master Mar 3, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: pargs Issues related to pargs(1) Component: Process Handle Issues related to process handles Component: Process Source Issues related to process sources Type: Documentation Improvements or additions to documentation Type: Enhancement A request, idea, or new functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pargs(1) support without /proc Decide what to do about environment filtering Print current environment

2 participants