Upstream describes LLDB as a next generation, high-performance debugger. It is built on top of LLVM/Clang toolchain, and features great integration with it. At the moment, it primarily supports debugging C, C++ and ObjC code, and there is interest in extending it to more languages.
In February, I have started working on LLDB, as contracted by the NetBSD Foundation. So far I've been working on reenabling continuous integration, squashing bugs, improving NetBSD core file support and updating NetBSD distribution to LLVM 8 (which is still stalled by unresolved regressions in inline assembly syntax). You can read more about that in my Mar 2019 report.
In April, my main focus was on fixing and enhancing the support for reading and writing CPU registers. In this report, I'd like to shortly summarize what I have done, what I have learned in the process and what I still need to do.
Buildbot status update
Last month I reported a temporary outage of buildbot service. I am glad to follow up on that and inform you that the service has been restored and the results of CI testing are once again available at: http://lab.llvm.org:8011/builders/netbsd-amd64. While the tests are currently failing, they still serve as useful source of information on potential issues and regressions.
The new discoveries include update on flaky tests problem. It turned out that the flaky markings I've tried to use to workaround it does not currently work with the lit test runner. However, I am still looking for a good way of implementing this. I will probably work on it more when I finish my priority tasks. It is possible that I will just skip the most problematic instead for the time being instead.
Additionally, the libc++ test suite identified that NetBSD is missing
the nexttowardl()
function. Kamil noticed that and asked me if
I could implement it. From quick manpage reading, I've came to
the conclusion that nexttowardl()
is equivalent to nextafterl()
,
and appropriately implemented it as an alias: 517c7caa3d9643 in src.
Fixing MM register support
The first task in my main TODO was to fix a bug in reading/writing MM
registers that was identified earlier. The MM registers were introduced
as part of MMX extensions to x86, and they were designed as overlapping
with the earlier ST registers used by x87 FPU. For this reason, they
are returned by the ptrace()
call as a single fx_87_ac
array
whose elements are afterwards to work on both kinds of registers.
The bug in question turned out to be mistaken use of fx_xmm
instead
of fx_87_ac
. As a result, the values of mm0..mm7 registers were
mapped to subsets of xmm0..xmm7 registers, rather than the correct
set of st(0)..st(7) registers. The fix for the problem in question
landed as r358178.
However, the fix itself was the easier part. The natural consequence of identifying a problem with the register was to add a regression test for it. This in turn triggered a whole set of events that deserve a section of their own.
Adding tests for register operations
Initially, the test for MM and XMM registers consisted of a simple
program written in pure amd64 assembler that wrote known patterns
to the registers in question, then triggered SIGTRAP via int3
,
and a lit test that run LLDB in order to execute the program, read
registers and compare their values to expected patterns. However,
following upstream recommendations it quickly evolved.
Firstly, upstream suggested replacing the assembly file with inline assembly in C or C++ program, in order to improve portability between platforms. As a result, I ended up learning how to use GCC extended inline assembly syntax (whose documentation is not exactly the most straightfoward to use) and created a test case that works fine both for i386 and amd64, and in a wide range of platforms supported by LLDB.
Secondly, it was necessary to restrict the tests into native runs
on i386 and amd64 hardware. I discovered that lit partially provides
for this, by defining native
feature whenever LLDB is being built
as native executable (vs cross-compiling). It also defined a few
platform-related features, so it seemed only natural to extend them
to provide explicit target-x86
and target-x86_64
features,
corresponding to i386 and amd64 targets. This was done in r358177.
Thirdly, upstream asked me to add tests also for other register types, as well as for writing registers. This overlapped with our need to test new register routines for NetBSD, so I've focused on them.
The main problem in adding more tests was that I needed to verify whether the processor supported specific instruction sets. For the time being, it seemed reasonable to assume that every possible user of LLDB would have at least SSE, and to filter tests specific to long mode on amd64 platform. However, adding tests for registers introduced by AVX extension required explicit check.
I have discussed the problem with Pavel Labath of LLDB upstream, and considered multiple options. His suggestion was to make the test program itself run cpuid instruction, and exit with a specific status if the needed registers are not supported. Then I could catch this status from dotest.py test and mark the test as unsupported. However, I really preferred using plain lit over dotest.py (mostly because it naturally resembled LLDB usage), and wanted to avoid duplicating cpuid code in multiple tests.
However, lit does not seem to support translating a specific exit status into 'unsupported'. The 'lit way' of solving this is to determine whether the necessary feature is available up front, and make the test depend on it. Of course, the problem was how to check supported CPU extensions from within lit.
Firstly, I've considered the possibility of determining cpuinfo from within Python. This would be the most trivial option, however Python stdlib does not seem to provide appropriate functions and I wanted to avoid relying on external modules.
Secondly, I've considered the possibility of running clang from within lit in order to build a simple test program running cpuid, and using it to fill the supported features.
Finally, I've arrived at the simpler idea of making lit-cpuid
,
an additional utility program built as part of LLDB. This program
uses very nice cpuid API exposed by LLVM libraries in order to determine
the available extensions and print them for lit's use. This landed
as r359303
and opened the way for more register tests.
To this moment, I've implemented the following tests:
tests for mm0..mm7 64-bit MMX registers and xmm0..xmm7 128-bit SSE registers mentioned above, common to i386 and amd64; read: r358178, write: r359681.
tests for the 8 general purpose registers:
*AX
..*DX
,*SP
,*BP
,*SI
,*DI
, in separate versions for i386 (32-bit registers) and amd64 (64-bit registers); read: r359438, write: r359441.tests for the 8 additional 64-bit general purpose registers r8..r15, and 8 additional 128-bit xmm8..xmm15 registers introduced in amd64; read: r359210, write: r359682.
tests for the 256-bit ymm0..ymm15 registers introduced by AVX, in separate versions for i386 (where only ymm0..ymm7 are available) and amd64; read: r359304, write: r359783.
tests for the 512-bit zmm0..zmm31 registers introduced by AVX-512, in separate versions for i386 (where only zmm0..zmm7 are available) and amd64; read: r359439, write: r359797.
tests for the xmm16..xmm31 and ymm16..ymm31 registers that were implicitly added by AVX-512 (xmm, ymm and zmm registers overlap/extend their predecessors); read: r359780, write: r359797.
Fixing memory reading and writing routine
The general-purpose register tests were initially failing on NetBSD. More specifically, the test worked correctly to the point of reading registers but afterwards lldb indicated a timeout and terminated the program instead of resuming it.
While investigating this, I've discovered that it is caused
by overwriting RBP. Curious enough, it happened only when large values
were written to it. I've 'bisected' it to an approximate max value
that still worked fine, and Kamil has identified it to be close to
vm.maxaddress
.
GDB did not suffer from this issue. I've discussed it with Pavel Labath
and he suggested it might be related to unwinding. Upon debugging it
further, I've noticed that lldb-server is apparently calling ptrace()
in an infinite loop, and this is causing communications with the CLI
process (LLDB is using client-server model internally) to timeout.
Ultimately, I've pinpointed it to memory reading routine not expecting
read to set piod_len
to 0 bytes (EOF). Apparently, this is exactly
what happens when you try to read past max virtual memory address.
I've made a patch for this. While reviewing it, Kamil also noticed that the routines are not summing up results of multiple split read/write calls as well. I've addressed both issues in r359572
Necessary extension of ptrace interface
At the moment, NetBSD implements 4 requests related to i386/amd64 registers:
PT_[GS]ETREGS
which covers general-purpose registers, IP, flags and segment registers,PT_[GS]ETFPREGS
which covers FPU registers (and xmm0..xmm15 registers on amd64),PT_[GS]ETDBREGS
which covers debug registers,PT_[GS]ETXMMREGS
which covers xmm0..xmm15 registers on i386 (not present on amd64).
The interface is missing methods to get AVX and AVX-512 registers,
namely ymm0..ymm15 and zmm0..zmm31. Apparently there's struct xsave_ymm
for the former in kernel headers but it is not used
anywhere. I am considering different options for extending this.
Important points worth noting are that:
YMM registers extend XMM registers, and therefore overlap with them. The existing
struct xsave_ymm
seems to use that, and expect only the upper half of YMM register to be stored there, with the lower half being accessible via XMM. Similar fact holds for ZMM vs YMM.AVX-512 increased the register count from 16 to 32. This means that there are 16 new XMM registers that are not accessible via current API.
This also opens questions about future extensibility of the interface. After all, we are not only seeing new register types added but also an increase in number of registers of existing types. What I'd really like to avoid is having an increasingly cluttered interface.
How are other systems solving it?
Linux introduced PT_[GS]ETREGSET
request that accepts a NT_*
constant identifying register set to operate on, and iovec
structure
containing buffer location and size. For x86, the constants equivalent
to older PT_*
requests are available, and a NT_X86_XSTATE
that
uses full XSAVE area. The interface supports operating on complete
XSAVE area only, and requires the caller to identify the correct size
for the CPU used beforehand.
FreeBSD introduced PT_[GS]ETXSTATE
request that operates on full or
partial XSAVE data. If the buffer provided is smaller than necessary,
it is partially filled. Additionally, PT_GETXSTATE_INFO
is provided
to get the buffer size for the CPU used.
A similar solution would be convenient for future extensions, as the caller would be able to implement them without having the kernel updated. Its main disadvantage is that it requires all callers to implement XSAVE area format parsing. Pavel Labath also suggested that we could further optimize it by supplying an offset argument, in order to support partial XSAVE area transfer.
An alternative is to keep adding new requests for new register types,
i.e. PT_[GS]ETYMMREGS
for YMM, and PT_[GS]ETZMMREGS
for ZMM.
In this case, it is necessary to further discuss the data format used.
It could either be the 'native' XSAVE format (i.e. YMM would contain
only upper halves of the registers, ZMM would contain upper halves
of zmm0..zmm15 and complete data of zmm16..zmm31), or more conveniently
to clients (at the cost of data duplication) whole registers.
If the latter, another question is posed: should we provide a dedicated
interface for xmm16..xmm31 (and ymm16..ymm31) then, or infer them from
zmm16..zmm31 registers?
Future plans
My work continues with the two milestones from last month, plus a third that's closely related:
Add support for FPU registers support for NetBSD/i386 and NetBSD/amd64.
Support XSAVE, XSAVEOPT, ... registers in core(5) files on NetBSD/amd64.
Add support for Debug Registers support for NetBSD/i386 and NetBSD/amd64.
The most important point right now is deciding on the format for passing the remaining registers, and implementing the missing ptrace interface kernel-side. The support for core files should follow using the same format then.
Userland-side, I will work on adding matching ATF tests for ptrace features and implement LLDB side of support for the new ptrace interface and core file notes. Afterwards, I will start working on improving support for the same things on 32-bit (i386) executables.
This work is sponsored by The NetBSD Foundation
The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:
We are very happy to announce
The NetBSD Foundation Google Summer of Code 2019 projects:
- Akul Abhilash Pillai - Adapting TriforceAFL for NetBSD kernel fuzzing
- Manikishan Ghantasala - Add KNF (NetBSD style) clang-format configuration
- Siddharth Muralee - Enhancing Syzkaller support for NetBSD
- Surya P - Implementation of COMPAT_LINUX and COMPAT_NETBSD32 DRM ioctls support for NetBSD kernel
- Jason High - Incorporation of Argon2 Password Hashing Algorithm into NetBSD
- Saurav Prakash - Porting NetBSD to HummingBoard Pulse
- Naveen Narayanan - Porting WINE to amd64 architecture on NetBSD
The communiting bonding period - where students get in touch with mentors and community - started yesterday. The coding period will start from May 27 until August 19.
Please welcome all our students and a big good luck to students and mentors!
A big thank to Google and The NetBSD Foundation organization mentors and administrators!
Looking forward to a great Google Summer of Code!
I am improving signaling code in the NetBSD kernel, covering corner cases with regression tests, and improving the documentation. I've been working at the level of sytems calls (syscalls): forking, threading, handling these with GDB, and tracing syscalls. Some work happens behind the scenes as I support the work of Michal Gorny on LLDB/ptrace features.
clone(2)/__clone(2) tracing fixes
clone(2) (and its alias __clone(2)) is a Linux-compatible system call that is equivalent to fork(2) and vfork(2). However it was more customization options. Some programs use clone(2) directly and in some cases it's just easier to precompile the same program also for the NetBSD distribution, without extra changes, using clone(2) directly in the program for NetBSD.
During my work on the fork1(9) kernel function -- which handles debugger-related events -- I implemented regression tests of this syscall. This was combined with certain supported modes of operation of clone(2), particularly checking supported flags. These combinations did not use more than a one in the same test.
Naturally, a judicious selection of edge cases in the regression tests should give meaningful results. I plan to stress the kernel with a random set of flags with a kernel fuzzer. In turn, this will help to catch immediate kernel problems quickly.
During my work I have discovered that support for clone(2) for a debugger has been defective since inception. This never worked due to a small 1-byte programming mistake. The fix landed in sys/kern/kern_fork.c r.1.207. As the fork1(9) code evolved since the introduction of PSL_TRACEFORK, the fix is no longer a single-liner, but still it removes only 3 bytes from the kernel code (in the past it would be 1 byte removal)!
@@ -477,11 +477,11 @@ fork1(struct lwp *l1, int flags, int exitsig, void *stack, size_t stacksize, * Trace fork(2) and vfork(2)-like events on demand in a debugger. */ tracefork = (p1->p_slflag & (PSL_TRACEFORK|PSL_TRACED)) == - (PSL_TRACEFORK|PSL_TRACED) && (flags && FORK_PPWAIT) == 0; + (PSL_TRACEFORK|PSL_TRACED) && (flags & FORK_PPWAIT) == 0; tracevfork = (p1->p_slflag & (PSL_TRACEVFORK|PSL_TRACED)) == - (PSL_TRACEVFORK|PSL_TRACED) && (flags && FORK_PPWAIT) != 0; + (PSL_TRACEVFORK|PSL_TRACED) && (flags & FORK_PPWAIT) != 0; tracevforkdone = (p1->p_slflag & (PSL_TRACEVFORK_DONE|PSL_TRACED)) == - (PSL_TRACEVFORK_DONE|PSL_TRACED) && (flags && FORK_PPWAIT); + (PSL_TRACEVFORK_DONE|PSL_TRACED) && (flags & FORK_PPWAIT); if (tracefork || tracevfork) proc_changeparent(p2, p1->p_pptr); if (tracefork) {
(flags & FORK_PPWAIT)
&
implements bitwise AND. Logical AND (&&
) was intended.
Despite many eyes reading and editing this code, this particular issue
was overlooked
until the introduction of the regression tests.
The effect of this change was that every clone(2) variation was incorrectly
mapped into corresponding fork(2)/vfork(2) event.
More information about the C semantics can be checked in web resources on web pages.
Now clone(2) should work comparably well, for example, when compared to fork(2) and vfork(2). Current work is to map all clone(2) calls that have the property of a stopped parent to vfork(2); otherwise, if they don't have this property, they should be mapped to fork(2). This approach allows me to directly map clone(2) variations into well defined interfaces in debuggers that distinguish 3 types of forking events:
- FORK
- VFORK
- VFORK_DONE
From a debugger's point of view it doesn't matter whether or not clone(2) shares the file descriptor table with its parent. It's an implementation detail, and either way it is expected to be handled by a tracer in the same way.
More options of clone(2) can be found in the NetBSD manual pages.
I still plan to check a similar interface, posix_spawn(3), which performs both operations in one call: clone(2) and exec(2). Most likely, according to my code reading, the syscall is not handled appropriately in the kernel space and I will need to map it into proper forking events. My motivation here is to support all system interfaces spawning new processes.
child_return(9) refactoring
child_return(9) is a kernel function that prepares a newly spawned child to return value 0 from fork(2), while its parent process returns child's process id. Originally the child_return(9) function was purely implemented in the MD part of each NetBSD port. I've since changed this and converted child_return(9) into MI code that is shared between all architectures. md_child_return() is now used for port specific code only.
The updated child_return(9) contains ptrace(2)- and ktruss(1)-related code that is shared now between all ports.
Incidentally, I noted a bug in a set of functions in NetBSD's aarch64 (ARM64) port. A set of functions called in thir original child_return() failed to call userret(9). In turn, the return path to user-mode procedures was incorrect. The bug has since been corrected and this resulted in passing several ATF tests.
This code has been also hardened for races that are theoretically possible, but unlikely to happen in practice.. on the other hand such statement usually means that a bug can be triggered easily in a loop within a short period of time. In order to stop risking and assuming that all the code now and in future changes is safe enough, I've added additional checks that assume that we won't generate a debugger related event in abnormal conditions like just receiving a SIGKILL, ignoring it and overwriting it with another signal SIGTRAP. The code has been modified to use racy-check, check for condition and if it evaluates to true, I am performing locking operations and recheck in new conditions the state, rechecking the integrity state before generating an event for a debugger.
/* * MI code executed in each newly spawned process before returning to userland. */ void child_return(void *arg) { struct lwp *l = arg; struct proc *p = l->l_proc; if (p->p_slflag & PSL_TRACED) { /* Paranoid check */ mutex_enter(proc_lock); if (!(p->p_slflag & PSL_TRACED)) { mutex_exit(proc_lock); goto my_tracer_is_gone; } mutex_enter(p->p_lock); eventswitch(TRAP_CHLD); } my_tracer_is_gone: md_child_return(l); /* * Return SYS_fork for all fork types, including vfork(2) and clone(2). * * This approach simplifies the code and avoids extra locking. */ ktrsysret(SYS_fork, 0, 0); }
Forking improvements under ptrace(2)
I've refactored the test cases and verified some of the forking semantics in the kernel. This included narrow cases such as nested vfork(2). Thankfully this worked correctly "as is" (except typical vforking(2) races that still exist).
I've added support to fork1(9) scenarios for detaching or killing a process in the middle of the process of spawning of a new child. This is needed in debuggers such as GDB that can either follow forkers or forkees, immediately detaching the other one. Bugs in these scenarios have been corrected and I have verified that GDB behaves correctly in these situations.
Threading improvements under ptrace(2)
I've reworked the current code for reporting threading (LWP) events to debuggers (LWP_CREATE and LWP_EXITED). The updated revision is also no longer prone to masking SIGTRAP in a child. Since these improvements, LWP events are now significantly more stable than they used to be. Reporting LWP_EXITED is still incorrect as there is a race condition between WAIT() and EXIT(). The effect of this is that the signal from EXIT() is never delivered to a debugger that is polling for it, and therefore it is missed.
Other changes
ATF ptrace(2) test corrections for handling of SIGILL crashes on SPARC, and detection of FPU availability on ARM, have been added.
The PT_IO operation in ptrace(2) can result in a false positive success value, however no byte transfer operation has been performed. Michal Gorny detected this problem in LLDB for invalid frame pointer reads. The NetBSD driver overlooked this scenario and was looping infinitely. This surprising property was also detected in PT_WRITE/PT_READ operations and found to be triggered in GDB. As a workaround I have disabled 0-length transfer requests in PT_IO by returning EINVAL. Zero-byte transfer operations bring the PT_WRITE/PT_READ calls into question, as we have no way to distinguish successful operation from an empty-byte transfer returning success.
In turn, this means PT_WRITE/PT_READ should be deprecated. I plan to document this clearly.
I've decided to finally forbid setting Program Counter to 0x0 in PT_CONTINUE/PT_DETACH/etc as it's hardly ever a valid executable address. There are two main factors here:
I've added previously missing support for KTR (ktrace(1)) events. In particular, this is for debugger-related signals except vfork(2) because this creates unkillable processes. I am considering fixing this by synchronizing the parent of the vfork(2)ed process and its child. This change will enable the debugger to process event signals in ktruss(1) logs.
GDB support improvements
During the last month I introduced a regression bug in passing crash signals to the debugger. I reduced some specific information passed to a tracer, indirectly improving NetBSD stability support while debugging in GDB. The trade-off was a slight reduction in readability of LLDB backtraces and crash reports.
Independently I've been asked by Christos Zoulas to fix GDB support for threads. I addressed the kernel shortage quickly and reworked the NetBSD platform code in GDB. Dead code (which was leftover from the original FreeBSD original code) was removed, missing code added, and monitoring debugger-related events was reworked. The latter supports the improved kernel APIs produced during my overall work. GDB still exhibits issues with threads, for example, for convoluted Golang binaries, but has been improved to the extent that ATF regression tests for GDB pass again.
Syscall tracing API
The episode of GDB fixes stimulated me to add support for passing the syscall number along with the SIGTRAP signal. I've described the interface in the commit message:
commit 7dd3c39f7d951a10642fce0f99d9e86d28156836 Author: kamil Date: Mon May 6 08:05:03 2019 +0000 Ship with syscall information with SIGTRAP TRAP_SCE/TRAP_SCX for tracers Expand siginfo_t (struct size not changed) to new values for SIGTRAP TRAP_SCE/TRAP_SCX events. - si_sysnum -- syscall number (int) - si_retval -- return value (2 x int) - si_error -- error code (int) - si_args -- syscall arguments (8 x uint64_t) TRAP_SCE delivers si_sysnum and si_args. TRAP_SCX delivers si_sysnum, si_retval, si_error and si_args. Users: debuggers (like GDB) and syscall tracers (like strace, truss). This MI interface is similar to the Linux kernel proposal of PTRACE_GET_SYSCALL_INFO by the strace developer team.
In order to verify the updated API before merging it into the kernel, I wrote a truss-like or strace-like tool for the kernel interfaces. I authored three versions of picotrace: the first in C; the second in Lua+C; and the third one with C. The final 3rd version has been published and imported as pkgsrc/devel/picotrace.
The upstream source code is available online at https://github.com/krytarowski/picotrace.
It is documented in pkgsrc as follows:
picotrace enables syscall trace logging for the specified processes. The tracer uses the ptrace(2) system call to perform the tracing. The picotrace program implements bare functionality by design. It has no pretty printing of data structures or interpretation of numberical arguments to their corresponding syscalls. picotrace is designed to be a framework for other more advanced tracers, and to illustrate canonical usage of the ptrace(2). New features are not expected unless they reflect a new feature in the kernel.
Summary
I was able run literally all existing ATF ptrace(2) from the testsuite and pass all of them. Unfortunately, VFORK and LWP operations still present race conditions in the kernel and can cause failures. In order to reduce concerns from other developers, I have disabled the racy tests by default. There is also a new observation that one test that used to be rock stable is now sometimes flaky. It has not been investigated, but I suspect that something with the pipe(2) kernel code has a regression or surfaced an old problem. I plan to investigate this once I will finish other ptrace(2) tasks.
Plan for the next milestone
I will visit BSDCan this month to speak about NVMM and HAXM. I will resume my work on forking and threading bugs after the conference. The lwp_exit() and wait() race will be prioritized as it affects most current users. After resolving this problem, I will be back to posix_spawn(2), followed by addressing vfork(2) races.
This work was sponsored by The NetBSD Foundation.
The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:
The NetBSD Project is pleased to announce NetBSD 8.1 RC1, the first (and hopefully final) release candidate for the upcoming NetBSD 8.1 release.
Over the last year, many changes have been made to the NetBSD 8 stable branch. As a stable branch the release engineering team and the NetBSD developers are conservative with changes to this branch and many users rely on the binaries from our regular auto-builds for production use. Now it is high time to cut a formal release, right before we go into the next release cycle with the upcoming branch for NetBSD 9.
Besides the workarounds for the latest CPU specific vulnerabilities, this also includes many bug fixes and a few selected new drivers. For more details and instructions see the 8.1 RC1 announcement.
Get NetBSD 8.1 RC1 from our CDN (provided by fastly) or one of the ftp mirrors.
Complete source and binaries for NetBSD are available for download at many sites around the world. A list of download sites providing FTP, AnonCVS, and other services may be found at https://www.NetBSD.org/mirrors/.
Please test RC1, we are looking forward to your feedback. Please send-pr any bugs or mail us at releng at NetBSD.org for more general comments.