I've finished the integration of sanitizers with the distribution build framework. A bootable and installable distribution is now available, verified with Address Sanitizer, with Undefined Behavior Sanitizer, or with both concurrently. A few dozen bugs were detected and the majority of them addressed.

LLVM sanitizers are compiler features that help find common software bugs. The following sanitizers are available:

  • TSan: Finds threading bugs,
  • MSan: Finds uninitialized memory read,
  • ASan: Finds invalid address usage bugs,
  • UBSan: Finds unspecified code semantics in runtime.

The new MKSANITIZER option supports full coverage of the NetBSD code base with these sanitizers, which helps reduce bugs and serve high security demands.

A brief overview of MKSANITIZER

A sanitizer is a special type of addition to a compiled program, and is included from a toolchain (LLVM or GCC). There are a few types of sanitizers. Their usual purposes are: bug detecting, profiling, and security hardening.

NetBSD already supports the most useful ones with a decent completeness:

  • Address Sanitizer (ASan, memory usage bug detector),
  • Undefined Behavior Sanitizer (UBSan, unspecified semantics in runtime detector),
  • Thread Sanitizer (TSan, data race detector), and
  • Memory Sanitizer (MSan, uninitialized memory read detector).

It's possible to combine compatible sanitizers in a single application; NetBSD and MKSANITIZER support doing so.

There are various advantages and limitations. Properties and requirements vary, mainly reflecting the type of sanitization. Comparisons against other software with similar properties (such as Valgrind) may provide a fuller picture.

Sanitizers usually introduce a relatively small overhead (~2x) compared to Valgrind (~20x). The portability is decent as the sanitizers don't depend heavily on the underlying CPU architecture, and in the UBSan case they basically work on everything including VAX. In the Valgrind case the portability is extremely dependent on the kernel and CPU, thus making this diagnostic tool very difficult to port across platforms. ASan, MSan and TSan require large addressable memory due to their design. This restricts MSan and TSan to 64-bit architectures with a lot of RAM, with ASan for ones that cover completely all of the 4GB (32-bit) address space (it's still possible to use small resources with ASan but it's a tradeoff between usability, time investment, and gain). Although the memory usage is higher with sanitized programs, the modern design and implementation of the memory management subystem in the NetBSD kernel allows to manage it lazily and regardless of reserving TBs of buffers for metadata, the physically used memory is significantly lower usually doubling the regular memory usage by a process. Memory demands are higher for processes that are in the process of fuzzing and thus there is an option to restrict the maximum number of used physical pages that will cause the program to halt (by default 2GB for libFuzzer). A selection of LLVM Sanitizers may conflict with some tools (like Valgrind) and mechanisms (like PaX ASLR in the ASan, TSan and MSan case). Other ones like PaX MPROTECT (sometimes called W^X) are fully compatible with all the currently supported sanitizers.

The main purposes of sanitizations from a user point of view are:

  • bug detecting and assuring correctness,
  • high security demands, and
  • auxiliary feature for fuzzing.

It's worth adding a few notes on the security part as there are numerous good security approaches. One of them is proactive secure coding that is a regime of using safe constructs in the source code and replacement of functions that are prone to errors with versions that are harder to misuse.

However the disadvantage of this approach is that it's just a regime in the coding period. The probability of introducing a bug is minimized, however it does still exist. A problem that is in a program of either style (proactive secure style and careless coding) are almost indistinguishable in the final product and an attacker can use the same methods to violate the program like integer overflow or use after free.

The usual way to prevent bugs is to assume that a code is buggy and add mitigation that will aim to reduce the chance to exploit it. An example of this is the sandboxing of an application.

A code that is aided with sanitizers can be configured, either at build-time or run-time, to report the bug in the execution time of e.g. integer overflow and cause an application to halt immediately. No coding regime can have the same effect and perhaps the number of programming languages with this property is also limited.

In order to use sanitizers effectively within a distribution there is need to rebuild a program and all of its dependencies (with few exceptions) with the same sanitizing configuration. Furthermore, in order to use some versions of fuzzing engines with some types of sanitizers we need to build the fuzzing libraries with the same sanitization as well (this is true for e.g. Memory Sanitizer used together with libFuzzer).

This was my primary motivation towards introduction of a new NetBSD distribution build option: MKSANITIZER.

NetBSD is probably the only distribution that ships with a fully sanitized distribution option. Today there is "just" need for a locally patched external LLVM toolchain and the work on this is still ongoing.

The whole userland sanitization skips not applicable exceptions:

  • low-level libc libraries crt0, crtbegin, crtend, crti, crtn etc,
  • libc,
  • libm,
  • librt,
  • libpthread,
  • bootloader,
  • crunchgen programs like rescue,
  • dynamic ELF loader (implemented as a library),
  • as of today static libraries and executables,
  • as of today as an exception ldd(1) that borrows parts from the dynamic ELF loader.
The selection of unsanitized base libraries like libc is the design choice of sanitizers that a part of the base code is unsanitized and sanitizers install interceptors for their public symbols. Sanitizers expect to use their API from high level, their features and so prevent recursive sanitization (although this happens sometimes in narrow cases). A good illustration of this design choice is the process of sanitization of users of the threading library. Sanitizers and TSan in particular register interceptors for the public symbols of libpthread and treat it mostly as a black box (there are few exceptions). As an alternative with a fully sanitized libpthread, there would need to be fully OS dependent implementation of each feature in sanitizers based on the selection of kernel features, handle relatively opaque syscalls, CPU specific differences in the implementation etc... and in the end it would be very difficult without the full reimplementation of libpthread to handle operations like pthread_join(3).

The sanitization of static programs as of today is a low priority and falls outside the scope of my work.

The situation with ldd(1) will be cleared in future and it will be most probably sanitized.

Kernel and kernel modules use a different version of sanitizers and the porting process of Kernel-AddressSanitizer and Kernel-UndefinedBehaviorSanitizer is ongoing out of the MKSANITIZER context.

There used to be an analogous attempt in the Gentoo land (asantoo), however these efforts stalled two years ago. The Google Chromium team uses a set of scripts to bootstrap sanitized dependencies for their programs on top of a Linux distribution (as of today Ubuntu Trusty x86_64).

I've started to document bugs detected with MKSANITIZER in a dedicated directory on my NetBSD homepage with my code and notes. So far there are 35 documented findings. Most of them are real problems in programs, some of them might be considered overcautious (mostly ones detected with UBSan) and probably all of them are without serious security risk or privilege escalation or system crash. Some of the findings (0029-0035 - MemorySanitizer userland one) contain problems located probably in sanitizers (the proper NetBSD support in them).

This list presents that some of the problems are located in formally externally-maintained software like tmux, heimdal, grep, nvi or nawk.

I think that the following patch is a good example of a good finding for a privileged (setuid) program passwd(1) that reads a vector out of bounds and write a null character into a random byte on the stack (documented as report 0024).

From 28dd358940af30f434a930fd1977e3bf2b69dcb1 Mon Sep 17 00:00:00 2001
From: kamil 
Date: Sun, 24 Jun 2018 01:53:14 +0000
Subject: [PATCH] Prevent underflow buffer read in trim_whitespace() in
 libutil/passwd.c

If a string is empty or contains only white characters, the algorithm of
removal of white characters at the end of the passed string will read
buffer at index -1 and keep iterating backward.

Detected with MKSANITIZER/ASan when executing passwd(1).
---
 lib/libutil/passwd.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/lib/libutil/passwd.c b/lib/libutil/passwd.c
index 9cc1d481a349..cee168e7d678 100644
--- a/lib/libutil/passwd.c
+++ b/lib/libutil/passwd.c
@@ -1,4 +1,4 @@
-/*	$NetBSD: passwd.c,v 1.52 2012/06/25 22:32:47 abs Exp $	*/
+/*	$NetBSD: passwd.c,v 1.53 2018/06/24 01:53:14 kamil Exp $	*/
 
 /*
  * Copyright (c) 1987, 1993, 1994, 1995
@@ -31,7 +31,7 @@
 
 #include 
 #if defined(LIBC_SCCS) && !defined(lint)
-__RCSID("$NetBSD: passwd.c,v 1.52 2012/06/25 22:32:47 abs Exp $");
+__RCSID("$NetBSD: passwd.c,v 1.53 2018/06/24 01:53:14 kamil Exp $");
 #endif /* LIBC_SCCS and not lint */
 
 #include 
@@ -503,13 +503,21 @@ trim_whitespace(char *line)
 
 	_DIAGASSERT(line != NULL);
 
+	/* Handle empty string */
+	if (*line == '\0')
+		return;
+
 	/* Remove leading spaces */
 	p = line;
 	while (isspace((unsigned char) *p))
 		p++;
 	memmove(line, p, strlen(p) + 1);
 
-	/* Remove trailing spaces */
+	/* Handle empty string after removal of whitespace characters */
+	if (*line == '\0')
+		return;
+
+	/* Remove trailing spaces, line must not be empty string here */
 	p = line + strlen(line) - 1;
 	while (isspace((unsigned char) *p))
 		p--;

The first boot of a MKSANITIZER distribution with Address Sanitizer

The process of getting a bootable and installable (and ignoring the aspect of buildable and generatable) installation ISO image was a loop of fixing bugs and retrying the process. At the end of the process there is an option to install a fully sanitized userland with ASan, UBSan or both. The MSan version is scheduled after finishing the kernel ptrace(2) work. Other options like a target prebuilt with ThreadSanitizer, safestack or The Scudo Hardened Allocator are untested.

I have also documented an example of the Heimdal bug that appeared during the login attempt (and actually preventing it) to a fully ASanitized userland:

This particular issue has been fixed with the following patch:

From ddc98829a64357ad73af0d0fa60c8d9c8499cce3 Mon Sep 17 00:00:00 2001
From: kamil 
Date: Sat, 16 Jun 2018 18:51:36 +0000
Subject: [PATCH] Do not reference buffer after the code scope {}

rk_getpwuid_r() returns a pointer pwd->pw_dir to a buffer pwbuf[].

It's not safe to store another a copy of pwd->pw_dir in outter scope and
use it out of the scope where there exists pwbuf[].

This fixes a problem reported by ASan under MKSANITIZER.
---
 crypto/external/bsd/heimdal/dist/lib/krb5/config_file.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/crypto/external/bsd/heimdal/dist/lib/krb5/config_file.c b/crypto/external/bsd/heimdal/dist/lib/krb5/config_file.c
index 47cb4481962e..6af30502ed5e 100644
--- a/crypto/external/bsd/heimdal/dist/lib/krb5/config_file.c
+++ b/crypto/external/bsd/heimdal/dist/lib/krb5/config_file.c
@@ -1,4 +1,4 @@
-/*	$NetBSD: config_file.c,v 1.3 2017/09/08 15:29:43 christos Exp $	*/
+/*	$NetBSD: config_file.c,v 1.4 2018/06/16 18:51:36 kamil Exp $	*/
 
 /*
  * Copyright (c) 1997 - 2004 Kungliga Tekniska Hogskolan
@@ -430,6 +430,8 @@ krb5_config_parse_file_multi (krb5_context context,
     if (ISTILDE(fname[0]) && ISPATHSEP(fname[1])) {
 #ifndef KRB5_USE_PATH_TOKENS
 	const char *home = NULL;
+	struct passwd pw, *pwd = NULL;
+	char pwbuf[2048];
 
 	if (!_krb5_homedir_access(context)) {
 	    krb5_set_error_message(context, EPERM,
@@ -441,9 +443,6 @@ krb5_config_parse_file_multi (krb5_context context,
 	    home = getenv("HOME");
 
 	if (home == NULL) {
-	    struct passwd pw, *pwd = NULL;
-	    char pwbuf[2048];
-
 	    if (rk_getpwuid_r(getuid(), &pw, pwbuf, sizeof(pwbuf), &pwd) == 0)
 		home = pwd->pw_dir;
 	}

Sending this patch upstream is on my TODO list, this means that other projects can benefit from this work. A single patch preventing NULL pointer arithmetic for tmux has been already submitted upstream and merged.

After the process of long run of booting newer versions of locally patched distribution I've finally entered the functional shell.

And a stored "copy-pasted" terminal screenshot after login into a shell:

also known as NetBSD-current.  It is very possible that it has serious bugs,
regressions, broken features or other problems.  Please bear this in mind
and use the system with care.

You are encouraged to test this version as thoroughly as possible.  Should you
encounter any problem, please report it back to the development team using the
send-pr(1) utility (requires a working MTA).  If yours is not properly set up,
use the web interface at: http://www.NetBSD.org/support/send-pr.html

Thank you for helping us test and improve NetBSD.

We recommend that you create a non-root account and use su(1) for root access.
qemu# uname -a
NetBSD qemu 8.99.19 NetBSD 8.99.19 (GENERIC) #12: Sat Jun 16 02:39:37 CEST 2018
 root@chieftec:/public/netbsd-root/sys/arch/amd64/compile/GENERIC amd64
qemu# nm /bin/ksh |grep asan|grep init
0000000000439bf8 B _ZN6__asan11asan_initedE
0000000000439bfc B _ZN6__asan20asan_init_is_runningE
00000000004387a1 b _ZN6__asanL14tsd_key_initedE
0000000000430f18 b _ZN6__asanL20dynamic_init_globalsE
000000000043a190 b _ZZN6__asan18asanThreadRegistryEvE11initialized
00000000000cfaf0 T __asan_after_dynamic_init
00000000000cf8a0 T __asan_before_dynamic_init
0000000000199b50 T __asan_init
qemu#

The sshd(8) crash has been fixed by Christos Zoulas. There are still at least 2 ASan unfixed bugs left in the installer and few ones that prevent booting and using the distribution without noting that the sanitizers are enabled. The most notorious ones are ssh(1) & sshd(8) startup breakage and egrep(1) misbehavior in corner cases, both might be false positives and bugs in the sanitizers.

Validation of the MKSANITIZER=yes distribution

I've managed to execute the ATF regression tests against a sanitized distribution prebuilt with Address Sanitizer and in another attempt against Undefined Behavior Sanitizer.

In my setup of the external toolchain I had broken C++ runtime library caused with a complicated bootstrap chain. The process of building various LLVM projects from a GCC distribution requires generic work with the LLVM projects and there is need to build and reuse intermediate steps. For example, the compiler-rt project that contains various low-level libraries (including sanitizers) requires Clang as the compiler, as otherwise it's not buildable. This is the reason why I've deferred testing all the features in the current stage and I'm trying to coordinate with the maintainer Joerg Sonnenberger the process of upgrading the LLVM projects in the NetBSD distribution. I will reuse it to rebase the patches of mine and ship a readme text to users and other developers expecting to run a release with the MKSANITIZER option.

The lack of C++ runtime pushed me towards reusing non-sanitized ATF tests (as the ATF framework is written in C++) against the sanitized userland. Two bugs have been detected:

  • expr(1) triggering Undefined Behavior in the routines detecting overflow in arithmetic operations,
  • sh(1) use after free in corner case of redefining an active function.

I've addressed the expr(1) issues and added new ATF tests in order to catch regressions in future potential changes. The Almquist Shell bug has been reported to the maintainer K. Robert Elz and fixed accordingly.

libFuzzer integration with the userland programs

During the Google Summer of Code project: libFuzzer integration with the basesystem by Yang Zheng it has been detected that the original expr(1) fix introduced by myself is not fully correct.

Yang Zheng has detected that the new version of expr(1) is still crashing in narrow cases. I've checked his integration patch of expr(1) with libFuzzer, reproduced the problem myself and documented:

$ ./expr -only_ascii=1 -max_len=32 -dict=expr-dict expr_corpus/ 1>/dev/null 
Dictionary: 12 entries
INFO: Seed: 2332047193
INFO: Loaded 1 modules   (725 inline 8-bit counters): 725 [0x7a11f0, 0x7a14c5), 
INFO: Loaded 1 PC tables (725 PCs): 725 [0x579d18,0x57ca68), 
INFO:      269 files found in expr_corpus/
INFO: seed corpus: files: 269 min: 1b max: 31b total: 3629b rss: 29Mb
expr.y:377:12: runtime error: signed integer overflow: 9223172036854775807 * -3 cannot be represented in type 'long'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior expr.y:377:12 in 
MS: 0 ; base unit: 0000000000000000000000000000000000000000
0x39,0x32,0x32,0x33,0x31,0x37,0x32,0x30,0x33,0x36,0x38,0x35,0x34,0x37,0x37,0x35,0x38,0x30,0x37,0x20,0x2a,0x20,0x2d,0x33,
9223172036854775807 * -3
artifact_prefix='./'; Test unit written to ./crash-9c3dd31298882557484a14ce0261e7bfd38e882d
Base64: OTIyMzE3MjAzNjg1NDc3NTgwNyAqIC0z

And the offending operation is INT * -INT:

$ eval ./expr-ubsan '9223372036854775807 \* -3' expr.y:377:12: runtime error: signed integer overflow: 9223372036854775807 * -3 cannot be represented in type 'long' -9223372036854775805

This has been fixed as well and the set of ATF tests for expr(1) extended for missing scenarios.

MKSANITIZER implementation

The initial implementation of MKSANITIZER has been designed and implemented by Christos Zoulas. I took this code and continued working on it with an external LLVM toolchain (version 7svn with local patches). The final result has been documented in share/mk/bsd.README:

MKSANITIZER     if "yes", use the selected sanitizer to compile userland
                programs as defined in USE_SANITIZER, which defaults to
                "address". A selection of available sanitizers:
                        address:        A memory error detector (default)
                        thread:         A data race detector
                        memory:         An uninitialized memory read detector
                        undefined:      An undefined behavior detector
                        leak:           A memory leak detector
                        dataflow:       A general data flow analysis
                        cfi:            A control flow detector
                        safe-stack:     Protect against stack-based corruption
                        scudo:          The Scudo Hardened allocator
                It's possible to specify multiple sanitizers within the
                USE_SANITIZER option (comma separated). The USE_SANITIZER value
                is passed to the -fsanitize= argument to the compiler.
                Additional arguments can be passed through SANITIZERFLAGS.
                The list of supported features and their valid combinations
                depends on the compiler version and target CPU architecture.

As an illustration, in order to build a distribution with ASan and UBSan, using the LLVM toolchain one needs to enter a command line like:

./build.sh -V MKLLVM=yes -V MKGCC=no -V HAVE_LLVM=yes -V MKSANITIZER=yes -V USE_SANITIZER="address,undefined" distribution

There is an ongoing effort on upstreaming the remaining toolchain patches and right now we need to use a specially preprocessed external LLVM toolchain with a pile of local patches.

The GCC toolchain is a downstream for LLVM sanitizers and is out of the current focus, although there are local NetBSD patches for ASan, UBSan and LSan in GCC's libsanitizer. Starting with GCC 8.x, there is the first upstreamed block of NetBSD code pulled in from LLVM sanitizers.

Golang and TSan (-race)

There has been finally merged the compiler-rt update patch in Golang.

runtime/race: update most syso files to compiler-rt fe2c72

These were generated using the racebuild configuration from
https://golang.org/cl/115375, with the LLVM compiler-rt repository at
commit fe2c72c59aa7f4afa45e3f65a5d16a374b6cce26 for most platforms.

The Windows build is from an older compiler-rt revision, because the
compiler-rt build script for the Go race detector has been broken
since January 2017 (https://reviews.llvm.org/D28596).

Updates #24354.

Change-Id: Ica05a5d0545de61172f52ab97e7f8f57fb73dbfd
Reviewed-on: https://go-review.googlesource.com/112896
Reviewed-by: Brad Fitzpatrick 
Run-TryBot: Brad Fitzpatrick 
TryBot-Result: Gobot Gobot 

This means that the TSan/amd64 support syzo file has been included for NetBSD next to Darwin, FreeBSD and Linux (Windows is broken and no longer maintained). There is still need to merge the remaining patches for shell scripts and go files, and the code is still in review waiting for feedback.

Changes merged with the NetBSD sources

  • ksh: Remove symbol clash with libc -- rename twalk() to ksh_twalk()
  • ktruss: Remove symbol clash with libc -- rename wprintf() to xwprintf()
  • ksh: Remove symbol clash with libc -- rename glob() to ksh_glob()
  • Don't pass -z defs to libc++ with MKSANITIZER=yes
  • Mark sigbus ATF tests in t_ptrace_wait as expected failure
  • Make new DTrace and ZFS code buildable with Clang/LLVM
  • Fix the MKGROFF=no MKCXX=yes build
  • Correct Undefined Behavior in ifconfig(8)
  • Correct Undefined Behavior in libc/citrus
  • Correct Undefined Behavior in gzip(1)
  • Do not use index out of bounds in nawk
  • Change type of tilde_ok from int to unsigned int in ksh(1)
  • Rework perform_arith_op() in expr(1) to omit Undefined Behavior
  • Add 2 new expr(1) ATF tests
  • Prevent Undefined Behavior in shift of signed integer in grep(1)
  • Set NOSANITIZER in i386 mbr files
  • Disable sanitizers for libm and librt
  • Avoid Undefind Behavior in DEFAULT_ALIGNMENT in GNU grep(1)
  • Detect properly overflow in expr(1) for 0 + INT
  • Make the alignof() usage more portable in grep(1)
  • heimdal: Do not reference buffer after the code scope {}
  • Do not cause Undefined Behavior in vi(1)
  • Disable MKSANITIZER in lib/csu
  • Disable SANITIZER for ldd(1)
  • Set NOSANITIZER in rescue/Makefile
  • Add new option -s to crunchgen(1) -- enable sanitization
  • Make building of dhcp compatible with MKSANITIZER
  • Refactor MKSANITIZER flags in mk rules
  • Specify NOSANITIZER in distrib/amd64/ramdisks/common
  • Fix invalid free(3) in sysinst(8)
  • Fix integer overflow in installboot(8)
  • Specify -Wno-format-extra-args for Clang/LLVM in gpl2/gettext
  • sysinst: Enlarge the set_status[] array by a single element
  • Prevent underflow buffer read in trim_whitespace() in libutil/passwd.c
  • Fix stack use after scope in libutil/pty
  • Prevent signed integer left shift UB in FD_SET(), FD_CLR(), FD_ISSET()
  • Reset SANITIZERFLAGS when specified NOSANITIZER / MKSANITIZER=no
  • Enhance the documentation of MKSANITIZER in bsd.README
  • Avoid unportable offsetof(3) calculation in nvi in log1.c
  • Add a framework for renaming symbols in libc&co for MKSANITIZER
  • Specify SANITIZER_RENAME_SYMBOL in nvi
  • Specify SANITIZER_RENAME_SYMBOL in diffutils
  • Specify SANITIZER_RENAME_SYMBOL in grep
  • Specify SANITIZER_RENAME_SYMBOL in cvs
  • Specify SANITIZER_RENAME_SYMBOL in chpass
  • Include for offsetof(3)
  • Avoid UB in tmux/window_copy_add_formats()
  • Document sanitizers in acronyms.comp
  • Add TODO.sanitizer
  • Avoid misaligned access in disklabel(8) in find_label() (patch by Christos Zoulas)
  • Improve the * operator handling in expr(1)
  • Add a couple of new ATF expr(1) tests
  • Add a missing check to handle correctly 0 * 0 in expr(1)
  • Add 3 more expr(1) ATF tests detecting overflow

Changes merged with the LLVM projects

  • LLVM: Handle NetBSD specific path in findDebugBinary()
  • compiler-rt: Disable recursive interceptors in signal(3)/MSan
  • Introduce CheckASLR() in sanitizers

Plan for the next milestone

The ptrace(2) tasks have been preempted by the suspended work on sanitizers, in order to actively collaborate with the Google Summer of Code students (libFuzzer integration with userland, KUBSan, KASan).

I have planned the following tasks before returning back to the ptrace(2) fixes:

  • upgrade base Clang/LLVM, libcxx, libcxxabi to at least 7svn (HEAD) (needs cooperation with Joerg Sonnenberger)
  • compiler-rt import and integration with base (needs cooperation with Joerg Sonnenberger)
  • merge TSan, MSan and libFuzzer ATF tests
  • prepare MKSANITIZER readme
  • kernel-asan port
  • kernel-ubsan port
  • switch syscall(2)/__syscall(2) to libc calls
  • upstream local patches, mostly to compiler-rt
  • develop fts(3) interceptors (MSan, for ls(1), find(1), mtree(8)
  • investigate and address the libcxx failing tests on NetBSD
  • no-ASLR boot.cfg option, required for MKSANITIZER
My plan for the next milestone is to reduce the the list and keep actively collaborating with the summer students.

This work was sponsored by The NetBSD Foundation.

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can:

http://netbsd.org/donations/#how-to-donate

Posted Monday afternoon, July 2nd, 2018 Tags:

The NetBSD Project is pleased to announce NetBSD 8.0 RC 2, the second (and hopefully final) release candidate for the upcoming NetBSD 8.0 release.

Unfortunately the first release candidate did not hold up in our extensive testing (also know as eating our own dog food): many NetBSD.org servers/machines were updated to it and worked fine, but the auto build cluster, where we produce our binaries, did not work well. The issue was tracked down to a driver bug (Intel 10 GBit ethernet), only showing up in certain configurations, and it has been fixed now.

Other security events, like the new FPU related exploit on some Intel CPUs, caused further kernel changes, so we are not going to release NetBSD 8.0 directly, but instead provide this new release candidate for additional testing.

The official RC2 announcement list these major changes compared to older releases:

  • USB stack rework, USB3 support added
  • In-kernel audio mixer
  • Reproducible builds
  • Full userland debug information (MKDEBUG) available. While most install media do not come with them (for size reasons), the debug and xdebug sets can be downloaded and extracted as needed later. They provide full symbol information for all base system and X binaries and libraries and allow better error reporting and (userland) crash analyzis.
  • PaX MPROTECT (W^X) memory protection enforced by default on some architectures with fine-grained memory protection and suitable ELF formats: i386, amd64, evbarm, landisk, pmax
  • PaX ASLR enabled by default on:
    i386, amd64, evbarm, landisk, pmax, sparc64
  • MKPIE (position independent executables) by default for userland on: i386, amd64, arm, m68k, mips, sh3, sparc64
  • added can(4), a socket layer for CAN busses
  • added ipsecif(4) for route-based VPNs
  • made part of the network stack MP-safe
  • NET_MPSAFE kernel option is required to try
  • WAPBL stability and performance improvements

Specific to i386 and amd64 CPUs:
  • Meltdown mitigation: SVS (separate virtual address spaces)
  • Spectre mitigation (support in gcc, used by default for kernels)
  • Lazy cpu saving disabled on some Intel CPUs ("eagerfpu")
  • SMAP support
  • (U)EFI bootloader

Various new drivers:
  • nvme(4) for modern solid state disks
  • iwm(4), a driver for Intel Wireless devices (AC7260, AC7265, AC3160...)
  • ixg(4): X540, X550 and newer device support.
  • ixv(4): Intel 10G Ethernet virtual function driver.
  • bta2dpd - new Bluetooth Advanced Audio Distribution Profile daemon

Many evbarm kernels now use FDT (flat device tree) information (loadable at boot time from an external file) for device configuration, the number of kernels has decreased but the numer of boards has vastly increased.

Lots of updates to 3rd party software included:
  • GCC 5.5 with support for Address Sanitizer and Undefined Behavior Sanitizer
  • GDB 7.12
  • GNU binutils 2.27
  • Clang/LLVM 3.8.1
  • OpenSSH 7.6
  • OpenSSL 1.0.2k
  • mdocml 1.14.1
  • acpica 20170303
  • ntp 4.2.8p11-o
  • dhcpcd 7.0.6
  • Lua 5.3.4

The NetBSD developers and the release engineering team have spent a lot of effort to make sure NetBSD 8.0 will be a superb release, but we have not yet fixed most of the accompanying documentation. So the included release notes and install documents will be updated before the final release, and also the above list of major items may lack important things.

Get NetBSD 8.0 RC2 from our CDN (provided by fastly) or one of the ftp mirrors.

Complete source and binaries for NetBSD are available for download at many sites around the world. A list of download sites providing FTP, AnonCVS, and other services may be found at http://www.NetBSD.org/mirrors/.

Please test RC2, so we can make the final release the best one ever so far. We are looking forward to your feedback. Please send-pr any bugs or mail us at releng at NetBSD.org for more general comments.

[0 comments]

Posted Monday evening, July 2nd, 2018 Tags:

Prepared by Siddharth Muralee (@Tr3x__) as a part of GSoC'18

I have been working on porting the Kernel Address Sanitizer(KASAN) for the NetBSD kernel. This summarizes the work done until the second evaluation.

Refer here for the link to the first report.

What is a Kernel Address Sanitizer?

The Kernel Address Sanitizer or KASAN is a fast and efficient memory error detector designed by developers at Google. It is heavily based on compiler optimization and has been very effective in reporting bugs in the Linux Kernel.

The aim of my project is to build the NetBSD kernel with the KASAN and use it to find bugs and improve code quality in the kernel. This Sanitizer will help detect a lot of memory errors that otherwise would be hard to detect.

Porting code from Linux to NetBSD

The design of KASAN in the NetBSD kernel is based on its Linux counterpart. Linux code is GPL licensed hence we intend to rewrite it completely or/and relicense certain code parts. We will be handling this once we have a working prototype ready.

This is in no way an easy task especially when the code we try to port is from multiple areas in the kernel like the Memory management system, Process Management etc.

The total port requires a transfer of around 3000 lines in around 6 files with references in around 20 other locations or more.

Design of KASAN and how it works

Kernel Address Sanitizer works by instrumenting all the memory accesses and having a separate "shadow buffer" to keep track of all the addresses that are legitimate and accessible and complains (Very Descriptively!!) when the kernel reads/writes elsewhere.

The basic idea behind Kernel ASan is to set aside a map/buffer where each byte in the kernel is represented by using a bit. This means the size of the buffer would be 1/8th of the total memory accessible by the kernel. In amd64(also x86_64) this would mean setting aside 16TB of memory to handle a total of 128TB of kernel memory.

Implementation Outline

A bulk of the work is done by the compiler inserted code itself(GCC as of now), but still there are a lot of features we have to implement.

  • Checking and reporting Infrastructure
  • Allocation and population of the Shadow buffer during boot
  • Modification of Allocators to update the Shadow buffer upon allocations and deallocations

Kernel Address Sanitizer is useful in finding bugs/coding errors in the kernel such as :

  • Use - after - free
  • Stack, heap and global buffer overflows
  • Double free
  • Use - after - scope

The design makes it faster than other tools such as kmemcheck etc. The average slowdown is expected to be around ~2x times or less.

KASAN Initialisation

KASAN initialization happens in two stages -

  • early in the boot stage, we set each page entry of the entire shadow region to zero_page (early_kasan_init)
  • after the physical memory has been mapped and the pmap(9) has been bootstrapped during kernel startup, the zero_pages are unmapped and the real pages are allocated and mapped (kasan_init).

Below is a short description of what kasan_init() does in Linux code :

  • It loads the kernel boot time page table and clears all the page table entries for the shadow buffer region which had been populated with zero_pages during early_kasan_init.
  • It marks shadow buffer offsets of parts of kernel memory; which we don't want to track or are prohibited, by populating them using kasan_populate_zero_shadow which iterates through all the page tables.
  • Write-protects the mappings and flushes the TLB.

Allocating the shadow buffer

Instead of iterating through the page table entries as Linux preferred to do, we decided to use our low-level kernel memory allocators to do the job for us. This helped in reducing the code complexity and allowed us to reduce the size of the code by a significant amount.

One may ask then does that allocator need to be sanitized? We propose to add a kasan_inited variable which would help the sanitization to occur after the initialization.

We are still in the process of testing this part.

Shadow translation (Address Sanitizer Algorithm)

The translation from a memory address to the corresponding shadow offset must be done pretty fast since it happens during every memory read/write. This is implemented similar to the below code

shadow_address = KmemToShadow(address);
void * KmemToShadow(void * addr) {
return (addr >> Shadow_scale) + Shadow_buffer_start;
}

The reverse shadow offsets to kernel memory addresses function is also similar to this.

The shadow translation functions have already been implemented and can be found in kasan.h in my Github repository.

Error Detection

Every read/write is instrumented to have a check which would decide if the memory access was legitimate or not. This would be done in the manner shown below.

shadow_address = KmemToShadow(address);
if (IsPoisoned(shadow_address)) {
ReportError(address, Size, IsWrite);
}

The actual implementation of the Error detection is a bit more complex since we have to include the mapping aspect as well.

Each byte of shadow buffer memory maps to a qword(8 bytes) of kernel memory. Because of which poisoned memory(*shadow_address) values have only 3 possibilities :

  • The value can be 0 ( Meaning that all 8 bytes are unpoisoned )
  • The value can be -ve ( Meaning that all 8 bytes are poisoned )
  • The value can have first k bits unpoisoned and the rest (8 - k) poisoned

Therefore we can use the value also to help assist us while doing Error detection.

Basic Bug Report

The information about each bug is stored in struct kasan_access_info which is then used to determine the following information

  • The kind of bug
  • Whether read/write caused it
  • Process ID of the task being executed
  • The address which caused the error

We also print the stack backtrace which helps in identifying the function with the bug and also helps in finding the execution flow which caused the bug.

One of the best features is that we will be able to use the address where the error occurred to show the poisoning in the shadow buffer. This diagram will be pretty useful for developers trying to fix the bugs found by KASAN.

Unfortunately, since we haven't finished modifying the allocators to update the shadow buffer on read/write we will not be able to test this as of now.

Summary

I have managed to get a good initial grasp of the internals of NetBSD kernel over the last two months.

I would like to thank my mentor Kamil for his constant support and valuable suggestions. A huge thanks to the NetBSD community who have been supportive throughout.

Most of my work is done on my fork of NetBSD.

Work left to be done

There is a lot of important features that still remains to be implemented. Below is the list of features that I will be working on.

  • Solve licensing issues
  • sysctl switches to tune options of kern_asan.c (quarantine size, halt_on_error etc)
  • Move the KASAN code to src/sys/kernel and the MI part call kern_asan.c (similar to kern_ubsan.c)
  • Ability to run concurrently KUBSAN & KASAN
  • Refactor kasan_depth and in_ubsan to be shared between sanitizers: probably as a bit in private LWP bitfield
  • ATF tests verifying KASAN's detection of bugs
  • The first boot to a functional shell of a kernel executing with KASAN
  • Finish execution of ATF tests with a kernel running with KASAN
  • Quarantine List
  • Report generation
  • Continue execution
  • Allocator hooks and functions
  • Memory hotplug
  • Kernel module shadowing
  • Quarantine for reusable structs like LWP
Posted late Wednesday afternoon, July 11th, 2018 Tags:

Prepared by Yang Zheng (tomsun.0.7 AT Gmail DOT com) as part of GSoC 2018

This is the second part of the project of integrating libFuzzer for the userland applications, you can learn about the first part of this project in this post.

After the preparation of the first part, I started to fuzz the userland programs with the libFuzzer. The programs we chose are five:

  1. expr(1)
  2. sed(1)
  3. sh(1)
  4. file(1)
  5. ping(8)

After we fuzzed them with libFuzzer, we also tried other fuzzers, i.e.: American Fuzzy Lop (AFL), honggfuzz and Radamsa.

Fuzz Userland Programs with libFuzzer

LLVM Logo
"LLVM Logo" by Teresa Chang / All Right Retained by Apple

In this section, I'll introduce how to fuzz the five programs with libFuzzer. The libFuzzer is an in-process, coverage-guided fuzzing engine. It can provide some interfaces to be implemented by the users:

  • LLVMFuzzerTestOneInput: fuzzing target
  • LLVMFuzzerInitialize: initialization function to access argc and argv
  • LLVMFuzzerCustomMutator: user-provided custom mutator
  • LLVMFuzzerCustomCrossOver: user-provided custom cross-over function
In the above functions, only the LLVMFuzzerTestOneInput is necessary to be implemented for any fuzzing programs. This function takes a buffer and the buffer length as input, it is the target to be fuzzed again and again. When the users want to finish some initialization job with argc and argv parameters, they also need to implement LLVMFuzzerInitialize. With LLVMFuzzerCustomMutator and LLVMFuzzerCustomCrossOver, the users can also change the behaviors of producing input buffer with one or two old input buffers. For more details, you can refer to this document.

Fuzz Userland Programs with Sanitizers

libFuzzer can be used with different sanitizers. It is quite simple to use sanitizers together with libFuzzer, you just need to add sanitizer names to the option like -fsanitize=fuzzer,address,undefined. However, memory sanitizer seems to be an exception. When we tried to use it together with libFuzzer, we got some runtime errors. The official document has mentioned that "using MemorySanitizer (MSAN) with libFuzzer is possible too, but tricky", but it doesn't mention how to use it properly.

In the following part of this article, you can assume that we have used the address and undefined sanitizers together with fuzzers if there is no explicit description.

Fuzz expr(1) with libFuzzer

The expr(1) takes some parameters from the command line as input and then treat the command line as a whole expression to be calculated. A example usage of the expr(1) would be like this:

    $ expr 1 + 1
    2
  
This program is relatively easy to fuzz, what we only to do is transform the original main function to the form of LLVMFuzzerTestOneInput. Since the implementation of the parser in expr(1) takes the argc and argv parameters as input, we need to transform the buffer provided by the LLVMFuzzerTestOneInput to the format needed by the parser. In the implementation, I assume the buffer is composed of several strings separated by the space characters (i.e.: ' ', '\t' and '\n'). Then, we can split the buffer into different strings and organize them into the form of argc and argv parameters.

However, there comes the first problem when I start to fuzz expr(1) with this modification. Since the libFuzzer will treat every exit as an error while fuzzing, there will be a lot of false positives. Fortunately, the implementation of expr(1) is simple, so we only need to replace the exit(3) with the return statement. In the fuzzing process of other programs, I'll introduce how to handle the exit(3) and other error handling interfaces elegantly.

You can also pass the fuzzing dictionary file (to provide keywords) and initial input cases to the libFuzzer, so that it can produce test cases more smartly. For expr(1), the dictionary file will be like this:

    min="-9223372036854775808"
    max="9223372036854775807"
    zero="0"
    one="1"
    negone="-1"
    div="/"
    mod="%"
    add="+"
    sub="-"
    or="|"
    add="&"
  
And there is only one initial test case:
    1 / 2
  

With this setting, we can quickly reproduce an existing bug which has been fixed by Kamil Rytarowski in this patch, that is, when you try to feed one of -9223372036854775808 / -1 or -9223372036854775808 % -1 expressions to expr(1), you will get a SIGFPE. After adopting the fix of this bug, it also detected a bug of integer overflow by feeding expr(1) with 9223372036854775807 * -3. This bug is detected with the help of undefined sanitizer (UBSan). This has been fixed in this commit. The fuzzing of expr(1) can be reproduced with this script.

Fuzz sed(1) with libFuzzer

The sed(1) reads from files or standard input (stdin) and modifying the input as specified by a list of commands. It is more complicated than the expr(1) to be fuzzed as it can receive input from several sources including command line parameters (commands), standard input (text to be operated on) and files (both commands and text). After reading the source code of sed(1), I have two findings:

  1. The commands are added by the add_compunit function
  2. The input files (including standard input) are organized by the s_flist structure and the mf_fgets function
With these observations, we can manually parse the libFuzzer buffer with the interfaces above. So I organized the buffer as below:
    command #1
    command #2
    ...
    command #N
        // an empty line
    text strings
  
The first several lines are the commands, one line for one command. Then there will be an empty line to identify the end of command lists. At last, the remaining part of this buffer is the text to be operated on. After parsing the buffer like this, we can add the commands one by one with the add_compunit interface. For the text, since we can directly get the whole text buffer as the format of a buffer, I re-implement the mf_fgets interface to get the input directly from the buffer provided by the libFuzzer.

As mentioned before in the fuzzing of expr(1), exit(3) will result in false positives with libFuzzer. Replacing the exit(3) with return statement can solve this problem in expr(1), but it will not work in sed(1) due to the deeper function call stack. The exit(3) interface is usually used to handle the unexpected cases in the programs. So, it will be a good idea to replace it with exceptions. Unfortunately, the programs we fuzzed are all implemented in C language instead of C++. Finally, I choose to use setjmp/longjmp interfaces to handle it: use the setjmp interface to define an exit point in the LLVMFuzzerTestOneInput function, and use longjmp to jmp to this point whenever the original implementation wants to call exit(3).

The dictionary file for it is like this:

    newline="\x0A"
    "a\\\"
    "b"
    "c\\\"
    "d"
    "D"
    "g"
    "G"
    "h"
    "H"
    "i\\\"
    "l"
    "n"
    "N"
    "p"
    "P"
    "q"
    "t"
    "x"
    "y"
    "!"
    ":"
    "="
    "#"
    "/"
  
And here is an initial test case:
    s/hello/hi/g

    hello, world!
  
which means replacing the "hello" into "hi" in the text of "hello, world!". The fuzzing script of sed(1) can be found here.

Fuzz sh(1) with libFuzzer

sh(1) is the standard command interpreter for the system. I choose the evalstring function as the fuzzing entry for sh(1). This function takes a string as the commands to be executed, so we can directly pass the libFuzzer input buffer to this function to start fuzzing. The dictionary file we used is like this:

    "echo"
    "ls"
    "cat"
    "hostname"
    "test"
    "["
    "]"
  
We can also add some other commands and shell script syntax to this file to reproduce other conditions. And also an initial test case is provided:
    echo "hello, world!"
  
You can also reproduce the fuzzing of sh(1) by this script.

Fuzz file(1) with libFuzzer

The fuzzing of file has been done by Christos Zoulas in this project. The difference between this program and other programs from the list is that the main functionality is provided by the libmagic library. As a result, we can directly fuzz the important functions (e.g.: magic_buffer) from this library.

Fuzz ping(8) with libFuzzer

The ping(8) is quite different from all of the programs mentioned above, the main input source is from the network instead of the command line, standard input or files. This challenges us a lot because we usually use the socket interface to receive network data and thus more complex to transform a single buffer into the socket model.

Fortunately, the ping(8) organizes all the network interfaces as the form of hooks to be registered in a structure. So I re-implement all these necessary interfaces (including socket(2), recvfrom(2), sendto(2), poll(2) and etc.) for ping(8).These re-implemented interfaces will take the data from the libFuzzer buffer and transform it into the data to be accessed by the network interfaces. After that, then we can use libFuzzer to fuzz the network data for ping(8). The script to reproduce can be found here.

Fuzz Userland Programs with Other Fuzzers

To compare libFuzzer with other fuzzers from different aspects, including the effort to modify, performance and functionalities, we also fuzzed these five programs with AFL, honggfuzz and radamsa.

Fuzz Programs with AFL and honggfuzz

The AFL and honggfuzz can fuzz the input from standard input and file. They both provide specific compilers (such as afl-cc, afl-clang, hfuzz-cc, hfuzz-clang and etc.) to fuzz programs with coverage information. So, the basic process to fuzz programs with them is to:

  1. Use the specific compilers to compile programs with necessary sanitizers
  2. Run the fuzzed programs with proper command line parameters
For detailed parameters, you can refer to the scripts for expr(1), sed(1), sh(1), file(1) and ping(8).

Miniature Lop
"Miniature Lop" (A kind of fuzzy lop) from Wikipedia / CC BY-SA 3.0

There is no need to do any modification to fuzz sed(1), sh(1) and file(1) with AFL and honggfuzz, because these programs mainly get input from standard input or files. But this doesn't mean that they can achieve the same functionalities as libFuzzer. For example, to fuzz the sed(1), you may also need to pass the commands in the command line parameters. This means that you need to manually specify the commands in the command line and you cannot fuzz them with AFL and honggfuzz, because they can only fuzz input from standard input and files. There is an option of reusing the modifications from the fuzzing process with libFuzzer, but we need to further add a main function for the fuzzed program.

Höngg
"Höngg" (A quarter in district 10 in Zürich) by Ikiwaner / CC BY-SA 3.0

For expr(1) and ping(8), we even need more modifications than the libFuzzer solution, because expr(1) mainly gets input from command line parameters and ping(8) mainly gets input from the network.

During this period, I have also prepared a package to install honggfuzz for the pkgsrc-wip repository. To make it compatible with NetBSD, we have also contributed to improving the code in the official repository, for more details, you can refer to this pull request.

Fuzz Programs with Radamsa

Radamsa is a test case generator, it works by reading sample files and generating different interesting outputs. Radamsa is not dependant on the fuzzed programs, it is only dependant on the input sample, which means it will not record the coverage information.

Moomins
"The Moomins" ("Radamsa" is a word spoken by a creature in Moomins) from the comic book cover by Tove Jansson

With Radamsa, we can use scripts to fuzz different programs with different input sources. For the expr(1), we can generate the mutated string and store it to a variable in the shell script and then feed it to the expr(1) in command line parameters. For the sed(1), we can generate both command strings and text by Radamsa and then feed them by command line parameters and file separately. For both sh(1) and file(1), we can generate the needed input file by Radamsa in the shell scripts.

It seems that the shell script and Radamsa combination can fuzz any kinds of programs, but it encounters some problems with ping(8). Although Radamsa supports generating input cases as a network server or client, it doesn't support the ICMP protocol. This means that we can not fuzz ping(8) with modifications or help from other applications.

Comparison Among Different Fuzzers

In this project, we have tried four different fuzzers: libFuzzer, AFL, honggfuzz and Radamsa. In this section, I will introduce a comparison from different aspects.

Modification of Fuzzing

For the programs we mentioned above, here I list the lines of code we need to modify as a factor of porting difficulties:

expr(1) sed(1) sh(1) file(1) ping(8)
libFuzzer 128 96 60 48 582
AFL/honggfuzz 142 0 0 0 590
Radamsa 0 0 0 0 N/A
As mentioned before, the libFuzzer needs to modify more lines for programs who mainly get input from standard input and files. However, for other programs (i.e.: expr(1) and ping(8)), the AFL and honggfuzz need to add more lines of code to get input from these sources. As for Radamsa, since it only needs the sample input data to generate outputs, it can fuzz all programs without modifications except ping(8).

Binary Sizes

The binary sizes for these fuzzers should also be considered if we want to ship them with NetBSD. The following binary sizes are based on the NetBSD-current with the nearly newest LLVM (compiled from source) as an external toolchain:

Dependency Compilers Fuzzer Tools Total
libFuzzer 0 56MB N/A 0 56MB
AFL 0 24KB 292KB 152KB 468KB
honggfuzz 36KB 840KB 124KB 0 1000KB
Radamsa 588KB 0 608KB 0 1196KB
The above table shows the space needed to install different fuzzers. The "Dependency" column shows the size of dependant library; the "Compilers" column shows the size of compilers used for re-compiling fuzzed programs; the "Fruzzer" column shows the size of fuzzer itself and the "Tools" column shows the size of analysis tools.

For the libFuzzer, if the system has already included the LLVM together with compiler-rt as the toolchain, we don't need extra space to import it. The fuzzer of libFuzzer is compiled together with the user's program, so the size is not counted. The compiler size shown above in this table is the size of statically compiled compiler clang. If we compile it dynamically, then there will be a plenty of dependant libraries should be considered. For the AFL, there is no dependant library except libc, so the size is zero. It will also introduce some tools like afl-analyze, afl-cmin and etc. The honggfuzz is dependant on the libBlocksRuntime library whose size is 36KB. This library is also included in the compiler-rt of LLVM. So, if you have already installed it, this size can be ignored. As for the Radamsa, it needs the Owl Lisp during the building process. So the size of the dependency is the size of Owl Lisp interpreter.

Compiler Compatibility

All these fuzzers except libFuzzer are compatible with both GCC and clang. The AFL and honggfuzz provide a wrapper for the native compiler, and the Radamsa does not care about the compilers. As for the libFuzzer, it is implemented in the compiler-rt of LLVM, so it cannot support the GCC compiler.

Support for Sanitizers

All these fuzzers can work together with sanitizers, but only the libFuzzer can provide a relatively strong guarantee that it can provide them. The AFL and honggfuzz, as I mentioned above, provide some wrappers for the underlying compiler. This means that it is dependant on the native compiler to decide whether they can fuzz the programs with the support of sanitizers. The Radamsa can only fuzz the binary directly, so the programs should be compiled with the sanitizers first. However, since the sanitizers are in the compiler-rt together with libFuzzer, you can directly add some flags of sanitizers while compiling the fuzzed programs.

Performance

At last, you may wonder how fast are those fuzzers to find an existing bug. For the above programs we have fuzzed in NetBSD, only libFuzzer can find two bugs for the expr(1). However, we cannot assert that the libFuzzer performs well than others. To further evaluate the performance of different fuzzers we have used, I choose some simple functions with bugs to measure how fast they can find them out. Here is a table to show the time for them to find the first bug:

libFuzzer AFL honggfuzz Radamsa
DivTest+S <1s 7s 1s 7s
DivTest >10min >10min 2s >10min
SimpleTest+S <1s >10min 1s >10min
SimpleTest <1s >10min 1s >10min
CxxStringEqTest+S <1s >10min 2s >10min
CxxStringEqTest >10min >10min 2s >10min
CounterTest+S 1s 5min 1s 7min
CounterTest 1s 4min 1s 7min
SimpleHashTest+S <1s 3s 1s 2s

The "+S" symbol means the version with sanitizers (in this evaluation, I used address and undefined sanitizers). In this table, we can observe that libFuzzer and honggfuzz perform better than others in most cases. And another point is that fuzzers can work better with sanitizers. For example, in the case of DivTest, the primary goal of this test is to trigger a "divide-by-zero" error, however, when working with the undefined sanitizer, all these fuzzers will trigger the "integer overflow" error more quickly. I only present a part of the interesting results of this evaluation here. You can refer to this script to reproduce some results or do more evaluation by yourself.

Summary

In the past one month, I mainly contributed to:

  1. Porting the libFuzzer to NetBSD
  2. Preparing a pkgsrc-wip package for honggfuzz
  3. Fuzzing some userland programs with libFuzzer and other three different fuzzers
  4. Evaluating different fuzzers from different aspects
Regarding the third contribution, I tried to use different methods to handle them according to their features. During this period, I have fortunately found two bugs for the expr(1).

I'd like to thank my mentor Kamil Rytarowski and Christos Zoulas for their suggestions and proposals. I also want to thank Kamil Frankowicz for his advice on fuzzing and playing with AFL. At last, thanks to Google and the NetBSD community for giving me a good opportunity to work on this project.

Posted early Friday morning, July 13th, 2018 Tags:

On July 7th and 8th there was pkgsrcCon 2018 in Berlin, Germany. It was my first pkgsrcCon and it was really really nice... So, let's share a report about it, what we have done, the talk presented and everything else!

Friday (06/07): Social Event

I arrived by plane at Berlin Tegel Airport in the middle of the afternoon. TXL buses were pretty full but after waiting for 3 of them, I was finally in the direction for Berlin Hauptbahnhof (nice thing about the buses is that after many are getting too full they start to arrive minute after minute!) and then took the S7 for Berlin Jannowitzbrücke station, just a couple of minutes on foot to republik-berlin (for the Friday social event).

On 18:00 we met in republik-berlin for the social event. We had good burgers there and one^Wtwo^Wsome beers together!

The place were a bit noisy for the Belgium vs Brazil World Cup match, but we still had nice discussions together (and also without losing a lot of people cheering on! :))

There was also a table tennis table and spz, maya, youri and myself played (I'm a terrible table tennis player but it was very funny to play the wild west without any rules! :)).

Saturday (07/07): Talks session

Meet & Greet -- Pierre Pronchery (khorben), Thomas Merkel (tm)

Pierre and Thomas welcomed us (aliens! :)) in c-base. c-base is a space station under Berlin (or probably one of the oldest hackerspace, at least old enough that the word "hackerspace" even didn't existed!).

Slides (PDF) are available!

Keynote: Beautiful Open Source -- Hugo Teso

Hugo talked about his experience as an open source developer and focused in particular how important is the user interface.

He discussed that examinating some projects he worked on: Inguma, Bokken, Iaitö and Cutter extracting patterns about his experience.

Slides (PDF) are available!

The state of desktops in pkgsrc -- Youri Mouton (youri)

Youri discussed about the state of desktop environments (DE) in pkgsrc starting with xfce, MATE, LXDE, KDE and Defora.

He then discussed about the WIP desktop environments: Cinnamon, LXQT, Gnome 3 and CDE, hardware support and login managers.

Especially for the WIP desktop environments help is more than welcomed so if you're interested in any of that, would like to help (that's also a great way to start involved in pkgsrc!) please get in touch with youri and/or give a look at the wip/*/TODO files in pkgsrc-wip!

NetBSD & Mercurial: One year later -- Jörg Sonnenberger (joerg)

Jörg started discussing about Git (citing High-level Problems with Git and How to Fix Them - Gregory Szorc) and then discussed on why using Mercurial.

Then he announced the latest changes: hgmaster.NetBSD.org and anonhg.NetBSD.org that permits to experiment with Mercurial and source-changes-hg@ and pkgsrc-changes-hg@ mailing lists.

The talk ended describing missing/TODO steps.

Slides (HTML) are available!

Maintaining qmail in 2018 -- Amitai Schleier (schmonz)

Amitai shared his long experience in maintaining qmail.

A lot of lesson learned in doing that were shared and it was also funny to see that at a certain point from MAINTAINER he was more and more involved doing that and ending up writing patches and tools for qmail.

Slides (HTML) are available!

A beginner's introduction to GCC -- Maya Rashish (maya)

Maya discussed about GCC. First she talked about an overview of the toolchain (in general) and the corresponding GCC projects, how to pass flags to each of them and how to stop the compilation process for each of them.

Then she talked about the black magic that happens in preprocessor, for example, what a program does an #include <math.h> and why __NetBSD__ is defined.

We then saw that with -save-temps is possible to save all intermediary results and how this is very helpful to debug possible problems.

Compiler, assembler and linker were then discussed. We have also seen specfiles, readelf and other GCC internals.

Slides (HTML) are available!

Handling the workflow of pkgsrc-security -- Leonardo Taccari (leot)

I discussed about the workflow of the pkgsrc Security Team (pkgsrc-security).

I gave a brief introduction to nmh (new MH) message handling system.

Then talked about the mission, tasks and workflow of the pkgsrc-security.

For the last part of the talk, I tried to put everything together and showed how to try to automate some part of the pkgsrc-security with nmh and some shell scripting.

Slides (PDF) are available!

Preaching for releng-pkgsrc -- Benny Siegert (bsiegert)

Benny discussed about pkgsrc Releng team (releng-pkgsrc).

The talk started discussing about the pkgsrc Quarterly Releases. Since 2003Q4, every quarter a new pkgsrc release is released. Stable releases are the basis for binary packages. Security, build and bug fixes get applied over the liftime of the release via pullups, until the next quarterly release. The release procedure and freeze period were also discussed.

Then we examined the life of a pullup. Benny first introduced what a pullup is, the rules for requesting them and a practical example of how to file a good pullup request. Under the hood parts of releng were also discussed, for example how tickets are handled with req, help script to ease the pullup, etc..

The talk concluded with the importance of releng-pkgsrc and also a call for volunteers to join releng-pkgsrc! (despite they're really doing a great work, at the moment there is a shortage of members in releng-pkgsrc, so, if you are interested and would like to join them please get in touch with them!)

Something old, something new, something borrowed -- Sevan Janiyan (sevan)

Sevan discussed about the state of NetBSD/macppc port.

Lot of improvements and news happened (a particular kudos to macallan for doing an amazing work on the macppc port!)! HEAD-llvm builds for macppc were added; awacs(4) Bluetooth support, IPsec support, Veriexec support are all enabled by default now.

radeonfb(4) and XCOFF boot loader had several improvements and now DVI is supported on the G4 Mac Mini.

The other big news in the macppc land is the G5 support that will probably be interesting also for possible pkgsrc bulk builds.

Sevan also discussed about some current problems (and workarounds!), bulk builds takes time, no modern browser with JavaScript support is easily available right now but also how using macppc port helped to spot several bugs.

Then he discussed about Upspin (please also give a look to the corresponding package in wip/go-upspin!)

Slides (PDF) are available!

Magit -- Christoph Badura (bad)

Christoph talk was a live introduction to Magit, a Git interface for Emacs.

The talk started quoting James Mickens It Was Never Going to Work, So Let's Have Some Tea talk presented at USENIX LISA15 when James Mickens talked about an high level picture of how Git works.

We then saw how to clone a repository inside Magit, how to navigate the commits, how to create a new branch, edit a file and look at unstaged changes, stage just some hunks of a change and commit them and how to rebase them (everything is just one or two keystrokes far!).

Post conf dinner

After the talks we had some burgers and beers together at Spud Bencer.

We formed several groups to go there from c-base and I was actually in the group that went there on foot so it was also a nice chance to sightsee Berlin (thanks to khorben for being a very nice guide! :)).

Sunday (08/07): Hacking session

An introduction to Forth -- Valery Ushakov (uwe)

On Sunday morning Valery talked about Forth from the ground up.

We saw how to implement a Forth interpreter step by step and discussed threaded code.

Unfortunately the talk was not recorded... However, if you are curious I suggest taking a look to nbuwe/forth BitBucket repository. internals.txt file also contains a lot of interesting resources about Forth.

Learning about Forth from uwe !@netbsd #pkgsrcCon

Hacking session

After Valery talk there was the hacking session where we hacked on pkgsrc, discussed together, etc..

Late in the afternoon some of us visited Computerspielemuseum.

More than 50 years of computer games were covered there and it was fun to also play to several historical and also more recent video games.

We then met again for a dinner together in Potsdamer Platz.

Group photograph of the pkgsrcCon 2018 kindly taken by Gilberto Taccari

Conclusion

pkgsrcCon 2018 was really really great!

First of all I would like to thank all the pkgsrcCon organizers: khorben and tm. It was very well organized and everything went well, thank you Pierre and Thomas!

A big thank you also to wiedi, just after few hours all the recordings of the talk were shared and that's really impressive!

Thanks also to youri and Gilberto for photographs.

Last, but not least, thanks to The NetBSD Foundation for supporting three developers to attend the conference. c-base for kindly providing a very nice location for the pkgsrcCon. Our sponsors: Defora Networks for sponsoring the t-shirts and badges for the conference and SkyLime for sponsoring the catering on Saturday.

Thank you!

Posted early Saturday morning, July 14th, 2018 Tags:

Prepared by Keivan Motavalli as part of GSoC 2018.

Packages may install code (both machine executable code and interpreted programs), documentation and manual pages, source headers, shared libraries and other resources such as graphic elements, sounds, fonts, document templates, translations and configuration files, or a combination of them.

Configuration files are usually the mean through which the behaviour of software without a user interface is specified. This covers parts of the operating systems, network daemons and programs in general that don't come with an interactive graphical or textual interface as the principal mean for setting options.

System wide configuration for operating system software tends to be kept under /etc, while configuration for software installed via pkgsrc ends up under LOCALBASE/etc (e.g., /usr/pkg/etc).

Software packaged as part of pkgsrc provides example configuration files, if any, which usually get extracted to LOCALBASE/share/examples/PKGBASE/.

After a package has been extracted pre-pending the PREFIX(/LOCALBASE?) to relative file paths as listed in the PLIST file, metadata entries (such as +BUILD_INFO, +DESC, etc) get extracted to PKG_DBDIR/PKGNAME-PKGVERSION (creating files under /usr/pkg/pkgdb/tor-0.3.2.10, as an example).

Some shell script also get extracted there, such as +INSTALL and +DEINSTALL. These incorporate further snippets that get copied out to distinct files after pkg_add executes the +INSTALL script with UNPACK as argument.

Two main frameworks exist taking care of installation and deinstallation operations: pkgtasks, still experimental, is structured as a library of POSIX-compliant shell scripts implementing functions that get included from LOCALBASE/share/pkgtasks-1 and called by the +INSTALL and +DEINSTALL scripts upon execution.

Currently pkgsrc defaults to using the pkginstall framework, which as mentioned copies out from the main file separate, monolithic scripts handling the creation and removal of directories on the system outside the PKGBASE, user accounts, shells, the setup of fonts... Among these and other duties, +FILES ADD, as called by +INSTALL, copies with correct permissions files from the PKGBASE to the system, if required by parts of the package such as init scripts and configuration files.

Files to be copied are added as comments to the script at package build time, here's an example:

# FILE: /etc/rc.d/tor cr share/examples/rc.d/tor 0755
# FILE: etc/tor/torrc c share/examples/tor/torrc.sample 0644

"c" indicates that LOCALBASE/share/examples/rc.d/tor is to be copied in place to /etc/rc.d/tor with permissions 755, "r" that it is to be handled as an rc.d script.

LOCALBASE/share/examples/tor/torrc.sample, the example file coming with default configuration options for the tor network daemon, is to be copied to LOCALBASE/etc/tor/torrc.

As of today, this only happens if the package has never been installed before and said configuration file doesn't already exist on the system, this to avoid overwriting explicit option changes made by the user (or site administrator) when upgrading or reinstalling packages.

Let's see where how it's done... actions are defined under case switches:

case $ACTION in
ADD)
        ${SED} -n "/^\# FILE: /{s/^\# FILE: //;p;}" ${SELF} | ${SORT} -u |
        while read file f_flags f_eg f_mode f_user f_group; do
	…
	case "$f_flags:$_PKG_CONFIG:$_PKG_RCD_SCRIPTS" in
	*f*:*:*|[!r]:yes:*|[!r][!r]:yes:*|[!r][!r][!r]:yes:*|*r*:yes:yes)
	if ${TEST} -f "$file"; then
		${ECHO} "${PKGNAME}: $file already exists"
	elif ${TEST} -f "$f_eg" -o -c "$f_eg"; then
		${ECHO} "${PKGNAME}: copying $f_eg to $file"
		${CP} $f_eg $file
		[...]
[...]

Programs and commands are called using variables set in the script and replaced with platform specific paths at build time, using the FILES_SUBST facility (see mk/pkginstall/bsd.pkginstall.mk) and platform tools definitions under mk/tools.

In order to also store revisions of example configuration files in a version control system, +FILES needs to be modified to always store revisions in a VCS, and to attempt merging changes non interactively when a configuration file is already installed on the system.

In order to avoid breakage, installed configuration is backed up first in the VCS, separating user-modified files from files that have been already automatically merged in the past, in order to allow the administrator to easily restore the last manually edited file in case of breakage.

Branches are deliberately not used, since not everyone may wish to get familiar with version control systems technicalities when attempting to make a broken system work again.

Here's what the modified pkginstall +FILES script does when installing spamd:

		case "$f_flags:$_PKG_CONFIG:$_PKG_RCD_SCRIPTS" in
		*f*:*:*|[!r]:yes:*|[!r][!r]:yes:*|[!r][!r][!r]:yes:*|*r*:yes:yes)
		if ${TEST} "$_PKG_RCD_SCRIPTS" = "no" -a ! -n "$NOVCS"; then

VCS functionality only applies to configuration files, not to rc.d scripts, and only if the environment variable $NOVCS is unset. Set it to any value - yes will work :) - to disable the handling of configuration file revisions.

A small note: these options could, in the future, be parsed by pkg_add from some configuration file and passed calling setenv before executing +INSTALL, without the need to pass them as arguments and thus minimizing code path changes.

$VCSDIR is used to set a working directory for VCS functionality different from the default one, VARBASE/confrepo.

VCSDIR/automergedfiles is a textual list made by the absolute paths of installed configuration files already automatically merged in the past during package upgrades.

Manually remove entries from the list when you make manual configuration changes after a package has been automatically merged!

And don't worry: automatic merging is disabled by default, set $VCSAUTOMERGE to enable it.

When a configuration file already exists on the system, if it is absent from VCSDIR/automergedfiles, it is assumed to be user edited and copied to

VCSDIR/user/path/to/installed/file is a working file REGISTERed (added and committed) to the version control system.

Check it out and restore it from there in case of breakage!

If the file is about to get automatically merged, and the operation already succeeded in the past, then you can find automatically merged revisions of installed configuration files under VCSDIR/automerged/path/to/installed/file checkout the required revision!

A new script, +VERSIONING, handles operations such as PREPARE (checks that a vcs repository is initialized), REGISTER (adds a configuration file from the working directory to the repo), COMMIT (commit multiple REGISTER actions after all configuration has been handled by the +FILES script, for VCSs that support atomic transactions), CHECKOUT (checks out the last revision of a file to the working directory) and CHECKOUT-FIRST (checks out the first revision of a file).

The version control system to be used as a backend can be set through $VCS. It default to RCS, the Revision Control System, which works only locally and doesn't support atomic transactions.

It will get setup as a tool when bootstrapping pkgsrc on platforms that don't already come with it.

Other backends such as CVS are supported and more will come; these, being used at the explicit request of the administrator, need to be already installed and placed in a directory part of $PATH.

Let's see what happens with rcs when NOVCS is unset, installing spamd (for the first time).

cd pkgsrc/mail/spamd
# bmake
=> Bootstrap dependency digest>=20010302: found digest-20160304
===> Skipping vulnerability checks.
> Fetching spamd-20060330.tar.gz
[...]
bmake install
===> Installing binary package of spamd-20060330nb2
spamd-20060330nb2: Creating group ``_spamd''
spamd-20060330nb2: Creating user ``_spamd''
useradd: Warning: home directory `/var/chroot/spamd' doesn't exist, and -m was not specified
rcs: /var/confrepo/defaults//usr/pkg/etc/RCS/spamd.conf,v: No such file or directory
/var/confrepo/defaults//usr/pkg/etc/spamd.conf,v  <--  /var/confrepo/defaults//usr/pkg/etc/spamd.conf
initial revision: 1.1
done
REGISTER /var/confrepo/defaults//usr/pkg/etc/spamd.conf
spamd-20060330nb2: copying /usr/pkg/share/examples/spamd/spamd.conf to /usr/pkg/etc/spamd.conf
===========================================================================
The following files should be created for spamd-20060330nb2:

        /etc/rc.d/pfspamd (m=0755)
            [/usr/pkg/share/examples/rc.d/pfspamd]

===========================================================================
===========================================================================
$NetBSD: MESSAGE,v 1.1.1.1 2005/06/28 12:43:57 peter Exp $

Don't forget to add the spamd ports to /etc/services:

spamd           8025/tcp                # spamd(8)
spamd-cfg       8026/tcp                # spamd(8) configuration

===========================================================================

/usr/pkg/etc/spamd.conf didn't already exists, so in the end, as usual, the example/default configuration /usr/pkg/share/examples/spamd/spamd.conf gets copied to PKG_SYSCONFDIR/spamd.conf.

The modified +FILES script also copied the example file under the VCS working directory at /var/confrepo/default/share/examples/spamd/spamd.conf it then REGISTEREd this (initial) revision of the default configuration with RCS.

When installing an updated (ouch!) spamd package, the installed configuration at /usr/pkg/etc/spamd.conf won't get touched, but a new revision of share/examples/spamd/spamd.conf will get stored using the revision control system.

For VCSs that support them, remote repositories can also be used via $REMOTEVCS. From the +VERSIONING comment:

REMOTEVCS, if set, must contain a string that the chosen VCS understands as
an URI to a remote repository, including login credentials if not specified
through other means. This is non standard across different backends, and
additional environment variables and cryptographic material 
may need to be provided.  

So, if using CVS accessing a remote repository over ssh, one should setup keys on the systems, then set and export

VCS=cvs
CVS_RSH=/usr/bin/ssh
REMOTEVCS=user@hostname:/path/to/existing/repo

Remember to initialize (e.g., mkdir -p /path/to/repo; cd /path/to/repo; cvs init) the repository on the remote system before attempting to install new packages.

Let's try to make a configuration change to spamd.conf and reinstall it:

I will enable whitelists uncommenting

#whitelist:\
#       :white:\
#       :method=file:\
#       :file=/var/mail/whitelist.txt:

...and enable automerge:

export VCSAUTOMERGE=yes
bmake install
[...]
merged with no conflict. installing it to /usr/pkg/etc/spamd.conf!

No conflicts get reported, diff shows no output since the installed file is already identical to the automerged one, which is installed again and contains the whitelisting options uncommented:

more /usr/pkg/etc/spamd.conf

# Whitelists are done like this, and must be added to "all" after each
# blacklist from which you want the addresses in the whitelist removed.
#
whitelist:\
        :white:\
        :method=file:\
        :file=/var/mail/whitelist.txt:

Let's simulate instead the addition of a new configuration option in a new package revision: this shouldn't generate conflicts!

bmake extract
===> Extracting for spamd-20060330nb2
vi work/spamd-20060330/etc/spamd.conf
# spamd config file, read by spamd-setup(8) for spamd(8)
#
# See spamd.conf(5)
# this is a new comment!
#

save, run bmake; bmake install:

===> Installing binary package of spamd-20060330nb2
RCS file: /var/confrepo/defaults//usr/pkg/etc/spamd.conf,v
done
/var/confrepo/defaults//usr/pkg/etc/spamd.conf,v  <--  /var/confrepo/defaults//usr/pkg/etc/spamd.conf
new revision: 1.9; previous revision: 1.8
done
REGISTER /var/confrepo/defaults//usr/pkg/etc/spamd.conf
spamd-20060330nb2: /usr/pkg/etc/spamd.conf already exists
spamd-20060330nb2: attempting to merge /usr/pkg/etc/spamd.conf with new defaults!
saving the currently installed revision to /var/confrepo/automerged//usr/pkg/etc/spamd.conf
RCS file: /var/confrepo/automerged//usr/pkg/etc/spamd.conf,v
done
/var/confrepo/automerged//usr/pkg/etc/spamd.conf,v  <--  /var/confrepo/automerged//usr/pkg/etc/spamd.conf
file is unchanged; reverting to previous revision 1.1
done
/var/confrepo/defaults//usr/pkg/etc/spamd.conf,v  -->  /var/confrepo/defaults//usr/pkg/etc/spamd.conf
revision 1.1
done
merged with no conflict. installing it to /usr/pkg/etc/spamd.conf!
--- /usr/pkg/etc/spamd.conf     2018-07-09 22:21:47.310545283 +0200
+++ /var/confrepo/defaults//usr/pkg/etc/spamd.conf.automerge    2018-07-09 22:29:16.597901636 +0200
@@ -5,6 +5,7 @@
 # See spamd.conf(5)
 #
 # Configures whitelists and blacklists for spamd
+# this is a new comment!
 #
 # Strings follow getcap(3) convention escapes, other than you
 # can have a bare colon (:) inside a quoted string and it
revert from the last revision of /var/confrepo/automerged//usr/pkg/etc/spamd.conf if needed
===========================================================================
The following files should be created for spamd-20060330nb2:

        /etc/rc.d/pfspamd (m=0755)
            [/usr/pkg/share/examples/rc.d/pfspamd]

===========================================================================
===========================================================================
$NetBSD: MESSAGE,v 1.1.1.1 2005/06/28 12:43:57 peter Exp $

Don't forget to add the spamd ports to /etc/services:

spamd           8025/tcp                # spamd(8)
spamd-cfg       8026/tcp                # spamd(8) configuration

===========================================================================
more /usr/pkg/etc/spamd.conf

[...]
# See spamd.conf(5)
#
# Configures whitelists and blacklists for spamd
# this is a new comment!
#
# Strings follow getcap(3) convention escapes, other than you
[...]
# Whitelists are done like this, and must be added to "all" after each
# blacklist from which you want the addresses in the whitelist removed.
#
whitelist:\
        :white:\
        :method=file:\
        :file=/var/mail/whitelist.txt:

We're set for now. In case of conflicts merging, the user is notified, the installed configuration file is not replaced and the conflict can be manually resolved by opening the file (as an example, /var/confrepo/defaults/usr/pkg/etc/spamd.conf.automerge) in a text editor.

Posted early Friday morning, July 20th, 2018 Tags:

The NetBSD Project is pleased to announce NetBSD 8.0, the sixteenth major release of the NetBSD operating system. It represents many bug fixes, additional hardware support and new security features. If you are running an earlier release of NetBSD, we strongly suggest updating to 8.0.

For more details, please see the release notes.

Complete source and binaries for NetBSD are available for download at many sites around the world and our CDN. A list of download sites providing FTP, AnonCVS, and other services may be found at the list of mirrors.

Posted early Sunday morning, July 22nd, 2018 Tags:

The NetBSD release engineering team is announcing a new support policy for our release branches. This affects NetBSD 8.0 and subsequent major releases (9.0, 10.0, etc.). All currently supported releases (6.x and 7.x) will keep their existing support policies.

Beginning with NetBSD 8.0, there will be no more teeny branches (e.g., netbsd-8-0).

This means that netbsd-8 will be the only branch for 8.x and there will be only one category of releases derived from 8.0: update releases. The first update release after 8.0 will be 8.1, the next will be 8.2, and so on. Update releases will contain security and bug fixes, and may contain new features and enhancements that are deemed safe for the release branch.

With this simplification of our support policy, users can expect:

  • More frequent releases
  • Better long-term support (example: quicker fixes for security issues, since there is only one branch to fix per major release)
  • New features and enhancements to make their way to binary releases faster (under our current scheme, no major release has received more than two feature updates in its life)

We understand that users of teeny branches may be concerned about the increased number of changes that update releases will bring. Historically, NetBSD stable branches (e.g., netbsd-7) have been managed very conservatively. Under this new scheme, the release engineering team will be even more strict in what changes we allow on the stable branch. Changes that would create issues with backwards compatibility are not allowed, and any changes made that prove to be problematic will be promptly reverted.

The support policy we've had until now was nice in theory, but it has not worked out in practice. We believe that this change will benefit the situation for vast majority of NetBSD users.

Posted Wednesday afternoon, July 25th, 2018 Tags: