- One in the base-system that includes a significant set of local patches.
- Another one in pkgsrc whose patching is limited to mostly build fixes.
The base-system version of GDB (GPLv3) still relies on local patching to work. I have set a goal to reduce the number of custom patches to bare minimum, ideally achieving the state of GDB working without any local modifications at all.
GDB changes
Last month, the NetBSD/amd64 support was merged into gdbserver. This month, the gdbserver target support was extended to NetBSD/i386 and NetBSD/aarch64. The gdbserver and gdb code was cleaned up, refactored and made capable of introducing even more NetBSD targets.
Meanwhile, the NetBSD/i386 build of GDB was fixed. The missing include of x86-bsd-nat.h as a common header was added to i386-bsd-nat.h. The i386 GDB code for BSD contained a runtime assert that verified whether the locally hardcoded struct sigcontext is compatible with the system headers. In reality, the system headers are no longer using this structure since 2003, after the switch to ucontext_t, and the validating code was no longer effective. After the switch to newer GCC, this was reported as a unused local variable by the compiler. I have decided to remove the check on NetBSD entirely. This was followed up by a small build fix.
The NetBSD team has noticed that the GDB's agent.cc code contains a portability bug and prepared a local fix. The traditional behavior of the BSD kernel is that passing random values of sun_len (part of sockaddr_un) can cause failures. In order to prevent the problems, the sockaddr_un structure is now zeroed before use. I've reimplemented the fix and successfully upstreamed it.
In order to easily resolve the issue with environment hardening enforced by PaX MPROTECT, I've introduced a runtime warning whenever byte transfers betweeen the debugee and debugger occur with the EACCES errno code.
binutils changes
I've added support for NetBSD/aarch64 upstream, in GNU BFD and GNU GAS. NetBSD still carries local patches for the GNU binutils components, and GNU ld does not build out of the box on NetBSD/aarch64.
Summary
The NetBSD support in GNU binutils and GDB is improving promptly, and the most popular platforms of amd64, i386 and aarch64 are getting proper support out of the box, without downstream patches. The remaining patches for these CPUs include: streamlining kgdb support, adding native GDB support for aarch64, upstreaming local modifications from the GNU binutils components (especially BFD and ld) and introducing portability enhancements in the dependent projects like libiberty and gnulib. Then, the remaining work is to streamline support for the remaining CPUs (Alpha, VAX, MIPS, HPPA, IA64, SH3, PPC, etc.), to develop the missing generic features (such as listing open file descriptors for the specified process) and to fix failures in the regression test-suite.
- One in the base-system that includes a significant set of local patches.
- Another one in pkgsrc whose patching is limited to mostly build fixes.
The base-system version of GDB (GPLv3) still relies on local patching to work. I have set a goal to reduce the number of custom patches to bare minimum, ideally achieving the state of GDB working without any local modifications at all.
GDB changes
Last month, the NetBSD/amd64 support was merged into gdbserver. This month, the gdbserver target support was extended to NetBSD/i386 and NetBSD/aarch64. The gdbserver and gdb code was cleaned up, refactored and made capable of introducing even more NetBSD targets.
Meanwhile, the NetBSD/i386 build of GDB was fixed. The missing include of x86-bsd-nat.h as a common header was added to i386-bsd-nat.h. The i386 GDB code for BSD contained a runtime assert that verified whether the locally hardcoded struct sigcontext is compatible with the system headers. In reality, the system headers are no longer using this structure since 2003, after the switch to ucontext_t, and the validating code was no longer effective. After the switch to newer GCC, this was reported as a unused local variable by the compiler. I have decided to remove the check on NetBSD entirely. This was followed up by a small build fix.
The NetBSD team has noticed that the GDB's agent.cc code contains a portability bug and prepared a local fix. The traditional behavior of the BSD kernel is that passing random values of sun_len (part of sockaddr_un) can cause failures. In order to prevent the problems, the sockaddr_un structure is now zeroed before use. I've reimplemented the fix and successfully upstreamed it.
In order to easily resolve the issue with environment hardening enforced by PaX MPROTECT, I've introduced a runtime warning whenever byte transfers betweeen the debugee and debugger occur with the EACCES errno code.
binutils changes
I've added support for NetBSD/aarch64 upstream, in GNU BFD and GNU GAS. NetBSD still carries local patches for the GNU binutils components, and GNU ld does not build out of the box on NetBSD/aarch64.
Summary
The NetBSD support in GNU binutils and GDB is improving promptly, and the most popular platforms of amd64, i386 and aarch64 are getting proper support out of the box, without downstream patches. The remaining patches for these CPUs include: streamlining kgdb support, adding native GDB support for aarch64, upstreaming local modifications from the GNU binutils components (especially BFD and ld) and introducing portability enhancements in the dependent projects like libiberty and gnulib. Then, the remaining work is to streamline support for the remaining CPUs (Alpha, VAX, MIPS, HPPA, IA64, SH3, PPC, etc.), to develop the missing generic features (such as listing open file descriptors for the specified process) and to fix failures in the regression test-suite.
- One in the base-system that includes a significant set of local patches.
- Another one in pkgsrc whose patching is limited to mostly build fixes.
The base-system version of GDB (GPLv3) still relies on local patching to work. I have set a goal to reduce the number of custom patches to bare minimum, ideally achieving the state of GDB working without any local modifications at all.
GDB changes
Last month, the NetBSD/amd64 support was merged into gdbserver. This month, the gdbserver target support was extended to NetBSD/i386 and NetBSD/aarch64. The gdbserver and gdb code was cleaned up, refactored and made capable of introducing even more NetBSD targets.
Meanwhile, the NetBSD/i386 build of GDB was fixed. The missing include of x86-bsd-nat.h as a common header was added to i386-bsd-nat.h. The i386 GDB code for BSD contained a runtime assert that verified whether the locally hardcoded struct sigcontext is compatible with the system headers. In reality, the system headers are no longer using this structure since 2003, after the switch to ucontext_t, and the validating code was no longer effective. After the switch to newer GCC, this was reported as a unused local variable by the compiler. I have decided to remove the check on NetBSD entirely. This was followed up by a small build fix.
The NetBSD team has noticed that the GDB's agent.cc code contains a portability bug and prepared a local fix. The traditional behavior of the BSD kernel is that passing random values of sun_len (part of sockaddr_un) can cause failures. In order to prevent the problems, the sockaddr_un structure is now zeroed before use. I've reimplemented the fix and successfully upstreamed it.
In order to easily resolve the issue with environment hardening enforced by PaX MPROTECT, I've introduced a runtime warning whenever byte transfers betweeen the debugee and debugger occur with the EACCES errno code.
binutils changes
I've added support for NetBSD/aarch64 upstream, in GNU BFD and GNU GAS. NetBSD still carries local patches for the GNU binutils components, and GNU ld does not build out of the box on NetBSD/aarch64.
Summary
The NetBSD support in GNU binutils and GDB is improving promptly, and the most popular platforms of amd64, i386 and aarch64 are getting proper support out of the box, without downstream patches. The remaining patches for these CPUs include: streamlining kgdb support, adding native GDB support for aarch64, upstreaming local modifications from the GNU binutils components (especially BFD and ld) and introducing portability enhancements in the dependent projects like libiberty and gnulib. Then, the remaining work is to streamline support for the remaining CPUs (Alpha, VAX, MIPS, HPPA, IA64, SH3, PPC, etc.), to develop the missing generic features (such as listing open file descriptors for the specified process) and to fix failures in the regression test-suite.
This post is a follow up of the first report and second report. Post summarizes the work done during the third and final coding period for the Google Summer of Code (GSoc’20) project - Enhance Syzkaller support for NetBSD
Sys2syz
Sys2syz would give an extra edge to Syzkaller for NetBSD. It has a potential of efficiently automating the conversion of syscall definitions to syzkaller’s grammar. This can aid in increasing the number of syscalls covered by Syzkaller significantly with the minimum possibility of manual errors. Let’s delve into its internals.
A peek into Syz2syz Internals
This tool parses the source code of device drivers present in C to a format which is compatible with grammar customized for syzkaller. Here, we try to cull the details of the target device by compiling, and then collocate the details with our python code. For further details about proposed design for the tool, refer to previous post.
Python code follows 4 major steps:
- Extractor.py - Extraction of all ioctl commands of a given device driver along with their arguments from the header files.
- Bear.py - Preprocessing of the device driver's files using compile_commands.json generated during the setup of tool using Bear.
- C2xml.py - XML files are generated by running c2xml on preprocessed device files. This eases the process of fetching the information related to arguments of commands
- Description.py - Generates descriptions for the ioctl commands and their arguments (builtin-types, arrays, pointers, structures and unions) using the XML files.
Extraction:
This step involves fetching the possible ioctl commands for the target device driver and getting the files which have to be included in our dev_target.txt file. We have already seen all the commands for device drivers are defined in a specific way. These commands defined in the header files need to be grepped along with the major details, regex comes in as a rescue for this
io = re.compile("#define\s+(.*)\s+_IO\((.*)\).*")
iow = re.compile("#define\s+(.*)\s+_IOW\((.*),\s+(.*),\s+(.*)\).*")
ior = re.compile("#define\s+(.*)\s+_IOR\((.*),\s+(.*),\s+(.*)\).*")
iowr = re.compile("#define\s+(.*)\s+_IOWR\((.*),\s+(.*),\s+(.*)\).*")
Code scans through all the header files present in the target device folder and extracts all the commands along with their details using compiled regex expressions. Details include the direction of buffer(null, in, out, inout) based on the types of Ioctl calls(_IO, _IOR, _IOW, _IOWR) and the argument of the call. These are stored in a file named ioctl_commands.txt at location out/<target_name>. Example output:
out, I2C_IOCTL_EXEC, i2c_ioctl_exec_t
Preprocessing:
Preprocessing is required for getting XML files, about which we would look in the next step. Bear plays a major role when it comes to preprocessing C files. It records the commands executed for building the target device driver. This step is performed when setup.sh script is executed.
Extracted commands are modified with the help of parse_commands() function to include ‘-E’ and ‘-fdirectives’ flags and give it a new output location. Commands extracted by this function are then used by the compile_target function which filters out the unnecessary flags and generates preprocessed files in our output directory.
Generating XML files
Run C2xml on the preprocessed files to fetch XML files which stores source code in a tree-like structure, making it easier to collect all the information related to each and every element of structures, unions etc. For eg:
<symbol type="struct" id="_5970" file="am2315.i" start-line="13240" start-col="16" end-line="13244" end-col="11" bit-size="96" alignment="4" offset="0">
<symbol type="node" id="_5971" ident="ipending" file="am2315.i" start-line="13241" start-col="33" end-line="13241" end-col="41" bit-size="32" alignment="4" offset="0" base-type-builtin="unsigned int"/<
<symbol type="node" id="_5972" ident="ilevel" file="am2315.i" start-line="13242" start-col="33" end-line="13242" end-col="39" bit-size="32" alignment="4" offset="4" base-type-builtin="int"/>
<symbol type="node" id="_5973" ident="imasked" file="am2315.i" start-line="13243" start-col="33" end-line="13243" end-col="40" bit-size="32" alignment="4" offset="8" base-type-builtin="unsigned int"/>
</symbol>
<symbol type="pointer" id="_5976" file="am2315.i" start-line="13249" start-col="14" end-line="13249" end-col="25" bit-size="64" alignment="8" offset="0" base-type-builtin="void"/>
<symbol type="array" id="_5978" file="am2315.i" start-line="13250" start-col="33" end-line="13250" end-col="39" bit-size="288" alignment="4" offset="0" base-type-builtin="unsigned int" array-size="9"/>
We would further see how attributes like - idents, id, type, base-type-builtin etc conveniently helps us to analyze code and generate descriptions in a trouble-free manner .
Descriptions.py
Final part, which offers a txt file storing all the required descriptions as its output. Here, information from the xml files and ioctl_commands.txt are combined together to generate descriptions of ioctl commands and their arguments.
Xml files for the given target device are parsed to form trees,
for file in (os.listdir(self.target)):
tree = ET.parse(self.target+file)
self.trees.append(tree)
We then traverse through these trees to search for the arguments of a particular ioctl command (particularly _IOR, _IOW, _IOWR commands) by the name of the argument. Once an element with the same value for ident attribute is found, attributes of the element are further examined to get its type. Possible types for these arguments are - struct, union, enum, function, array, pointer, macro and node. Using the type information we determine the way to define the element in accordance with syzkaller’s grammar syntax.
Building structs and unions involves defining their elements too, XML makes it easier. Program analyses each and every element which is a child of the root (struct/union) and generates its definitions. A dictionary helps in tracking the structs/unions which have been already built. Later, the dictionary is used to pretty print all the structs and union in the output file. Here is a code snippet which depicts the approach
name = child.get("ident")
if name not in self.structs_and_unions.keys():
elements = {}
for element in child:
elem_type = self.get_type(element)
elem_ident = element.get("ident")
if elem_type == None:
elem_type = element.get("type")
elements[element.get("ident")] = elem_type
element_str = ""
for element in elements:
element_str += element + "\t" + elements[element] + "\n"
self.structs_and_unions[name] = " {\n" + element_str + "}\n"
return str(name)
Task of creating descriptions for arrays is made simpler due to the attribute - `array-size`. When it comes to dealing with pointers, syzkaller needs the user to fill in the direction of the pointer. This has already been taken care of while analyzing the ioctl commands in Extractor.py. The second argument with in/out/inout as its possible value depends on ‘fun’ macros - _IOR, _IOW, _IOWR respectively.
There is another category named as nodes which can be distinguished using the base-type-builtin and base-type attributes.
Result
Once the setup script for sys2syz is executed, sys2syz can be used for a certain target_device file by executing the python wrapper script (sys2syz.py) with :
#bin/sh
python sys2syz.py -t <absolute_path_to_device_driver_source> -c compile_commands.json -v
This would generate a dev_<device_driver>.txt file in the out directory. An example description file autogenerated by sys2syz for i2c device driver.
#Autogenerated by sys2syz
include
resource fd_i2c[fd]
syz_open_dev$I2C(dev ptr[in, string["/dev/i2c"]], id intptr, flags flags[open_flags]) fd_i2c
ioctl$I2C_IOCTL_EXEC(fd fd_i2c, cmd const[I2C_IOCTL_EXEC], arg ptr[out, i2c_ioctl_exec])
i2c_ioctl_exec {
iie_op flags[i2c_op_t_flags]
iie_addr int16
iie_buflen len[iie_buf, intptr]
iie_buf buffer[out]
iie_cmdlen len[iie_cmd, intptr]
iie_cmd buffer[out]
}
Future Work
Though we have a basic working structure of this tool, yet a lot has to be worked upon for leveling it up to make the best of it. Perfect goals would be met when there would be least of manual labor needed. Sys2syz still looks forward to automating the detection of macros used by the flag types in syzkaller. List of to-dos also includes extending syzkaller’s support for generation of description of syscalls.
Some other yet-to-be-done tasks include-
- Generating descriptions for function type
- Calculating attributes for structs and unions
Summary
We have surely reached closer to our goals but the project needs active involvement and incremental updates to scale it up to its full potential. Looking forward to much more learning and making more contribution to NetBSD community.
Atlast, a word of thanks to my mentors William Coldwell, Siddharth Muralee, Santhosh Raju and Kamil Rytarowski as well as the NetBSD organization for being extremely supportive. Also, I owe a big thanks to Google for giving me such a glaring opportunity to work on this project.
This post is a follow up of the first report and second report. Post summarizes the work done during the third and final coding period for the Google Summer of Code (GSoc’20) project - Enhance Syzkaller support for NetBSD
Sys2syz
Sys2syz would give an extra edge to Syzkaller for NetBSD. It has a potential of efficiently automating the conversion of syscall definitions to syzkaller’s grammar. This can aid in increasing the number of syscalls covered by Syzkaller significantly with the minimum possibility of manual errors. Let’s delve into its internals.
A peek into Syz2syz Internals
This tool parses the source code of device drivers present in C to a format which is compatible with grammar customized for syzkaller. Here, we try to cull the details of the target device by compiling, and then collocate the details with our python code. For further details about proposed design for the tool, refer to previous post.
Python code follows 4 major steps:
- Extractor.py - Extraction of all ioctl commands of a given device driver along with their arguments from the header files.
- Bear.py - Preprocessing of the device driver's files using compile_commands.json generated during the setup of tool using Bear.
- C2xml.py - XML files are generated by running c2xml on preprocessed device files. This eases the process of fetching the information related to arguments of commands
- Description.py - Generates descriptions for the ioctl commands and their arguments (builtin-types, arrays, pointers, structures and unions) using the XML files.
Extraction:
This step involves fetching the possible ioctl commands for the target device driver and getting the files which have to be included in our dev_target.txt file. We have already seen all the commands for device drivers are defined in a specific way. These commands defined in the header files need to be grepped along with the major details, regex comes in as a rescue for this
io = re.compile("#define\s+(.*)\s+_IO\((.*)\).*")
iow = re.compile("#define\s+(.*)\s+_IOW\((.*),\s+(.*),\s+(.*)\).*")
ior = re.compile("#define\s+(.*)\s+_IOR\((.*),\s+(.*),\s+(.*)\).*")
iowr = re.compile("#define\s+(.*)\s+_IOWR\((.*),\s+(.*),\s+(.*)\).*")
Code scans through all the header files present in the target device folder and extracts all the commands along with their details using compiled regex expressions. Details include the direction of buffer(null, in, out, inout) based on the types of Ioctl calls(_IO, _IOR, _IOW, _IOWR) and the argument of the call. These are stored in a file named ioctl_commands.txt at location out/<target_name>. Example output:
out, I2C_IOCTL_EXEC, i2c_ioctl_exec_t
Preprocessing:
Preprocessing is required for getting XML files, about which we would look in the next step. Bear plays a major role when it comes to preprocessing C files. It records the commands executed for building the target device driver. This step is performed when setup.sh script is executed.
Extracted commands are modified with the help of parse_commands() function to include ‘-E’ and ‘-fdirectives’ flags and give it a new output location. Commands extracted by this function are then used by the compile_target function which filters out the unnecessary flags and generates preprocessed files in our output directory.
Generating XML files
Run C2xml on the preprocessed files to fetch XML files which stores source code in a tree-like structure, making it easier to collect all the information related to each and every element of structures, unions etc. For eg:
<symbol type="struct" id="_5970" file="am2315.i" start-line="13240" start-col="16" end-line="13244" end-col="11" bit-size="96" alignment="4" offset="0">
<symbol type="node" id="_5971" ident="ipending" file="am2315.i" start-line="13241" start-col="33" end-line="13241" end-col="41" bit-size="32" alignment="4" offset="0" base-type-builtin="unsigned int"/<
<symbol type="node" id="_5972" ident="ilevel" file="am2315.i" start-line="13242" start-col="33" end-line="13242" end-col="39" bit-size="32" alignment="4" offset="4" base-type-builtin="int"/>
<symbol type="node" id="_5973" ident="imasked" file="am2315.i" start-line="13243" start-col="33" end-line="13243" end-col="40" bit-size="32" alignment="4" offset="8" base-type-builtin="unsigned int"/>
</symbol>
<symbol type="pointer" id="_5976" file="am2315.i" start-line="13249" start-col="14" end-line="13249" end-col="25" bit-size="64" alignment="8" offset="0" base-type-builtin="void"/>
<symbol type="array" id="_5978" file="am2315.i" start-line="13250" start-col="33" end-line="13250" end-col="39" bit-size="288" alignment="4" offset="0" base-type-builtin="unsigned int" array-size="9"/>
We would further see how attributes like - idents, id, type, base-type-builtin etc conveniently helps us to analyze code and generate descriptions in a trouble-free manner .
Descriptions.py
Final part, which offers a txt file storing all the required descriptions as its output. Here, information from the xml files and ioctl_commands.txt are combined together to generate descriptions of ioctl commands and their arguments.
Xml files for the given target device are parsed to form trees,
for file in (os.listdir(self.target)):
tree = ET.parse(self.target+file)
self.trees.append(tree)
We then traverse through these trees to search for the arguments of a particular ioctl command (particularly _IOR, _IOW, _IOWR commands) by the name of the argument. Once an element with the same value for ident attribute is found, attributes of the element are further examined to get its type. Possible types for these arguments are - struct, union, enum, function, array, pointer, macro and node. Using the type information we determine the way to define the element in accordance with syzkaller’s grammar syntax.
Building structs and unions involves defining their elements too, XML makes it easier. Program analyses each and every element which is a child of the root (struct/union) and generates its definitions. A dictionary helps in tracking the structs/unions which have been already built. Later, the dictionary is used to pretty print all the structs and union in the output file. Here is a code snippet which depicts the approach
name = child.get("ident")
if name not in self.structs_and_unions.keys():
elements = {}
for element in child:
elem_type = self.get_type(element)
elem_ident = element.get("ident")
if elem_type == None:
elem_type = element.get("type")
elements[element.get("ident")] = elem_type
element_str = ""
for element in elements:
element_str += element + "\t" + elements[element] + "\n"
self.structs_and_unions[name] = " {\n" + element_str + "}\n"
return str(name)
Task of creating descriptions for arrays is made simpler due to the attribute - `array-size`. When it comes to dealing with pointers, syzkaller needs the user to fill in the direction of the pointer. This has already been taken care of while analyzing the ioctl commands in Extractor.py. The second argument with in/out/inout as its possible value depends on ‘fun’ macros - _IOR, _IOW, _IOWR respectively.
There is another category named as nodes which can be distinguished using the base-type-builtin and base-type attributes.
Result
Once the setup script for sys2syz is executed, sys2syz can be used for a certain target_device file by executing the python wrapper script (sys2syz.py) with :
#bin/sh
python sys2syz.py -t <absolute_path_to_device_driver_source> -c compile_commands.json -v
This would generate a dev_<device_driver>.txt file in the out directory. An example description file autogenerated by sys2syz for i2c device driver.
#Autogenerated by sys2syz
include
resource fd_i2c[fd]
syz_open_dev$I2C(dev ptr[in, string["/dev/i2c"]], id intptr, flags flags[open_flags]) fd_i2c
ioctl$I2C_IOCTL_EXEC(fd fd_i2c, cmd const[I2C_IOCTL_EXEC], arg ptr[out, i2c_ioctl_exec])
i2c_ioctl_exec {
iie_op flags[i2c_op_t_flags]
iie_addr int16
iie_buflen len[iie_buf, intptr]
iie_buf buffer[out]
iie_cmdlen len[iie_cmd, intptr]
iie_cmd buffer[out]
}
Future Work
Though we have a basic working structure of this tool, yet a lot has to be worked upon for leveling it up to make the best of it. Perfect goals would be met when there would be least of manual labor needed. Sys2syz still looks forward to automating the detection of macros used by the flag types in syzkaller. List of to-dos also includes extending syzkaller’s support for generation of description of syscalls.
Some other yet-to-be-done tasks include-
- Generating descriptions for function type
- Calculating attributes for structs and unions
Summary
We have surely reached closer to our goals but the project needs active involvement and incremental updates to scale it up to its full potential. Looking forward to much more learning and making more contribution to NetBSD community.
Atlast, a word of thanks to my mentors William Coldwell, Siddharth Muralee, Santhosh Raju and Kamil Rytarowski as well as the NetBSD organization for being extremely supportive. Also, I owe a big thanks to Google for giving me such a glaring opportunity to work on this project.
After a small delay*, the NetBSD Project is pleased to announce NetBSD 9.1, the first feature and stability maintenance release of the netbsd-9 stable branch.
The new release features (among various other changes) many bug fixes, a few performance enhancements, stability improvements for ZFS and LFS and support for USB security keys in a mode easily usable in Firefox and other applications.
For more details and instructions see the 9.1 announcement.
Get NetBSD 9.1 from our CDN (provided by fastly) or one of the ftp mirrors.
Complete source and binaries for NetBSD are available for download at many sites around the world. A list of download sites providing FTP, AnonCVS, and other services may be found at https://www.NetBSD.org/mirrors/.
* for the delay: let us say there was a minor hickup and we took the opportunity to provide up to date timezone files for NetBSD users in Fiji.
After a small delay*, the NetBSD Project is pleased to announce NetBSD 9.1, the first feature and stability maintenance release of the netbsd-9 stable branch.
The new release features (among various other changes) many bug fixes, a few performance enhancements, stability improvements for ZFS and LFS and support for USB security keys in a mode easily usable in Firefox and other applications.
For more details and instructions see the 9.1 announcement.
Get NetBSD 9.1 from our CDN (provided by fastly) or one of the ftp mirrors.
Complete source and binaries for NetBSD are available for download at many sites around the world. A list of download sites providing FTP, AnonCVS, and other services may be found at https://www.NetBSD.org/mirrors/.
* for the delay: let us say there was a minor hickup and we took the opportunity to provide up to date timezone files for NetBSD users in Fiji.