Saturday, 25 January 2014

FreeBSD-SA-09:14.devfs

About the same time as the pipe vulnerability there was a devfs race condition discovered. This vulnerability manifested itself by an uninitialized vnode pointer being used. The pointer would be NULL and could be used by another process before it is assigned to an actual vnode. The vulnerability doesn't have a specific "place" in the code because it results due to the product of how devfs and vfs interact. However, the fix was made in devfs.

The bug turned out to be exploitable with the exploit nicely described by XORL blog post. I will be going into a little more detail about the code paths leading to the vulnerability.

First a process tries to open a devfs file (i.e. /dev/null or similiar). This is done through the open system call which eventually executes kern_open kernel function.

int
kern_open(struct thread *td, char *path, enum uio_seg pathseg, int flags, int mode)
{
        ....
/* An extra reference on `nfp' has been held for us by falloc(). */
fp = nfp;
cmode = ((mode &~ fdp->fd_cmask) & ALLPERMS) &~ S_ISTXT;
NDINIT(&nd, LOOKUP, FOLLOW, pathseg, path, td);
td->td_dupfd = -1; /* XXX check for fdopen */
error = vn_open(&nd, &flags, cmode, indx);
        ...
}
Almost at the very beginning the call goes down the path of vn_open which executes the VFS specific functionalities. vn_open performs many checks, such as does the user have access to the file or are the access flags correct? It eventually passes control to the devfs subsystem for the actual device opening:

int
vn_open_cred(ndp, flagp, cmode, cred, fdidx)
struct nameidata *ndp;
int *flagp, cmode;
struct ucred *cred;
int fdidx;
{
...
restart:
vfslocked = 0;
fmode = *flagp;
...
ndp->ni_cnd.cn_nameiop = LOOKUP;
ndp->ni_cnd.cn_flags = ISOPEN |
   ((fmode & O_NOFOLLOW) ? NOFOLLOW : FOLLOW) |
   LOCKSHARED | LOCKLEAF | MPSAFE;
if ((error = namei(ndp)) != 0)
return (error);
ndp->ni_cnd.cn_flags &= ~MPSAFE;
vfslocked = (ndp->ni_cnd.cn_flags & GIANTHELD) != 0;
vp = ndp->ni_vp;
}
       ...
if ((error = VOP_OPEN(vp, fmode, cred, td, fdidx)) != 0)
goto bad;
if (fmode & FWRITE)
vp->v_writecount++;
*flagp = fmode;
ASSERT_VOP_LOCKED(vp, "vn_open_cred");
if (fdidx == -1)
VFS_UNLOCK_GIANT(vfslocked);
return (0);
bad:
NDFREE(ndp, NDF_ONLY_PNBUF);
vput(vp);
VFS_UNLOCK_GIANT(vfslocked);
*flagp = fmode;
ndp->ni_vp = NULL;
return (error);
}
Here the call is passed through to devfs via the VOP_OPEN marco call.
static int
devfs_open(struct vop_open_args *ap)
{
...
dsw = dev_refthread(dev);
if (dsw == NULL)
return (ENXIO);
/* XXX: Special casing of ttys for deadfs.  Probably redundant. */
if (dsw->d_flags & D_TTY)
vp->v_vflag |= VV_ISTTY;
VOP_UNLOCK(vp, 0, td);
...
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td);
dev_relthread(dev);
...
fp = ap->a_td->td_proc->p_fd->fd_ofiles[ap->a_fdidx];
KASSERT(fp->f_ops == &badfileops,
    ("Could not vnode bypass device on fdops %p", fp->f_ops));
fp->f_ops = &devfs_ops_f;
fp->f_data = dev;
return (error);
}
So far so good, nothing terribly bad has happened, there is no memory corruption. However, the problem is that right after the VOP_UNLOCK(vp, 0, td) call another thread can start using the file descriptor. If the second thread does not check the vnode pointer then it would be in trouble. At this point the kernel has not assigned the vnode to the file descriptor (fp) structure.

This assignment happens later in the kern_open call in the same execution thread. In fact, it happens just before the function returns to the user land.
int
kern_open(struct thread *td, char *path, enum uio_seg pathseg, int flags,
    int mode)
{
...
FILEDESC_LOCK(fdp);
FILE_LOCK(fp);
if (fp->f_count == 1) {
mp = vp->v_mount;
KASSERT(fdp->fd_ofiles[indx] != fp,
   ("Open file descriptor lost all refs"));
FILE_UNLOCK(fp);
FILEDESC_UNLOCK(fdp);
VOP_UNLOCK(vp, 0, td);
vn_close(vp, flags & FMASK, fp->f_cred, td);
VFS_UNLOCK_GIANT(vfslocked);
fdrop(fp, td);
td->td_retval[0] = indx;
return (0);
}
fp->f_vnode = vp;
if (fp->f_data == NULL)
fp->f_data = vp;
fp->f_flag = flags & FMASK;
if (fp->f_ops == &badfileops)
fp->f_ops = &vnops;
fp->f_seqcount = 1;
fp->f_type = (vp->v_type == VFIFO ? DTYPE_FIFO : DTYPE_VNODE);
FILE_UNLOCK(fp);
FILEDESC_UNLOCK(fdp);
...
}
The assignment marked above is where it happens. As mentioned before, it is too late by that time and there is a danger that the pointer could be used. That is exactly what happened in the exploit code. The fix was to go back to the devfs_open call, break the abstraction, and assign the vnode to the file descriptor right after the unlock happens.

Saturday, 18 January 2014

FreeBSD-SA-09:13.pipe

Code seems to age much quicker that anything else. Way back - not so long ago - in 2009 there was a bug in the FreeBSD kernel PIPE and EVENT handling code. This turned out to be exploitable in versions 6.x of the kernel. It was never truly patched, however the code was redesigned in order to eliminate a whole set of potential vulnerabilities including this one. The bug was published by the FreeBSD security advisory FreeBSD-SA-09:13.pipe.asc. A proof of concept exploit is available for this vulnerability: http://www.frasunek.com/pipe.txt.

For this analysis I needed to figure out what sequences of events lead to the vulnerability manifestation. I won't go into details about how the corruption happened and how the exploit works. Also, I haven't actually tried to execute it - so, I'm merely assuming that it works.

Unless you know the details of how kqueues, knote lists and pipes work, this vulnerability is actually quite hard to spot even if the patch is available. The patch covers a lot of code and does not highlight the bug itself. So, if for some strange reason you're trying to figure out this vulnerability then this post should give you the initial steps.

The vulnerable version of the kernel is still available in the current (as of this writing) FreeBSD SVN: http://svnweb.freebsd.org/base/release/6.0.0/. All analysis below follows that code.

We start with function pipe_close which gets called via the close system call.
static int
pipe_close(fp, td)
    struct file *fp;
    struct thread *td;
{
    struct pipe *cpipe = fp->f_data;

    fp->f_ops = &badfileops;
    fp->f_data = NULL;
    funsetown(&cpipe->pipe_sigio);
    pipeclose(cpipe);
    return (0);
}
This function obtains the pipe structure and sends it on to the pipeclose function. It is important to note that a pipe has two parts. The read and write, however it is one entity. The pipe pair is allocated in the same UMA (Upper Memory Area) zone as one chunk of memory. So, really the read/write pipes refer to the same general space.

Pipeclose then obtains the pair and tries to flush the pipe and clear out any knotes attached to it. A knote is a special mechanism used for kernel to user event notification. In very basic terms it is a select optimized for a special case. In select the user has to pass a whole list of identifiers to the kernel while with a knote a user subscribes to a filter (the event criteria) and allows for a much more granular event notification. The user process maintains a kqueue of the events it is listening on while each identifier being listened on maintains a knote linked list to know who to notify. A much more detailed description of this mechanism can be found in this paper: kqueue.pdf.

Once various closing/flushing processes are complete the pipeclose function tries to clear out the pipe by starting with the knote list. About half way down the list, the following sequence is executed:
static void
pipeclose(cpipe)
      struct pipe *cpipe;
{
      struct pipepair *pp;
      struct pipe *ppipe;
      .....
      PIPE_UNLOCK(cpipe);
      pipe_free_kmem(cpipe);
      PIPE_LOCK(cpipe);
      cpipe->pipe_present = 0;
      pipeunlock(cpipe);
      knlist_clear(&cpipe->pipe_sel.si_note, 1);
      knlist_destroy(&cpipe->pipe_sel.si_note);
      .....
}


Can you spot the bug? The above code is basically it. I wouldn't expect you to, unless you're a kernel hacker for this particular portion of the kernel. Specifically, the last two lines in the above listing cause the problem.

  • The PIPE_LOCK mutex isn't protecting the pipe, it is protecting the pipelock mutex.
  • The pipe is UNLOCKED by the pipeunlock call before calls to knlist_clear and knlist_destroy are made.
This means that two processes can be calling knlist_clear and knlist_destroy unsafely. Both of those functions are not thread safe. So, it can happen that a linked list of knotes for the pipe is reinitialized (via the destroy call) while it is still being cleared. The clearing is a blocking procedure that sends out notifications to processes on the knote list. While the clearing function is traversing the same linked list it could easily be destroyed by another process because that process sees an already cleared list.





Thursday, 9 January 2014

FreeBSD-SA-10:09.pseudofs

Older FreeBSD 7 and 8 versions had a bug in the pseudofs module - back in 2010. This bug manifested through an unnecessary mutex release which turned out to be exploitable through a NULL pointer dereference. In this post I will do a walk through to show the events that lead up to the bug. For more information check out the security advisory: FreeBSD-SA-10:09.pseudofs.asc. There is also a proof of concept exploit available on Security Focus.

The chain starts with a call to extattr_get_link in kern/vfs_extattr.c
ssize_t extattr_get_link(const char *path, int attrnamespace,
const char *attrname, void *data, size_t nbytes);
Which obtains extended attributes from a vnode. The functions looks like this
int extattr_get_link(td, uap)
struct thread *td;
struct extattr_get_link_args /* {
const char *path;
int attrnamespace;
const char *attrname;
void *data;
size_t nbytes;
} */ *uap;
{
...
vfslocked = NDHASGIANT(&nd);
error = extattr_get_vp(nd.ni_vp, uap->attrnamespace, attrname,
   uap->data, uap->nbytes, td);
vrele(nd.ni_vp);
VFS_UNLOCK_GIANT(vfslocked);
return (error);
}
The important part being selected, we see a call to extattr_get_vp. This was essentially a wrapper adding a few bells and whistles to the process.
static int
extattr_get_vp(struct vnode *vp, int attrnamespace, const char *attrname,
    void *data, size_t nbytes, struct thread *td)
{
struct uio auio, *auiop;
struct iovec aiov;
ssize_t cnt;
size_t size, *sizep;
int error;
VFS_ASSERT_GIANT(vp->v_mount);
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
...
 
error = VOP_GETEXTATTR(vp, attrnamespace, attrname, auiop, sizep,
   td->td_ucred, td);
if (auiop != NULL) {
cnt -= auio.uio_resid;
td->td_retval[0] = cnt;
} else
td->td_retval[0] = size;
done:
VOP_UNLOCK(vp, 0);
return (error);
}
We are starting to see a few statements directly relating to the vulnerability. First there is the vn_lock
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
which takes in the vnode pointer that we are interested in. This is the call that locks the vnode and the mutex in question. The code for that is slightly convoluted. Instead of using mtx_lock,  it works by invoking the lock manager.
// in kern/vfs_vnops.c
#define vn_lock(vp, flags) _vn_lock(vp, flags, __FILE__, __LINE__)
 
int
_vn_lock(struct vnode *vp, int flags, char *file, int line)
{
int error;
VNASSERT((flags & LK_TYPE_MASK) != 0, vp,
   ("vn_lock called with no locktype."));
do {
...
error = VOP_LOCK1(vp, flags, file, line);            
flags &= ~LK_INTERLOCK; /* Interlock is always dropped. */
...
} while (flags & LK_RETRY && error != 0);
return (error);
}
The function passes control to the VOP_LOCK1 marco. Here the control goes into the custom psuedofs territory. The module, however, does not implement the lock function and so a default locking function is used.
// kern/vfs_default.c
int
vop_stdlock(ap)
struct vop_lock1_args /* {
struct vnode *a_vp;
int a_flags;
char *file;
int line;
} */ *ap;
{
struct vnode *vp = ap->a_vp;
return (_lockmgr_args(vp->v_vnlock, ap->a_flags, VI_MTX(vp),
   LK_WMESG_DEFAULT, LK_PRIO_DEFAULT, LK_TIMO_DEFAULT, ap->a_file,
   ap->a_line));
}
Here we see that the default implementation passes the lock, vp->v_vnlock, to the lock manager via the _lockmgr_args function - a function too complex to show here and unnecessary for my purposes. The mutex is officially locked. Going back to extattr_get_vp we see a call to VOP_GETEXTATTR
error = VOP_GETEXTATTR(vp, attrnamespace, attrname, auiop, sizep,
    td->td_ucred, td);
This marco sends us to the module code where the actual extended attributes extraction occurs. The call resolves to pfs_getextattr function - through various function pointer magic. While essentially a wrapping function, akin to the Java synchronized block, this is where the bug lives.
static int
pfs_getextattr(struct vop_getextattr_args *va)
{
struct vnode *vn = va->a_vp;
struct pfs_vdata *pvd = vn->v_data;
struct pfs_node *pn = pvd->pvd_pn;
struct proc *proc;
int error;
PFS_TRACE(("%s", pn->pn_name));
pfs_assert_not_owned(pn);
/*
* This is necessary because either process' privileges may
* have changed since the open() call.
*/
if (!pfs_visible(curthread, pn, pvd->pvd_pid, &proc))
PFS_RETURN (EIO);
if (pn->pn_getextattr == NULL)
error = EOPNOTSUPP;
else
error = pn_getextattr(curthread, proc, pn,
   va->a_attrnamespace, va->a_name, va->a_uio,
   va->a_size, va->a_cred);
if (proc != NULL)
PROC_UNLOCK(proc);
  
 pfs_unlock(pn); //<---- BUG
PFS_RETURN (error);
}
That  pfs_unlock call is the culprit. Taking in the same node we saw in the VOP_LOCK1 call we saw earlier, it unlocks the mutex for the node. In retrospect the bug seems obvious. Why would the developer think that it is ok to mess with a mutex that was modified on a much high abstraction layer. I'm sure there was a good reason at the time. Perhaps the code was refactored where this unlocking step no longer makes sense. Regardless, the line was here and it caused a vulnerability. pfs_unlock itself is a simple inline function defined in fs/pseudofs/pseudofs_internal.h
static inline voidpfs_unlock(struct pfs_node *pn){
mtx_unlock(&pn->pn_mutex);}
Now for completeness, the other unlocking call happens back in extattr_get_vp  function through a call to a VOP function
static int
extattr_get_vp(struct vnode *vp, int attrnamespace, const char *attrname,
    void *data, size_t nbytes, struct thread *td)
{
 
...
VOP_UNLOCK(vp, 0);
return (error);
}
Again, psuedofs does not implement the unlocking function and uses a default implementation which uses the lock manager.

I've not gone into the details of how the actual corruption occurs and how it can be exploited. Perhaps another time. For my task I just needed to know the call chains that lead up to the bug. Hope you enjoyed reading it. An unpached version can be seen here: 8.3.0/sys/fs/pseudofs/pseudofs_vnops.c