The bug turned out to be exploitable with the exploit nicely described by XORL blog post. I will be going into a little more detail about the code paths leading to the vulnerability.
First a process tries to open a devfs file (i.e. /dev/null or similiar). This is done through the open system call which eventually executes kern_open kernel function.
intAlmost at the very beginning the call goes down the path of vn_open which executes the VFS specific functionalities. vn_open performs many checks, such as does the user have access to the file or are the access flags correct? It eventually passes control to the devfs subsystem for the actual device opening:
kern_open(struct thread *td, char *path, enum uio_seg pathseg, int flags, int mode)
{
....
/* An extra reference on `nfp' has been held for us by falloc(). */
fp = nfp;
cmode = ((mode &~ fdp->fd_cmask) & ALLPERMS) &~ S_ISTXT;
NDINIT(&nd, LOOKUP, FOLLOW, pathseg, path, td);
td->td_dupfd = -1; /* XXX check for fdopen */
error = vn_open(&nd, &flags, cmode, indx);
...
}
intHere the call is passed through to devfs via the VOP_OPEN marco call.
vn_open_cred(ndp, flagp, cmode, cred, fdidx)
struct nameidata *ndp;
int *flagp, cmode;
struct ucred *cred;
int fdidx;
{
...
restart:
vfslocked = 0;
fmode = *flagp;
...
ndp->ni_cnd.cn_nameiop = LOOKUP;
ndp->ni_cnd.cn_flags = ISOPEN |
((fmode & O_NOFOLLOW) ? NOFOLLOW : FOLLOW) |
LOCKSHARED | LOCKLEAF | MPSAFE;
if ((error = namei(ndp)) != 0)
return (error);
ndp->ni_cnd.cn_flags &= ~MPSAFE;
vfslocked = (ndp->ni_cnd.cn_flags & GIANTHELD) != 0;
vp = ndp->ni_vp;
}
...
if ((error = VOP_OPEN(vp, fmode, cred, td, fdidx)) != 0)
goto bad;
if (fmode & FWRITE)
vp->v_writecount++;
*flagp = fmode;
ASSERT_VOP_LOCKED(vp, "vn_open_cred");
if (fdidx == -1)
VFS_UNLOCK_GIANT(vfslocked);
return (0);
bad:
NDFREE(ndp, NDF_ONLY_PNBUF);
vput(vp);
VFS_UNLOCK_GIANT(vfslocked);
*flagp = fmode;
ndp->ni_vp = NULL;
return (error);
}
static intSo far so good, nothing terribly bad has happened, there is no memory corruption. However, the problem is that right after the VOP_UNLOCK(vp, 0, td) call another thread can start using the file descriptor. If the second thread does not check the vnode pointer then it would be in trouble. At this point the kernel has not assigned the vnode to the file descriptor (fp) structure.
devfs_open(struct vop_open_args *ap)
{
...
dsw = dev_refthread(dev);
if (dsw == NULL)
return (ENXIO);
/* XXX: Special casing of ttys for deadfs. Probably redundant. */
if (dsw->d_flags & D_TTY)
vp->v_vflag |= VV_ISTTY;
VOP_UNLOCK(vp, 0, td);
...
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY, td);
dev_relthread(dev);
...
fp = ap->a_td->td_proc->p_fd->fd_ofiles[ap->a_fdidx];
KASSERT(fp->f_ops == &badfileops,
("Could not vnode bypass device on fdops %p", fp->f_ops));
fp->f_ops = &devfs_ops_f;
fp->f_data = dev;
return (error);
}
This assignment happens later in the kern_open call in the same execution thread. In fact, it happens just before the function returns to the user land.
intThe assignment marked above is where it happens. As mentioned before, it is too late by that time and there is a danger that the pointer could be used. That is exactly what happened in the exploit code. The fix was to go back to the devfs_open call, break the abstraction, and assign the vnode to the file descriptor right after the unlock happens.
kern_open(struct thread *td, char *path, enum uio_seg pathseg, int flags,
int mode)
{
...
FILEDESC_LOCK(fdp);
FILE_LOCK(fp);
if (fp->f_count == 1) {
mp = vp->v_mount;
KASSERT(fdp->fd_ofiles[indx] != fp,
("Open file descriptor lost all refs"));
FILE_UNLOCK(fp);
FILEDESC_UNLOCK(fdp);
VOP_UNLOCK(vp, 0, td);
vn_close(vp, flags & FMASK, fp->f_cred, td);
VFS_UNLOCK_GIANT(vfslocked);
fdrop(fp, td);
td->td_retval[0] = indx;
return (0);
}
fp->f_vnode = vp;
if (fp->f_data == NULL)
fp->f_data = vp;
fp->f_flag = flags & FMASK;
if (fp->f_ops == &badfileops)
fp->f_ops = &vnops;
fp->f_seqcount = 1;
fp->f_type = (vp->v_type == VFIFO ? DTYPE_FIFO : DTYPE_VNODE);
FILE_UNLOCK(fp);
FILEDESC_UNLOCK(fdp);
...
}