Thursday, March 31, 2011

Names in Minifilters - Implementing Name Provider Callbacks

Since we're going to talk a bit more about names, one important aspect to cover is how to implement minifilters that change the namespace in some way. Of course, this is a very large topic so in this post I'm going to cover one particular aspect of that, how to implement the name provider callbacks. The "name provider" callbacks are in fact two members of the FLT_REGISTRATION structure, GenerateFileNameCallback and NormalizeNameComponentCallback (generally referred to as the generate callback and the normalize callback). There are two other callbacks related to this (NormalizeContextCleanupCallback and NormalizeNameComponentExCallback) but they should be pretty easy to figure out. Strangely enough Microsoft doesn't currently provide any sample on how these callbacks are supposed to look like even in a basic case so I'll do that in this post. Please note that this is written almost from scratch. It is very loosely based on some existing code I have but that was way too complicated for the purpose of this post. I just wanted to show what the callbacks are supposed to do in principle. So the code here hasn't been extensively tested, it might fail in unusual circumstances and so on. It should only used as a reference and not in a production environment (not that it does all that much anyway). With that said, if you do run this code and it fails under some circumstances or you spot an error just by looking at it, please let me know and I'll investigate and update it so that everyone benefits from it.

I'll be referring to minifilters that implement these name provider callbacks as name provider minifilters, or simply name providers.

First let's talk a bit about why these callbacks exist. As you may have read on this blog or you may know from experience, name generation is a pretty complicated business (I use name generation in the general sense, referring both to creating a name for a file and to normalizing that name). In preCreate it might require looking in the FileObject and RelatedFileObject, or it might require looking up a fileID into a table to get the name and so on. If there is support for links (hardlinks for example) then the name depends on the FILE_OBJECT that's asking for it. For filters, it also matters if the file object is going to be a target of some operation that changes the namespace (rename, hardlink), in which case the name during the preOp is different from the name in postOp. Then you have the tunnel cache which might change the name and so on. Legacy filters spent a lot of time (both during development and at runtime) trying to get names for files, and so the Filter Manager team at MS started the project with the intention to simplify things. In the process of doing that they wrote a lot of code (I've heard that about 20% of the initial FltMgr code was dedicated to this) and they've noticed a couple of problems. First, performance isn't all that great and second (and this is the real nasty discovery in my opinion), that there is still a need to violate the strict layering rules FltMgr tries hard to obey (I haven't given it enough thought to see if there may have been another way so I'll just take their word for it). So they tried to address both these issues by creating a model where only minifilters that actually need to be involved in the name generation path (namespace virtualization and such) need to see the ugliness, while the rest of them can just enjoy the rather simple abstraction of calling FltGetFileNameInformation and getting back a name that is ready to be (mis)used.

For name provider minifilters, FltMgr requires that they implement two callbacks that it will call when generating (or normalizing) a name. Moreover, the IO generated by FltMgr while trying to generate these names will only be shown to these types of minifilters (name providers). This is a pretty serious decision because the minifilter model is designed so that minifilters really do see all IO happening on a volume, so excluding a certain class of IO was not a decision to be taken lightly. However, the performance benefits were enough that it seemed justified and in terms of what IO is shown to minifilters, pretty much all minifilters that I know of work just fine without processing that FltMgr IO (in fact, it comes as a surprise to most people that there is some IO happening behind the scenes which shows that it is truly transparent). Another problem with that IO is that, as I said before, it might violate layering. Specifically, the callbacks might be called when minifilters below them call FltGetFileNameInformation, which means that the developers of these kinds of minifilters need to be extremely careful when implementing their filters. In conclusion, my advice is that unless a minifilter actually MUST implement a name provider it should really avoid doing so, no matter how much they want to process "all IO". In general a minfilter MUST implement a name provider if it implements some sort of name virtualization scheme where the name of a file below their layer is different than the one below or where they take over part of the namespace or if the verifier warning I was talking about in this post occasionally pops up.

Let's not waste more time on warnings and talk about how the callbacks need to be implemented and what they should do.



GenerateFileNameCallback



The generate callback should be thought of as "the function that gets called when someone asks for an opened name" (FLT_FILE_NAME_OPENED). Its purpose is to return a file name given a FILE_OBJECT and a FLT_CALLBACK_DATA structure. There are a couple of important points about this function:
  • The FLT_CALLBACK_DATA structure might be missing if the request for the name does not come from a component that is involved in an IO operation (for example, if a minifilter is also registered to receive process creation notifications and it wants to call FltGetFileNameInformation from that callback it can't because it doesn't have a FLT_CALLBACK_DATA structure; in that case its only recourse is to call FltGetFileNameInformationUnsafe). However, the minifilter writer can safely assume that either the FILE_OBJECT is opened (FILE_OBJECT->FsContext is not NULL) or there is a FLT_CALLBACK_DATA structure. It is impossible to get an unopened FILE_OBJECT outside of the IRP_MJ_CREATE path and in that path a minifilter must never call FltGetFileNameInformationUnsafe anyway. This might not strike you as very important but it matters a lot for case sensitivity (that's what you get for changing names...). The information about whether operations on a file are to be case sensitive or not is stored in a file when the file is opened (FO_OPENED_CASE_SENSITIVE). However, if the file is not yet open then the information can be found in the IRP_MJ_CREATE request, in the IO_STACK_LOCATION->Flags for legacy filters and FLT_CALLBACK_DATA->Iopb->OperationFlags for minifilters (SL_CASE_SENSITIVE).
  • The filter has the option to tell FltMgr whether the name it returns should be cached or not. This is very important because if FltMgr caches the wrong name it will run into very interesting issues that are pretty complicated to debug. I've had a lot of fun that way. On the other hand, if the minifilter always tells FltMgr not to cache the name then performance will suffer greatly. When to cache really depends on minifilter architecture so there isn't much more general advice I can give.
  • In general a minifilter should never return a name it is not prepared to handle in the future. For example, I've seen cases where a minifilter (A) was doing something like returning a name that was only valid during preCreate (a GUID that the minifilter used as a key in a hash and that was used to get a real file name below the minifilter's layer, after which the GUID was discarded). The minifilter then got in trouble when some other minifilter (B) used that name later on to open their own handle to the same file and minifilter A no longer had the GUID in its internal hash and so it had no idea what the real file was. I guess my advice is to try no to do anything too fancy here.

NormalizeNameComponentCallback



The normalize callback is called when someone asks for a normalized name. FltMgr gets a regular name (using the generate callback) and then it looks at each component and if it thinks it might be a short name it calls the name providers with the parent directory path and the name of the component it is trying to normalize. Here are some of the important things to mention:
  • This function has a lot of potential for recursion. Try calling FltGetFileNameInformation to get a normalized name in preCreate from your own name provider and you'll see what I'm talking about. Don't use a lot of stack and try to avoid recursion as much as possible. This will require quite a lot of ingenuity to work around.
  • The name of a file can be different inside a transaction and so any IO you perform must be in the context of that transaction. Possibly other operations (registry lookups ?) will need to be transacted as well.
And finally, here is the code that can be plugged into the passthrough sample to make it filter name provider requests:
 BOOLEAN  
 PtDoRequestOperationStatus(  
   __in PFLT_CALLBACK_DATA Data  
   );  
 NTSTATUS PtNormalizeNameComponentExCallback(  
   __in   PFLT_INSTANCE Instance,  
   __in_opt PFILE_OBJECT FileObject,  
   __in   PCUNICODE_STRING ParentDirectory,  
   __in   USHORT VolumeNameLength,  
   __in   PCUNICODE_STRING Component,  
   __out  PFILE_NAMES_INFORMATION ExpandComponentName,  
   __in   ULONG ExpandComponentNameLength,  
   __in   FLT_NORMALIZE_NAME_FLAGS Flags,  
   __inout PVOID *NormalizationContext  
   );  
 NTSTATUS PtNormalizeNameComponentCallback(  
   __in   PFLT_INSTANCE Instance,  
   __in   PCUNICODE_STRING ParentDirectory,  
   __in   USHORT VolumeNameLength,  
   __in   PCUNICODE_STRING Component,  
   __out  PFILE_NAMES_INFORMATION ExpandComponentName,  
   __in   ULONG ExpandComponentNameLength,  
   __in   FLT_NORMALIZE_NAME_FLAGS Flags,  
   __inout PVOID *NormalizationContext  
   );  
 NTSTATUS PtGenerateFileNameCallback(  
   __in   PFLT_INSTANCE Instance,  
   __in   PFILE_OBJECT FileObject,  
   __in_opt PFLT_CALLBACK_DATA CallbackData,  
   __in   FLT_FILE_NAME_OPTIONS NameOptions,  
   __out   PBOOLEAN CacheFileNameInformation,  
   __out   PFLT_NAME_CONTROL FileName  
   );  
 //  
 // Assign text sections for each routine.  
 //  
 #ifdef ALLOC_PRAGMA  
 #pragma alloc_text(INIT, DriverEntry)  
 ...  
 //  
 // This defines what we want to filter with FltMgr  
 //  
 CONST FLT_REGISTRATION FilterRegistration = {  
   sizeof( FLT_REGISTRATION ),     // Size  
   FLT_REGISTRATION_VERSION,      // Version  
   0,                 // Flags  
   NULL,                // Context  
   Callbacks,             // Operation callbacks  
   PtUnload,              // MiniFilterUnload  
   PtInstanceSetup,          // InstanceSetup  
   PtInstanceQueryTeardown,      // InstanceQueryTeardown  
   PtInstanceTeardownStart,      // InstanceTeardownStart  
   PtInstanceTeardownComplete,     // InstanceTeardownComplete  
   PtGenerateFileNameCallback,     // GenerateFileName  
   PtNormalizeNameComponentCallback,  // NormalizeNameComponent  
   NULL,                // NormalizeContextCleanup  
 #if FLT_MGR_LONGHORN  
   NULL,                // TransactionNotification  
   PtNormalizeNameComponentExCallback, // NormalizeNameComponentEx  
 #endif // FLT_MGR_LONGHORN  
 };  
 ...  
 NTSTATUS PtGenerateFileNameCallback(  
   __in   PFLT_INSTANCE Instance,  
   __in   PFILE_OBJECT FileObject,  
   __in_opt PFLT_CALLBACK_DATA CallbackData,  
   __in   FLT_FILE_NAME_OPTIONS NameOptions,  
   __out   PBOOLEAN CacheFileNameInformation,  
   __out   PFLT_NAME_CONTROL FileName  
   )  
 {  
   NTSTATUS status = STATUS_SUCCESS;  
   PFLT_FILE_NAME_INFORMATION belowFileName = NULL;  
   PT_DBG_PRINT( PTDBG_TRACE_ROUTINES,  
          ("PassThrough!PtGenerateFileNameCallback: Entered\n") );  
   __try {  

     //
     //  We expect to only get requests for opened and short names.
     //  If we get something else, fail. Please note that it is in
     //  fact possible that if we get a normalized name request the
     //  code would work because it's not really doing anything other 
     //  than calling FltGetFileNameInformation which would handle the
     //  normalized name request just fine. However, in a real name 
     //  provider this might require a different implementation. 
     //

     if (!FlagOn( NameOptions, FLT_FILE_NAME_OPENED ) && 
         !FlagOn( NameOptions, FLT_FILE_NAME_SHORT )) {

         ASSERT(!"we have a received a request for an unknown format. investigate!");

         return STATUS_NOT_SUPPORTED ;
     }

     //  
     // First we need to get the file name. We're going to call   
     // FltGetFileNameInformation below us to get the file name from FltMgr.   
     // However, it is possible that we're called by our own minifilter for   
     // the name so in order to avoid an infinite loop we must make sure to   
     // remove the flag that tells FltMgr to query this same minifilter.   
     //  
     ClearFlag( NameOptions, FLT_FILE_NAME_REQUEST_FROM_CURRENT_PROVIDER );  
     //  
     // this will be called for FltGetFileNameInformationUnsafe as well and  
     // in that case we don't have a CallbackData, which changes how we call   
     // into FltMgr.  
     //  
     if (CallbackData == NULL) {  
       //  
       // This must be a call from FltGetFileNameInformationUnsafe.  
       // However, in order to call FltGetFileNameInformationUnsafe the   
       // caller MUST have an open file (assert).  
       //  
       ASSERT( FileObject->FsContext != NULL );  
       status = FltGetFileNameInformationUnsafe( FileObject,  
                            Instance,  
                            NameOptions,  
                            &belowFileName );   
       if (!NT_SUCCESS(status)) {  
         __leave;  
       }                              
     } else {  
       //  
       // We have a callback data, we can just call FltMgr.  
       //  
       status = FltGetFileNameInformation( CallbackData,  
                         NameOptions,  
                         &belowFileName );   
       if (!NT_SUCCESS(status)) {  
         __leave;  
       }                              
     }  
     //  
     // At this point we have a name for the file (the opened name) that   
     // we'd like to return to the caller. We must make sure we have enough   
     // buffer to return the name or we must grow the buffer. This is easy   
     // when using the right FltMgr API.  
     //  
     status = FltCheckAndGrowNameControl( FileName, belowFileName->Name.Length );  
     if (!NT_SUCCESS(status)) {  
       __leave;  
     }  
     //  
     // There is enough buffer, copy the name from our local variable into  
     // the caller provided buffer.  
     //  
     RtlCopyUnicodeString( &FileName->Name, &belowFileName->Name );   
     //  
     // And finally tell the user they can cache this name.  
     //  
     *CacheFileNameInformation = TRUE;  
   } __finally {  
     if ( belowFileName != NULL) {  
       FltReleaseFileNameInformation( belowFileName );        
     }  
   }  
   return status;  
 }  
 NTSTATUS PtNormalizeNameComponentCallback(  
   __in   PFLT_INSTANCE Instance,  
   __in   PCUNICODE_STRING ParentDirectory,  
   __in   USHORT VolumeNameLength,  
   __in   PCUNICODE_STRING Component,  
   __out  PFILE_NAMES_INFORMATION ExpandComponentName,  
   __in   ULONG ExpandComponentNameLength,  
   __in   FLT_NORMALIZE_NAME_FLAGS Flags,  
   __inout PVOID *NormalizationContext  
   )  
 {  
   //  
   // This is just a thin wrapper over PtNormalizeNameComponentExCallback.   
   // Please note that we don't pass in a FILE_OBJECT because we don't   
   // have one.   
   //  
   return PtNormalizeNameComponentExCallback( Instance,  
                         NULL,  
                         ParentDirectory,   
                         VolumeNameLength,   
                         Component,   
                         ExpandComponentName,  
                         ExpandComponentNameLength,  
                         Flags,  
                         NormalizationContext );  
 }  
 NTSTATUS PtNormalizeNameComponentExCallback(  
   __in   PFLT_INSTANCE Instance,  
   __in_opt PFILE_OBJECT FileObject,  
   __in   PCUNICODE_STRING ParentDirectory,  
   __in   USHORT VolumeNameLength,  
   __in   PCUNICODE_STRING Component,  
   __out  PFILE_NAMES_INFORMATION ExpandComponentName,  
   __in   ULONG ExpandComponentNameLength,  
   __in   FLT_NORMALIZE_NAME_FLAGS Flags,  
   __inout PVOID *NormalizationContext  
   )  
 {  
   NTSTATUS status = STATUS_SUCCESS;  
   HANDLE parentDirHandle = NULL;  
   OBJECT_ATTRIBUTES parentDirAttributes;  
   BOOLEAN isDestinationFile;  
   BOOLEAN isCaseSensitive;  
   IO_STATUS_BLOCK ioStatus;  
 #if FLT_MGR_LONGHORN  
   IO_DRIVER_CREATE_CONTEXT driverContext;  
   PTXN_PARAMETER_BLOCK txnParameter = NULL;  
 #endif // FLT_MGR_LONGHORN  
   PT_DBG_PRINT( PTDBG_TRACE_ROUTINES,  
          ("PassThrough!PtNormalizeNameComponentExCallback: Entered\n") );  
   __try {  
     //  
     // Initialize the boolean variables. we only use the case sensitivity  
     // one but we initialize both just to point out that you can tell   
     // whether Component is a "destination" (target of a rename or hardlink  
     // creation operation).  
     //  
     isCaseSensitive = BooleanFlagOn( Flags,   
                      FLTFL_NORMALIZE_NAME_CASE_SENSITIVE );  
     isDestinationFile = BooleanFlagOn( Flags,   
                       FLTFL_NORMALIZE_NAME_DESTINATION_FILE_NAME );  
     //  
     // Open the parent directory for the component we're trying to   
     // normalize. It might need to be a case sensitive operation so we   
     // set that flag if necessary.  
     //  
     InitializeObjectAttributes( &parentDirAttributes,  
                   (PUNICODE_STRING)ParentDirectory,   
                   OBJ_KERNEL_HANDLE | (isCaseSensitive ? OBJ_CASE_INSENSITIVE : 0 ),  
                   NULL,  
                   NULL );  
 #if FLT_MGR_LONGHORN  
     //  
     // In Vista and newer this must be done in the context of the same  
     // transaction the FileObject belongs to.      
     //  
     IoInitializeDriverCreateContext( &driverContext );  
     txnParameter = IoGetTransactionParameterBlock( FileObject );  
     driverContext.TxnParameters = txnParameter;  
     status = FltCreateFileEx2( gFilterHandle,  
                   Instance,  
                   &parentDirHandle,  
                   NULL,  
                   FILE_LIST_DIRECTORY | SYNCHRONIZE,  
                   &parentDirAttributes,  
                   &ioStatus,  
                   0,  
                   FILE_ATTRIBUTE_NORMAL | FILE_ATTRIBUTE_DIRECTORY,   
                   FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,  
                   FILE_OPEN,  
                   FILE_DIRECTORY_FILE | FILE_SYNCHRONOUS_IO_NONALERT,  
                   NULL,  
                   0,  
                   IO_IGNORE_SHARE_ACCESS_CHECK,  
                   &driverContext );  
 #else // !FLT_MGR_LONGHORN  
     //  
     // preVista we don't care about transactions  
     //  
     status = FltCreateFile( gFilterHandle,  
                 Instance,  
                 &parentDirHandle,  
                 FILE_LIST_DIRECTORY | SYNCHRONIZE,  
                 &parentDirAttributes,  
                 &ioStatus,  
                 0,  
                 FILE_ATTRIBUTE_NORMAL | FILE_ATTRIBUTE_DIRECTORY,   
                 FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,  
                 FILE_OPEN,  
                 FILE_DIRECTORY_FILE | FILE_SYNCHRONOUS_IO_NONALERT,  
                 NULL,  
                 0,  
                 IO_IGNORE_SHARE_ACCESS_CHECK );  
 #endif // FLT_MGR_LONGHORN  
     if (!NT_SUCCESS(status)) {  
       __leave;  
     }  
     //  
     // Now that we have a handle to the parent directory of Component, we  
     // need to query its long name from the file system. We're going to use  
     // ZwQueryDirectoryFile because the handle we have for the directory   
     // was opened with FltCreateFile and so targeting should work just fine.  
     //  
     status = ZwQueryDirectoryFile( parentDirHandle,  
                     NULL,  
                     NULL,  
                     NULL,  
                     &ioStatus,  
                     ExpandComponentName,  
                     ExpandComponentNameLength,  
                     FileNamesInformation,  
                     TRUE,  
                     (PUNICODE_STRING)Component,  
                     TRUE );   
   } __finally {  
     if (parentDirHandle != NULL) {  
       FltClose( parentDirHandle );  
     }  
   }  
   return status;  
 }  
Update 04/12/2011: added check and assert to PtGenerateFileNameCallback.
Update 03/08/2012: see this post.

Thursday, March 24, 2011

Names in Minifilters - Using FltGetTunneledName

I'd like to start a longer set of posts on the topic of names in minifilters and using FltMgr name APIs (as always, fell free to submit suggestions and questions using the comment mechanism). This post is the first in that series and it's a topic that I've seen come up a lot. It's a pretty well documented feature but it still seems to be less known. The main articles on it (that I'm aware of) are:

This topic is very important to minifilter writers that have minifilters that use names. In particular minifilters that call FltGetFileNameInformation with FLT_FILE_NAME_NORMALIZED in preCreate (and there seem to be quite a few of those despite the various issues associated with this design) and in preSetInformation. But to explain why this is an issue we need to look at FltMgr's behavior in preCreate and during IRP_MJ_SET_INFORMATION.
As you're probably aware, FltMgr caches the names it generates every time someone asks for FltGetFileNameInformation. Because name generation is pretty expensive, a cache is a very good idea and performance does benefit greatly especially on system with multiple minifilters installed (and since Windows now ships with 2 minifilters and most people use anti-virus software which also use file system filters you can imagine that the performance benefits add up). The name cache is stored in a context associated with a stream (but it might be per FILE_OBJECT if there are multiple hardlinks to the file that the stream belongs to). However, in preCreate the stream is not yet opened. The IRP_MJ_CREATE operation must complete in the file system for the stream to be known and so  FltMgr can't use the name cache in preCreate. This has two major implications:
  1. Performance will be affected since the name will have to be generated for every single IRP_MJ_CREATE (which is a pretty big reason why one shouldn't query names in preCreate if they can avoid it).
  2. Not only can the name not be looked up in the cache, but it also can't be stored in the cache. So now consider the case of 3 minifilters that all query the name in preCreate. If FltMgr didn’t cache the name at all it would have to build it three times. FltMgr does what it can in this case and it caches the name in an internal cache associated with the IRP_MJ_CREATE operation. Then, once the operation completes, the cache is transparently moved to the stream cache.

One other thing to consider is that the name generated in preCreate, even if the caller asks for FLT_FILE_NAME_NORMALIZED, might still contain a short name. This can happen when the IRP_MJ_CREATE is trying to create a new file (that doesn't exist yet) specifying only the short name. FltMgr tries to normalize the name and it gets a normalized path for the parent directory of that file, but the file itself is not in the directory and there is nothing FltMgr can do except return the short name to the caller for that file (the final component). This is in fact where the Name Tunneling Cache comes into play. In this case when the created reaches the file system, if it finds the short file name in the cache it will create the file with the long name from the cache as well. But as we've explained before the minifilter might have a normalized name from the preCreate which no longer matches the file.
In order to show when this happens (and to show some code using this function), I wrote a small modification on top of the passthrough minifilter sample from the WDK:

 FLT_PREOP_CALLBACK_STATUS  
 PtPreOperationPassThrough (  
   __inout PFLT_CALLBACK_DATA Data,  
   __in PCFLT_RELATED_OBJECTS FltObjects,  
   __deref_out_opt PVOID *CompletionContext  
   )  
 /*++  
 Routine Description:  
   This routine is the main pre-operation dispatch routine for this  
   miniFilter. Since this is just a simple passThrough miniFilter it  
   does not do anything with the callbackData but rather return  
   FLT_PREOP_SUCCESS_WITH_CALLBACK thereby passing it down to the next  
   miniFilter in the chain.  
   This is non-pageable because it could be called on the paging path  
 Arguments:  
   Data - Pointer to the filter callbackData that is passed to us.  
   FltObjects - Pointer to the FLT_RELATED_OBJECTS data structure containing  
     opaque handles to this filter, instance, its associated volume and  
     file object.  
   CompletionContext - The context for the completion routine for this  
     operation.  
 Return Value:  
   The return value is the status of the operation.  
 --*/  
 {  
   NTSTATUS status;  
   PFLT_FILE_NAME_INFORMATION name;  
   PT_DBG_PRINT( PTDBG_TRACE_ROUTINES,  
          ("PassThrough!PtPreOperationPassThrough: Entered\n") );  
   if (Data->Iopb->MajorFunction == IRP_MJ_CREATE) {  
     //  
     // this is a preCreate, get the name.  
     //  
     status = FltGetFileNameInformation( Data,   
                       FLT_FILE_NAME_NORMALIZED | FLT_FILE_NAME_QUERY_DEFAULT,   
                       &name );  
     if (NT_SUCCESS(status)) {  
       //  
       // send it to postCreate  
       //  
       *CompletionContext = (PVOID) name;  
     } else {  
       PT_DBG_PRINT( PTDBG_TRACE_FILE_NAME_FAILURES,  
              ("PassThrough!PtPreOperationPassThrough: Failed to get file name, status=%08x\n",  
               status) );  
     }  
   }  
   //  
   // See if this is an operation we would like the operation status  
   // for. If so request it.  
   //  
   // NOTE: most filters do NOT need to do this. You only need to make  
   //    this call if, for example, you need to know if the oplock was  
   //    actually granted.  
   //  
   if (PtDoRequestOperationStatus( Data )) {  
     status = FltRequestOperationStatusCallback( Data,  
                           PtOperationStatusCallback,  
                           (PVOID)(++OperationStatusCtx) );  
     if (!NT_SUCCESS(status)) {  
       PT_DBG_PRINT( PTDBG_TRACE_OPERATION_STATUS,  
              ("PassThrough!PtPreOperationPassThrough: FltRequestOperationStatusCallback Failed, status=%08x\n",  
               status) );  
     }  
   }  
   return FLT_PREOP_SUCCESS_WITH_CALLBACK;  
 }  
 FLT_POSTOP_CALLBACK_STATUS  
 PtPostOperationPassThrough (  
   __inout PFLT_CALLBACK_DATA Data,  
   __in PCFLT_RELATED_OBJECTS FltObjects,  
   __in_opt PVOID CompletionContext,  
   __in FLT_POST_OPERATION_FLAGS Flags  
   )  
 /*++  
 Routine Description:  
   This routine is the post-operation completion routine for this  
   miniFilter.  
   This is non-pageable because it may be called at DPC level.  
 Arguments:  
   Data - Pointer to the filter callbackData that is passed to us.  
   FltObjects - Pointer to the FLT_RELATED_OBJECTS data structure containing  
     opaque handles to this filter, instance, its associated volume and  
     file object.  
   CompletionContext - The completion context set in the pre-operation routine.  
   Flags - Denotes whether the completion is successful or is being drained.  
 Return Value:  
   The return value is the status of the operation.  
 --*/  
 {  
   NTSTATUS status;  
   PFLT_FILE_NAME_INFORMATION name = NULL;  
   PFLT_FILE_NAME_INFORMATION realName = NULL;  
   UNREFERENCED_PARAMETER( FltObjects );  
   UNREFERENCED_PARAMETER( Flags );  
   PT_DBG_PRINT( PTDBG_TRACE_ROUTINES,  
          ("PassThrough!PtPostOperationPassThrough: Entered\n") );  
   name = (PFLT_FILE_NAME_INFORMATION)CompletionContext;  
   if (name != NULL) {  
     //  
     // we got a name from preCreate. check if it's a tunneled name.  
     //  
     status = FltGetTunneledName( Data,   
                    name,  
                    &realName );  
     if (NT_SUCCESS(status)) {  
       //  
       // see if we actually got a tunneled name.  
       //  
       if (realName != NULL) {  
         DbgPrint( "Got a tunneled name: File:%p, original name: %wZ, real name:%wZ\n",  
              FltObjects->FileObject,  
              &name->Name,  
              &realName->Name );  
       }  
     }  
     FltReleaseFileNameInformation( name );  
     if (realName != NULL) {  
       FltReleaseFileNameInformation( realName );  
     }  
   }  
   return FLT_POSTOP_FINISHED_PROCESSING;  
 }  
To recap, you need to care about this only if your minifilter queries normalized names in preCreate or preSetInformation and then uses them in the postOp callback. Also, you might need to worry about this if you keep a normalized name in a context. However, this is only a concern for renames (you can't use a context in preCreate) and for minifilters that keep a name in a context the easiest way is to update it in postRename and not do anything in preRename.

Thursday, March 17, 2011

How File System Filters Attach to Volumes - Part II

And now it's time to take a look at how minifilters fit in the picture. One of the first things to note is that the minifilter model does away completely with CDOs. The only objects that a minifilter interacts with are FLT_VOLUMEs, which are equivalent to VDOs. This is not normally a problem because most operations that are sent to the CDO are not really relevant to file system filters. However, IRP_MN_MOUNT_VOLUME is the one operation minifilters might be interested in, because some minifilters might want to block mounting of a volume. This is where IRP_MJ_VOLUME_MOUNT comes in. This is a virtual IRP (there isn't such an IRP function in the IO manager) that is only relevant to minifilters. When an IRP_MN_MOUNT_VOLUME is received by FltMgr on the CDO for a file system, it will create a request with the type IRP_MJ_VOLUME_MOUNT and send it to minifilters that have registered for that notification. Please note that there are some limitations about what the minifilter can do at this time, the intention is that a minifilter uses this notifcation only if it wants to block a volume mount and it must figure out whether this volume needs to be blocked or not without much help from the file system. Blocking a mount means that the volume will not be mounted at all on the system and is not what is used by  minifilter to tell FltMgr that it doesn't want to be attached to that volume (which is handled in the InstanceSetupCallback). Here are some things that are special about this operation:
  • The minifilter sees this request before the volume is mounted by the file system (in fact at this point the file system doesn't know anything about the volume, the IRP_MN_MOUNT_VOLUME would be the first notification the file system receives about that volume), so it really can't do any file system operation on that volume.
  • Only the Filter and the Volume members of the FLT_OBJECTS structure are set up.
  • The IO manager has locks held at this point and any file system IO to the same volume or even a different volume that might not be mounted at this time might deadlock. Block level IO to the volume should work though.
  • This is listed as a FAST_IO operation, though in fact it's not. Don't return FLT_PREOP_DISALLOW_FASTIO to it. Don't return FLT_PREOP_PENDING either. You must either return FLT_PREOP_SUCCESS_WITH_CALLBACK, FLT_PREOP_SUCCESS_NO_CALLBACK or FLT_PREOP_COMPLETE (if you want to prevent the mount from reaching the file system CDO, in which case the status must not be STATUS_SUCCESS).
This is what it looks like when the request reaches a minifilter (the passthrough sample in my case). Things to note are the fields in FltObjects and the fact that the volume isn't initialized yet (I've highlighted them):
1: kd> kn
 # ChildEBP RetAddr  
00 9960b8c4 9604319a PassThrough!PtPreOperationPassThrough+0x3c [c:\temp\passthrough\passthrough.c @ 675]
01 9960b930 960489ec fltmgr!FltpPerformPreMountCallbacks+0x1d0
02 9960b998 96048c5b fltmgr!FltpFsControlMountVolume+0x116
03 9960b9c8 828454bc fltmgr!FltpFsControl+0x5b
04 9960b9e0 829c102d nt!IofCallDriver+0x63
05 9960ba44 828a5424 nt!IopMountVolume+0x1d8
06 9960ba7c 82a48f9f nt!IopCheckVpbMounted+0x64
07 9960bb60 82a2a26b nt!IopParseDevice+0x7c9
08 9960bbdc 82a502d9 nt!ObpLookupObjectName+0x4fa
09 9960bc38 82a4862b nt!ObOpenObjectByName+0x165
0a 9960bcb4 82a53f42 nt!IopCreateFile+0x673
0b 9960bd00 8284c44a nt!NtCreateFile+0x34
0c 9960bd00 778464f4 nt!KiFastCallEntry+0x12a
1: kd> ?? FltObjects
struct _FLT_RELATED_OBJECTS * 0x9960b8e8
   +0x000 Size             : 0x18
   +0x002 TransactionContext : 0
   +0x004 Filter           : 0x9299d678 _FLT_FILTER
   +0x008 Volume           : 0x9297fad8 _FLT_VOLUME
   +0x00c Instance         : (null) 
   +0x010 FileObject       : (null) 
   +0x014 Transaction      : (null) 
1: kd> !fltkd.volume 0x9297fad8 
FLT_VOLUME: 9297fad8 "\Device\Harddisk0\DR0"
   FLT_OBJECT: 9297fad8  [04000000] Volume
      RundownRef               : 0x00000002 (1)
      PointerCount             : 0x00000001 
      PrimaryLink              : [92cdaa74-92cdaa74] 
   Frame                    : 92cda9c8 "Frame 0" 
   Flags                    : [00000008] Mounting
   FileSystemType           : [00000001] FLT_FSTYPE_RAW
   VolumeLink               : [92cdaa74-92cdaa74] 
   DeviceObject             : 926b16d8 
   DiskDeviceObject         : 92f036e8 
   FrameZeroVolume          : 00000000 
   VolumeInNextFrame        : 00000000 
   Guid                     : "" 
   CDODeviceName            : "\Device\RawDisk" 
   CDODriverName            : "\FileSystem\RAW" 
   TargetedOpenCount        : 0 
   Callbacks                : (9297fb6c)
   ContextLock              : (9297fdc4)
   VolumeContexts           : (9297fdc8)  Count=0
   StreamListCtrls          : (9297fdcc)  rCount=0 
   FileListCtrls            : (9297fe10)  rCount=0 
   NameCacheCtrl            : (9297fe58)
   InstanceList             : (9297fb28)
The next thing to talk about is the InstanceSetupCallback. This is the callback that gets called when a new instance gets created. This callback allows the minifilter to decide whether it needs to attach to the volume and to set up its internal state for that volume. The interesting thing about this is understanding when it gets called. One important factor in the decision is that the minifilter needs to be able to perform operations on the file system in its InstanceSetupCallback (like opening a file for example; see the MetadataManager minifilter sample in the WDK for a minifilter that does that). Let's look at some of the factors that would impact the decision about when the notification needs to be called:
  • The IO manager needs to be able to process operations on the volume. This means that the mount must be completed, because FltCreateFile (and most other operations) would go to the IO manager and if the IO manager doesn't know that the mount is completed it will block the operation behind the mount. So clearly, the InstanceSetupCallback could not have been called anywhere during IRP_MN_MOUNT_VOLUME processing.
  • The InstanceSetupCallback must be called before ANY other callback is sent to the minfilter on that volume, which means it can't be asynchronous with other operations because for asynchronous operations, the order in which they reach various layers is impossible to guarantee. So all operations above a certain layer must be blocked until InstanceSetupCallback for that layer is completed.
  • Because minifilters need to be able to perform IO on the file system in their InstanceSetupCallback, the filters below them need to see that operation (since minifilters should be able to filter all operations on a volume). This means that the InstanceSetupCallback must have already been called for all the minifilters below (otherwise they wouldn't be able to process operations).
So when considering all these factors we arrive at the current implementation. When any operation reaches FltMgr on a volume, if InstanceSetupCallbacks have not been called yet, FltMgr will block that operation and call the InstanceSetupCallbacks for all the minifilters, starting from the lowest one and going up (where up means higher altitude numbers). After all the InstanceSetupCallbacks are complete FltMgr will release the lock and IO can proceed normally. Let's take a look in the debugger and see these effects. Things to note are how we're still in the context of the NtCreateFile request where we were before. Also, please note that the FileInfo minifilter is below PassThrough and it's already set up:
1: kd> kn
 # ChildEBP RetAddr  
00 9960b880 96049bf5 PassThrough!PtInstanceSetup [c:\temp\passthrough\passthrough.c @ 393]
01 9960b8b4 9604a417 fltmgr!FltpDoInstanceSetupNotification+0x69
02 9960b900 9604a7d1 fltmgr!FltpInitInstance+0x25d
03 9960b970 9604a8d7 fltmgr!FltpCreateInstanceFromName+0x285
04 9960b9dc 96053cde fltmgr!FltpEnumerateRegistryInstances+0xf9
05 9960ba2c 960487f4 fltmgr!FltpDoFilterNotificationForNewVolume+0xe0
06 9960ba70 828454bc fltmgr!FltpCreate+0x206
07 9960ba88 82a496ad nt!IofCallDriver+0x63
08 9960bb60 82a2a26b nt!IopParseDevice+0xed7
09 9960bbdc 82a502d9 nt!ObpLookupObjectName+0x4fa
0a 9960bc38 82a4862b nt!ObOpenObjectByName+0x165
0b 9960bcb4 82a53f42 nt!IopCreateFile+0x673
0c 9960bd00 8284c44a nt!NtCreateFile+0x34
1: kd> !fltkd.volume 0x9297fad8 
FLT_VOLUME: 9297fad8 "\Device\Harddisk0\DR0"
   FLT_OBJECT: 9297fad8  [04000000] Volume
      RundownRef               : 0x00000006 (3)
      PointerCount             : 0x00000001 
      PrimaryLink              : [92cdaa68-924ea7f4] 
   Frame                    : 92cda9c8 "Frame 0" 
   Flags                    : [00000066] PendingSetupNotify SetupNotifyCalled EnableNameCaching FilterAttached
   FileSystemType           : [00000001] FLT_FSTYPE_RAW
   VolumeLink               : [92cdaa68-924ea7f4] 
   DeviceObject             : 926b16d8 
   DiskDeviceObject         : 92f036e8 
   FrameZeroVolume          : 9297fad8 
   VolumeInNextFrame        : 00000000 
   Guid                     : "" 
   CDODeviceName            : "\Device\RawDisk" 
   CDODriverName            : "\FileSystem\RAW" 
   TargetedOpenCount        : 0 
   Callbacks                : (9297fb6c)
   ContextLock              : (9297fdc4)
   VolumeContexts           : (9297fdc8)  Count=0
   StreamListCtrls          : (9297fdcc)  rCount=0 
   FileListCtrls            : (9297fe10)  rCount=0 
   NameCacheCtrl            : (9297fe58)
   InstanceList             : (9297fb28)
      FLT_INSTANCE: 92979b40 "PassThrough Instance" "370030"
      FLT_INSTANCE: 923f2dc8 "FileInfo" "45000"
Before we go on I'd like to recap the sequence of events during a mount:
  1. A request to open a file (that ultimately arrives in IopCreateFile) is sent to a volume that is not mounted yet.
  2. During the IopParseDevice call IO manager discovers that the volume isn't mounted so it tries to mount it. It does this behind a lock so multiple requests would be queued here until the mount completes.
  3. the IO manager sends the IRP_MJ_FILE_SYSTEM_CONTROL with IRP_MN_MOUNT_VOLUME request to the CDO of each file system.
  4. FltMgr is attached to each CDO and so it gets this request and sends the IRP_MJ_VOLUME_MOUNT request to minifilters.
  5. If no minifilters blocked the request, FltMgr sends the request to the FS CDO below.
  6. The FS creates a VDO and the volume is mounted.
  7. The IO manager knows the volume is mounted and it releases all the operations blocked behind that volume mount.
  8. The FltMgr gets all these operations and it discovers that the topmost instance on the volume hasn't been initialized yet, so it block all the operations behind a lock again.
  9. It then calls InstanceSetupCallback for the lowest minifilter in the stack, then for the one above it, and then for the one above that one and so on.. Please note that this notification happens in the context of whichever thread happens to win the race of the lock, so if there is more than one thread trying to perform an operation on a volume, it's possible that the IRP_MJ_VOLUME_MOUNT callback is called in the context of one thread and InstanceSetupCallback in a context of a different one.
  10. Once all the instances have been set up, FltMgr allows all operations to continue and the initialization of the volume is now complete.
Finally I'd like to talk about a couple of deadlocks that I've seen and some design decisions to avoid.
  • One interesting deadlock happened with a minifilter that blocked preCreate and called a user mode service to scan the file (like an anti-virus). When another minifilter above that one tried to create a file in its InstanceSetupCallback (it actually was the MetadataManager sample), this minifilter blocked that create and sent it to the user mode service. The user mode services tried to open the file to scan it but it was blocked in FltMgr because the instance setup phase wasn't complete so all top level IO was blocked. Alternative approaches that would have avoided that deadlock would have been to scan in postCreate (which is what most such filters do) or to use a private communication channel with the user mode service to insure that all IO issued by the user mode service is layered properly.
  • Another interesting deadlock can happen with the registry. As you can tell from the stack above, FltMgr needs to read minifilter configuration information from the registry (see that call to fltmgr!FltpEnumerateRegistryInstances). However, the registry has some very complicated locking rules and so if it happens that the registry is locked when FltMgr needs to read its configuration, FltMgr will wait for it. In one case I've seen, a driver (not a minifilter) was calling ZwLoadKey() for a file on a different volume from the system volume. The volume wasn't mounted so inside the ZwLoadKey() call the registry would acquire a lock, try to open the file, which resulted in a mount and then FltMgr tried to check the registry for any minifilter instances and it got blocked behind the registry lock. One possible solution in this case would be make sure that the volume is mounted before calling ZwLoadKey(). Please note that this might happen in many cases, any operation that ties a registry operation with a file system operation can potentially deadlock. For example, a registry filter that tries to log operation to a file might also cause the same deadlock if the volume containing the log file hasn't arrived yet.
  • Another pretty well known deadlock happens in InstanceSetup with the MountMgr. It is described in great detail here http://www.osronline.com/showThread.cfm?link=90003, so I won't do it. This should be fixed in Vista and Win7.
I hope this post has been useful in explaining how instances get created on a volume and how they might sometimes deadlock and what to look for when such deadlocks occur.

Thursday, March 10, 2011

How File System Filters Attach to Volumes - Part I

I want to talk a bit about how FltMgr attaches to volumes and how instances are created when a new volume arrives. I want to use that as the basis to talk about what minifilters can do in their InstanceSetup callback. This should also explain some possible deadlocks in that path and emphasize the point that doing things in postCreate is preferable to preCreate. I also want to talk about IRP_MJ_VOLUME_MOUNT and how it works and why it's there. I was going to write just one post but it's too long already and I'm not done so I'll split it in a couple of posts...

I'll start with a refresher on how file systems mount volumes and how legacy file system filters attach to file systems. When a file system driver is initialized it creates what is called a Control Device Object (CDO). It can create more than one of those (look at the FastFat WDK sample for an example of a file system creating more than a CDO). The reason the file system needs to do that is that it must register a device with the IO manager when it tells it is a file system (by calling IoRegisterFileSystem and passing in the CDO(s)). Please note that this mechanism predates PNP and as you can see it is very different. These CDOs are named device objects and their purpose is to receive commands for the file system. One such command is the IRP_MJ_FILE_SYSTEM_CONTROL with the IRP_MN_MOUNT_VOLUME minor code (which I'll just refer to as IRP_MN_MOUNT_VOLUME from now on since IRP_MN_MOUNT_VOLUME is only delivered through an IRP_MJ_FILE_SYSTEM_CONTROL and there is no possibility of confusion), which is sent by the IO manager when it wants to mount a volume. One possible sequence of operations is this:

  1. Volume DEVICE_OBJECT is created, usually by the volume manager, with a name like "\Device\HarddiskVolume2".
  2. The volume manager alerts the system of the arrival of the volume by calling IoRegisterDeviceInterface() with the MOUNTDEV_MOUNTED_DEVICE_GUID or GUID_DEVINTERFACE_VOLUME (which are in fact the same GUID). This alerts MountMgr that a volume has arrived.
  3. MountMgr queries the volume for the name and sets up the NT volume name (which looks like "\\?\Volume{4c1b02c1-d990-11dc-99ae-806e6f6e6963}") and the DOS volume name (which might look like "C:"). Both these names point to the volume device ("\Device\HarddiskVolume2").
  4. At this point the volume is not mounted and it has a VPB structure associated with it that keeps track of that.
  5. After a while someone issues an operation to the volume (like trying to open "C:\foo.txt", or "\\?\Volume{4c1b02c1-d990-11dc-99ae-806e6f6e6963}\foo.txt" or "\Device\HarddiskVolume2\foo.txt", which are different names for the same thing). While trying to issue the IRP_MJ_CREATE, IO manager will check if the volume is mounted and if not it will mount it (nt!IopCheckVpbMounted). See my post "About IRP_MJ_CREATE and minifilter design considerations - Part II" and look at step 2 in my steps for nt!IopParseDevice.
  6. If the volume is not mounted in nt!IopCheckVpbMounted then IO mgr calls nt!IopMountVolume which walks through the registered file systems for that device type (hence the need for more than one CDO) and sends the IRP_MN_MOUNT_VOLUME request to each of devices on the list of registered file systems (which is a list of CDOs).
  7. When a file system receives an IRP_MN_MOUNT_VOLUME it checks whether it can mount the file system (reads some sectors and does whatever it needs to do to figure it is it's volume) and then it creates a new DEVICE_OBJECT (anonymous this time) which is called a Volume Device Object (VDO), which is linked through the VPB to the actual volume DEVICE_OBJECT (the one that has a name and a drive letter).
  8. Once nt!IopCheckVpbMounted completes and a volume is mounted nt!IopParseDevice continues and an IRP_MJ_CREATE is sent to the newly mounted volume, which is the first operation that the file system processed on that VDO.
Another way to look at this is that the CDO device functions as a factory for file system instances, and the IRP_MN_MOUNT_VOLUME is a request for the factory to generate an instance associated with the storage volume DEVICE_OBJECT, which will either fail if the file system doesn't recognize the volume or will return the file system VDO, which is the file system instance for that volume. Here is some debugger output to illustrate all this. In order to generate all this I took a 32bit Win7 and rebooted it and put a breakpoint on nt!IopMountVolume (that's why NTFS has no volumes and just a CDO). I'm showing mainly to showcase some more windbg commands that are useful when debugging file systems:
This is NTFS initialized, with just one DEVICE_OBJECT, the CDO. Also please note how the CDO is a named device:
0: kd> !drvobj NTFS
Driver object (924d5758) is for:
 \FileSystem\Ntfs
Driver Extension List: (id , addr)

Device Object list:
93215638  
0: kd> !devobj 93215638  
Device object (93215638) is for:
 Ntfs \FileSystem\Ntfs DriverObject 924d5758
Current Irp 00000000 RefCount 1 Type 00000008 Flags 00000040
Dacl 973af50c DevExt 00000000 DevObjExt 932156f0 
ExtensionFlags (0x00000800)  
                             Unknown flags 0x00000800
AttachedDevice (Upper) 93211020 \FileSystem\FltMgr
Device queue is not busy.
This is what the stack looks like when IopMountVolume is called. Please note that volsnap is opening a file on a volume. Also, note how the DeviceObject member of the VPB is null (since no file system is mounted on the volume), and the VPB flags are also all clear:
0: kd> kb
ChildEBP RetAddr  Args to Child              
984b18cc 828ad424 934cd768 924d7a00 00000000 nt!IopMountVolume
984b1904 82a50f9f 924d7a48 984b1a30 984b19c8 nt!IopCheckVpbMounted+0x64
984b19e8 82a3226b 934cd768 844d6f78 924f55e8 nt!IopParseDevice+0x7c9
984b1a64 82a582d9 00000000 984b1ab8 00000240 nt!ObpLookupObjectName+0x4fa
984b1ac4 82a5062b 984b1c44 924d6f78 93b15900 nt!ObOpenObjectByName+0x165
984b1b40 82a8b67e 984b1c90 00120089 984b1c44 nt!IopCreateFile+0x673
984b1b88 8285444a 984b1c90 00120089 984b1c44 nt!NtOpenFile+0x2a
984b1b88 828527c1 984b1c90 00120089 984b1c44 nt!KiFastCallEntry+0x12a
984b1c18 969b0414 984b1c90 00120089 984b1c44 nt!ZwOpenFile+0x11
984b1c94 969b9194 934d60d8 00000000 00000000 volsnap!VspOpenControlBlockFile+0x108
984b1d1c 969b9eea 934d60d8 935775ac 934c78bc volsnap!VspOpenFilesAndValidateSnapshots+0x2e
984b1d34 969a5e59 935775a8 00000000 93500020 volsnap!VspSetIgnorableBlocksInBitmapWorker+0x40
984b1d50 82a1f6d3 934c79ac 432d39b1 00000000 volsnap!VspWorkerThread+0x83
984b1d90 828d10f9 969a5dd6 934cd6a0 00000000 nt!PspSystemThreadStartup+0x9e
00000000 00000000 00000000 00000000 00000000 nt!KiThreadStartup+0x19
0: kd> !obja 984b1c44 
Obja +984b1c44 at 984b1c44:
 Name is \Device\HarddiskVolume2\System Volume Information\{3808876b-c176-4e48-b7ae-04046e6cc752}
 OBJ_CASE_INSENSITIVE
0: kd> !devobj \Device\HarddiskVolume2
Device object (934cd768) is for:
 HarddiskVolume2 \Driver\volmgr DriverObject 93b00388
Current Irp 00000000 RefCount 1 Type 00000007 Flags 00003150
Vpb 934cb290 Dacl 973af50c DevExt 934cd820 DevObjExt 934cd908 Dope 934cab20 DevNode 934cfc48 
ExtensionFlags (0x00000800)  
                             Unknown flags 0x00000800
AttachedDevice (Upper) 934d0b70 \Driver\fvevol
Device queue is not busy.
0: kd> !vpb 934cb290 
Vpb at 0x934cb290
Flags: 0x0 
DeviceObject: 0x00000000
RealDevice:   0x934cd768
RefCount: 0
Volume Label: 
Next thing we're going to step out of this function and look at the objects again. There is a new, anonymous DEVICE_OBJECT that NTFS created, which is pointed by VPB->DeviceObject and the VPB flags have changed to indicate that the volume is mounted.
1: kd> gu
nt!IopCheckVpbMounted+0x64:
828ad424 8b4d10          mov     ecx,dword ptr [ebp+10h]
0: kd> gu
nt!IopParseDevice+0x7c9:
82a50f9f 8945c4          mov     dword ptr [ebp-3Ch],eax
0: kd> !drvobj NTFS
Driver object (924d5758) is for:
 \FileSystem\Ntfs
Driver Extension List: (id , addr)

Device Object list:
93690020  93215638  
0: kd> !devobj 93690020  
Device object (93690020) is for:
  \FileSystem\Ntfs DriverObject 924d5758
Current Irp 00000000 RefCount 0 Type 00000008 Flags 00040000
DevExt 936900d8 DevObjExt 93690fb0 
ExtensionFlags (0x00000800)  
                             Unknown flags 0x00000800
AttachedDevice (Upper) 93566c08 \FileSystem\FltMgr
Device queue is not busy.
0: kd> !vpb 934cb290 
Vpb at 0x934cb290
Flags: 0x1 mounted 
DeviceObject: 0x93690020
RealDevice:   0x934cd768
RefCount: 15
Volume Label: 
Filters have largely been out of the picture so far (except for the fact that FltMgr was attached both to NTFS' CDO and the newly created VDO). So let's talk about how legacy filters (FltMgr being a legacy filter) enter this picture. When NTFS calls IoRegisterFileSystem, FltMgr creates and attaches a DEVICE_OBJECT of its own on top of NTFS. So FltMgr will have a device attached to all CDOs. Then, when an IRP_MN_MOUNT_VOLUME request arrives on that CDO, FltMgr creates a new DEVICE_OBJECT (that will be attached to the VDO created by the file system if the mount is successful or discarded if the mount is not successful) and then it simply passes the IRP_MN_MOUNT_VOLUME request below. Please note that FltMgr can't know in advance if the file system will actually mount the volume or not, so it must wait until the IRP_MN_MOUNT_VOLUME is completed to do more significant work. However, if it waited for the completion of IRP_MN_MOUNT_VOLUME before allocating the new DEVICE_OBJECT, it might end up in the position where the mount was successful but allocating the new DEVICE_OBJECT failed so it wouldn't be able to attach to the volume. The only reason I'm mentioning this is to illustrate that the safe approach when filtering something is to pre-allocate all resources that might be necessary (and perform all checks) before the operation is sent to the layer below (and if anything fails then fail the operation), because if the layer below successfully completes the operation the filter must not fail in processing it or it might end up in a broken state. Alternatively it might have to undo the operation performed at the underlying layer, which might not be easy or even possible.
The key things to remember from this post are:
  • The drive letter (DOS name) and other volume names (NT name) are not associated with the file system device, but rather with the storage volume.
  • Mounting the volume happens on first access to that volume.
  • Also, the first IO on a volume is an IRP_MJ_CREATE, so for a filter (both legacy and minifilter) the preCreate callback will be the first operation callback called on a newly mounted file system volume.

Thursday, March 3, 2011

Duplicating User Mode Handles

Among the many new verifier checks in Win7 is a particular one about using user handles in kernel mode. I won't go into the details of why that is potentially bad and instead I'll focus on how to work around the issue. However I'd like to point to some documentation explaining the improvements in Driver Verifier in Win7 in general and this check in particular. There is a PPT "Driver Verifier Advancements In Windows 7" that is pretty good for a high-level view and there is also a more detailed document "Driver Verifier in Windows 7". The verifier bugcheck message in this case is this:

DRIVER_VERIFIER_DETECTED_VIOLATION (c4)
A device driver attempting to corrupt the system has been caught.  This is
because the driver was specified in the registry as being suspect (by the
administrator) and the kernel has enabled substantial checking of this driver.
If the driver attempts to corrupt the system, bugchecks 0xC4, 0xC1 and 0xA will
be among the most commonly seen crashes.
Arguments:
Arg1: 000000f6, Referencing user handle as KernelMode.
Arg2: xxxxxxxx, Handle value being referenced.
Arg3: xxxxxxxx, Address of the current process.
Arg4: xxxxxxxx, Address inside the driver that is performing the incorrect reference.
Before we go into more detail I'd also like to explain what I was trying to do in one case where I ran into this issue. I was writing a driver that had a user mode command line utility that was used to send commands to the driver. One of these commands required sending a user mode file to the driver so it could write logging information into that file. One approach could be to pass in the name of the file and use the driver to open the file, but this is not trivial for various reasons:

  • How to get the file name? The user might call the command line utility with a relative path (like "foo.exe -file ..\bar.txt") and so I need to figure out either the full path or to send the current directory path to the driver (yuck!).
  • Even using a full path wouldn't be enough because the drive letter might be different depending on the session the user is in. Besides, who knows what a path really points to ? Some followers of this blog might know how much I dislike file names and how I try to avoid them.
  • The user might not actually have access to write to that file but the kernel would so I would have to impersonate the user before trying to create the file.
So anyway my decision was to open the file in user mode and then call the driver and tell it the handle to the file and let the driver figure out what the object is and how to use it. However, what I wanted was to use the ZwWriteFile API to write to the file and so I needed a handle to that object. Looking at the OB APIs it's easy to see that we could simply call ObReferenceObjectByHandle followed by ObOpenObjectByPointer to create the new kernel handle. This is what my code looked like:
        status = ObReferenceObjectByHandle( ioctlBuffer->UserHandle,
                                            FILE_READ_DATA | FILE_WRITE_DATA | SYNCHRONIZE | STANDARD_RIGHTS_READ | FILE_READ_ATTRIBUTES,
                                            *IoFileObjectType,
                                            UserMode,
                                            &userFileObject,
                                            NULL );
 
        if (!NT_SUCCESS(status)) {
 
            __leave;
        }
 
        ASSERT(FlagOn( userFileObject->Flags, FO_HANDLE_CREATED ) && 
                   !FlagOn( userFileObject->Flags, FO_CLEANUP_COMPLETE ));
 
        status = ObOpenObjectByPointer( userFileObject,
                                        OBJ_KERNEL_HANDLE,
                                        NULL,
                                        0,//FILE_READ_DATA | FILE_WRITE_DATA,
                                        *IoFileObjectType,
                                        KernelMode,
                                        &kernelFileHandle);
This worked pretty well for local files but when I tried to open a file that was on a remote file system it failed to open the kernel handle with STATUS_ACCESS_DENIED . I spent some time tracing through the code and what I found was that ObOpenObjectByPointer in this case always ends up sending an IRP_MJ_QUERY_SECURITY request to the file system. Moreover, this request seemed to always ask for all the security information (which you can see if you disassemble nt!ObpGetObjectSecurity and look at how it sets the SecurityInformation; on my Win7 it's a "mov dword ptr [xxx],1Fh"):
#define OWNER_SECURITY_INFORMATION       (0x00000001L)
#define GROUP_SECURITY_INFORMATION       (0x00000002L)
#define DACL_SECURITY_INFORMATION        (0x00000004L)
#define SACL_SECURITY_INFORMATION        (0x00000008L)
#define LABEL_SECURITY_INFORMATION       (0x00000010L)
 
3: kd> dt 0xfffff9801a3d8fb8 nt!_IO_STACK_LOCATION Parameters.QuerySecurity.
   +0x008 Parameters                : 
      +0x000 QuerySecurity             : 
         +0x000 SecurityInformation       : 0x1f
         +0x008 Length                    : 0x100
However, while this works well on the local system, it almost always fails over SMB. Looking at the page "2.2.1.3 SECURITY_INFORMATION", there is this table that describes what the caller needs in order to be able to read various information types. For SACL_SECURITY_INFORMATION we see that in fact READ_CONTROL is not enough and that a certain privilege is required. This privilege is unlikely to be granted to any client of the server and so in the general case the IRP_MJ_QUERY_SECURITY issued by nt!ObpGetObjectSecurity will fail with STATUS_ACCESS_DENIED.

Security information access requested
Rights required of caller on server
Privileges required of caller on server
OWNER_SECURITY_INFORMATION
READ_CONTROL
Does not apply.
GROUP_SECURITY_INFORMATION
READ_CONTROL
Does not apply.
DACL_SECURITY_INFORMATION
READ_CONTROL
Does not apply.
SACL_SECURITY_INFORMATION
Does not apply.
Security privilege.

So now since this approach was out of the picture, I needed something else. Unfortunately, I have been unable to figure out a documented way to achieve this (pretty much anything I tried called nt!ObpGetObjectSecurity at some point). However, there is one undocumented function that actually does exactly what I wanted, ZwDuplicateObject. So now my code looks something like this:
    //
    // duplicate the handle... first get a handle to the system process
    // so we can call ZwDuplicateObject on it.
    //

    status = ObOpenObjectByPointer( PsInitialSystemProcess,
                                    OBJ_KERNEL_HANDLE,
                                    NULL,
                                    STANDARD_RIGHTS_READ,
                                    NULL,
                                    KernelMode,
                                    &systemProcessHandle );

    
    if (!NT_SUCCESS(status)) {

        return status;
    }

    status = ZwDuplicateObject( NtCurrentProcess(),
                                ioctlBuffer->UserHandle,
                                systemProcessHandle,
                                &kernelFileHandle,
                                FILE_READ_DATA | FILE_WRITE_DATA,
                                OBJ_KERNEL_HANDLE,
                                DUPLICATE_SAME_ATTRIBUTES | DUPLICATE_SAME_ACCESS );

This approach works because when using DUPLICATE_SAME_ACCESS ZwDuplicateObject() doesn't actually try to validate the access. This works fine in cases like the one I described where I didn't want any more rights than the user had. However, if the driver needs more (or maybe just different) access to the object then this function will also perform access checks.
Another possible approach would have been to open the file again in the driver and create a new handle, which would work because the security privilege isn't necessary to open a file on a remote server. However in this case I would have had to make sure that the user had the right type of access to the file. There is also the performance issue to consider, since issuing a new IRP_MJ_CREATE is not exactly cheap. It didn't really matter in my case but I'm just mentioning it here just in case.
And finally, there are some caveats to consider for this approach:
  • Because we've duplicated the handle, we are effectively using the same FILE_OBJECT as the user and so any changes we make will potentially affect them. For example, if the IO manager keeps track of the current byte offset for this FILE_OBJECT, operations the driver might perform change that and so the user mode component might get confused. So if the driver is planning on being transparent to the user mode client, then it needs to be extra careful about this sort of things. This wasn't a concern in my case since the user mode client was aware it was sending the handle to a driver and didn't use the handle afterwards, but it might be different for a minifilter.
  • Since we've duplicated the handle, it is possible that IRP_MJ_CLEANUP no longer arrives in the context of the user process (depending on whether the kernel handle gets closed first or not). This might have an impact on some minifilters as well as on any byte range locks on the file.
  • Since ZwDuplicateObject() is not documented it might not be supported in this form (or at all) in future OS releases. Though IMO Microsoft should document this API.
  • In XP and Server 2003 (SRV03) there is a bug (fixed in XP SP3 and SRV03 R2 SP1 (I'm not sure about the version)) where the handle that is returned by ZwDuplicateObject is a kernel handle (it belongs in the system process' handle table) but is not marked as such (the most significant bit is not set).
  • Finally, there is one other flag to ZwDuplicateObject that might be interesting to anyone using to duplicate user mode handles, DUPLICATE_CLOSE_SOURCE. This closes the user mode handle before the function returns. According to Gary Nebbett's book, it will close the handle regardless of the status of the operation.