Working with Excel through COM Interop
I got into programming years ago with Excel macros and before long I was writing Visual Basic for Application (VBA) codes. When I made the switch to VB.NET, I thought automating Excel would be simple. Except for overcoming the sizable difference between VBA and VB.NET, it was pretty straightforward…that is if you’re not concerned with leaking resources. The following are some things I picked up about Excel interop and some gotchas.
Interoperating with COM is made easy because the .NET framework has a sophisticated system tha allows managed and unmanaged codes to work together. When you add a reference to a COM type library, Visual Studio automatically generates an interop assembly. Types from this assembly can then be used by your classes like any other managed types.
For the most part, types exposed by interop assemblies are proxies. That means you’re not dealing directly with actual COM objects. That’s a good thing because this saves you time and reduces complexity. Proxies, when created as early-bound objects, are specific types. They implement the interfaces the COM objects implement and expose their methods, properties, and events. Furthermore, proxies handle the task of passing data between managed and unmanaged environments…think of method parameters and return values. As an example, when a COM method returns a pointer to a native string object, the proxy copies the string to the managed world and frees the native string.
What is a Runtime Callable Wrapper?
A runtime callable wrapper (RCW) is another name for the proxies exposed by interop assemblies. Each RCW internally holds a pointer to a real COM object and implements the interfaces of that object. Because RCWs are managed objects but hold references to unmanaged objects, they implement class destructors. When you try to use RCWs that have been garbage collected, you get this cryptic error: “COM object that has been separated from its underlying RCW cannot be used.” This is one of the most common exception with COM interoperability.
The weird thing about runtime callable wrappers is they maintain two different reference counts: COM object reference count and marshaling reference count. Since the runtime creates one RCW for each COM object, the internal COM object reference count is either 1 (alive) or 0 (dead). The marshaling count on the other hand represents the number of times the COM object has been marshaled into the managed world. Basically the first time a pointer to a COM object is received, marshaling reference count is set to 1. Thereafter, additional calls to methods or properties that return the same COM object, will return the same RCW but with its marshaling count incremented.
A better way to explain marshaling count is through an example. Here we create an Excel application and add a new workbook. Then we retrieve that workbook from the workbooks collection two times. At that point, marshaling count for the RCW that wraps the Excel Workbook is 3 because the COM object was marshaled into the managed world three times. The output shows ending marshaling counts after each call to ReleaseComObject(). This method simply returns the internal marshalling count after decreasing it by 1. Once the marshaling count reaches 0, RCW will call IUnkown::Release thus freeing the COM object.
|Dim app As Excel.Application = New Excel.Application()
Dim books As Excel.Workbooks = app.Workbooks
Dim book1 As Excel.Workbook = books.Add()
Dim book2 As Excel.Workbook = books(1)
Dim book3 As Excel.Workbook = books(1)Debug.WriteLine(“book3=” + Marshal.ReleaseComObject(book3))
Debug.WriteLine(“book2=” + Marshal.ReleaseComObject(book2))
Debug.WriteLine(“book1=” + Marshal.ReleaseComObject(book1))
Debug.WriteLine(“books=” + Marshal.ReleaseComObject(books))
Debug.WriteLine(“app=” + Marshal.ReleaseComObject(app))
Properly Disposing of Excel Objects
Disposing of Excel objects is straight forward. After you’re done with the object just remember to call Marshal.ReleaseComObject. If the same Excel object has been marshaled multiple times, the internal marshaling count will be greater than one. In that case, the last call to Marshal.ReleaseComObject which brings the marshaling count to zero will detach the COM reference. Also, calling Marshal.ReleaseComObject on RCWs with marshaling count already at zero doesn’t do anything. The method simply returns -1.
A note on the “two dot rule” when working with COM interop. This rule is frequently mentioned in discussion boards and states you shouldn’t do this: “xlApp.Workbooks.Add()”. You shouldn’t have two dots within any expression. The reason is all of the temporary variables that holds intermediate objects. Temporary variables are unreachable. A more complete rule should be if a method or property returns a System.__ComObject (base class for RCWs), those objects should be explicitly assigned to variables so Marshal.ReleaseComObject can be later called.
There’s also another method to set the internal marshaling count to zero. Calling Marshal.FinalReleaseComObject will bypass the marshaling count logic and immediately set the marshaling count to zero. This may simplify things but Marshal.ReleaseComObject is my preferred choice.
So what happens if you don’t call Marshal.ReleaseComObject? Runtime callable wrappers are cleaned up in one of two ways: (1) the garbage collector will take over and release RCWs that are not in use via the finalizer thread or (2) when a new COM object is activated or when an interface pointer enters the runtime for the first time, the runtime will clean up all RCWs that are no longer in use.
The second way requires more explanation. Normally, garbage collection may not occur for a long time so RCWs no longer in user could hang around until application exits. However, if you create another Excel instance, RCWs subject to garbage collection get cleaned up right away. The end result is at most only one or two Excel instances are ever kept alive.
This automatic clean up sounds good an dandy but there’s one flaw. If the Excel instance is not visible, calls to Marshal.ReleaseComObject alone will not kill the process. You must explicitly call Excel.Application.Quit() prior to having Marshal.ReleaseComObject called on the application class. Without this, the associated COM server will keep running. So if you create a dozen Excel instances, a dozen Excel processes will be orphaned. You’ll have to kill them via the Task Manager.
Although the .NET framework has this sophisticated system, working with Excel interop still requires a lot of bookkeeping. You have to track each and every COM object that are marshaled into the managed world. From what I understand, COM events are even trickier. But all’s not lost. There are plenty of libraries that will manage this for you. You could also write your own library. I don’t think that’ll be too difficult.