While working on document generation on server side, where Open XML SDK is the tool for a developer, I came across a scenario where the requirement was to generate Open XML based Excel Workbooks (XLSX), from older format of Excel Workbooks (XLS) files which contained a huge amount of data, like million rows in a single sheet!

We were able to employ XSL Transform technique to very effeciently generate a part (sheet.xml) of Open XML workbook. I have talked about this whole scenario at the following post:

How to efficiently generate OpenXML Workbook from Excel files containing very large amount of data – Visual Studio Office Development (VSOD) Support Team – Site Home – MSDN Blogs

This method is quite fast as well as extensible. During my testing, the generation of an Open XML workbook from a data XML containing around 1150 rows with 30+ columns each, took even less than a second to generate. Also, this technique can be extended to pull in data from a back-end database as XML and generate Open XML workbooks on the fly.

Have a look into this article and feel free to drop you comments/queries here or on the above post, I will surely try to answer them.

Hello everyone! In the last post we left off with a small example of how to automate Microsoft Office applications from our custom solutions. This week we are going to discuss about what does Microsoft Office object model means, and will discuss about various classes/interfaces available in the Office applications.

As I had mentioned earlier, Microsoft Office applications (from now on, wherever I mentions this term, I am referring to Microsoft Word, Excel, and PowerPoint) are also COM (Component Object Model) servers. This enables them to provide a way for developers to use their features in other programs.

As a developer, all you need to do is call a function to launch the server, and once it is up and running, you can ask for the various interfaces provided by the server and utilize its functionality into your application.

The task of launching an Office application from your program can be performed in different ways depending on whether you are linking to the COM server’s type library statically, or you want to launch it dynamically. While linking statically, all you need to do is call the constructor of the main class of the COM Server and then you can access the other Interfaces by using properties/methods of the main class object. Linking dynamically involves calling CoCreateInstance API specifying either the CSLID or ProgID of the application to launch. In the example code of my previous post, we have seen the static linking technique.

Each of the Office applications provide their main class by the name Application, and rest of the Interfaces/Classes that we can use are exposed as Properties of the Application class. It represents a running instance of the Sever and provides methods to manipulate its state like, activating it, setting its visibility etc.

After the Application class, each Office application provides a class that represents the document it works on. In case of Word it is Document class, for Excel it is called Workbook and in case of PowerPoint it is named as Presentation. The Application class maintains the list of all currently opened documents and provides a property that represents this collection. To access a particular document, we need to first get the collection and then find our required document object.

In case of Excel and PowerPoint, each working document contains another level of isolation for its contents. Excel provides separate Worksheets in a workbook and a PowerPoint presentation contains various Slides. You might say that even Word documents contain various Pages, and you are very right. However, Word doesn’t provide page as another isolation of its content. All of the content of a document is represented by the Document class only.

Ok, I think that is enough of the theory; let’s see how this architecture of Office applications can be depicted:

 Office Object Model Architecture

As shown above, Range class is common to Word and Excel, and it represents the portion of the content that can be manipulated with these applications. In case of Word, it could represent a paragraph, a line of text, or a single word. In case of Excel, it represents a collection of Cells, or a particular cell. In PowerPoint, the content of a slide is represented by a collection of Shapes, i.e., each item of a slide in PowerPoint is a shape.

A point to be noted here is that even though I have mentioned Range/Shape as THE object to manipulate content in Office application, it not the only one. There are other objects, too, that can be used to play around with the content of these applications. Here is a small list of such objects:

  • Word
    • Selection – type of Range object
    • Shapes – represents drawings, embedded images or objects, etc.
    • InlineShapes – shapes that are displayed in-line with text
  • Excel
    • Rows – Range object representing a row of data in Excel
    • Columns – Range object representing a column of data in Excel
    • Cells – another Range object, for a single/multiple cell(s)
    • Shapes – represents drawings, embedded images or objects etc.
    • Selection – Represents any selected item on the sheet, could be a Cell, a Range of cells, or a Shape etc.
  • PowerPoint
    • Selection – represents a currently selected object on the slide
    • SlideRange – object to manipulate single or multiple slides
    • TextRange – object to manipulate a selected piece of text
    • ShapeRange – object representing a collection of shapes

In Part-1 of this post, we saw code snippets in VC++ and C# to insert a piece of text in a Word document. If you were to implement similar task for an Excel workbook or a PowerPoint Presentation, you can very easily implement it by knowing what type of object you should work with. For instance, you can use Range or Cell object in Excel, and you can use Shape object in case of PowerPoint.

With this, I am going to end this post. I am sure this post has broadened your view of how to work with Office applications. You can also refer to Microsoft Office SDK Documentation and Developer References for more details and list of other classes/objects available to you.

In case you have any queries, please feel free to drop in your comments; I will be more than happy to respond.

It has been a while when I last posted something… and with this post I am making a comeback :D

While working with Word during these days, I encountered a scenario where, if I had an Alternate Data Stream (ADS) associated with Word 97-2003 format document and I converted it to the newest format (DOCX format) in Word 2007, the associated ADS would get lost and there is no way to recover it.

I have posted a programmatic solution to retain this ADS when a document gets converted at:

Visual Studio Office Development (VSOD) Support Team : How to retain alternate data stream associated with a word document while converting it to newer file format version

After programming with the Microsoft Office applications like Word, Excel, and PowerPoint for some time, I feel that using these applications in your own solutions can be quite overwhelming for the first timers. So, I thought of writing this post to introduce the concepts to everyone who wants to program/use Office applications in their own solutions.

The idea of this post is to get you acquainted with the generic concepts related to programming Microsoft Office applications (from here on, whenever I use this term I am mainly referring to Word/Excel/PowerPoint; however, this doesn’t mean that other Office applications cannot be automated). I assume that you already know how to create WinForms/MFC projects using Microsoft Visual Studio. So, let’s begin…

Most widely used Microsoft Office suite applications like Word, Excel, and PowerPoint provide developers with various interfaces, using which they can automate many tasks of these applications; called Office Object Model. Office Object Model is nothing, but a set of interfaces/classes/objects (different terms – same meaning) which allows programmers to call/ check/operate various features of these applications programmatically, which is also called *Automation*.

Apart from being desktop applications, Microsoft Office Applications are also COM components, which expose custom Interfaces for you to utilize in your own solutions. Using these components in a C++ based application involves using the COM APIs (like CoCreateInstance etc.). On the other hand, if you are creating a .Net framework based application then you need something called Microsoft Office Primary Interop Assemblies (PIA) installed on your machine. These are the assemblies provided by Microsoft for use in .Net applications specifically. You can download these from following URLs:

  1. Download link for Office 2003 PIAs
  2. Download link for Office 2007 PIAs

Now, before we get into the thick of things, let’s see what it means to use Microsoft Office applications in custom solutions. We will create two applications, viz., a MFC based Dialog box application and a C# based WinForms application. We will then launch Word from these two, and insert a very magical sentence in a new Word document: “Hello World!”… :) .

Create a MFC based Dialog box application. Add a button to your main dialog box; double click on it and Visual Studio will generate a click event handler for your button. Leave it as it is for now, we’ll come back to it in a few minutes.

To be able to automate Word from MFC based application, we need to define (or rather generate) few classes which will handle to task of instantiating required COM interface, and making function calls to these interfaces for us. For this, right click on your project node (in solution explorer window) and select Add -> Class… From the Add Class dialog box, select “MFC Class from TypeLib” template and click Add (you may have to select the MFC node in the Categories tree to see this Template); as below:

Add new MFC Class from TypeLib

In the Add Class from TypeLib wizard, select “Microsoft Word XX.0 Object Library” (the value of XX will be 12 if you have Office 2007, and 11 if you have Office 2003 installed on your machine). From the list of available interfaces, select the ones shown in the following screenshot:

Select the Interfaces for which Classes will be generated

On clicking the “Finish” button, the wizard will generate header files corresponding to the classes which we have selected above.

At this moment, if you try to compile your project, you may receive many errors. This is due to a known issue with MFC Class generation wizard (don’t know if Microsoft knows about this, but at least I know it.. ;) ), and to fix this open each one of the generated files, and comment the following line:

 

#import "C:\\Program Files\\Microsoft Office\\Office12\\MSWORD.OLB" no_namespace

 

This #import is actually not required when using MFC classes generated from TypeLib.

Now, copy and paste the following code snippet in your newly created button’s event handler:

 

::CoInitialize(NULL); // Initialize the COM related stuff
VARIANT vMissing; // VARIANT defining a missing/optional parameter.
vMissing.vt = VT_ERROR;
vMissing.scode = DISP_E_PARAMNOTFOUND; 

CApplication wordApp; // Variable to hold our Application class object
wordApp.CreateDispatch(_T("Word.Application")); // create the instance of Word
wordApp.put_Visible(TRUE); // make our instance visible to user 

CDocuments docs(wordApp.get_Documents()); // Get the Application.Documents collection object
CDocument0 doc ( docs.Add( // Add a new Document to the Documents collection.
	&vMissing, &vMissing, &vMissing, &vMissing) ); // All the four input parameters are optional. 

CSelection selection ( wordApp.get_Selection() ); // Get the current selection/cursor position in the newly created document.
selection.put_Text ( _T("Hello World!") ); // Here goes our magical sentence in the document. 

selection.ReleaseDispatch(); // free the COM resrouces held by our Selection object.
doc.ReleaseDispatch(); // free the COM resrouces held by our Document object.
docs.ReleaseDispatch(); // free the COM resrouces held by our Documents collection object.
wordApp.ReleaseDispatch(); // free the COM resrouces held by our Application object. 

::CoUninitialize(); // Clean up previously loaded COM related stuff

 

Also, remember to #include the four header files in the file where you will paste the above code (XXXDlg.cpp).

That’s it; go ahead compile and run your project. When you click your button, it should launch Word with a new document containing “Hello World!” in it… (isn’t that magical?… well I told you so ;) )

If you are a .Net guy, and looking for a similar example in C#, here are the steps you need to perform…

Create a C# based WinForms application; add a button and its click event handler to your form. In the event handler button copy and paste the following code snippet:

 

object vMissing = Type.Missing;        //object defining missing/optional parameters

// Create a new instance of Word
Microsoft.Office.Interop.Word.Application wordApp = new Microsoft.Office.Interop.Word.Application();

// Make it visible to the user
wordApp.Visible = true;

// Get the Documents collection
Microsoft.Office.Interop.Word.Documents docs = wordApp.Documents;

// Add a new document to the Documents collection
Microsoft.Office.Interop.Word.Document doc = docs.Add(ref vMissing, ref vMissing, ref vMissing, ref vMissing);

// Get the current selection/cursor position in the newly created document.
Microsoft.Office.Interop.Word.Selection selection = wordApp.Selection;

// Here goes our magical sentence in the document.
selection.Text = "Hello World!";

 

Now, before you can compile your project, you need to add reference to the assembly which contains definitions of the classes for used above. For this, right click on the *References* folder in your project’s solution explorer, and select Add Reference. From the list of available .Net assemblies select Microsoft.Office.Interop.Word assembly, and click Ok.

Well, that’s it for today. We will discuss about the classes, terms and other related concepts that I have used in today’s snippets in the coming posts, soon. Till then, take care and happy coding :)

While working with PowerPoint 2007 today I noticed a strange behavior:

If you have a Chart in an Excel workbook, and you want to insert this chart as link in a PowerPoint Slide by using Paste Special, it may fail to paste anything in the slide if the current view is Outline View.

Also, even if the current active tab on the Normal View of a PowerPoint presentation is Outline View and it has the input focus, PowerPoint will fail to paste anything. And, the stranger part is that you will not get any message box informing so!? (…not surprised!)

How to avoid this:

You can paste the chart by following any of the following ways:

  1. Close Pane 1 of Normal View.
  2. Activate the Slides tab in Pane 1 (see screenshot below).
  3. While keeping the Outline tab as selected tab in Pane 1, Activate Pane 2 (Slide pane) by clicking in it, before you select Paste Special.

PowerPoint Panes

 

Please note that you may encounter the issue even if the Outline tab in Pane 1 is activated and it has input focus.