Skip to main content

Microsoft word to PDF/HTML Converter

Developing a content management System, I came across this requirement where the uploaded word docs needs to be converted to PDF at server, saved and later on be available for display on the browser.


This is a clean approach as the content of the pages can change once in a while and there’s no point in making the content static and then reinventing the wheel when the content changes.

The application uploads the word documents to the server, which are saved as PDF and become a part of the dynamic menu that contains the link if the document has been uploaded to the server. Now whenever the content of the uploaded files changes, the changed files can be uploaded to the server again and the modified content is available to the user.

One of the pre-requisites for this functionality to work is the availability of save-as pdf template to be available in MS-Word 2007.

If it is not available, it can be downloaded from here.

I’ll not go into creation of dynamic menu and all other stuff, I’ll just explain how the “. Docx” to “.pdf” conversion works.

Here is the Code for the conversion.

public static string ConvertDocument(string filePath, string folder_to_save_in,string FileName)


{

Microsoft.Office.Interop.Word.ApplicationClass wordApplication = new Microsoft.Office.Interop.Word.ApplicationClass();

string newfilename = string.Empty;

try

{

// set up a Word Application...

// Opening a Word doc

object o_nullobject = System.Reflection.Missing.Value;

object o_filePath = filePath;

Microsoft.Office.Interop.Word.Document doc = wordApplication.Documents.Open(ref o_filePath,

ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,

ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,

ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject);

// save The Doc in html/pdf format format...

//string newfilename = folder_to_save_in + @"\"+FileName.Replace(".docx", ".html");

newfilename = folder_to_save_in + @"\" + FileName.Replace(".docx", ".pdf");

object o_newfilename = newfilename;

//object o_format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatHTML;

object o_format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatPDF;

object o_encoding = null;

object o_endings = Microsoft.Office.Interop.Word.WdLineEndingType.wdCRLF;

wordApplication.ActiveDocument.SaveAs(ref o_newfilename, ref o_format, ref o_nullobject,

ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,

ref o_nullobject, ref o_nullobject, ref o_encoding, ref o_nullobject,

ref o_nullobject, ref o_endings, ref o_nullobject);

// close the original doc ...

doc.Close(ref o_nullobject, ref o_nullobject, ref o_nullobject);

}

catch (Exception ex)

{


}

finally

{

}

return newfilename;

}
 
For this Code to work , you need to add a reference to Microsoft.Office.Interop.Word dll(in Vs-2008, there will be two versions verion 11.0 and version 12.0. This code uses 12.0).


The code works by creating an instance of word application, then opening the the document specied in the “filepath” parameter and the finally saving it in the “folder_to_save_in” folder.

WdSaveFormat enum has a number of options (the same that you get while using save as functionality of the MS-Word), so the document can be saves as HTML also.

This code  returns a strin that would be the location where the document is saved as the requirement was to update an XML file.Please make necessary modifications as required.
Hope this was Helpful,

Till Next we connect…..

Happy Coding!



Comments

Popular posts from this blog

Asp.Net 4.0: An Overview-Part-III

This is the last post in the series which will explore the following new features of ASP.Net 4.0  Performance Monitoring for Individual Applications in a Single Worker Process Web.config File Refactoring Permanently Redirecting a Page Expanding the Range of Allowable URLs Performance Monitoring for Individual Applications in a Single Worker Process It is a common practice to host multiple ASP.NET applications in a single worker process, In order to increase the number of Web sites that can be hosted on a single server. This practice results in difficulties for server administrators to identify an individual application that is experiencing problems. ASP.NET 4 introduces new resource-monitoring functionality introduced by the CLR. To enable this functionality, following XML configuration snippet is added to the aspnet.config configuration file.(This file is located in the directory where the .NET Framework is installed ) <?xml version="1.0" encoding="UTF-8...

WCF-REST Services-Part-II

HOW REST is implemented in WCF Part-I of the series explored the REST conceptually and this post will explore how REST is implemented in WCF. For REST implementation in WCF, 2 new attributes namely WebGetAttribute and WebInvokeAttribute are introduced in WCF along with a URI template mechanism that enables you to declare the URI and verb to which each method is going to respond. The infrastructure comes in the form of a binding ( WebHttpBinding ) and a behavior ( WebHttpBehavior ) that provide the correct networking stack for using REST. Also, there is some hosting infrastructure help from a custom Service¬Host ( WebServiceHost ) and a ServiceHostFactory ( WebServiceHostFactory ). How WCF Routes messages WCF routes network messages to methods on instances of the classes defined as implementations of the service. Default behavior ( Dispatching ) for WCF is to do this routing based on the concept of action. For this dispatching to work, an action needs to be present in ev...

WPF Routing

WPF (3.5) introduced the concept of Routing that made the event routing easies in the scenarios where it was tedious to handle events. Consider a scenario where there are a number of Hyperlinks in a Panel that direct to separate locations on Click. Now if this is done in normal programming, each hyperlink will have to have code for execution. It would be easier and cleaner if we could handle the hyperlinks in the container (the Panel) that handles the click and redirects to appropriate location. WPF handles the events with the following 3 strategies. Direct events are like ordinary .NET events. They originate in one element and don’t pass to any other. For example, MouseEnter is a direct event. Bubbling events are events that travel up the containment hierarchy. For example, MouseDown is a bubbling event. It is raised first by the element that is clicked. Next, it is raised by that element’s parent, and then by that element’s parent, and so on, until WPF reaches the top of the e...