Skip to main content

Microsoft word to PDF/HTML Converter

Developing a content management System, I came across this requirement where the uploaded word docs needs to be converted to PDF at server, saved and later on be available for display on the browser.


This is a clean approach as the content of the pages can change once in a while and there’s no point in making the content static and then reinventing the wheel when the content changes.

The application uploads the word documents to the server, which are saved as PDF and become a part of the dynamic menu that contains the link if the document has been uploaded to the server. Now whenever the content of the uploaded files changes, the changed files can be uploaded to the server again and the modified content is available to the user.

One of the pre-requisites for this functionality to work is the availability of save-as pdf template to be available in MS-Word 2007.

If it is not available, it can be downloaded from here.

I’ll not go into creation of dynamic menu and all other stuff, I’ll just explain how the “. Docx” to “.pdf” conversion works.

Here is the Code for the conversion.

public static string ConvertDocument(string filePath, string folder_to_save_in,string FileName)


{

Microsoft.Office.Interop.Word.ApplicationClass wordApplication = new Microsoft.Office.Interop.Word.ApplicationClass();

string newfilename = string.Empty;

try

{

// set up a Word Application...

// Opening a Word doc

object o_nullobject = System.Reflection.Missing.Value;

object o_filePath = filePath;

Microsoft.Office.Interop.Word.Document doc = wordApplication.Documents.Open(ref o_filePath,

ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,

ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,

ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject);

// save The Doc in html/pdf format format...

//string newfilename = folder_to_save_in + @"\"+FileName.Replace(".docx", ".html");

newfilename = folder_to_save_in + @"\" + FileName.Replace(".docx", ".pdf");

object o_newfilename = newfilename;

//object o_format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatHTML;

object o_format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatPDF;

object o_encoding = null;

object o_endings = Microsoft.Office.Interop.Word.WdLineEndingType.wdCRLF;

wordApplication.ActiveDocument.SaveAs(ref o_newfilename, ref o_format, ref o_nullobject,

ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,

ref o_nullobject, ref o_nullobject, ref o_encoding, ref o_nullobject,

ref o_nullobject, ref o_endings, ref o_nullobject);

// close the original doc ...

doc.Close(ref o_nullobject, ref o_nullobject, ref o_nullobject);

}

catch (Exception ex)

{


}

finally

{

}

return newfilename;

}
 
For this Code to work , you need to add a reference to Microsoft.Office.Interop.Word dll(in Vs-2008, there will be two versions verion 11.0 and version 12.0. This code uses 12.0).


The code works by creating an instance of word application, then opening the the document specied in the “filepath” parameter and the finally saving it in the “folder_to_save_in” folder.

WdSaveFormat enum has a number of options (the same that you get while using save as functionality of the MS-Word), so the document can be saves as HTML also.

This code  returns a strin that would be the location where the document is saved as the requirement was to update an XML file.Please make necessary modifications as required.
Hope this was Helpful,

Till Next we connect…..

Happy Coding!



Comments

Popular posts from this blog

Asp.Net 4.0: An Overview-Part-III

This is the last post in the series which will explore the following new features of ASP.Net 4.0  Performance Monitoring for Individual Applications in a Single Worker Process Web.config File Refactoring Permanently Redirecting a Page Expanding the Range of Allowable URLs Performance Monitoring for Individual Applications in a Single Worker Process It is a common practice to host multiple ASP.NET applications in a single worker process, In order to increase the number of Web sites that can be hosted on a single server. This practice results in difficulties for server administrators to identify an individual application that is experiencing problems. ASP.NET 4 introduces new resource-monitoring functionality introduced by the CLR. To enable this functionality, following XML configuration snippet is added to the aspnet.config configuration file.(This file is located in the directory where the .NET Framework is installed ) <?xml version="1.0" encoding="UTF-8...

WCF..Why Communication Foundation?

WCF (Basics..for building effective Services) WCF:  Windows Communication Foundation It includes a collection of of .NET distributed technologies that have existed for long , but never got grouped under one name. WCF can be considered as collection of the following technologies. Web Services(ASMX) NET Enterprise Services MSMQ .NET Remoting Code written in WCF can interact across components, applications and systems. WCF is in accordance with SOA (Service Oriented Architecture). Following Sections provide the details of these ABCs. Addresses In WCF, every service has a unique address. The address provides two important elements A)     Location of the service B)      Transport protocol or transport schema used to communicate with the service.  The location indicates the name of the target machine, site, or network; a communication port, pipe, or queue; and an optional specific path or URI. WCF supports th...

Covariance and Contravariance-General Discussion

If you have just started the exploration of .Net Framework 4.0, two terms namely Covariance and Contravariance might have been heard. The concept that these terms encapsulate are used by most developer almost daily, however there has never been any botheration about the terminologies. Now, what actually these terms mean and how are these going to affect us as a developer, if we dive in to the details. The simple answer is it’s always good to know your tools before actually using them. Enough philosophy, let’s get to the business. Starting the discussion let me reiterate that in addition to Covariance and Contravariance, there is another terminology, Invariance. I’ll by start here by diving into the details of Invariance and then proceed further. Invariance: Invariance can be better understood by considering the types in .Net.>net has basically two type, value-types and reference-types. Value types (int, double etc) are invariant i.e. the types can’t be interchanged either ...