Skip to main content

Microsoft word to PDF/HTML Converter

Developing a content management System, I came across this requirement where the uploaded word docs needs to be converted to PDF at server, saved and later on be available for display on the browser.


This is a clean approach as the content of the pages can change once in a while and there’s no point in making the content static and then reinventing the wheel when the content changes.

The application uploads the word documents to the server, which are saved as PDF and become a part of the dynamic menu that contains the link if the document has been uploaded to the server. Now whenever the content of the uploaded files changes, the changed files can be uploaded to the server again and the modified content is available to the user.

One of the pre-requisites for this functionality to work is the availability of save-as pdf template to be available in MS-Word 2007.

If it is not available, it can be downloaded from here.

I’ll not go into creation of dynamic menu and all other stuff, I’ll just explain how the “. Docx” to “.pdf” conversion works.

Here is the Code for the conversion.

public static string ConvertDocument(string filePath, string folder_to_save_in,string FileName)


{

Microsoft.Office.Interop.Word.ApplicationClass wordApplication = new Microsoft.Office.Interop.Word.ApplicationClass();

string newfilename = string.Empty;

try

{

// set up a Word Application...

// Opening a Word doc

object o_nullobject = System.Reflection.Missing.Value;

object o_filePath = filePath;

Microsoft.Office.Interop.Word.Document doc = wordApplication.Documents.Open(ref o_filePath,

ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,

ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,

ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject);

// save The Doc in html/pdf format format...

//string newfilename = folder_to_save_in + @"\"+FileName.Replace(".docx", ".html");

newfilename = folder_to_save_in + @"\" + FileName.Replace(".docx", ".pdf");

object o_newfilename = newfilename;

//object o_format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatHTML;

object o_format = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatPDF;

object o_encoding = null;

object o_endings = Microsoft.Office.Interop.Word.WdLineEndingType.wdCRLF;

wordApplication.ActiveDocument.SaveAs(ref o_newfilename, ref o_format, ref o_nullobject,

ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject, ref o_nullobject,

ref o_nullobject, ref o_nullobject, ref o_encoding, ref o_nullobject,

ref o_nullobject, ref o_endings, ref o_nullobject);

// close the original doc ...

doc.Close(ref o_nullobject, ref o_nullobject, ref o_nullobject);

}

catch (Exception ex)

{


}

finally

{

}

return newfilename;

}
 
For this Code to work , you need to add a reference to Microsoft.Office.Interop.Word dll(in Vs-2008, there will be two versions verion 11.0 and version 12.0. This code uses 12.0).


The code works by creating an instance of word application, then opening the the document specied in the “filepath” parameter and the finally saving it in the “folder_to_save_in” folder.

WdSaveFormat enum has a number of options (the same that you get while using save as functionality of the MS-Word), so the document can be saves as HTML also.

This code  returns a strin that would be the location where the document is saved as the requirement was to update an XML file.Please make necessary modifications as required.
Hope this was Helpful,

Till Next we connect…..

Happy Coding!



Comments

Popular posts from this blog

Asp.Net 4.0: An Overview-Part-III

This is the last post in the series which will explore the following new features of ASP.Net 4.0  Performance Monitoring for Individual Applications in a Single Worker Process Web.config File Refactoring Permanently Redirecting a Page Expanding the Range of Allowable URLs Performance Monitoring for Individual Applications in a Single Worker Process It is a common practice to host multiple ASP.NET applications in a single worker process, In order to increase the number of Web sites that can be hosted on a single server. This practice results in difficulties for server administrators to identify an individual application that is experiencing problems. ASP.NET 4 introduces new resource-monitoring functionality introduced by the CLR. To enable this functionality, following XML configuration snippet is added to the aspnet.config configuration file.(This file is located in the directory where the .NET Framework is installed ) <?xml version="1.0" encoding="UTF-8...

WPF Overview-Part-II

This post is in continuation to the last post. In this Post I’ll be exploring the Dependency Properties Dependency properties are similar to CLR properties with more advanced and complex features. The main difference between the CLR properties and dependency properties is, that the value of a normal .NET property is read directly from a private member in your class, whereas the value of a DependencyProperty is resolved dynamically when calling the GetValue() method that is inherited from DependencyObject . In case this description did not make sense, no need to worry, It will become clear by the time you reach end of this article. How the Value is Resolved in Dependency properties Every time a dependency property is accessed, it internally resolves the value by following the precedence from high to low. It checks if a local value is available, if not, check if a custom style trigger is active and I the similar manner continues until it finds a value. At last the default value is alwa...

WebMatrix-The Swiss Army knife from Microsoft

  What’s more in store for Web developers, ASP.Net 4.5(with full support for HTML 5 , CSS 3 amd Javascript) and to complement it, is the new awesome tool-WebMatrix. WebMatrix combines five products in one, letting you install, develop, optimize, deploy and manage your sites and databases. With support for the latest web standards like HTML5 & CSS3, multiple frameworks like ASP.NET & PHP, and multiple database engines like MySQL & SQL Server ..and what’s even better…it FREE!! ! ( Download Webmatrix ). To learn more about webmatrix, I recomment the following video. WebMatrix-An intro The tools promises a lot and I have already downloaded a copy and started playing with it. Hope the information  here is useful. Till next we connect…. Happy Learning.