Convert PDF document to XML using C#.net [Resolved]

Posted by Mdjack under C# on 7/19/2012 | Points: 10 | Views : 20453 | Status : [Member] | Replies : 8

Write New Post |

Search Forums | Answered

Resolved Posts |

Un Answered Posts |

Forums Home

Hi,

Any can help me for my urgent requirement.

Convert PDF document to XML using C#.net

N. MOHAMED ZACKKARIAH

[Resolved]

Reply | Reply with Attachment

Alert Moderator

Responses

Posted by: Megan00 on: 7/19/2012 [Member] Starter | Points: 50

0	You can use Spire.Doc and Spire.PDF to realize your task. but not directly. you have to first extract text or images from PDF by using Spire.PDF:http://www.e-iceblue.com/Introduce/pdf-for-net-introduce.html to extract them,then, using Spire.Doc to convert the extract text to XML using Spire.Doc:http://www.e-iceblue.com/Introduce/word-for-net-introduce.html. you can use below code to extract your pdf: //Create a pdf document. PdfDocument doc = new PdfDocument(); doc.LoadFromFile(@"C:\Program Files\e-iceblue\Spire.Pdf\Demos\Data\Sample2.pdf"); StringBuilder buffer = new StringBuilder(); IList<Image> images = new List<Image>(); foreach (PdfPageBase page in doc.Pages) { buffer.Append(page.ExtractText()); foreach (Image image in page.ExtractImages()) { images.Add(image); } } doc.Close(); //save text String fileName = "TextInPdf.docx"; File.WriteAllText(fileName, buffer.ToString()); //save image int index = 0; foreach (Image image in images) { String imageFileName = String.Format("Image-{0}.png", index++); image.Save(imageFileName, ImageFormat.Png); } //Launching the Text file. System.Diagnostics.Process.Start(fileName); and then, convert word to xml using Spire.Doc: private void button1_Click(object sender, EventArgs e) { //Create word document Document document = new Document(); document.LoadFromFile(@"D:\Sample.doc"); //Save doc file. document.SaveToFile("Sample.xml", FileFormat.Xml); //Launching the MS Word file. WordDocViewer("Sample.xml"); } private void WordDocViewer(string fileName) { try { System.Diagnostics.Process.Start(fileName); } catch { } } } } i hope this method can help you but not must, I only say it has possibility.Guy, you really meet a tough task. Never give up! Smile to the world! http://excelcsharp.blog.com/ Mdjack, if this helps please login to Mark As Answer. \| Alert Moderator

You can use Spire.Doc and Spire.PDF to realize your task. but not directly. you have to first extract text or images from PDF by using Spire.PDF:http://www.e-iceblue.com/Introduce/pdf-for-net-introduce.html
to extract them,then, using Spire.Doc to convert the extract text to XML using Spire.Doc:http://www.e-iceblue.com/Introduce/word-for-net-introduce.html.
you can use below code to extract your pdf:

         //Create a pdf document.

            PdfDocument doc = new PdfDocument();

            doc.LoadFromFile(@"C:\Program Files\e-iceblue\Spire.Pdf\Demos\Data\Sample2.pdf");

            StringBuilder buffer = new StringBuilder();

            IList<Image> images = new List<Image>();

            foreach (PdfPageBase page in doc.Pages)

            {

                buffer.Append(page.ExtractText());

                foreach (Image image in page.ExtractImages())

                {

                    images.Add(image);

                }

            }

            doc.Close();

            //save text

            String fileName = "TextInPdf.docx";

            File.WriteAllText(fileName, buffer.ToString());

            //save image

            int index = 0;

            foreach (Image image in images)

            {

                String imageFileName

                    = String.Format("Image-{0}.png", index++);

                image.Save(imageFileName, ImageFormat.Png);

            }

            //Launching the Text file.

            System.Diagnostics.Process.Start(fileName);

and then, convert word to xml using Spire.Doc:

        private void button1_Click(object sender, EventArgs e)

        {

            //Create word document

            Document document = new Document();

            document.LoadFromFile(@"D:\Sample.doc");

            //Save doc file.

            document.SaveToFile("Sample.xml", FileFormat.Xml);

            //Launching the MS Word file.

            WordDocViewer("Sample.xml");

        }

        private void WordDocViewer(string fileName)

        {

            try

            {

                System.Diagnostics.Process.Start(fileName);

            }

            catch { }

        }

    }

}

i hope this method can help you but not must, I only say it has possibility.Guy, you really meet a tough task.

Never give up! Smile to the world!
http://excelcsharp.blog.com/

Mdjack, if this helps please login to Mark As Answer. | Alert Moderator

Posted by: Megan00 on: 7/19/2012 [Member] Starter | Points: 25

0	I only know how to convert xml to PDF, but It is hard to convert PDF directly to xml, so why not extract PDF information first and then, convert word to xml: http://www.e-iceblue.com/Knowledgebase/Spire.PDF/Program-Guide/Extract-and-Insert-PDF-Images-Text-for-WPF.html http://www.e-iceblue.com/Knowledgebase/Spire.Doc/Program-Guide.html Never give up! Smile to the world! http://excelcsharp.blog.com/ Mdjack, if this helps please login to Mark As Answer. \| Alert Moderator

I only know how to convert xml to PDF, but It is hard to convert PDF directly to xml, so why not extract PDF information first and then, convert word to xml:
http://www.e-iceblue.com/Knowledgebase/Spire.PDF/Program-Guide/Extract-and-Insert-PDF-Images-Text-for-WPF.html
http://www.e-iceblue.com/Knowledgebase/Spire.Doc/Program-Guide.html

Never give up! Smile to the world!
http://excelcsharp.blog.com/

Mdjack, if this helps please login to Mark As Answer. | Alert Moderator

Posted by: Mdjack on: 7/19/2012 [Member] Starter | Points: 25

0	Thanks Megan. Can u give me the idea how to achieve this stuff. N. MOHAMED ZACKKARIAH Mdjack, if this helps please login to Mark As Answer. \| Alert Moderator

Posted by: Megan00 on: 7/19/2012 [Member] Starter | Points: 25

0	It is really hard to convert pdf to xml directly, so I think if possible, you can first extract the PDF text and images and then, convert word to xml, but it will change the structure of oringinal PDF , so it is really hard. but you can use my suggestion to give it a try. as long as I have other information, I will reply u as soon as possible. Never give up! Smile to the world! http://excelcsharp.blog.com/ Mdjack, if this helps please login to Mark As Answer. \| Alert Moderator

It is really hard to convert pdf to xml directly, so I think if possible, you can first extract the PDF text and images and then, convert word to xml, but it will change the structure of oringinal PDF , so it is really hard. but you can use my suggestion to give it a try. as long as I have other information, I will reply u as soon as possible.

Never give up! Smile to the world!
http://excelcsharp.blog.com/

Mdjack, if this helps please login to Mark As Answer. | Alert Moderator

Posted by: Megan00 on: 7/19/2012 [Member] Starter | Points: 25

0	you can only convert pdf created by text documents. If pdf contains image pages(like scanned documents) then you can not convert it. Never give up! Smile to the world! http://excelcsharp.blog.com/ Mdjack, if this helps please login to Mark As Answer. \| Alert Moderator

Posted by: Mdjack on: 7/19/2012 [Member] Starter | Points: 25

0	Hi Can u tell need to use third party dll for convert the PDF to DOC? Can u give me any code for to do the stuff for convert pdf to word document. N. MOHAMED ZACKKARIAH Mdjack, if this helps please login to Mark As Answer. \| Alert Moderator

Posted by: Zaiba on: 11/27/2013 [Member] Starter | Points: 25

0	You can convert PDF to XML and vice versa using c#/.net by using Aspose.PDF for .NET Library. http://www.aspose.com/.net/pdf-component.aspx Mdjack, if this helps please login to Mark As Answer. \| Alert Moderator

Posted by: t5j9033387989 on: 11/28/2013 [Member] Starter | Points: 25

0	https://bytescout.com/products/developer/pdfextractorsdk/index.html take a look of this link in it step by step solution is given. mark this answer if it will really help you, Thanks&Regards ketan Mdjack, if this helps please login to Mark As Answer. \| Alert Moderator

Latest Posts