Convert PDF document to XML using C#.net [Resolved]

Posted by Mdjack under C# on 7/19/2012 | Points: 10 | Views : 11804 | Status : [Member] | Replies : 9
Hi,

Any can help me for my urgent requirement.

Convert PDF document to XML using C#.net

N. MOHAMED ZACKKARIAH


Responses

Posted by: Megan00 on: 7/19/2012 [Member] Starter | Points: 50

Up
0
Down

Resolved
You can use Spire.Doc and Spire.PDF to realize your task. but not directly. you have to first extract text or images from PDF by using Spire.PDF:http://www.e-iceblue.com/Introduce/pdf-for-net-introduce.html
to extract them,then, using Spire.Doc to convert the extract text to XML using Spire.Doc:http://www.e-iceblue.com/Introduce/word-for-net-introduce.html.
you can use below code to extract your pdf:
         //Create a pdf document.

PdfDocument doc = new PdfDocument();
doc.LoadFromFile(@"C:\Program Files\e-iceblue\Spire.Pdf\Demos\Data\Sample2.pdf");
StringBuilder buffer = new StringBuilder();
IList<Image> images = new List<Image>();
foreach (PdfPageBase page in doc.Pages)
{
buffer.Append(page.ExtractText());
foreach (Image image in page.ExtractImages())
{
images.Add(image);
}
}
doc.Close();
//save text
String fileName = "TextInPdf.docx";
File.WriteAllText(fileName, buffer.ToString());
//save image
int index = 0;
foreach (Image image in images)
{
String imageFileName
= String.Format("Image-{0}.png", index++);
image.Save(imageFileName, ImageFormat.Png);
}
//Launching the Text file.
System.Diagnostics.Process.Start(fileName);


and then, convert word to xml using Spire.Doc:
        private void button1_Click(object sender, EventArgs e)

{
//Create word document
Document document = new Document();
document.LoadFromFile(@"D:\Sample.doc");
//Save doc file.
document.SaveToFile("Sample.xml", FileFormat.Xml);
//Launching the MS Word file.
WordDocViewer("Sample.xml");
}
private void WordDocViewer(string fileName)
{
try
{
System.Diagnostics.Process.Start(fileName);
}
catch { }
}
}
}

i hope this method can help you but not must, I only say it has possibility.Guy, you really meet a tough task.


Never give up! Smile to the world!
http://excelcsharp.blog.com/

Mdjack, if this helps please login to Mark As Answer. | Alert Moderator

Posted by: Megan00 on: 7/19/2012 [Member] Starter | Points: 25

Up
0
Down
I only know how to convert xml to PDF, but It is hard to convert PDF directly to xml, so why not extract PDF information first and then, convert word to xml:
http://www.e-iceblue.com/Knowledgebase/Spire.PDF/Program-Guide/Extract-and-Insert-PDF-Images-Text-for-WPF.html
http://www.e-iceblue.com/Knowledgebase/Spire.Doc/Program-Guide.html

Never give up! Smile to the world!
http://excelcsharp.blog.com/

Mdjack, if this helps please login to Mark As Answer. | Alert Moderator

Posted by: Mdjack on: 7/19/2012 [Member] Starter | Points: 25

Up
0
Down
Thanks Megan. Can u give me the idea how to achieve this stuff.

N. MOHAMED ZACKKARIAH

Mdjack, if this helps please login to Mark As Answer. | Alert Moderator

Posted by: Megan00 on: 7/19/2012 [Member] Starter | Points: 25

Up
0
Down
It is really hard to convert pdf to xml directly, so I think if possible, you can first extract the PDF text and images and then, convert word to xml, but it will change the structure of oringinal PDF , so it is really hard. but you can use my suggestion to give it a try. as long as I have other information, I will reply u as soon as possible.

Never give up! Smile to the world!
http://excelcsharp.blog.com/

Mdjack, if this helps please login to Mark As Answer. | Alert Moderator

Posted by: Megan00 on: 7/19/2012 [Member] Starter | Points: 25

Up
0
Down
you can only convert pdf created by text documents. If pdf contains image pages(like scanned documents) then you can not convert it.

Never give up! Smile to the world!
http://excelcsharp.blog.com/

Mdjack, if this helps please login to Mark As Answer. | Alert Moderator

Posted by: Mdjack on: 7/19/2012 [Member] Starter | Points: 25

Up
0
Down
Hi
Can u tell need to use third party dll for convert the PDF to DOC?
Can u give me any code for to do the stuff for convert pdf to word document.

N. MOHAMED ZACKKARIAH

Mdjack, if this helps please login to Mark As Answer. | Alert Moderator

Posted by: Zaiba on: 11/27/2013 [Member] Starter | Points: 25

Up
0
Down
You can convert PDF to XML and vice versa using c#/.net by using Aspose.PDF for .NET Library.

http://www.aspose.com/.net/pdf-component.aspx

Mdjack, if this helps please login to Mark As Answer. | Alert Moderator

Posted by: t5j9033387989 on: 11/28/2013 [Member] Starter | Points: 25

Up
0
Down
https://bytescout.com/products/developer/pdfextractorsdk/index.html

take a look of this link in it step by step solution is given.

mark this answer if it will really help you,

Thanks&Regards
ketan

Mdjack, if this helps please login to Mark As Answer. | Alert Moderator

Posted by: Evanpan on: 1/20/2016 [Member] Starter | Points: 25

Up
0
Down
I wonder whether there are any differences between pdf extraction and pdf to text conversion process?
http://www.pqscan.com/pdf-to-text/


Mdjack, if this helps please login to Mark As Answer. | Alert Moderator

Login to post response