
You can use Spire.Doc and Spire.PDF to realize your task. but not directly. you have to first extract text or images from PDF by using Spire.PDF:
http://www.e-iceblue.com/Introduce/pdf-for-net-introduce.html
to extract them,then, using Spire.Doc to convert the extract text to XML using Spire.Doc:
http://www.e-iceblue.com/Introduce/word-for-net-introduce.html.
you can use below code to extract your pdf:
//Create a pdf document.
PdfDocument doc = new PdfDocument();
doc.LoadFromFile(@"C:\Program Files\e-iceblue\Spire.Pdf\Demos\Data\Sample2.pdf");
StringBuilder buffer = new StringBuilder();
IList<Image> images = new List<Image>();
foreach (PdfPageBase page in doc.Pages)
{
buffer.Append(page.ExtractText());
foreach (Image image in page.ExtractImages())
{
images.Add(image);
}
}
doc.Close();
//save text
String fileName = "TextInPdf.docx";
File.WriteAllText(fileName, buffer.ToString());
//save image
int index = 0;
foreach (Image image in images)
{
String imageFileName
= String.Format("Image-{0}.png", index++);
image.Save(imageFileName, ImageFormat.Png);
}
//Launching the Text file.
System.Diagnostics.Process.Start(fileName);
and then, convert word to xml using Spire.Doc:
private void button1_Click(object sender, EventArgs e)
{
//Create word document
Document document = new Document();
document.LoadFromFile(@"D:\Sample.doc");
//Save doc file.
document.SaveToFile("Sample.xml", FileFormat.Xml);
//Launching the MS Word file.
WordDocViewer("Sample.xml");
}
private void WordDocViewer(string fileName)
{
try
{
System.Diagnostics.Process.Start(fileName);
}
catch { }
}
}
}
i hope this method can help you but not must, I only say it has possibility.Guy, you really meet a tough task.
Never give up! Smile to the world!
http://excelcsharp.blog.com/
Mdjack, if this helps please login to Mark As Answer. | Alert Moderator