Sorting XML files so that differences can then be found
up vote
6
down vote
favorite
I need to compare two XML files, each of which is about 13,000 lines long.
Sadly the code that generates these files doesn't generate the data in the same order each time (the data comes from a database).
Therefore, I get false positives when using a standard line-by-line diff utility (WinMerge), even after canonicalising the XML file.
As an example of my problem:
file1:
<a>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">pineapple</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>
file2:
<a>
<b key="fruit.available">pineapple</b>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>
These files are have the same content, but the position of the banana line means that they are considered different by traditional diff. Are there any tools that can perform a sort such that the files are considered the same?
By the way, the XML file structures are more complicated than the examples above!
xml sorting diff
add a comment |
up vote
6
down vote
favorite
I need to compare two XML files, each of which is about 13,000 lines long.
Sadly the code that generates these files doesn't generate the data in the same order each time (the data comes from a database).
Therefore, I get false positives when using a standard line-by-line diff utility (WinMerge), even after canonicalising the XML file.
As an example of my problem:
file1:
<a>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">pineapple</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>
file2:
<a>
<b key="fruit.available">pineapple</b>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>
These files are have the same content, but the position of the banana line means that they are considered different by traditional diff. Are there any tools that can perform a sort such that the files are considered the same?
By the way, the XML file structures are more complicated than the examples above!
xml sorting diff
Why don't you sort the data your getting from the database before you write the file?
– Ramhound
Sep 13 '11 at 12:07
I don't have access to the database, just the application's front end. I have one instance of the application which works, one which doesn't. I'm trying to compare their configuration, and the only way I can do that is to output a dump of their configuration and compare them :(
– Rich
Sep 13 '11 at 13:17
What do you want the output to look like? Is it sufficient to say that (for example) file1 has mango and file2 does not? Or do you need line numbers, xml attributes, etc?
– jdigital
Sep 13 '11 at 21:35
I asked this on softwarerecs.se
– Jan Doggen
May 13 at 20:15
add a comment |
up vote
6
down vote
favorite
up vote
6
down vote
favorite
I need to compare two XML files, each of which is about 13,000 lines long.
Sadly the code that generates these files doesn't generate the data in the same order each time (the data comes from a database).
Therefore, I get false positives when using a standard line-by-line diff utility (WinMerge), even after canonicalising the XML file.
As an example of my problem:
file1:
<a>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">pineapple</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>
file2:
<a>
<b key="fruit.available">pineapple</b>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>
These files are have the same content, but the position of the banana line means that they are considered different by traditional diff. Are there any tools that can perform a sort such that the files are considered the same?
By the way, the XML file structures are more complicated than the examples above!
xml sorting diff
I need to compare two XML files, each of which is about 13,000 lines long.
Sadly the code that generates these files doesn't generate the data in the same order each time (the data comes from a database).
Therefore, I get false positives when using a standard line-by-line diff utility (WinMerge), even after canonicalising the XML file.
As an example of my problem:
file1:
<a>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">pineapple</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>
file2:
<a>
<b key="fruit.available">pineapple</b>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>
These files are have the same content, but the position of the banana line means that they are considered different by traditional diff. Are there any tools that can perform a sort such that the files are considered the same?
By the way, the XML file structures are more complicated than the examples above!
xml sorting diff
xml sorting diff
edited Mar 20 '17 at 10:17
Community♦
1
1
asked Sep 13 '11 at 11:41
Rich
78521538
78521538
Why don't you sort the data your getting from the database before you write the file?
– Ramhound
Sep 13 '11 at 12:07
I don't have access to the database, just the application's front end. I have one instance of the application which works, one which doesn't. I'm trying to compare their configuration, and the only way I can do that is to output a dump of their configuration and compare them :(
– Rich
Sep 13 '11 at 13:17
What do you want the output to look like? Is it sufficient to say that (for example) file1 has mango and file2 does not? Or do you need line numbers, xml attributes, etc?
– jdigital
Sep 13 '11 at 21:35
I asked this on softwarerecs.se
– Jan Doggen
May 13 at 20:15
add a comment |
Why don't you sort the data your getting from the database before you write the file?
– Ramhound
Sep 13 '11 at 12:07
I don't have access to the database, just the application's front end. I have one instance of the application which works, one which doesn't. I'm trying to compare their configuration, and the only way I can do that is to output a dump of their configuration and compare them :(
– Rich
Sep 13 '11 at 13:17
What do you want the output to look like? Is it sufficient to say that (for example) file1 has mango and file2 does not? Or do you need line numbers, xml attributes, etc?
– jdigital
Sep 13 '11 at 21:35
I asked this on softwarerecs.se
– Jan Doggen
May 13 at 20:15
Why don't you sort the data your getting from the database before you write the file?
– Ramhound
Sep 13 '11 at 12:07
Why don't you sort the data your getting from the database before you write the file?
– Ramhound
Sep 13 '11 at 12:07
I don't have access to the database, just the application's front end. I have one instance of the application which works, one which doesn't. I'm trying to compare their configuration, and the only way I can do that is to output a dump of their configuration and compare them :(
– Rich
Sep 13 '11 at 13:17
I don't have access to the database, just the application's front end. I have one instance of the application which works, one which doesn't. I'm trying to compare their configuration, and the only way I can do that is to output a dump of their configuration and compare them :(
– Rich
Sep 13 '11 at 13:17
What do you want the output to look like? Is it sufficient to say that (for example) file1 has mango and file2 does not? Or do you need line numbers, xml attributes, etc?
– jdigital
Sep 13 '11 at 21:35
What do you want the output to look like? Is it sufficient to say that (for example) file1 has mango and file2 does not? Or do you need line numbers, xml attributes, etc?
– jdigital
Sep 13 '11 at 21:35
I asked this on softwarerecs.se
– Jan Doggen
May 13 at 20:15
I asked this on softwarerecs.se
– Jan Doggen
May 13 at 20:15
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
I think you can use a tool such as xmldiff for this purposes.
http://diffxml.sourceforge.net/
On the tools webpage it states:
The standard Unix tools diff and patch are used to find the differences between text files and to apply the differences. These tools operate on a line by line basis using well-studied methods for computing the longest common subsequence (LCS).
Using these tools on hierarchically structured data (XML etc) leads to sub-optimal results, as they are incapable of recognizing the tree-based structure of these files.
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
I think you can use a tool such as xmldiff for this purposes.
http://diffxml.sourceforge.net/
On the tools webpage it states:
The standard Unix tools diff and patch are used to find the differences between text files and to apply the differences. These tools operate on a line by line basis using well-studied methods for computing the longest common subsequence (LCS).
Using these tools on hierarchically structured data (XML etc) leads to sub-optimal results, as they are incapable of recognizing the tree-based structure of these files.
add a comment |
up vote
0
down vote
I think you can use a tool such as xmldiff for this purposes.
http://diffxml.sourceforge.net/
On the tools webpage it states:
The standard Unix tools diff and patch are used to find the differences between text files and to apply the differences. These tools operate on a line by line basis using well-studied methods for computing the longest common subsequence (LCS).
Using these tools on hierarchically structured data (XML etc) leads to sub-optimal results, as they are incapable of recognizing the tree-based structure of these files.
add a comment |
up vote
0
down vote
up vote
0
down vote
I think you can use a tool such as xmldiff for this purposes.
http://diffxml.sourceforge.net/
On the tools webpage it states:
The standard Unix tools diff and patch are used to find the differences between text files and to apply the differences. These tools operate on a line by line basis using well-studied methods for computing the longest common subsequence (LCS).
Using these tools on hierarchically structured data (XML etc) leads to sub-optimal results, as they are incapable of recognizing the tree-based structure of these files.
I think you can use a tool such as xmldiff for this purposes.
http://diffxml.sourceforge.net/
On the tools webpage it states:
The standard Unix tools diff and patch are used to find the differences between text files and to apply the differences. These tools operate on a line by line basis using well-studied methods for computing the longest common subsequence (LCS).
Using these tools on hierarchically structured data (XML etc) leads to sub-optimal results, as they are incapable of recognizing the tree-based structure of these files.
answered Dec 15 '13 at 14:07
Kenneth Yrke Joergensen
16216
16216
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f335005%2fsorting-xml-files-so-that-differences-can-then-be-found%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Why don't you sort the data your getting from the database before you write the file?
– Ramhound
Sep 13 '11 at 12:07
I don't have access to the database, just the application's front end. I have one instance of the application which works, one which doesn't. I'm trying to compare their configuration, and the only way I can do that is to output a dump of their configuration and compare them :(
– Rich
Sep 13 '11 at 13:17
What do you want the output to look like? Is it sufficient to say that (for example) file1 has mango and file2 does not? Or do you need line numbers, xml attributes, etc?
– jdigital
Sep 13 '11 at 21:35
I asked this on softwarerecs.se
– Jan Doggen
May 13 at 20:15