Sorting XML files so that differences can then be found











up vote
6
down vote

favorite












I need to compare two XML files, each of which is about 13,000 lines long.



Sadly the code that generates these files doesn't generate the data in the same order each time (the data comes from a database).



Therefore, I get false positives when using a standard line-by-line diff utility (WinMerge), even after canonicalising the XML file.



As an example of my problem:



file1:



<a>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">pineapple</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>


file2:



<a>
<b key="fruit.available">pineapple</b>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>


These files are have the same content, but the position of the banana line means that they are considered different by traditional diff. Are there any tools that can perform a sort such that the files are considered the same?



By the way, the XML file structures are more complicated than the examples above!










share|improve this question
























  • Why don't you sort the data your getting from the database before you write the file?
    – Ramhound
    Sep 13 '11 at 12:07










  • I don't have access to the database, just the application's front end. I have one instance of the application which works, one which doesn't. I'm trying to compare their configuration, and the only way I can do that is to output a dump of their configuration and compare them :(
    – Rich
    Sep 13 '11 at 13:17










  • What do you want the output to look like? Is it sufficient to say that (for example) file1 has mango and file2 does not? Or do you need line numbers, xml attributes, etc?
    – jdigital
    Sep 13 '11 at 21:35










  • I asked this on softwarerecs.se
    – Jan Doggen
    May 13 at 20:15















up vote
6
down vote

favorite












I need to compare two XML files, each of which is about 13,000 lines long.



Sadly the code that generates these files doesn't generate the data in the same order each time (the data comes from a database).



Therefore, I get false positives when using a standard line-by-line diff utility (WinMerge), even after canonicalising the XML file.



As an example of my problem:



file1:



<a>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">pineapple</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>


file2:



<a>
<b key="fruit.available">pineapple</b>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>


These files are have the same content, but the position of the banana line means that they are considered different by traditional diff. Are there any tools that can perform a sort such that the files are considered the same?



By the way, the XML file structures are more complicated than the examples above!










share|improve this question
























  • Why don't you sort the data your getting from the database before you write the file?
    – Ramhound
    Sep 13 '11 at 12:07










  • I don't have access to the database, just the application's front end. I have one instance of the application which works, one which doesn't. I'm trying to compare their configuration, and the only way I can do that is to output a dump of their configuration and compare them :(
    – Rich
    Sep 13 '11 at 13:17










  • What do you want the output to look like? Is it sufficient to say that (for example) file1 has mango and file2 does not? Or do you need line numbers, xml attributes, etc?
    – jdigital
    Sep 13 '11 at 21:35










  • I asked this on softwarerecs.se
    – Jan Doggen
    May 13 at 20:15













up vote
6
down vote

favorite









up vote
6
down vote

favorite











I need to compare two XML files, each of which is about 13,000 lines long.



Sadly the code that generates these files doesn't generate the data in the same order each time (the data comes from a database).



Therefore, I get false positives when using a standard line-by-line diff utility (WinMerge), even after canonicalising the XML file.



As an example of my problem:



file1:



<a>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">pineapple</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>


file2:



<a>
<b key="fruit.available">pineapple</b>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>


These files are have the same content, but the position of the banana line means that they are considered different by traditional diff. Are there any tools that can perform a sort such that the files are considered the same?



By the way, the XML file structures are more complicated than the examples above!










share|improve this question















I need to compare two XML files, each of which is about 13,000 lines long.



Sadly the code that generates these files doesn't generate the data in the same order each time (the data comes from a database).



Therefore, I get false positives when using a standard line-by-line diff utility (WinMerge), even after canonicalising the XML file.



As an example of my problem:



file1:



<a>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">pineapple</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>


file2:



<a>
<b key="fruit.available">pineapple</b>
<b key="fruit.preferred">banana</b>
<b key="fruit.available">apple</b>
<b key="fruit.available">orange</b>
</a>


These files are have the same content, but the position of the banana line means that they are considered different by traditional diff. Are there any tools that can perform a sort such that the files are considered the same?



By the way, the XML file structures are more complicated than the examples above!







xml sorting diff






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 20 '17 at 10:17









Community

1




1










asked Sep 13 '11 at 11:41









Rich

78521538




78521538












  • Why don't you sort the data your getting from the database before you write the file?
    – Ramhound
    Sep 13 '11 at 12:07










  • I don't have access to the database, just the application's front end. I have one instance of the application which works, one which doesn't. I'm trying to compare their configuration, and the only way I can do that is to output a dump of their configuration and compare them :(
    – Rich
    Sep 13 '11 at 13:17










  • What do you want the output to look like? Is it sufficient to say that (for example) file1 has mango and file2 does not? Or do you need line numbers, xml attributes, etc?
    – jdigital
    Sep 13 '11 at 21:35










  • I asked this on softwarerecs.se
    – Jan Doggen
    May 13 at 20:15


















  • Why don't you sort the data your getting from the database before you write the file?
    – Ramhound
    Sep 13 '11 at 12:07










  • I don't have access to the database, just the application's front end. I have one instance of the application which works, one which doesn't. I'm trying to compare their configuration, and the only way I can do that is to output a dump of their configuration and compare them :(
    – Rich
    Sep 13 '11 at 13:17










  • What do you want the output to look like? Is it sufficient to say that (for example) file1 has mango and file2 does not? Or do you need line numbers, xml attributes, etc?
    – jdigital
    Sep 13 '11 at 21:35










  • I asked this on softwarerecs.se
    – Jan Doggen
    May 13 at 20:15
















Why don't you sort the data your getting from the database before you write the file?
– Ramhound
Sep 13 '11 at 12:07




Why don't you sort the data your getting from the database before you write the file?
– Ramhound
Sep 13 '11 at 12:07












I don't have access to the database, just the application's front end. I have one instance of the application which works, one which doesn't. I'm trying to compare their configuration, and the only way I can do that is to output a dump of their configuration and compare them :(
– Rich
Sep 13 '11 at 13:17




I don't have access to the database, just the application's front end. I have one instance of the application which works, one which doesn't. I'm trying to compare their configuration, and the only way I can do that is to output a dump of their configuration and compare them :(
– Rich
Sep 13 '11 at 13:17












What do you want the output to look like? Is it sufficient to say that (for example) file1 has mango and file2 does not? Or do you need line numbers, xml attributes, etc?
– jdigital
Sep 13 '11 at 21:35




What do you want the output to look like? Is it sufficient to say that (for example) file1 has mango and file2 does not? Or do you need line numbers, xml attributes, etc?
– jdigital
Sep 13 '11 at 21:35












I asked this on softwarerecs.se
– Jan Doggen
May 13 at 20:15




I asked this on softwarerecs.se
– Jan Doggen
May 13 at 20:15










1 Answer
1






active

oldest

votes

















up vote
0
down vote













I think you can use a tool such as xmldiff for this purposes.



http://diffxml.sourceforge.net/



On the tools webpage it states:




The standard Unix tools diff and patch are used to find the differences between text files and to apply the differences. These tools operate on a line by line basis using well-studied methods for computing the longest common subsequence (LCS).



Using these tools on hierarchically structured data (XML etc) leads to sub-optimal results, as they are incapable of recognizing the tree-based structure of these files.







share|improve this answer





















    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "3"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f335005%2fsorting-xml-files-so-that-differences-can-then-be-found%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    I think you can use a tool such as xmldiff for this purposes.



    http://diffxml.sourceforge.net/



    On the tools webpage it states:




    The standard Unix tools diff and patch are used to find the differences between text files and to apply the differences. These tools operate on a line by line basis using well-studied methods for computing the longest common subsequence (LCS).



    Using these tools on hierarchically structured data (XML etc) leads to sub-optimal results, as they are incapable of recognizing the tree-based structure of these files.







    share|improve this answer

























      up vote
      0
      down vote













      I think you can use a tool such as xmldiff for this purposes.



      http://diffxml.sourceforge.net/



      On the tools webpage it states:




      The standard Unix tools diff and patch are used to find the differences between text files and to apply the differences. These tools operate on a line by line basis using well-studied methods for computing the longest common subsequence (LCS).



      Using these tools on hierarchically structured data (XML etc) leads to sub-optimal results, as they are incapable of recognizing the tree-based structure of these files.







      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        I think you can use a tool such as xmldiff for this purposes.



        http://diffxml.sourceforge.net/



        On the tools webpage it states:




        The standard Unix tools diff and patch are used to find the differences between text files and to apply the differences. These tools operate on a line by line basis using well-studied methods for computing the longest common subsequence (LCS).



        Using these tools on hierarchically structured data (XML etc) leads to sub-optimal results, as they are incapable of recognizing the tree-based structure of these files.







        share|improve this answer












        I think you can use a tool such as xmldiff for this purposes.



        http://diffxml.sourceforge.net/



        On the tools webpage it states:




        The standard Unix tools diff and patch are used to find the differences between text files and to apply the differences. These tools operate on a line by line basis using well-studied methods for computing the longest common subsequence (LCS).



        Using these tools on hierarchically structured data (XML etc) leads to sub-optimal results, as they are incapable of recognizing the tree-based structure of these files.








        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 15 '13 at 14:07









        Kenneth Yrke Joergensen

        16216




        16216






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f335005%2fsorting-xml-files-so-that-differences-can-then-be-found%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            flock() on closed filehandle LOCK_FILE at /usr/bin/apt-mirror

            Mangá

            Eduardo VII do Reino Unido