Having trouble parsing a csv file from the commandline












2















I have a CSV file that I've been working on for most of the day and I'm not having any luck getting it parsed properly using a Regex with awk.



awk is not processing the Regex as expected.



Here are the inputs:




  • GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5-p2, GNU MP 6.1.2)

  • Regex: /([(.*?)])|[^,]+/g

  • Sample text hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3],[recipe1, recipe2, recipe3],2019-01-10 06:06:31

  • Raw text (before stripping out double-quotes, which I do at a step not listed explicitly in this question):
    hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31


When I run this through Regexr.com, it shows the proper matches:
Screencap from Regexr



I pipe from cat -> sed -> awk (the sample text above is what comes out of sed) and run the following command (I only want the first 9 fields, which includes the entirety of the first field enclosed in , but not anything after that):



awk '/([(.*?)])|[^,]+/g{print $1,$2,$3,$4,$5,$6,$7,$8,$9}'


What I'm expecting to see as an output:
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]



NOTE: The important part about this is treating the field with the roles (between the brackets) as a single field, or at least including all of the roles in the output, but none of the recipes)



What I'm actually getting out is the full line that was fed in.



From playing around with the variables, I found the following field assigments coming out of awk:




  • $1 = hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1,

  • $2 = role2,

  • $3 = role3],[recipe1,

  • $4 = recipe2,

  • $5 = recipe3],2019-01-10

  • $6 = 06:06:31


I've tried using the accepted answer from This Stack Overflow question outright, and I've tried tweaking it to use as the delimiters instead of ", and that gets me closer, but it's still not treating the role field as a single field.










share|improve this question









New contributor




Taegost is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

    – Kusalananda
    5 hours ago











  • Actually, what are you trying to do with the regex generally? Which lines do you want to print?

    – terdon
    5 hours ago











  • I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

    – Taegost
    4 hours ago











  • @Taegost OK, but what is the regex doing? Your print statement already does that.

    – terdon
    4 hours ago
















2















I have a CSV file that I've been working on for most of the day and I'm not having any luck getting it parsed properly using a Regex with awk.



awk is not processing the Regex as expected.



Here are the inputs:




  • GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5-p2, GNU MP 6.1.2)

  • Regex: /([(.*?)])|[^,]+/g

  • Sample text hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3],[recipe1, recipe2, recipe3],2019-01-10 06:06:31

  • Raw text (before stripping out double-quotes, which I do at a step not listed explicitly in this question):
    hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31


When I run this through Regexr.com, it shows the proper matches:
Screencap from Regexr



I pipe from cat -> sed -> awk (the sample text above is what comes out of sed) and run the following command (I only want the first 9 fields, which includes the entirety of the first field enclosed in , but not anything after that):



awk '/([(.*?)])|[^,]+/g{print $1,$2,$3,$4,$5,$6,$7,$8,$9}'


What I'm expecting to see as an output:
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]



NOTE: The important part about this is treating the field with the roles (between the brackets) as a single field, or at least including all of the roles in the output, but none of the recipes)



What I'm actually getting out is the full line that was fed in.



From playing around with the variables, I found the following field assigments coming out of awk:




  • $1 = hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1,

  • $2 = role2,

  • $3 = role3],[recipe1,

  • $4 = recipe2,

  • $5 = recipe3],2019-01-10

  • $6 = 06:06:31


I've tried using the accepted answer from This Stack Overflow question outright, and I've tried tweaking it to use as the delimiters instead of ", and that gets me closer, but it's still not treating the role field as a single field.










share|improve this question









New contributor




Taegost is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





















  • What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

    – Kusalananda
    5 hours ago











  • Actually, what are you trying to do with the regex generally? Which lines do you want to print?

    – terdon
    5 hours ago











  • I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

    – Taegost
    4 hours ago











  • @Taegost OK, but what is the regex doing? Your print statement already does that.

    – terdon
    4 hours ago














2












2








2








I have a CSV file that I've been working on for most of the day and I'm not having any luck getting it parsed properly using a Regex with awk.



awk is not processing the Regex as expected.



Here are the inputs:




  • GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5-p2, GNU MP 6.1.2)

  • Regex: /([(.*?)])|[^,]+/g

  • Sample text hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3],[recipe1, recipe2, recipe3],2019-01-10 06:06:31

  • Raw text (before stripping out double-quotes, which I do at a step not listed explicitly in this question):
    hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31


When I run this through Regexr.com, it shows the proper matches:
Screencap from Regexr



I pipe from cat -> sed -> awk (the sample text above is what comes out of sed) and run the following command (I only want the first 9 fields, which includes the entirety of the first field enclosed in , but not anything after that):



awk '/([(.*?)])|[^,]+/g{print $1,$2,$3,$4,$5,$6,$7,$8,$9}'


What I'm expecting to see as an output:
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]



NOTE: The important part about this is treating the field with the roles (between the brackets) as a single field, or at least including all of the roles in the output, but none of the recipes)



What I'm actually getting out is the full line that was fed in.



From playing around with the variables, I found the following field assigments coming out of awk:




  • $1 = hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1,

  • $2 = role2,

  • $3 = role3],[recipe1,

  • $4 = recipe2,

  • $5 = recipe3],2019-01-10

  • $6 = 06:06:31


I've tried using the accepted answer from This Stack Overflow question outright, and I've tried tweaking it to use as the delimiters instead of ", and that gets me closer, but it's still not treating the role field as a single field.










share|improve this question









New contributor




Taegost is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












I have a CSV file that I've been working on for most of the day and I'm not having any luck getting it parsed properly using a Regex with awk.



awk is not processing the Regex as expected.



Here are the inputs:




  • GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5-p2, GNU MP 6.1.2)

  • Regex: /([(.*?)])|[^,]+/g

  • Sample text hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3],[recipe1, recipe2, recipe3],2019-01-10 06:06:31

  • Raw text (before stripping out double-quotes, which I do at a step not listed explicitly in this question):
    hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31


When I run this through Regexr.com, it shows the proper matches:
Screencap from Regexr



I pipe from cat -> sed -> awk (the sample text above is what comes out of sed) and run the following command (I only want the first 9 fields, which includes the entirety of the first field enclosed in , but not anything after that):



awk '/([(.*?)])|[^,]+/g{print $1,$2,$3,$4,$5,$6,$7,$8,$9}'


What I'm expecting to see as an output:
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]



NOTE: The important part about this is treating the field with the roles (between the brackets) as a single field, or at least including all of the roles in the output, but none of the recipes)



What I'm actually getting out is the full line that was fed in.



From playing around with the variables, I found the following field assigments coming out of awk:




  • $1 = hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1,

  • $2 = role2,

  • $3 = role3],[recipe1,

  • $4 = recipe2,

  • $5 = recipe3],2019-01-10

  • $6 = 06:06:31


I've tried using the accepted answer from This Stack Overflow question outright, and I've tried tweaking it to use as the delimiters instead of ", and that gets me closer, but it's still not treating the role field as a single field.







awk regular-expression csv






share|improve this question









New contributor




Taegost is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Taegost is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 3 hours ago









Rui F Ribeiro

39.4k1479131




39.4k1479131






New contributor




Taegost is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 5 hours ago









TaegostTaegost

1155




1155




New contributor




Taegost is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Taegost is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Taegost is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.













  • What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

    – Kusalananda
    5 hours ago











  • Actually, what are you trying to do with the regex generally? Which lines do you want to print?

    – terdon
    5 hours ago











  • I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

    – Taegost
    4 hours ago











  • @Taegost OK, but what is the regex doing? Your print statement already does that.

    – terdon
    4 hours ago



















  • What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

    – Kusalananda
    5 hours ago











  • Actually, what are you trying to do with the regex generally? Which lines do you want to print?

    – terdon
    5 hours ago











  • I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

    – Taegost
    4 hours ago











  • @Taegost OK, but what is the regex doing? Your print statement already does that.

    – terdon
    4 hours ago

















What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

– Kusalananda
5 hours ago





What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

– Kusalananda
5 hours ago













Actually, what are you trying to do with the regex generally? Which lines do you want to print?

– terdon
5 hours ago





Actually, what are you trying to do with the regex generally? Which lines do you want to print?

– terdon
5 hours ago













I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

– Taegost
4 hours ago





I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

– Taegost
4 hours ago













@Taegost OK, but what is the regex doing? Your print statement already does that.

– terdon
4 hours ago





@Taegost OK, but what is the regex doing? Your print statement already does that.

– terdon
4 hours ago










2 Answers
2






active

oldest

votes


















2














By default, awk will use whitespace to define fields, which explains why you're getting the output you see. Since you want to use a comma to separate the fields, you need to say so with -F:



awk -F, '{...}' 


To have awk print comma separated output, you need to set the OFS variable:



awk -F, -vOFS=, '{...}' 


The real difficulty here is that you are then trying to treat [role1, role2, role3] as a single field, but that's 3 fields. There are commas there, so that will be split into [role1, role2 and role3]. If you know there will always be exactly 3 fields there, it's easy:



$ awk -F, -vOFS=, '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11}' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]




However, based on the raw data you have now added, while a proper CSV parser will always be the better approach, you can still do it in awk. Just run this on your original input data:



$ awk -F']' -vOFS=, '{gsub(/"/,"");print $1"]"}' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]


The trick is to use ] as the field separator and tell awk to only print the 1st field. This will print everything up to the first ]. We then add back the ] (since that is removed when the fields are built). The gsub removes all the quotes.






share|improve this answer


























  • That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

    – Taegost
    4 hours ago











  • @Taegost they might be for csv, but they aren't for awk.

    – terdon
    4 hours ago











  • Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

    – Taegost
    4 hours ago











  • If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

    – DopeGhoti
    4 hours ago











  • @Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

    – terdon
    4 hours ago



















3














If you're dealing with a complex CSV file - in particular, one whose fields may contain quoted delimiters (in this case, commas) then a proper CSV parser is going to save a lot of headaches e.g. with csvtool



$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
csvtool col 1-9 -
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]"


or (to remove the quotes)



$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
csvtool col 1-9 - | tr -d '"'
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]




If you can't obtain a standalone CSV parser such as csvtool, then both Perl and Python have CSV modules e.g.



perl -MText::CSV -lpe '
BEGIN{$p = Text::CSV->new()}
$_ = join ",", map { $_ = s/"//gr } ($p->fields())[0..8] if $p->parse($_)
'





share|improve this answer


























  • This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

    – Taegost
    3 hours ago











  • @Taegost if you figure out how to do it in Ruby, please add that as an answer!

    – steeldriver
    3 hours ago











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});






Taegost is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f494013%2fhaving-trouble-parsing-a-csv-file-from-the-commandline%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














By default, awk will use whitespace to define fields, which explains why you're getting the output you see. Since you want to use a comma to separate the fields, you need to say so with -F:



awk -F, '{...}' 


To have awk print comma separated output, you need to set the OFS variable:



awk -F, -vOFS=, '{...}' 


The real difficulty here is that you are then trying to treat [role1, role2, role3] as a single field, but that's 3 fields. There are commas there, so that will be split into [role1, role2 and role3]. If you know there will always be exactly 3 fields there, it's easy:



$ awk -F, -vOFS=, '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11}' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]




However, based on the raw data you have now added, while a proper CSV parser will always be the better approach, you can still do it in awk. Just run this on your original input data:



$ awk -F']' -vOFS=, '{gsub(/"/,"");print $1"]"}' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]


The trick is to use ] as the field separator and tell awk to only print the 1st field. This will print everything up to the first ]. We then add back the ] (since that is removed when the fields are built). The gsub removes all the quotes.






share|improve this answer


























  • That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

    – Taegost
    4 hours ago











  • @Taegost they might be for csv, but they aren't for awk.

    – terdon
    4 hours ago











  • Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

    – Taegost
    4 hours ago











  • If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

    – DopeGhoti
    4 hours ago











  • @Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

    – terdon
    4 hours ago
















2














By default, awk will use whitespace to define fields, which explains why you're getting the output you see. Since you want to use a comma to separate the fields, you need to say so with -F:



awk -F, '{...}' 


To have awk print comma separated output, you need to set the OFS variable:



awk -F, -vOFS=, '{...}' 


The real difficulty here is that you are then trying to treat [role1, role2, role3] as a single field, but that's 3 fields. There are commas there, so that will be split into [role1, role2 and role3]. If you know there will always be exactly 3 fields there, it's easy:



$ awk -F, -vOFS=, '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11}' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]




However, based on the raw data you have now added, while a proper CSV parser will always be the better approach, you can still do it in awk. Just run this on your original input data:



$ awk -F']' -vOFS=, '{gsub(/"/,"");print $1"]"}' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]


The trick is to use ] as the field separator and tell awk to only print the 1st field. This will print everything up to the first ]. We then add back the ] (since that is removed when the fields are built). The gsub removes all the quotes.






share|improve this answer


























  • That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

    – Taegost
    4 hours ago











  • @Taegost they might be for csv, but they aren't for awk.

    – terdon
    4 hours ago











  • Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

    – Taegost
    4 hours ago











  • If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

    – DopeGhoti
    4 hours ago











  • @Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

    – terdon
    4 hours ago














2












2








2







By default, awk will use whitespace to define fields, which explains why you're getting the output you see. Since you want to use a comma to separate the fields, you need to say so with -F:



awk -F, '{...}' 


To have awk print comma separated output, you need to set the OFS variable:



awk -F, -vOFS=, '{...}' 


The real difficulty here is that you are then trying to treat [role1, role2, role3] as a single field, but that's 3 fields. There are commas there, so that will be split into [role1, role2 and role3]. If you know there will always be exactly 3 fields there, it's easy:



$ awk -F, -vOFS=, '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11}' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]




However, based on the raw data you have now added, while a proper CSV parser will always be the better approach, you can still do it in awk. Just run this on your original input data:



$ awk -F']' -vOFS=, '{gsub(/"/,"");print $1"]"}' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]


The trick is to use ] as the field separator and tell awk to only print the 1st field. This will print everything up to the first ]. We then add back the ] (since that is removed when the fields are built). The gsub removes all the quotes.






share|improve this answer















By default, awk will use whitespace to define fields, which explains why you're getting the output you see. Since you want to use a comma to separate the fields, you need to say so with -F:



awk -F, '{...}' 


To have awk print comma separated output, you need to set the OFS variable:



awk -F, -vOFS=, '{...}' 


The real difficulty here is that you are then trying to treat [role1, role2, role3] as a single field, but that's 3 fields. There are commas there, so that will be split into [role1, role2 and role3]. If you know there will always be exactly 3 fields there, it's easy:



$ awk -F, -vOFS=, '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11}' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]




However, based on the raw data you have now added, while a proper CSV parser will always be the better approach, you can still do it in awk. Just run this on your original input data:



$ awk -F']' -vOFS=, '{gsub(/"/,"");print $1"]"}' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]


The trick is to use ] as the field separator and tell awk to only print the 1st field. This will print everything up to the first ]. We then add back the ] (since that is removed when the fields are built). The gsub removes all the quotes.







share|improve this answer














share|improve this answer



share|improve this answer








edited 4 hours ago

























answered 5 hours ago









terdonterdon

129k32253428




129k32253428













  • That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

    – Taegost
    4 hours ago











  • @Taegost they might be for csv, but they aren't for awk.

    – terdon
    4 hours ago











  • Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

    – Taegost
    4 hours ago











  • If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

    – DopeGhoti
    4 hours ago











  • @Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

    – terdon
    4 hours ago



















  • That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

    – Taegost
    4 hours ago











  • @Taegost they might be for csv, but they aren't for awk.

    – terdon
    4 hours ago











  • Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

    – Taegost
    4 hours ago











  • If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

    – DopeGhoti
    4 hours ago











  • @Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

    – terdon
    4 hours ago

















That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

– Taegost
4 hours ago





That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

– Taegost
4 hours ago













@Taegost they might be for csv, but they aren't for awk.

– terdon
4 hours ago





@Taegost they might be for csv, but they aren't for awk.

– terdon
4 hours ago













Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

– Taegost
4 hours ago





Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

– Taegost
4 hours ago













If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

– DopeGhoti
4 hours ago





If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

– DopeGhoti
4 hours ago













@Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

– terdon
4 hours ago





@Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

– terdon
4 hours ago













3














If you're dealing with a complex CSV file - in particular, one whose fields may contain quoted delimiters (in this case, commas) then a proper CSV parser is going to save a lot of headaches e.g. with csvtool



$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
csvtool col 1-9 -
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]"


or (to remove the quotes)



$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
csvtool col 1-9 - | tr -d '"'
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]




If you can't obtain a standalone CSV parser such as csvtool, then both Perl and Python have CSV modules e.g.



perl -MText::CSV -lpe '
BEGIN{$p = Text::CSV->new()}
$_ = join ",", map { $_ = s/"//gr } ($p->fields())[0..8] if $p->parse($_)
'





share|improve this answer


























  • This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

    – Taegost
    3 hours ago











  • @Taegost if you figure out how to do it in Ruby, please add that as an answer!

    – steeldriver
    3 hours ago
















3














If you're dealing with a complex CSV file - in particular, one whose fields may contain quoted delimiters (in this case, commas) then a proper CSV parser is going to save a lot of headaches e.g. with csvtool



$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
csvtool col 1-9 -
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]"


or (to remove the quotes)



$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
csvtool col 1-9 - | tr -d '"'
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]




If you can't obtain a standalone CSV parser such as csvtool, then both Perl and Python have CSV modules e.g.



perl -MText::CSV -lpe '
BEGIN{$p = Text::CSV->new()}
$_ = join ",", map { $_ = s/"//gr } ($p->fields())[0..8] if $p->parse($_)
'





share|improve this answer


























  • This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

    – Taegost
    3 hours ago











  • @Taegost if you figure out how to do it in Ruby, please add that as an answer!

    – steeldriver
    3 hours ago














3












3








3







If you're dealing with a complex CSV file - in particular, one whose fields may contain quoted delimiters (in this case, commas) then a proper CSV parser is going to save a lot of headaches e.g. with csvtool



$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
csvtool col 1-9 -
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]"


or (to remove the quotes)



$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
csvtool col 1-9 - | tr -d '"'
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]




If you can't obtain a standalone CSV parser such as csvtool, then both Perl and Python have CSV modules e.g.



perl -MText::CSV -lpe '
BEGIN{$p = Text::CSV->new()}
$_ = join ",", map { $_ = s/"//gr } ($p->fields())[0..8] if $p->parse($_)
'





share|improve this answer















If you're dealing with a complex CSV file - in particular, one whose fields may contain quoted delimiters (in this case, commas) then a proper CSV parser is going to save a lot of headaches e.g. with csvtool



$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
csvtool col 1-9 -
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]"


or (to remove the quotes)



$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
csvtool col 1-9 - | tr -d '"'
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]




If you can't obtain a standalone CSV parser such as csvtool, then both Perl and Python have CSV modules e.g.



perl -MText::CSV -lpe '
BEGIN{$p = Text::CSV->new()}
$_ = join ",", map { $_ = s/"//gr } ($p->fields())[0..8] if $p->parse($_)
'






share|improve this answer














share|improve this answer



share|improve this answer








edited 4 hours ago

























answered 4 hours ago









steeldriversteeldriver

34.9k35184




34.9k35184













  • This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

    – Taegost
    3 hours ago











  • @Taegost if you figure out how to do it in Ruby, please add that as an answer!

    – steeldriver
    3 hours ago



















  • This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

    – Taegost
    3 hours ago











  • @Taegost if you figure out how to do it in Ruby, please add that as an answer!

    – steeldriver
    3 hours ago

















This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

– Taegost
3 hours ago





This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

– Taegost
3 hours ago













@Taegost if you figure out how to do it in Ruby, please add that as an answer!

– steeldriver
3 hours ago





@Taegost if you figure out how to do it in Ruby, please add that as an answer!

– steeldriver
3 hours ago










Taegost is a new contributor. Be nice, and check out our Code of Conduct.










draft saved

draft discarded


















Taegost is a new contributor. Be nice, and check out our Code of Conduct.













Taegost is a new contributor. Be nice, and check out our Code of Conduct.












Taegost is a new contributor. Be nice, and check out our Code of Conduct.
















Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f494013%2fhaving-trouble-parsing-a-csv-file-from-the-commandline%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Mouse cursor on multiple screens with different PPI

Agildo Ribeiro

Sometime when accessing a menu: “Ubuntu 16.04 has experienced an internal error”