Having trouble parsing a csv file from the commandline

I have a CSV file that I've been working on for most of the day and I'm not having any luck getting it parsed properly using a Regex with awk.

awk is not processing the Regex as expected.

Here are the inputs:

GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5-p2, GNU MP 6.1.2)

Regex: /([(.*?)])|[^,]+/g

Sample text hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3],[recipe1, recipe2, recipe3],2019-01-10 06:06:31

Raw text (before stripping out double-quotes, which I do at a step not listed explicitly in this question):
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31

When I run this through Regexr.com, it shows the proper matches:
Screencap from Regexr

I pipe from cat -> sed -> awk (the sample text above is what comes out of sed) and run the following command (I only want the first 9 fields, which includes the entirety of the first field enclosed in , but not anything after that):

awk '/([(.*?)])|[^,]+/g{print $1,$2,$3,$4,$5,$6,$7,$8,$9}'

What I'm expecting to see as an output:
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

NOTE: The important part about this is treating the field with the roles (between the brackets) as a single field, or at least including all of the roles in the output, but none of the recipes)

What I'm actually getting out is the full line that was fed in.

From playing around with the variables, I found the following field assigments coming out of awk:

$1 = hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1,

$2 = role2,

$3 = role3],[recipe1,

$4 = recipe2,

$5 = recipe3],2019-01-10

$6 = 06:06:31

I've tried using the accepted answer from This Stack Overflow question outright, and I've tried tweaking it to use as the delimiters instead of ", and that gets me closer, but it's still not treating the role field as a single field.

edited 3 hours ago

Rui F Ribeiro

39.4k1479131

asked 5 hours ago

Taegost

1155

New contributor

What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

– Kusalananda
5 hours ago

Actually, what are you trying to do with the regex generally? Which lines do you want to print?

– terdon♦
5 hours ago

I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

– Taegost
4 hours ago

@Taegost OK, but what is the regex doing? Your print statement already does that.

– terdon♦
4 hours ago

add a comment |

I have a CSV file that I've been working on for most of the day and I'm not having any luck getting it parsed properly using a Regex with awk.

awk is not processing the Regex as expected.

Here are the inputs:

GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5-p2, GNU MP 6.1.2)

Regex: /([(.*?)])|[^,]+/g

Sample text hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3],[recipe1, recipe2, recipe3],2019-01-10 06:06:31

Raw text (before stripping out double-quotes, which I do at a step not listed explicitly in this question):
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31

When I run this through Regexr.com, it shows the proper matches:
Screencap from Regexr

awk '/([(.*?)])|[^,]+/g{print $1,$2,$3,$4,$5,$6,$7,$8,$9}'

What I'm expecting to see as an output:
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

NOTE: The important part about this is treating the field with the roles (between the brackets) as a single field, or at least including all of the roles in the output, but none of the recipes)

What I'm actually getting out is the full line that was fed in.

From playing around with the variables, I found the following field assigments coming out of awk:

$1 = hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1,

$2 = role2,

$3 = role3],[recipe1,

$4 = recipe2,

$5 = recipe3],2019-01-10

$6 = 06:06:31

edited 3 hours ago

Rui F Ribeiro

39.4k1479131

asked 5 hours ago

Taegost

1155

New contributor

What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

– Kusalananda
5 hours ago

Actually, what are you trying to do with the regex generally? Which lines do you want to print?

– terdon♦
5 hours ago

I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

– Taegost
4 hours ago

@Taegost OK, but what is the regex doing? Your print statement already does that.

– terdon♦
4 hours ago

add a comment |

I have a CSV file that I've been working on for most of the day and I'm not having any luck getting it parsed properly using a Regex with awk.

awk is not processing the Regex as expected.

Here are the inputs:

GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5-p2, GNU MP 6.1.2)

Regex: /([(.*?)])|[^,]+/g

Sample text hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3],[recipe1, recipe2, recipe3],2019-01-10 06:06:31

Raw text (before stripping out double-quotes, which I do at a step not listed explicitly in this question):
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31

When I run this through Regexr.com, it shows the proper matches:
Screencap from Regexr

awk '/([(.*?)])|[^,]+/g{print $1,$2,$3,$4,$5,$6,$7,$8,$9}'

What I'm expecting to see as an output:
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

NOTE: The important part about this is treating the field with the roles (between the brackets) as a single field, or at least including all of the roles in the output, but none of the recipes)

What I'm actually getting out is the full line that was fed in.

From playing around with the variables, I found the following field assigments coming out of awk:

$1 = hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1,

$2 = role2,

$3 = role3],[recipe1,

$4 = recipe2,

$5 = recipe3],2019-01-10

$6 = 06:06:31

edited 3 hours ago

Rui F Ribeiro

39.4k1479131

asked 5 hours ago

Taegost

1155

New contributor

I have a CSV file that I've been working on for most of the day and I'm not having any luck getting it parsed properly using a Regex with awk.

awk is not processing the Regex as expected.

Here are the inputs:

GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5-p2, GNU MP 6.1.2)

Regex: /([(.*?)])|[^,]+/g

Sample text hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3],[recipe1, recipe2, recipe3],2019-01-10 06:06:31

Raw text (before stripping out double-quotes, which I do at a step not listed explicitly in this question):
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31

When I run this through Regexr.com, it shows the proper matches:
Screencap from Regexr

awk '/([(.*?)])|[^,]+/g{print $1,$2,$3,$4,$5,$6,$7,$8,$9}'

What I'm expecting to see as an output:
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

NOTE: The important part about this is treating the field with the roles (between the brackets) as a single field, or at least including all of the roles in the output, but none of the recipes)

What I'm actually getting out is the full line that was fed in.

From playing around with the variables, I found the following field assigments coming out of awk:

$1 = hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1,

$2 = role2,

$3 = role3],[recipe1,

$4 = recipe2,

$5 = recipe3],2019-01-10

$6 = 06:06:31

awk regular-expression csv

edited 3 hours ago

Rui F Ribeiro

39.4k1479131

asked 5 hours ago

Taegost

1155

New contributor

edited 3 hours ago

Rui F Ribeiro

39.4k1479131

asked 5 hours ago

Taegost

1155

New contributor

edited 3 hours ago

Rui F Ribeiro

39.4k1479131

edited 3 hours ago

Rui F Ribeiro

39.4k1479131

edited 3 hours ago

Rui F Ribeiro

39.4k1479131

asked 5 hours ago

Taegost

1155

New contributor

asked 5 hours ago

Taegost

1155

asked 5 hours ago

Taegost

1155

New contributor

Taegost is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

– Kusalananda
5 hours ago

Actually, what are you trying to do with the regex generally? Which lines do you want to print?

– terdon♦
5 hours ago

I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

– Taegost
4 hours ago

@Taegost OK, but what is the regex doing? Your print statement already does that.

– terdon♦
4 hours ago

add a comment |

What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

– Kusalananda
5 hours ago

Actually, what are you trying to do with the regex generally? Which lines do you want to print?

– terdon♦
5 hours ago

I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

– Taegost
4 hours ago

@Taegost OK, but what is the regex doing? Your print statement already does that.

– terdon♦
4 hours ago

What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

– Kusalananda
5 hours ago

Actually, what are you trying to do with the regex generally? Which lines do you want to print?

– terdon♦
5 hours ago

I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

– Taegost
4 hours ago

@Taegost OK, but what is the regex doing? Your print statement already does that.

– terdon♦
4 hours ago

add a comment |

2 Answers
2

active

oldest

votes

By default, awk will use whitespace to define fields, which explains why you're getting the output you see. Since you want to use a comma to separate the fields, you need to say so with -F:

awk -F, '{...}'

To have awk print comma separated output, you need to set the OFS variable:

awk -F, -vOFS=, '{...}'

The real difficulty here is that you are then trying to treat [role1, role2, role3] as a single field, but that's 3 fields. There are commas there, so that will be split into [role1, role2 and role3]. If you know there will always be exactly 3 fields there, it's easy:

$ awk -F, -vOFS=, '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11}' file

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

However, based on the raw data you have now added, while a proper CSV parser will always be the better approach, you can still do it in awk. Just run this on your original input data:

$ awk -F']' -vOFS=, '{gsub(/"/,"");print $1"]"}' file

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

The trick is to use ] as the field separator and tell awk to only print the 1st field. This will print everything up to the first ]. We then add back the ] (since that is removed when the fields are built). The gsub removes all the quotes.

edited 4 hours ago

answered 5 hours ago

terdon♦

129k32253428

That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

– Taegost
4 hours ago

@Taegost they might be for csv, but they aren't for awk.

– terdon♦
4 hours ago

Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

– Taegost
4 hours ago

If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

– DopeGhoti
4 hours ago

@Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

– terdon♦
4 hours ago

|
show 1 more comment

If you're dealing with a complex CSV file - in particular, one whose fields may contain quoted delimiters (in this case, commas) then a proper CSV parser is going to save a lot of headaches e.g. with csvtool

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 

    csvtool col 1-9 -

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]"

or (to remove the quotes)

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 

    csvtool col 1-9 - | tr -d '"'

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

If you can't obtain a standalone CSV parser such as csvtool, then both Perl and Python have CSV modules e.g.

perl -MText::CSV -lpe '

  BEGIN{$p = Text::CSV->new()} 

  $_ = join ",", map { $_ = s/"//gr } ($p->fields())[0..8] if $p->parse($_)

'

edited 4 hours ago

answered 4 hours ago

steeldriver

34.9k35184

This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

– Taegost
3 hours ago

@Taegost if you figure out how to do it in Ruby, please add that as an answer!

– steeldriver
3 hours ago

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Taegost is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f494013%2fhaving-trouble-parsing-a-csv-file-from-the-commandline%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

By default, awk will use whitespace to define fields, which explains why you're getting the output you see. Since you want to use a comma to separate the fields, you need to say so with -F:

awk -F, '{...}'

To have awk print comma separated output, you need to set the OFS variable:

awk -F, -vOFS=, '{...}'

$ awk -F, -vOFS=, '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11}' file

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

However, based on the raw data you have now added, while a proper CSV parser will always be the better approach, you can still do it in awk. Just run this on your original input data:

$ awk -F']' -vOFS=, '{gsub(/"/,"");print $1"]"}' file

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

edited 4 hours ago

answered 5 hours ago

terdon♦

129k32253428

That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

– Taegost
4 hours ago

@Taegost they might be for csv, but they aren't for awk.

– terdon♦
4 hours ago

Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

– Taegost
4 hours ago

If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

– DopeGhoti
4 hours ago

@Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

– terdon♦
4 hours ago

|
show 1 more comment

By default, awk will use whitespace to define fields, which explains why you're getting the output you see. Since you want to use a comma to separate the fields, you need to say so with -F:

awk -F, '{...}'

To have awk print comma separated output, you need to set the OFS variable:

awk -F, -vOFS=, '{...}'

$ awk -F, -vOFS=, '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11}' file

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

However, based on the raw data you have now added, while a proper CSV parser will always be the better approach, you can still do it in awk. Just run this on your original input data:

$ awk -F']' -vOFS=, '{gsub(/"/,"");print $1"]"}' file

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

edited 4 hours ago

answered 5 hours ago

terdon♦

129k32253428

That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

– Taegost
4 hours ago

@Taegost they might be for csv, but they aren't for awk.

– terdon♦
4 hours ago

Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

– Taegost
4 hours ago

If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

– DopeGhoti
4 hours ago

@Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

– terdon♦
4 hours ago

|
show 1 more comment

By default, awk will use whitespace to define fields, which explains why you're getting the output you see. Since you want to use a comma to separate the fields, you need to say so with -F:

awk -F, '{...}'

To have awk print comma separated output, you need to set the OFS variable:

awk -F, -vOFS=, '{...}'

$ awk -F, -vOFS=, '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11}' file

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

However, based on the raw data you have now added, while a proper CSV parser will always be the better approach, you can still do it in awk. Just run this on your original input data:

$ awk -F']' -vOFS=, '{gsub(/"/,"");print $1"]"}' file

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

edited 4 hours ago

answered 5 hours ago

terdon♦

129k32253428

By default, awk will use whitespace to define fields, which explains why you're getting the output you see. Since you want to use a comma to separate the fields, you need to say so with -F:

awk -F, '{...}'

To have awk print comma separated output, you need to set the OFS variable:

awk -F, -vOFS=, '{...}'

$ awk -F, -vOFS=, '{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11}' file

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

However, based on the raw data you have now added, while a proper CSV parser will always be the better approach, you can still do it in awk. Just run this on your original input data:

$ awk -F']' -vOFS=, '{gsub(/"/,"");print $1"]"}' file

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

edited 4 hours ago

answered 5 hours ago

terdon♦

129k32253428

edited 4 hours ago

answered 5 hours ago

terdon♦

129k32253428

answered 5 hours ago

terdon♦

129k32253428

answered 5 hours ago

terdon♦

129k32253428

That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

– Taegost
4 hours ago

@Taegost they might be for csv, but they aren't for awk.

– terdon♦
4 hours ago

Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

– Taegost
4 hours ago

If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

– DopeGhoti
4 hours ago

@Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

– terdon♦
4 hours ago

|
show 1 more comment

That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

– Taegost
4 hours ago

@Taegost they might be for csv, but they aren't for awk.

– terdon♦
4 hours ago

Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

– Taegost
4 hours ago

If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

– DopeGhoti
4 hours ago

@Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

– terdon♦
4 hours ago

That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

– Taegost
4 hours ago

@Taegost they might be for csv, but they aren't for awk.

– terdon♦
4 hours ago

Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

– Taegost
4 hours ago

If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

– DopeGhoti
4 hours ago

@Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

– terdon♦
4 hours ago

|
show 1 more comment

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 

    csvtool col 1-9 -

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]"

or (to remove the quotes)

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 

    csvtool col 1-9 - | tr -d '"'

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

If you can't obtain a standalone CSV parser such as csvtool, then both Perl and Python have CSV modules e.g.

perl -MText::CSV -lpe '

  BEGIN{$p = Text::CSV->new()} 

  $_ = join ",", map { $_ = s/"//gr } ($p->fields())[0..8] if $p->parse($_)

'

edited 4 hours ago

answered 4 hours ago

steeldriver

34.9k35184

This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

– Taegost
3 hours ago

@Taegost if you figure out how to do it in Ruby, please add that as an answer!

– steeldriver
3 hours ago

add a comment |

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 

    csvtool col 1-9 -

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]"

or (to remove the quotes)

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 

    csvtool col 1-9 - | tr -d '"'

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

If you can't obtain a standalone CSV parser such as csvtool, then both Perl and Python have CSV modules e.g.

perl -MText::CSV -lpe '

  BEGIN{$p = Text::CSV->new()} 

  $_ = join ",", map { $_ = s/"//gr } ($p->fields())[0..8] if $p->parse($_)

'

edited 4 hours ago

answered 4 hours ago

steeldriver

34.9k35184

This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

– Taegost
3 hours ago

@Taegost if you figure out how to do it in Ruby, please add that as an answer!

– steeldriver
3 hours ago

add a comment |

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 

    csvtool col 1-9 -

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]"

or (to remove the quotes)

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 

    csvtool col 1-9 - | tr -d '"'

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

If you can't obtain a standalone CSV parser such as csvtool, then both Perl and Python have CSV modules e.g.

perl -MText::CSV -lpe '

  BEGIN{$p = Text::CSV->new()} 

  $_ = join ",", map { $_ = s/"//gr } ($p->fields())[0..8] if $p->parse($_)

'

edited 4 hours ago

answered 4 hours ago

steeldriver

34.9k35184

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 

    csvtool col 1-9 -

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]"

or (to remove the quotes)

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 

    csvtool col 1-9 - | tr -d '"'

hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

If you can't obtain a standalone CSV parser such as csvtool, then both Perl and Python have CSV modules e.g.

perl -MText::CSV -lpe '

  BEGIN{$p = Text::CSV->new()} 

  $_ = join ",", map { $_ = s/"//gr } ($p->fields())[0..8] if $p->parse($_)

'

edited 4 hours ago

answered 4 hours ago

steeldriver

34.9k35184

edited 4 hours ago

answered 4 hours ago

steeldriver

34.9k35184

answered 4 hours ago

steeldriver

34.9k35184

answered 4 hours ago

steeldriver

34.9k35184

This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

– Taegost
3 hours ago

@Taegost if you figure out how to do it in Ruby, please add that as an answer!

– steeldriver
3 hours ago

add a comment |

This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

– Taegost
3 hours ago

@Taegost if you figure out how to do it in Ruby, please add that as an answer!

– steeldriver
3 hours ago

This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

– Taegost
3 hours ago

@Taegost if you figure out how to do it in Ruby, please add that as an answer!

– steeldriver
3 hours ago

add a comment |

Taegost is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Taegost is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Bdtyktl