The ultimate free solution, parse any CSV file to a JSON Array via a Power Automate Flow. This flow will parse any CSV file, no matter the encoding. It will handle Unix, Windows, or Mac files. The key values (i.e. the header row) will be dynamically mapped and it will handle CSV files of all shapes and sizes. Whilst I have previously blogged about parsing an array with a select action, this solution did not handle varying header lengths automatically.
The solution below will automatically handle the following:
- multiple sources within O365: OneDrive, SharePoint, or Outlook
- common file encodings (Unix, Windows & Mac)
- varying header sizes
- and most importantly it is free for you to use and copy
What does the Flow look like? The Pseudo Code
- Get File Content, this action is for OneDrive where data is actually returned as a string. I will also cover SharePoint and email data sources below. These data sources need to convert from Base64 Encoding to String.
- Compose (Delimiter), this is a parameter, leave unchanged as a comma ,. If you want to parse a pipe-delimited or semi colon-delimited, change as appropriate | or ;
- Compose (FileContent), this is for returning the file content of the Get File Content action. I also took the opportunity to replace quotes with an empty string. This is to cater for quote encapsulated CSV’s. Note that strings containing the separator will cause problems aligning the values/columns. I suggest you watch the following video here on a possible solution.
- Compose (LineEnding), allows an expression to check for the appearance of one of three return chars. 0d0a, 0a, or 0d, and setting the value as appropriate.
- Compose (Headers), use a split expression, and the LineEnding on the first element of the file content. Then split again using the Delimiter. We create an array of header values to be used as keys in the Apply to Each action.
- Apply To Each, skip the header, and split the Array using the LineEnding. Each loop is passed an array of row items. I have turned on concurrency for maximum efficiency.
- Select (EachObject), we create a range from 0 to header array length. The header/key is mapped by calling each element by an integer. We return the value by splitting the row by our Delimiter. As before, we return the value by using an integer. You can watch how we might call array elements by integer here.
- Select (Replace), we want to work with an object at this point. We convert the Array to a string and replace the curly brackets { and } with an empty string ”. We do the same with the square brackets [ and ] into curly brackets { and }.
- Compose (json), we run the new string through the JSON expression. This eliminates any escaped strings for our final array.
- Compose (Re Purposed Array), finally, we’ve made it (!). Outside of the apply to each, we can bring our objects together and create our re-purposed Array.
Flow Actions in More Detail
Below I have split the flow into two parts. The first demonstrates the data preparation. Defining the delimiter, preparing the file content, identifying the LineEnding and splitting the headers. Each of the expressions used are in the comments but also available below to copy and paste.
3. FileContent: replace(outputs('Get_file_content')?['body'],'"','') 4. LineEnding: if(equals(indexof(outputs('FileContent'), decodeUriComponent('%0D%0A')), -1), if(equals(indexof(outputs('FileContent'), decodeUriComponent('%0A')), -1), decodeUriComponent('%0D'), decodeUriComponent('%0A')), decodeUriComponent('%0D%0A')) 5. Headers: split(first(split(outputs('FileContent'),outputs('LineEnding'))),outputs('Delimiter'))
The second part of the solution is the apply to each. For each of the rows in the CSV we map out each of the the keys (i.e the header) and value (the column data values). We then convert the array returned by the Select into an Object. The json expression removes any escaped characters i.e. \”. This enables the final action to bring the objects back together as an Array.
6. Apply to Each: skip(split(outputs('FileContent'),outputs('LineEnding')),1) 7. Each Object: FROM: range(0, length(outputs('Headers'))) KEY: outputs('Headers')?[item()] VALUE: split(items('Apply_to_each'),outputs('Delimiter'))?[item()] 8. Replace: replace(replace(replace(replace(string(body('EachObject')), '{', ''), '}', ''), '[', '{'), ']', '}') 9. JSON: json(outputs('replace')) 10. Re-Purposed Array: outputs('json')
The Input vs Output
The advantage of retrieving file content from OneDrive is that it is returned as a string. Remember that this solution does not need you to specify the header keys. It works them out dynamically. If you have ten columns in the CSV, you will get ten keys in your array. If you change the file structure, it will map it immediately to fit the new file layout.
The final action of the Flow will return a perfectly formatted JSON Array. Note that the column headers have been applied to the keys.
You can now call any of the keys by building up an expression. For example, to return first_name Hanni, you would use the following:
first(outputs('Re_Purposed_Array'))?['first_name'] OR outputs('Re_Purposed_Array')?[0]?['first_name']
If you wanted to return all of the email addresses in an array so that you can send an email to all of the users, you could use a select action. Watch here to see this in action.
Other file sources, SharePoint and Outlook
For both SharePoint and Outlook, the file is retrieved in Base64. You need to use the expression Base64toString. After this has been performed, the actions of the flow are identical. You will recall that I used a compose called FileContent. If you replace the file content with the SharePoint or Outlook source (Including the replace), the Flow will operate exactly the same with minimal disruption.
In order to Get file content from SharePoint, you build an expression to select the $content from the body. Remember that we must convert from Base64 to string.
When an email is received in your Mailbox with A CSV attachment, convert the Base64 encoded attachment contentBytes to string and replace the double quotes with an empty string.
Want to understand File Encodings?
It is interesting to understand the return characters for the file encodings in Unix, Windows, and Mac files. Rather than repeat what is already out there, please read up on Wikipedia here if you are interested in learning more. I have highlighted the difference pictorially below for the more common encodings, Windows and Unix. My Flow handles all three encodings automatically via the use of the IndexOf expression, if one encoding doesn’t appear in the file contents, it will check for the other types.
Windows File Encoding with 0dx0a Line Break
Unix File Encoding with 0a Line Break
And finally….
There are paid-for actions out there to do this for you, for example, Encodian. There are also a few other ways of doing this. What can you learn from them?
- Power Automate: How to parse a CSV File to create a JSON array (Paulie Murana)
- Power Automate: How to parse a CSV file (Manuel Gomes)
- 3 steps to read csv files from SharePoint (Pieter Veenstra)
This solution covers a multitude of expressions and actions. It demonstrates how you can parse most, if not all CSV files to a JSON Array without the need to consider the size and shape of the Array. The headers and column sizes will be handled dynamically, As demonstrated, the source can be OneDrive, SharePoint or via attachments in Outlook.
A couple of discoveries from testing, some files contain byte order marking or BOM and so you might need to replace EF BB BF for an empty string if you see any strange characters at the beginning of your file. I’ve also noticed that some arrays contain an empty final object. This object could be skipped using a filter array where the object is empty. I’ve left you to explore this for yourself.
Hi, I’m getting an error with the ‘EachObject’ part.
Unable to process template language expressions in action ‘EachObject’ inputs at line ‘1’ and column ‘6110’: ‘The template language expression ‘outputs(‘Headers’)?[item()]’ cannot be evaluated because property ‘357e1bac-a7b2-4000-bae0-d7c813b39d6e,City Fibre,CF1 ,123 Made Up Street,Made up Town,Made Up City,Made Up county,AB12 3CD,4’ cannot be selected. Array elements can only be selected using an integer index. Please see https://aka.ms/logicexpressions for usage details.’.
Hi Joe, Hard to diagnose without seeing your flow. Have you managed to sort it? It’s basically saying you are returning elements, many of, and you need to call them by integer ?[0], ?[1] etc. If not drop me a message.
Thank you very much !!
I expect is a simple question, but I’ve been at it for a couple of weeks and hope you can assist.
Using your guide I have my flows working fine – from a SP file and email; thanks. However I can’t get the next step – parse the JSON and write the data to a SP list working.
I’m using the ‘Parse JSON’ action; Content: Outputs(‘Re_Purposed_Array’) and I take the schema from a sample run. However each time I get the same error: ValidationFailed. The schema validation failed.
Any advice you can provide I would appreciate.
I’m not a parse json fan, and if you’re keen to learn, I would suggest you try and understand the json payload. Happy to help you there. Best bet is to look at the output in a beautifier. Then you will learn to construct your own expressions and not rely on parse json. https://youtu.be/6nJSUNh579w take a wee look here and let me know if you need any more help.
Thanks for the super-quick reply. I took a look at my output and watched the video – very helpful, thanks. I’ll have to give more time to that. From the output I identified my issue, though I haven’t really solved it. It’s the final row of the CSV – it’s a CRLF or similar that creates an final empty (null) item in the JSON output.
Do you think that could be addressed by adjusting the ‘LineEnding’ compose action in your solution, or perhaps another set of actions to find the length of the JSON array and then split all but the last item?
Length() and take() are certainly one option, you can also use the filter array action, where the row is not equal to ”.
Hi, i’m new on Power Automate and I have a question.
When I check Re Purposed Array result , it returns me correctly the JSON objects but always, it returns an extra object with null fields. At the end of your post, you comment:
” I’ve also noticed that some arrays contain an empty final object. This object could be skipped using a filter array where the object is empty. I’ve left you to explore this for yourself. ”
Could you give me the answer for fix this problem in your example please?
Thanks a lot!
I want to know where do you put the filter array specifically … I’m trying to do this but I can’t fix it because im new on this 🙂
After step 10, filter array action on the new array, where Id (or one of your key names) is not empty. This should filter the array and remove the empty object. Alternatively you can also use the expression take https://docs.microsoft.com/en-us/azure/logic-apps/workflow-definition-language-functions-reference#take and take the first length() of array -1.
Congratulations on a super well explained job! This is truly amazing. Thank you very much for sharing
Hello, I’m getting on error in FileContent and following is the error message:
InvalidTemplate. Unable to process template language expressions in action ‘FileContent’ inputs at line ‘0’ and column ‘0’: ‘The template language function ‘base64tostring’ expects its parameter to be a string. The provided value is of type ‘Array’. Please see https://aka.ms/logicexpressions#base64ToString for usage details.’.
This what I get when I get an outlook email and when I tried the sharepoint, I get an error saying ‘The provided value is of type ‘Object’.
I was wondering if anyone knows the solution to this.
Delimiter was ‘|’
FileContent:
For Email
replace(base64tostring(triggerOutputs()?[‘body/attachments’]),'”‘,”)
For SharePoint
replace(base64tostring(outputs(‘Get_file_content’)?[‘body’]),'”‘,”)
Hello, I’m getting on error in FileContent and following is the error message:
Unable to process template language expressions in action ‘FileContent’ inputs at line ‘0’ and column ‘0’:
‘The template language function ‘base64tostring’ expects its parameter to be a string.
The provided value is of type ‘Array’.
I was wondering if anyone knows the solution to this.
Delimiter was ‘|’
FileContent:
For Email
replace(base64tostring(triggerOutputs()?[‘body/attachments’]),'”‘,”)
Hey Randy, you’re trying to parse the array of attachments, rather than the file content of an attachment. Hence the error about an array. As there could be multiple attachments you will need to obtain the first attachment object first() or ?[0] after body attachments. If you are triggering on email received, you need to enable under advanced “include attachments” or use the get attachment action.
Damian,
In your if statement, it looks like you are checking whether the index of each of the encoded line break characters is equal to -1, if I am reading the correctly.
I haven’t seen that before like that. Is that a way of checking to see if it exists at all in the given string?
Would, for example something like greater or equal to 0 work the same way? Much appreciated, thank you.
Hi, the function returns -1 if the string is not found. See here https://docs.microsoft.com/en-us/azure/logic-apps/workflow-definition-language-functions-reference#indexof
🙂 Eventually I got there 🙂 Fantastic! Thank you for your response and of course the clever way of doing this! Really useful! All the best!
Thanks for your excellent solution.
Last but not least from enlighting the audience with the byte order marking issue.
I have been extremely frustrated from this issue but didn’t know there was a term for the underlying cause.
The weird symbols are messing up the file header and subsequent parsing.
Could you be persuaded to do a short video/article on how to go about replacing ‘EF BB BF’ as you mention in the end of your article?
I don’t know exactly where to apply the replace() function as you suggest to use.
Thank you for sharing! I am trying to create a SharePoint list item for each row in the CSV file and I’m having a little trouble.
How can I have a SP list item created for each CSV row? Is there a way to do this using the ‘Re Purposed Array’ output?
Thanks for all your help!
Hi there, absolutely. The new array can be passed to an apply to each and then add create item, then for each field, create an expression item()?[‘nameoffieldinarray’]
Thank you!! This worked beautifully.
Hello again. I have an additional question. Is there a way to account for cell data that contains a comma in the csv cells? Can we use an expression to replace them so that they don’t create new array lines?
I guess it’s quote encapsulated? 2 options, replace all instances of “,” with another delimiter like | then split on |. Otherwise look at office scripts which can use regex.
https://youtu.be/9J6ThPWGaG0
Thanks for the response!! Any chance you could show what those replace and split expressions look like and where in the flow to put them?
Thanks for all your help!!
Thank you for the detailed instructions on how you developed this flow. I was able to get it to work in my environment. There was one challenge that I could not overcome but I am sure it’s very doable. I have data inside double quotes and separate by comma. It’s not very data value. For example, I have:
reportId, companyName,amount
1234, “GE, Inc.”, 2345
4321, “Ford, Inc.”,3492
The company name in this example has a comma but it should not be considered as a separate value.
Can you help with this type of input string?
Power Automate on its own is not great. But if you are happy using Office Scripts, you might be able to tackle this with regex. CSV to Excel Power Automate and Office Scripts Any File Encoding – Free | Fast | Easy
https://youtu.be/9J6ThPWGaG0
@DamoBird365, thank you for your suggestion. Unfortunately, my work has not approved the use of Office Scripts for its use. I have seen an article, but instructions were not easy to follow and hard to understand for me. This article approach was to parse each data field to find the double quotes and replace comma with ### then merge data. Have you tried that approach?
BTW, here is the article: https://sharepains.com/2020/09/08/handle-commas-csv-files-power-automate/
I built the flow based on these instructions but it times out. I have about 80 columns to process. In the test csv file, I only had 20 rows.
My production csv will have over 3000 plus it keeps growing.
You could try watching the following
https://youtu.be/Ci9EO7SRalY
DamoBird365, I have seen that video. In the video you are replacing all comma/quotes with a pipe. This works for you because you have this pattern for all columns. In my case I only have double quotes for data fields have comma as part of the data. All other fields don’t have double quotes.
Here is one record:
3935,13781,”GE, Inc.”,461,9/28/2022 23:44
Have you asked Peter 😉 I haven’t got a solution for this one. Scripts would work with regex but you’ve said that’s not possible. Is your csv always in that format? Ie one field in quotes? If you post a sample on the forum and tag me, I can have a look. I have an idea to hack it.
Magnificent write up! Thank you for taking the time to explain each step! I have one question though, as this didn’t work with my CSV file. That’s because the CSV file I obtain from an application, contains information in a column that has the same carriage returns as the end of line. It also has double quotes, and commas in them. That means it looks similar to this:
“Person”, “Date”, “Amount”, “Comments”, “ID”
“Mike Leavy”, “7/10/2023”, “5,345”,”This is the first line in this field
This is the second line
and so forth”, “788”
“Jen Carol”, “5/3/2022”, “124”, “This is another field
where people hit enter and use commas and “quotation marks”,
so the output of the csv in this field has carriage returns and commas”, “789”
Now when you open the CSV in a text editor, it appears exactly like this:
“Person”, “Date”, “Amount”, “Comments”, “ID”\n
“Mike Leavy”, “7/10/2023”, “5,345”,”This is the first line in this field\nThis is the second line\nand so forth”,”788″\n
“Jen Carol”, “5/3/2022”, “124”, “This is another field\nwhere people hit enter and use commas and “”quotation marks””,\nso the output of the csv in this field has carriage returns and commas”,”789″\n
So what’s happening is everywhere a quote appears in that field, the CSV puts double quotes, and commas appear inline exactly as they type, because each field is surrounded by double quotes. However, when I search and split the entries on “\n” or “\r\n” or any type of carriage return within power automate, it splits the line right where the first \n appears in the “Comments” field, breaking the rest of the import and pumping out an error. I’ve opened a hex editor and the carriage return at the end of the line (0a) is the same as the carriage return used in the field with carriage returns. Just wondering if you have any suggestions for this type of complex CSV file that a very popular application generates but does not allow me to modify? Thank you!!
You could try office scripts? It allow regex which might better suit your requirements? CSV to Excel Power Automate and Office Scripts Any File Encoding – Free | Fast | Easy
https://youtu.be/9J6ThPWGaG0
Hello, want to know the limitations of action ‘EachObject'(ideally Select action in portal) as I could see its creating a payload for upto 20 array elements