Encoding

This forum can be browsed by the general public. Posting is limited to current SAPIEN license holders with active maintenance and does not offer a response time guarantee.
Forum rules
DO NOT POST LICENSE NUMBERS, ACTIVATION KEYS OR ANY OTHER LICENSING INFORMATION IN THIS FORUM.
Only the original author and our tech personnel can reply to a topic that is created in this forum. If you find a topic that relates to an issue you are having, please create a new topic and reference the other in your post.

Any code longer than three lines should be added as code using the 'Select Code' dropdown menu or attached as a file.
This topic is 3 months and 2 weeks old and has exceeded the time allowed for comments. Please begin a new topic or use the search feature to find a similar but newer topic.
User avatar
Lembasts
Posts: 433
Last visit: Mon Oct 21, 2024 2:59 pm
Has voted: 3 times
Been upvoted: 1 time

Encoding

Post by Lembasts »

To help you better we need some information from you.

*** Please fill in the fields below. If you leave fields empty or specify 'latest' rather than the actual version your answer will be delayed as we will be forced to ask you for this information. ***

Product, version and build: 246
Operating system:win10
PowerShell version(s):5.1

Just came across this:
https://dsccommunity.org/styleguidelines/general/
It says dont use UTF-8 with BOM. Is there anything specific about editing in PS studio that makes this requirement unnecessary or bad?
And if I edit a file and change the encoding from utf-8 with bom to just utf-8 will that cause any issues?
Thanks
David
by Alexander Riedel » Mon Jul 15, 2024 5:29 pm
Unfortunately this does not state any reason for that recommendation. This is from 2019, so five years old. I somewhat object to the notion that one encoding is more 'correct' than another one. Who's the judge? There are certainly cases where one or another may not work, but nothing is listed here. As far as PowerShell Studio is concerned, it should make no difference.

Here are a few pointers about encoding (also for anyone else stumbling across this). These are my opinions based on experience. Use what you need to use for your case.

UTF-8 BOM versus UTF-8 without BOM: The BOM (Byte Order Mark) makes it easy for any application reading or re-writing this file to determine what encoding you want.
If you use different applications to edit, run or modify such a file, defaults should not matter, it is supposed to respect the encoding designated by the BOM.
If you use UTF-8, I always recommend to use the version with BOM. I have rarely found a case where this became an issue. Your mileage of course may vary.

Without a BOM, it becomes guesswork. A standard English language text file without any special characters is technically both a UTF-8 file and a Windows 1252 file at the same time. Upon loading a file, an editor (or any application) must examine each character in that file until a (hopefully) correctly encoded character is found. If there is none, well, then it becomes a matter of opinion so to speak. As you may imagine, the detection process can be flawed in some circumstances. Who's to know what you want it to be.
You will need to make sure that any application you touch this with is set to use UTF-8 by default. And don't let anyone else touch it.

UTF-16 LE also referred to as 'Unicode Little Endian'. Each character is of 16 bit size no matter whether it needs encoding or not. It has a BOM by default (there is no option to skip it) and I generally found it to work with everything. If you routinely mix different languages and symbols in your files, I always recommend to use this. It is the default encoding I use for ALL files I create without running into problems (so far). It does make a code file twice as large compared to its Windows 1252 counterpart, but who cares? Code files are not that large and disk space is usually not a problem anymore. Also of note, when you package your files for PowerShell 7 using our Script Packager, everything is always converted to UTF-16 LE by default.

UTF-16 BE (Big Endian) is a remnant from the time when Apple used processors (Motorola) which encoded the two 8 bit word of a character in a different order than Intel processors.
I will not bore you with listing ye olde mainframe systems also using this byte order :D
If you want to geek out on this, please see here: https://en.wikipedia.org/wiki/Endianness
Mostly unused and should be avoided nowadays. Of note is however that the first version if the PowerShell ISE encoded new files as UTF-16 BE by default, which caused a lot of grief as not every editor did support that at the time. So if you download some really old PowerShell samples, you might encounter it.

Hope this helps.
Go to full post
User avatar
Alexander Riedel
Posts: 8575
Last visit: Fri Nov 01, 2024 6:19 am
Answers: 23
Been upvoted: 42 times

Re: Encoding

Post by Alexander Riedel »

Unfortunately this does not state any reason for that recommendation. This is from 2019, so five years old. I somewhat object to the notion that one encoding is more 'correct' than another one. Who's the judge? There are certainly cases where one or another may not work, but nothing is listed here. As far as PowerShell Studio is concerned, it should make no difference.

Here are a few pointers about encoding (also for anyone else stumbling across this). These are my opinions based on experience. Use what you need to use for your case.

UTF-8 BOM versus UTF-8 without BOM: The BOM (Byte Order Mark) makes it easy for any application reading or re-writing this file to determine what encoding you want.
If you use different applications to edit, run or modify such a file, defaults should not matter, it is supposed to respect the encoding designated by the BOM.
If you use UTF-8, I always recommend to use the version with BOM. I have rarely found a case where this became an issue. Your mileage of course may vary.

Without a BOM, it becomes guesswork. A standard English language text file without any special characters is technically both a UTF-8 file and a Windows 1252 file at the same time. Upon loading a file, an editor (or any application) must examine each character in that file until a (hopefully) correctly encoded character is found. If there is none, well, then it becomes a matter of opinion so to speak. As you may imagine, the detection process can be flawed in some circumstances. Who's to know what you want it to be.
You will need to make sure that any application you touch this with is set to use UTF-8 by default. And don't let anyone else touch it.

UTF-16 LE also referred to as 'Unicode Little Endian'. Each character is of 16 bit size no matter whether it needs encoding or not. It has a BOM by default (there is no option to skip it) and I generally found it to work with everything. If you routinely mix different languages and symbols in your files, I always recommend to use this. It is the default encoding I use for ALL files I create without running into problems (so far). It does make a code file twice as large compared to its Windows 1252 counterpart, but who cares? Code files are not that large and disk space is usually not a problem anymore. Also of note, when you package your files for PowerShell 7 using our Script Packager, everything is always converted to UTF-16 LE by default.

UTF-16 BE (Big Endian) is a remnant from the time when Apple used processors (Motorola) which encoded the two 8 bit word of a character in a different order than Intel processors.
I will not bore you with listing ye olde mainframe systems also using this byte order :D
If you want to geek out on this, please see here: https://en.wikipedia.org/wiki/Endianness
Mostly unused and should be avoided nowadays. Of note is however that the first version if the PowerShell ISE encoded new files as UTF-16 BE by default, which caused a lot of grief as not every editor did support that at the time. So if you download some really old PowerShell samples, you might encounter it.

Hope this helps.
Alexander Riedel
SAPIEN Technologies, Inc.
This topic is 3 months and 2 weeks old and has exceeded the time allowed for comments. Please begin a new topic or use the search feature to find a similar but newer topic.