utf 8 - PHP: How to remove all non printable characters in a string? -


I think I need to delete 0-31 and 127 characters,

Function or piece is the code to do efficiently.

7 bit ASCII?

If your Tardis landed in 1963 only, and you want only 7-bit printable ASCII characters, you can rip from 0-31 and 127-255 in this way:

  $ string = preg_replace ('/ [\ X00- \ x1F \ x7F- \ xFF] /', '', $ string);  

This range matches anything between 0-31, 127-255 and removes it.

8 bit Extended ASCII?

You have fallen into the hot tub time machine, and you are back in the eighties. If you find some form of 8 bit ASCII, then you want to keep the characters in the range 128-255. A simple adjustment - just look for the 0-31 and 127

  $ string = preg_replace ( '/ [\ x00- \ x1F \ x7F] /', '', $ string);  

UTF-8?

Ah, welcome to the 21st century. If you have UTF -8 encoded string, then / u can be used to regex

  $ string = preg_replace ( '/ [\ x00- \ X1F \ X7F] / u ',' ', $ string);  

This only removes 0-31 and 127. It works in ASCII and UTF-8 because both stocks (as mentioned by mgutt below). Speaking strictly, it will work without the / u modifier. But if you want to remove the other characters, then it makes life easier ...

If you are working with Unicode, then, but we should take a simple idea:

-8 string in a UTF, it will be encoded as 0xC2A0 . You can search for that specific sequence, but you can remove it, but instead of the / u modifier, you can simply add the code code to \ xA0 : < / P>

  $ string = preg_replace ('/ [\ x00- \ x1F \ x7F \ xA0] / u', '', $ string);  

Appendix: What about str_replace?

preg_replace is very efficient, but if you're doing this operation so, remove to build an array of characters that you want, and use str_replace described by mgutt below , Such as

  // Create an array we can use again in many tasks $ badchar = array (// control characters chr (0), chr (1), chr ( 2), chr (3), chr (4), chr (5), chr (6), chr (7), chr (8), chr (9), chr (10), chr (11), chr 12), chr (13), chr (14), chr (15), chr (16), chr (17), chr (18), chr (19), chr (20), chr (21), chr ( 22), chr (23), chr (24), chr (25), chr (26), chr (27), chr (28) Chr (29), chr (30), chr (31), // non-printing characters, CRR (127)); // $ str2 = str_replace ($ badchar, '', $ str) in place of unwanted characters;  

Easily it seems that it will happen faster, but this is not always the case, you should definitely see the benchmark whether it saves you anything. I did some criteria in the length of the various string with random data, and this pattern emerged using PHP 7.0.12.

  2 letter Str_replace 5k3439ms Preg_replace 2k99l9ms Preg_replace 44k0l% faster 4 characters str_replace 6.0701ms Preg_replace 1.41 9ms preg_replace 76.74% faster than 8 characters str_replace 5.811 9ms preg_replace 2.0721ms preg_replace 64.35% faster 16 Character str_replace 6.0401ms preg_replace 2.1980ms preg_replace 63.61% faster 32 characters str_replace 6.0320ms preg_replace 2.6770ms preg_replace 55.62% faster 64 characters str_replace 7.4198ms preg_replace 4.4160ms preg_replace is 40.48% faster 128 characters str_replace 12.7239ms preg_replace 7.5412ms preg_replace 40.73% faster than 256 characters Stro_rele place 19.8820ms preg_replace 17.1330ms preg_replace 13.83% faster 512 characters Stro_ place 343399ms preg_replace 34.0221ms preg_replace 0.93% faster 1024 characters Str_replace 57kll4lms Preg_replace 67.0300ms str_replace 14.79% faster 2048 characters str_replace 94.7111ms preg_replace 123.3189ms str_replace 23.20% faster 4096 characters str_replace 227.7029ms preg_replace 258.87% SR_replay 11.87% faster 8192 characters Straw_Relable 506.3410ms preg_replace 555.626 9ms str_replace 8.87% faster 16384 Chars str_replace 1116.8811ms preg_replace 1098.058 9ms preg_replace 1.69% quicker 32768 characters str_replace 2299.3128ms preg_replace 2222.8632ms preg_replace 3.32% faster  

time for yourself 10000 iterations, but what is more interesting is the relative difference to 512 characters, I Preg_replace alway win was watching 1-8 kb range, a marginal edge of str_replace Was there.

I thought it was an interesting result, so here it is also included. Important thing is not to take the result and use it to make decisions on ways to use it, but decide to benchmark against your data again and again.


Comments

Popular posts from this blog

c# - ListView onScroll event -

PHP - get image from byte array -

Linux Terminal Problem with Non-Canonical Terminal I/O app -