Recently I've came across an interesting case of names transformation to so called "slug" format:
First, quite readable solution:Step 1: covert to lowercase Step 2: replace any alpheNumeric to hypen(-). Start to End of the string: Don't consider unicodes Step 3: Remove any adjacent hypen(-) Step 4: Remove trailing or leading hyphens
public static final String CHARS_REG = "[^a-zA-Z0-9]"; public static final String DOUBLE_REG = "[-]+"; public static final String TRIM_REG = "-$|^-"; public static final String DASH = "-"; static final Pattern CHARS = Pattern.compile(CHARS_REG); static final Pattern DOUBLE = Pattern.compile(DOUBLE_REG); static final Pattern TRIM = Pattern.compile(TRIM_REG); public static String makeSlugNamePatterny(String name) { String it = name.toLowerCase().trim(); it = CHARS.matcher(it).replaceAll(DASH); it = DOUBLE.matcher(it).replaceAll(DASH); it = TRIM.matcher(it).replaceAll(""); return it; }Unfortunately, out-of-the-box java regexp support can not trim and lowercase, so we see 5 operations here. It was curious for me to implement it regex-free and to compare performance:
public static String makeSlugNameOptimized(String name) { String result = name + " "; StringBuilder temp = new StringBuilder(); char ch1 = 0; char ch2 = toLowCaseAlfanumeric(result.charAt(0)); for (int i = 0; i < result.length() - 1; i++) { ch1 = ch2; ch2 = toLowCaseAlfanumeric(result.charAt(i + 1)); if (ch2 != ch1) { temp.append(ch1); } else if ('-' != ch1) { temp.append(ch1); } } while ('-' == temp.charAt(0)) { temp.deleteCharAt(0); } while ('-' == temp.charAt(temp.length() - 1)) { temp.deleteCharAt(temp.length() - 1); } return temp.toString(); } private static final int Aa = 'a' - 'A'; private static char toLowCaseAlfanumeric(char c) { if (c >= 'a' && c <= 'z') { return c; } if (c >= 'A' && c <= 'Z') { return (char) (c + Aa); } if (c >= '0' && c <= '9') { return c; } return '-'; }Note that optimized case doesn't trim or lowercase separately - it's a part of the loop.
On 500 random strings 500 times gave following performance: patterny-2059 milliseconds, optimised - 277.
Conclusion
10x time improvements - it's the cost of regexps in my case. Readability can be improved by extracting some methods, so it should not be a concern here. Generally speaking, either regexp is simple - in this case you can work around like I did, or it's complex - in this case it's both slow and unreadable.
Basically, as classic(https://xkcd.com/1171/) says:
No comments:
Post a Comment