Remove all space between Chinese words in Regex
I would like to replace all spaces among CHINESE TEXT ONLY.
MY TEXT: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"
IDEAL OUTPUT: "請把這裡的 10 多個字合併. Can you help me?"
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");
I have studied the relevant question but seems not working in my situation so bring my question to here for some helps. Below is the question which I think is similar to my situation.
Reference
javascript regex
New contributor
Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
I would like to replace all spaces among CHINESE TEXT ONLY.
MY TEXT: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"
IDEAL OUTPUT: "請把這裡的 10 多個字合併. Can you help me?"
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");
I have studied the relevant question but seems not working in my situation so bring my question to here for some helps. Below is the question which I think is similar to my situation.
Reference
javascript regex
New contributor
Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1
Does your spaces actually are or you just used it guessing?
– Justinas
1 hour ago
.replace(/ /g,'')
– Nitesh Virani
1 hour ago
Using the latest ECMAScript 2018 regex syntax you may uses.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
– Wiktor Stribiżew
1 hour ago
Do you want to keep a space before10if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.
– Wiktor Stribiżew
44 mins ago
add a comment |
I would like to replace all spaces among CHINESE TEXT ONLY.
MY TEXT: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"
IDEAL OUTPUT: "請把這裡的 10 多個字合併. Can you help me?"
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");
I have studied the relevant question but seems not working in my situation so bring my question to here for some helps. Below is the question which I think is similar to my situation.
Reference
javascript regex
New contributor
Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I would like to replace all spaces among CHINESE TEXT ONLY.
MY TEXT: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"
IDEAL OUTPUT: "請把這裡的 10 多個字合併. Can you help me?"
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");
I have studied the relevant question but seems not working in my situation so bring my question to here for some helps. Below is the question which I think is similar to my situation.
Reference
javascript regex
javascript regex
New contributor
Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 1 hour ago
Gurman
6,5052931
6,5052931
New contributor
Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 1 hour ago
Needa HellNeeda Hell
443
443
New contributor
Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1
Does your spaces actually are or you just used it guessing?
– Justinas
1 hour ago
.replace(/ /g,'')
– Nitesh Virani
1 hour ago
Using the latest ECMAScript 2018 regex syntax you may uses.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
– Wiktor Stribiżew
1 hour ago
Do you want to keep a space before10if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.
– Wiktor Stribiżew
44 mins ago
add a comment |
1
Does your spaces actually are or you just used it guessing?
– Justinas
1 hour ago
.replace(/ /g,'')
– Nitesh Virani
1 hour ago
Using the latest ECMAScript 2018 regex syntax you may uses.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
– Wiktor Stribiżew
1 hour ago
Do you want to keep a space before10if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.
– Wiktor Stribiżew
44 mins ago
1
1
Does your spaces actually are
or you just used it guessing?– Justinas
1 hour ago
Does your spaces actually are
or you just used it guessing?– Justinas
1 hour ago
.replace(/ /g,'')– Nitesh Virani
1 hour ago
.replace(/ /g,'')– Nitesh Virani
1 hour ago
Using the latest ECMAScript 2018 regex syntax you may use
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')– Wiktor Stribiżew
1 hour ago
Using the latest ECMAScript 2018 regex syntax you may use
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')– Wiktor Stribiżew
1 hour ago
Do you want to keep a space before
10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.– Wiktor Stribiżew
44 mins ago
Do you want to keep a space before
10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.– Wiktor Stribiżew
44 mins ago
add a comment |
5 Answers
5
active
oldest
votes
Using @Brett Zamir soluce on how to match chinese character in regex
Javascript unicode string, chinese character but no punctuation
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);It looks like :
([blabla chinese chars]) ([blabla chinese chars])*
1
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
1 hour ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
1 hour ago
I've edited my post to match your desire
– Grégory NEUT
1 hour ago
I'd uses+instead of' '
– HerrSerker
1 hour ago
What about eg請 的 10 多 個 a
– bobble bubble
31 mins ago
|
show 2 more comments
Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),
([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
And replace it by $1
Demo
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));add a comment |
Getting to the Chinese char matching pattern
Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into
[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]
In ES6, to match a single Chinese char, it can be used as
/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u
Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get
(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])
pattern to match any Chinese char using JS RegExp.
So, you may use
s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')
See the regex demo.
If your JS environment is ECMAScript 2018 compliant you may use a shorter
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
Pattern details
(CHINESE_CHAR_PATTERN)- Capturing group 1 ($1in the replacement pattern): any Chinese char
s+- any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN)- there must be a Chinese char immediately to the right of the current location.
JS demo:
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
FYI: if only one whitespace is expected between Chinese chars, remove+afters.
– Wiktor Stribiżew
1 hour ago
add a comment |
Try this
str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
3
The space in front of the 10 is missing.
– holydragon
1 hour ago
@holydragon it's fixed now
– Kamil Kiełczewski
1 hour ago
add a comment |
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54179179%2fremove-all-space-between-chinese-words-in-regex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using @Brett Zamir soluce on how to match chinese character in regex
Javascript unicode string, chinese character but no punctuation
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);It looks like :
([blabla chinese chars]) ([blabla chinese chars])*
1
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
1 hour ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
1 hour ago
I've edited my post to match your desire
– Grégory NEUT
1 hour ago
I'd uses+instead of' '
– HerrSerker
1 hour ago
What about eg請 的 10 多 個 a
– bobble bubble
31 mins ago
|
show 2 more comments
Using @Brett Zamir soluce on how to match chinese character in regex
Javascript unicode string, chinese character but no punctuation
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);It looks like :
([blabla chinese chars]) ([blabla chinese chars])*
1
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
1 hour ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
1 hour ago
I've edited my post to match your desire
– Grégory NEUT
1 hour ago
I'd uses+instead of' '
– HerrSerker
1 hour ago
What about eg請 的 10 多 個 a
– bobble bubble
31 mins ago
|
show 2 more comments
Using @Brett Zamir soluce on how to match chinese character in regex
Javascript unicode string, chinese character but no punctuation
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);It looks like :
([blabla chinese chars]) ([blabla chinese chars])*
Using @Brett Zamir soluce on how to match chinese character in regex
Javascript unicode string, chinese character but no punctuation
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);It looks like :
([blabla chinese chars]) ([blabla chinese chars])*
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);edited 1 hour ago
answered 1 hour ago
Grégory NEUTGrégory NEUT
8,69621437
8,69621437
1
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
1 hour ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
1 hour ago
I've edited my post to match your desire
– Grégory NEUT
1 hour ago
I'd uses+instead of' '
– HerrSerker
1 hour ago
What about eg請 的 10 多 個 a
– bobble bubble
31 mins ago
|
show 2 more comments
1
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
1 hour ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
1 hour ago
I've edited my post to match your desire
– Grégory NEUT
1 hour ago
I'd uses+instead of' '
– HerrSerker
1 hour ago
What about eg請 的 10 多 個 a
– bobble bubble
31 mins ago
1
1
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
1 hour ago
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
1 hour ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
1 hour ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
1 hour ago
I've edited my post to match your desire
– Grégory NEUT
1 hour ago
I've edited my post to match your desire
– Grégory NEUT
1 hour ago
I'd use
s+ instead of ' '– HerrSerker
1 hour ago
I'd use
s+ instead of ' '– HerrSerker
1 hour ago
What about eg
請 的 10 多 個 a– bobble bubble
31 mins ago
What about eg
請 的 10 多 個 a– bobble bubble
31 mins ago
|
show 2 more comments
Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),
([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
And replace it by $1
Demo
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));add a comment |
Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),
([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
And replace it by $1
Demo
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));add a comment |
Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),
([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
And replace it by $1
Demo
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),
([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
And replace it by $1
Demo
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));edited 59 mins ago
answered 1 hour ago
Pushpesh Kumar RajwanshiPushpesh Kumar Rajwanshi
5,7322827
5,7322827
add a comment |
add a comment |
Getting to the Chinese char matching pattern
Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into
[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]
In ES6, to match a single Chinese char, it can be used as
/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u
Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get
(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])
pattern to match any Chinese char using JS RegExp.
So, you may use
s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')
See the regex demo.
If your JS environment is ECMAScript 2018 compliant you may use a shorter
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
Pattern details
(CHINESE_CHAR_PATTERN)- Capturing group 1 ($1in the replacement pattern): any Chinese char
s+- any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN)- there must be a Chinese char immediately to the right of the current location.
JS demo:
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
FYI: if only one whitespace is expected between Chinese chars, remove+afters.
– Wiktor Stribiżew
1 hour ago
add a comment |
Getting to the Chinese char matching pattern
Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into
[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]
In ES6, to match a single Chinese char, it can be used as
/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u
Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get
(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])
pattern to match any Chinese char using JS RegExp.
So, you may use
s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')
See the regex demo.
If your JS environment is ECMAScript 2018 compliant you may use a shorter
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
Pattern details
(CHINESE_CHAR_PATTERN)- Capturing group 1 ($1in the replacement pattern): any Chinese char
s+- any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN)- there must be a Chinese char immediately to the right of the current location.
JS demo:
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
FYI: if only one whitespace is expected between Chinese chars, remove+afters.
– Wiktor Stribiżew
1 hour ago
add a comment |
Getting to the Chinese char matching pattern
Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into
[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]
In ES6, to match a single Chinese char, it can be used as
/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u
Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get
(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])
pattern to match any Chinese char using JS RegExp.
So, you may use
s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')
See the regex demo.
If your JS environment is ECMAScript 2018 compliant you may use a shorter
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
Pattern details
(CHINESE_CHAR_PATTERN)- Capturing group 1 ($1in the replacement pattern): any Chinese char
s+- any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN)- there must be a Chinese char immediately to the right of the current location.
JS demo:
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));Getting to the Chinese char matching pattern
Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into
[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]
In ES6, to match a single Chinese char, it can be used as
/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u
Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get
(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])
pattern to match any Chinese char using JS RegExp.
So, you may use
s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')
See the regex demo.
If your JS environment is ECMAScript 2018 compliant you may use a shorter
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
Pattern details
(CHINESE_CHAR_PATTERN)- Capturing group 1 ($1in the replacement pattern): any Chinese char
s+- any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN)- there must be a Chinese char immediately to the right of the current location.
JS demo:
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));edited 35 mins ago
answered 1 hour ago
Wiktor StribiżewWiktor Stribiżew
310k16131206
310k16131206
FYI: if only one whitespace is expected between Chinese chars, remove+afters.
– Wiktor Stribiżew
1 hour ago
add a comment |
FYI: if only one whitespace is expected between Chinese chars, remove+afters.
– Wiktor Stribiżew
1 hour ago
FYI: if only one whitespace is expected between Chinese chars, remove
+ after s.– Wiktor Stribiżew
1 hour ago
FYI: if only one whitespace is expected between Chinese chars, remove
+ after s.– Wiktor Stribiżew
1 hour ago
add a comment |
Try this
str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
3
The space in front of the 10 is missing.
– holydragon
1 hour ago
@holydragon it's fixed now
– Kamil Kiełczewski
1 hour ago
add a comment |
Try this
str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
3
The space in front of the 10 is missing.
– holydragon
1 hour ago
@holydragon it's fixed now
– Kamil Kiełczewski
1 hour ago
add a comment |
Try this
str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);Try this
str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);edited 52 mins ago
answered 1 hour ago
Kamil KiełczewskiKamil Kiełczewski
9,25685892
9,25685892
3
The space in front of the 10 is missing.
– holydragon
1 hour ago
@holydragon it's fixed now
– Kamil Kiełczewski
1 hour ago
add a comment |
3
The space in front of the 10 is missing.
– holydragon
1 hour ago
@holydragon it's fixed now
– Kamil Kiełczewski
1 hour ago
3
3
The space in front of the 10 is missing.
– holydragon
1 hour ago
The space in front of the 10 is missing.
– holydragon
1 hour ago
@holydragon it's fixed now
– Kamil Kiełczewski
1 hour ago
@holydragon it's fixed now
– Kamil Kiełczewski
1 hour ago
add a comment |
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);add a comment |
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);add a comment |
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);edited 40 mins ago
answered 1 hour ago
Younes ZaidiYounes Zaidi
4771415
4771415
add a comment |
add a comment |
Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54179179%2fremove-all-space-between-chinese-words-in-regex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Does your spaces actually are
or you just used it guessing?– Justinas
1 hour ago
.replace(/ /g,'')– Nitesh Virani
1 hour ago
Using the latest ECMAScript 2018 regex syntax you may use
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')– Wiktor Stribiżew
1 hour ago
Do you want to keep a space before
10if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.– Wiktor Stribiżew
44 mins ago