Remove all space between Chinese words in Regex

I would like to replace all spaces among CHINESE TEXT ONLY.

MY TEXT: "請把這裡的 10 多個字合併. Can you help me?"

IDEAL OUTPUT: "請把這裡的 10 多個字合併. Can you help me?"

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace("/&nbsp;/", "");

I have studied the relevant question but seems not working in my situation so bring my question to here for some helps. Below is the question which I think is similar to my situation.

Reference

edited 1 hour ago

Gurman

6,5052931

asked 1 hour ago

Needa Hell

443

New contributor

1

Does your spaces actually are   or you just used it guessing?

– Justinas
1 hour ago

.replace(/ /g,'')

– Nitesh Virani
1 hour ago

Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
1 hour ago

Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
44 mins ago

add a comment |

I would like to replace all spaces among CHINESE TEXT ONLY.

MY TEXT: "請把這裡的 10 多個字合併. Can you help me?"

IDEAL OUTPUT: "請把這裡的 10 多個字合併. Can you help me?"

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace("/&nbsp;/", "");

I have studied the relevant question but seems not working in my situation so bring my question to here for some helps. Below is the question which I think is similar to my situation.

Reference

edited 1 hour ago

Gurman

6,5052931

asked 1 hour ago

Needa Hell

443

New contributor

1

Does your spaces actually are   or you just used it guessing?

– Justinas
1 hour ago

.replace(/ /g,'')

– Nitesh Virani
1 hour ago

Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
1 hour ago

Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
44 mins ago

add a comment |

I would like to replace all spaces among CHINESE TEXT ONLY.

MY TEXT: "請把這裡的 10 多個字合併. Can you help me?"

IDEAL OUTPUT: "請把這裡的 10 多個字合併. Can you help me?"

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace("/&nbsp;/", "");

I have studied the relevant question but seems not working in my situation so bring my question to here for some helps. Below is the question which I think is similar to my situation.

Reference

edited 1 hour ago

Gurman

6,5052931

asked 1 hour ago

Needa Hell

443

New contributor

I would like to replace all spaces among CHINESE TEXT ONLY.

MY TEXT: "請把這裡的 10 多個字合併. Can you help me?"

IDEAL OUTPUT: "請把這裡的 10 多個字合併. Can you help me?"

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace("/&nbsp;/", "");

I have studied the relevant question but seems not working in my situation so bring my question to here for some helps. Below is the question which I think is similar to my situation.

Reference

javascript regex

edited 1 hour ago

Gurman

6,5052931

asked 1 hour ago

Needa Hell

443

New contributor

edited 1 hour ago

Gurman

6,5052931

asked 1 hour ago

Needa Hell

443

New contributor

edited 1 hour ago

Gurman

6,5052931

edited 1 hour ago

Gurman

6,5052931

edited 1 hour ago

Gurman

6,5052931

asked 1 hour ago

Needa Hell

443

New contributor

asked 1 hour ago

Needa Hell

443

asked 1 hour ago

Needa Hell

443

New contributor

Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

Does your spaces actually are   or you just used it guessing?

– Justinas
1 hour ago

.replace(/ /g,'')

– Nitesh Virani
1 hour ago

Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
1 hour ago

Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
44 mins ago

add a comment |

1

Does your spaces actually are   or you just used it guessing?

– Justinas
1 hour ago

.replace(/ /g,'')

– Nitesh Virani
1 hour ago

Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
1 hour ago

Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
44 mins ago

Does your spaces actually are   or you just used it guessing?

– Justinas
1 hour ago

.replace(/ /g,'')

– Nitesh Virani
1 hour ago

Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
1 hour ago

Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
44 mins ago

add a comment |

5 Answers
5

active

oldest

votes

Using @Brett Zamir soluce on how to match chinese character in regex

Javascript unicode string, chinese character but no punctuation

const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';



const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');



const ret = str.replace(regex, '$1$2');



console.log(ret);

It looks like :

([blabla chinese chars]) ([blabla chinese chars])*

edited 1 hour ago

answered 1 hour ago

Grégory NEUT

8,69621437

1

The output here doesn't match with the ideal output. Notice the space in front of the 10.

– holydragon
1 hour ago

you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

– jonatjano
1 hour ago

I've edited my post to match your desire

– Grégory NEUT
1 hour ago

I'd use s+ instead of ' '

– HerrSerker
1 hour ago

What about eg 請的 10 多個 a

– bobble bubble
31 mins ago

|
show 2 more comments

Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),

([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)

And replace it by $1

Demo

var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';

console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));

edited 59 mins ago

answered 1 hour ago

Pushpesh Kumar Rajwanshi

5,7322827

add a comment |

Getting to the Chinese char matching pattern

Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into

[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]

In ES6, to match a single Chinese char, it can be used as

/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u

Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get

(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])

pattern to match any Chinese char using JS RegExp.

So, you may use

s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')

See the regex demo.

If your JS environment is ECMAScript 2018 compliant you may use a shorter

s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

Pattern details

(CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char

s+ - any 1+ whitespaces (any Unicode whitespace)

(?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.

JS demo:

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";

var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]"; 

console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));

// ECMAScript 2018 only

console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));

edited 35 mins ago

answered 1 hour ago

Wiktor Stribiżew

310k16131206

FYI: if only one whitespace is expected between Chinese chars, remove + after s.

– Wiktor Stribiżew
1 hour ago

add a comment |

Try this

str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');



console.log(str);

edited 52 mins ago

answered 1 hour ago

Kamil Kiełczewski

9,25685892

3

The space in front of the 10 is missing.

– holydragon
1 hour ago

@holydragon it's fixed now

– Kamil Kiełczewski
1 hour ago

add a comment |

var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';



var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];



var isChinese = function (str) {

  var charCode;

  var flag;

  var range;

  for (var i = 0; i < str.length;) {

    charCode = str.codePointAt(i);

    flag = false;

    for (var j = 0; j < chineseRange.length; j++) {

      range = chineseRange[j];

      if (charCode >= range[0] && charCode <= range[1]) {

        flag = true;

        break;

      }

    }

    if (!flag) {

      return false;

    }

    if (charCode <= 0xffff) {

      i++

    } else {

      i += 2

    }

  }

  return true;

}

   // for more information about chinese.js visite this demo in Github

   //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js



   // I wrote this function to remove space between chinese word



var spl = chine.trim().split(/s+/);

var text = '';

for (var i = 0; i < spl.length; i++) {

  if (isChinese(spl[i])) {

    if (!isChinese(spl[i + 1])) {

      text += spl[i] + ' ';

    } else {

      text += spl[i];

    }

  } else {

    text += spl[i] + ' ';

  }

}

console.log(text);

edited 40 mins ago

answered 1 hour ago

Younes Zaidi

4771415

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54179179%2fremove-all-space-between-chinese-words-in-regex%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

5 Answers
5

active

oldest

votes

5 Answers
5

active

oldest

votes

Using @Brett Zamir soluce on how to match chinese character in regex

Javascript unicode string, chinese character but no punctuation

const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';



const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');



const ret = str.replace(regex, '$1$2');



console.log(ret);

It looks like :

([blabla chinese chars]) ([blabla chinese chars])*

edited 1 hour ago

answered 1 hour ago

Grégory NEUT

8,69621437

1

The output here doesn't match with the ideal output. Notice the space in front of the 10.

– holydragon
1 hour ago

you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

– jonatjano
1 hour ago

I've edited my post to match your desire

– Grégory NEUT
1 hour ago

I'd use s+ instead of ' '

– HerrSerker
1 hour ago

What about eg 請的 10 多個 a

– bobble bubble
31 mins ago

|
show 2 more comments

Using @Brett Zamir soluce on how to match chinese character in regex

Javascript unicode string, chinese character but no punctuation

const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';



const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');



const ret = str.replace(regex, '$1$2');



console.log(ret);

It looks like :

([blabla chinese chars]) ([blabla chinese chars])*

edited 1 hour ago

answered 1 hour ago

Grégory NEUT

8,69621437

1

The output here doesn't match with the ideal output. Notice the space in front of the 10.

– holydragon
1 hour ago

you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

– jonatjano
1 hour ago

I've edited my post to match your desire

– Grégory NEUT
1 hour ago

I'd use s+ instead of ' '

– HerrSerker
1 hour ago

What about eg 請的 10 多個 a

– bobble bubble
31 mins ago

|
show 2 more comments

Using @Brett Zamir soluce on how to match chinese character in regex

Javascript unicode string, chinese character but no punctuation

const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';



const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');



const ret = str.replace(regex, '$1$2');



console.log(ret);

It looks like :

([blabla chinese chars]) ([blabla chinese chars])*

edited 1 hour ago

answered 1 hour ago

Grégory NEUT

8,69621437

Using @Brett Zamir soluce on how to match chinese character in regex

Javascript unicode string, chinese character but no punctuation

const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';



const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');



const ret = str.replace(regex, '$1$2');



console.log(ret);

It looks like :

([blabla chinese chars]) ([blabla chinese chars])*

const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';



const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');



const ret = str.replace(regex, '$1$2');



console.log(ret);

const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';



const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');



const ret = str.replace(regex, '$1$2');



console.log(ret);

edited 1 hour ago

answered 1 hour ago

Grégory NEUT

8,69621437

edited 1 hour ago

answered 1 hour ago

Grégory NEUT

8,69621437

answered 1 hour ago

Grégory NEUT

8,69621437

answered 1 hour ago

Grégory NEUT

8,69621437

1

The output here doesn't match with the ideal output. Notice the space in front of the 10.

– holydragon
1 hour ago

you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

– jonatjano
1 hour ago

I've edited my post to match your desire

– Grégory NEUT
1 hour ago

I'd use s+ instead of ' '

– HerrSerker
1 hour ago

What about eg 請的 10 多個 a

– bobble bubble
31 mins ago

|
show 2 more comments

1

The output here doesn't match with the ideal output. Notice the space in front of the 10.

– holydragon
1 hour ago

you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

– jonatjano
1 hour ago

I've edited my post to match your desire

– Grégory NEUT
1 hour ago

I'd use s+ instead of ' '

– HerrSerker
1 hour ago

What about eg 請的 10 多個 a

– bobble bubble
31 mins ago

The output here doesn't match with the ideal output. Notice the space in front of the 10.

– holydragon
1 hour ago

you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

– jonatjano
1 hour ago

I've edited my post to match your desire

– Grégory NEUT
1 hour ago

I'd use s+ instead of ' '

– HerrSerker
1 hour ago

What about eg 請的 10 多個 a

– bobble bubble
31 mins ago

|
show 2 more comments

([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)

And replace it by $1

Demo

var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';

console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));

edited 59 mins ago

answered 1 hour ago

Pushpesh Kumar Rajwanshi

5,7322827

add a comment |

([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)

And replace it by $1

Demo

var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';

console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));

edited 59 mins ago

answered 1 hour ago

Pushpesh Kumar Rajwanshi

5,7322827

add a comment |

([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)

And replace it by $1

Demo

var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';

console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));

edited 59 mins ago

answered 1 hour ago

Pushpesh Kumar Rajwanshi

5,7322827

([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)

And replace it by $1

Demo

var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';

console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));

var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';

console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));

var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';

console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));

edited 59 mins ago

answered 1 hour ago

Pushpesh Kumar Rajwanshi

5,7322827

edited 59 mins ago

answered 1 hour ago

Pushpesh Kumar Rajwanshi

5,7322827

answered 1 hour ago

Pushpesh Kumar Rajwanshi

5,7322827

answered 1 hour ago

Pushpesh Kumar Rajwanshi

5,7322827

add a comment |

Getting to the Chinese char matching pattern

Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into

[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]

In ES6, to match a single Chinese char, it can be used as

/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u

Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get

(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])

pattern to match any Chinese char using JS RegExp.

So, you may use

s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')

See the regex demo.

If your JS environment is ECMAScript 2018 compliant you may use a shorter

s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

Pattern details

(CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char

s+ - any 1+ whitespaces (any Unicode whitespace)

(?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.

JS demo:

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";

var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]"; 

console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));

// ECMAScript 2018 only

console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));

edited 35 mins ago

answered 1 hour ago

Wiktor Stribiżew

310k16131206

FYI: if only one whitespace is expected between Chinese chars, remove + after s.

– Wiktor Stribiżew
1 hour ago

add a comment |

Getting to the Chinese char matching pattern

Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into

[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]

In ES6, to match a single Chinese char, it can be used as

/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u

Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get

(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])

pattern to match any Chinese char using JS RegExp.

So, you may use

s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')

See the regex demo.

If your JS environment is ECMAScript 2018 compliant you may use a shorter

s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

Pattern details

(CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char

s+ - any 1+ whitespaces (any Unicode whitespace)

(?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.

JS demo:

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";

var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]"; 

console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));

// ECMAScript 2018 only

console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));

edited 35 mins ago

answered 1 hour ago

Wiktor Stribiżew

310k16131206

FYI: if only one whitespace is expected between Chinese chars, remove + after s.

– Wiktor Stribiżew
1 hour ago

add a comment |

Getting to the Chinese char matching pattern

Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into

[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]

In ES6, to match a single Chinese char, it can be used as

/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u

Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get

(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])

pattern to match any Chinese char using JS RegExp.

So, you may use

s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')

See the regex demo.

If your JS environment is ECMAScript 2018 compliant you may use a shorter

s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

Pattern details

(CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char

s+ - any 1+ whitespaces (any Unicode whitespace)

(?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.

JS demo:

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";

var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]"; 

console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));

// ECMAScript 2018 only

console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));

edited 35 mins ago

answered 1 hour ago

Wiktor Stribiżew

310k16131206

Getting to the Chinese char matching pattern

Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into

[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]

In ES6, to match a single Chinese char, it can be used as

/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u

Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get

(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])

pattern to match any Chinese char using JS RegExp.

So, you may use

s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')

See the regex demo.

If your JS environment is ECMAScript 2018 compliant you may use a shorter

s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

Pattern details

(CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char

s+ - any 1+ whitespaces (any Unicode whitespace)

(?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.

JS demo:

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";

var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]"; 

console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));

// ECMAScript 2018 only

console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";

var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]"; 

console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));

// ECMAScript 2018 only

console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));

var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";

var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]"; 

console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));

// ECMAScript 2018 only

console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));

edited 35 mins ago

answered 1 hour ago

Wiktor Stribiżew

310k16131206

edited 35 mins ago

answered 1 hour ago

Wiktor Stribiżew

310k16131206

answered 1 hour ago

Wiktor Stribiżew

310k16131206

answered 1 hour ago

Wiktor Stribiżew

310k16131206

FYI: if only one whitespace is expected between Chinese chars, remove + after s.

– Wiktor Stribiżew
1 hour ago

add a comment |

FYI: if only one whitespace is expected between Chinese chars, remove + after s.

– Wiktor Stribiżew
1 hour ago

FYI: if only one whitespace is expected between Chinese chars, remove + after s.

– Wiktor Stribiżew
1 hour ago

add a comment |

Try this

str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');



console.log(str);

edited 52 mins ago

answered 1 hour ago

Kamil Kiełczewski

9,25685892

3

The space in front of the 10 is missing.

– holydragon
1 hour ago

@holydragon it's fixed now

– Kamil Kiełczewski
1 hour ago

add a comment |

Try this

str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');



console.log(str);

edited 52 mins ago

answered 1 hour ago

Kamil Kiełczewski

9,25685892

3

The space in front of the 10 is missing.

– holydragon
1 hour ago

@holydragon it's fixed now

– Kamil Kiełczewski
1 hour ago

add a comment |

Try this

str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');



console.log(str);

edited 52 mins ago

answered 1 hour ago

Kamil Kiełczewski

9,25685892

Try this

str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');



console.log(str);

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');



console.log(str);

var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');



console.log(str);

edited 52 mins ago

answered 1 hour ago

Kamil Kiełczewski

9,25685892

edited 52 mins ago

answered 1 hour ago

Kamil Kiełczewski

9,25685892

answered 1 hour ago

Kamil Kiełczewski

9,25685892

answered 1 hour ago

Kamil Kiełczewski

9,25685892

3

The space in front of the 10 is missing.

– holydragon
1 hour ago

@holydragon it's fixed now

– Kamil Kiełczewski
1 hour ago

add a comment |

3

The space in front of the 10 is missing.

– holydragon
1 hour ago

@holydragon it's fixed now

– Kamil Kiełczewski
1 hour ago

The space in front of the 10 is missing.

– holydragon
1 hour ago

@holydragon it's fixed now

– Kamil Kiełczewski
1 hour ago

add a comment |

var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';



var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];



var isChinese = function (str) {

  var charCode;

  var flag;

  var range;

  for (var i = 0; i < str.length;) {

    charCode = str.codePointAt(i);

    flag = false;

    for (var j = 0; j < chineseRange.length; j++) {

      range = chineseRange[j];

      if (charCode >= range[0] && charCode <= range[1]) {

        flag = true;

        break;

      }

    }

    if (!flag) {

      return false;

    }

    if (charCode <= 0xffff) {

      i++

    } else {

      i += 2

    }

  }

  return true;

}

   // for more information about chinese.js visite this demo in Github

   //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js



   // I wrote this function to remove space between chinese word



var spl = chine.trim().split(/s+/);

var text = '';

for (var i = 0; i < spl.length; i++) {

  if (isChinese(spl[i])) {

    if (!isChinese(spl[i + 1])) {

      text += spl[i] + ' ';

    } else {

      text += spl[i];

    }

  } else {

    text += spl[i] + ' ';

  }

}

console.log(text);

edited 40 mins ago

answered 1 hour ago

Younes Zaidi

4771415

add a comment |

var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';



var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];



var isChinese = function (str) {

  var charCode;

  var flag;

  var range;

  for (var i = 0; i < str.length;) {

    charCode = str.codePointAt(i);

    flag = false;

    for (var j = 0; j < chineseRange.length; j++) {

      range = chineseRange[j];

      if (charCode >= range[0] && charCode <= range[1]) {

        flag = true;

        break;

      }

    }

    if (!flag) {

      return false;

    }

    if (charCode <= 0xffff) {

      i++

    } else {

      i += 2

    }

  }

  return true;

}

   // for more information about chinese.js visite this demo in Github

   //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js



   // I wrote this function to remove space between chinese word



var spl = chine.trim().split(/s+/);

var text = '';

for (var i = 0; i < spl.length; i++) {

  if (isChinese(spl[i])) {

    if (!isChinese(spl[i + 1])) {

      text += spl[i] + ' ';

    } else {

      text += spl[i];

    }

  } else {

    text += spl[i] + ' ';

  }

}

console.log(text);

edited 40 mins ago

answered 1 hour ago

Younes Zaidi

4771415

add a comment |

var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';



var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];



var isChinese = function (str) {

  var charCode;

  var flag;

  var range;

  for (var i = 0; i < str.length;) {

    charCode = str.codePointAt(i);

    flag = false;

    for (var j = 0; j < chineseRange.length; j++) {

      range = chineseRange[j];

      if (charCode >= range[0] && charCode <= range[1]) {

        flag = true;

        break;

      }

    }

    if (!flag) {

      return false;

    }

    if (charCode <= 0xffff) {

      i++

    } else {

      i += 2

    }

  }

  return true;

}

   // for more information about chinese.js visite this demo in Github

   //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js



   // I wrote this function to remove space between chinese word



var spl = chine.trim().split(/s+/);

var text = '';

for (var i = 0; i < spl.length; i++) {

  if (isChinese(spl[i])) {

    if (!isChinese(spl[i + 1])) {

      text += spl[i] + ' ';

    } else {

      text += spl[i];

    }

  } else {

    text += spl[i] + ' ';

  }

}

console.log(text);

edited 40 mins ago

answered 1 hour ago

Younes Zaidi

4771415

var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';



var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];



var isChinese = function (str) {

  var charCode;

  var flag;

  var range;

  for (var i = 0; i < str.length;) {

    charCode = str.codePointAt(i);

    flag = false;

    for (var j = 0; j < chineseRange.length; j++) {

      range = chineseRange[j];

      if (charCode >= range[0] && charCode <= range[1]) {

        flag = true;

        break;

      }

    }

    if (!flag) {

      return false;

    }

    if (charCode <= 0xffff) {

      i++

    } else {

      i += 2

    }

  }

  return true;

}

   // for more information about chinese.js visite this demo in Github

   //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js



   // I wrote this function to remove space between chinese word



var spl = chine.trim().split(/s+/);

var text = '';

for (var i = 0; i < spl.length; i++) {

  if (isChinese(spl[i])) {

    if (!isChinese(spl[i + 1])) {

      text += spl[i] + ' ';

    } else {

      text += spl[i];

    }

  } else {

    text += spl[i] + ' ';

  }

}

console.log(text);

var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';



var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];



var isChinese = function (str) {

  var charCode;

  var flag;

  var range;

  for (var i = 0; i < str.length;) {

    charCode = str.codePointAt(i);

    flag = false;

    for (var j = 0; j < chineseRange.length; j++) {

      range = chineseRange[j];

      if (charCode >= range[0] && charCode <= range[1]) {

        flag = true;

        break;

      }

    }

    if (!flag) {

      return false;

    }

    if (charCode <= 0xffff) {

      i++

    } else {

      i += 2

    }

  }

  return true;

}

   // for more information about chinese.js visite this demo in Github

   //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js



   // I wrote this function to remove space between chinese word



var spl = chine.trim().split(/s+/);

var text = '';

for (var i = 0; i < spl.length; i++) {

  if (isChinese(spl[i])) {

    if (!isChinese(spl[i + 1])) {

      text += spl[i] + ' ';

    } else {

      text += spl[i];

    }

  } else {

    text += spl[i] + ' ';

  }

}

console.log(text);

var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';



var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];



var isChinese = function (str) {

  var charCode;

  var flag;

  var range;

  for (var i = 0; i < str.length;) {

    charCode = str.codePointAt(i);

    flag = false;

    for (var j = 0; j < chineseRange.length; j++) {

      range = chineseRange[j];

      if (charCode >= range[0] && charCode <= range[1]) {

        flag = true;

        break;

      }

    }

    if (!flag) {

      return false;

    }

    if (charCode <= 0xffff) {

      i++

    } else {

      i += 2

    }

  }

  return true;

}

   // for more information about chinese.js visite this demo in Github

   //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js



   // I wrote this function to remove space between chinese word



var spl = chine.trim().split(/s+/);

var text = '';

for (var i = 0; i < spl.length; i++) {

  if (isChinese(spl[i])) {

    if (!isChinese(spl[i + 1])) {

      text += spl[i] + ' ';

    } else {

      text += spl[i];

    }

  } else {

    text += spl[i] + ' ';

  }

}

console.log(text);

edited 40 mins ago

answered 1 hour ago

Younes Zaidi

4771415

edited 40 mins ago

answered 1 hour ago

Younes Zaidi

4771415

answered 1 hour ago

Younes Zaidi

4771415

answered 1 hour ago

Younes Zaidi

4771415

add a comment |

Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Kdjykuj