Remove all space between Chinese words in Regex












7















I would like to replace all spaces among CHINESE TEXT ONLY.



MY TEXT: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"



IDEAL OUTPUT: "請把這裡的 10 多個字合併. Can you help me?"



var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");


I have studied the relevant question but seems not working in my situation so bring my question to here for some helps. Below is the question which I think is similar to my situation.



Reference










share|improve this question









New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1





    Does your spaces actually are   or you just used it guessing?

    – Justinas
    1 hour ago











  • .replace(/ /g,'')

    – Nitesh Virani
    1 hour ago











  • Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

    – Wiktor Stribiżew
    1 hour ago











  • Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

    – Wiktor Stribiżew
    44 mins ago
















7















I would like to replace all spaces among CHINESE TEXT ONLY.



MY TEXT: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"



IDEAL OUTPUT: "請把這裡的 10 多個字合併. Can you help me?"



var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");


I have studied the relevant question but seems not working in my situation so bring my question to here for some helps. Below is the question which I think is similar to my situation.



Reference










share|improve this question









New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 1





    Does your spaces actually are   or you just used it guessing?

    – Justinas
    1 hour ago











  • .replace(/ /g,'')

    – Nitesh Virani
    1 hour ago











  • Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

    – Wiktor Stribiżew
    1 hour ago











  • Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

    – Wiktor Stribiżew
    44 mins ago














7












7








7


2






I would like to replace all spaces among CHINESE TEXT ONLY.



MY TEXT: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"



IDEAL OUTPUT: "請把這裡的 10 多個字合併. Can you help me?"



var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");


I have studied the relevant question but seems not working in my situation so bring my question to here for some helps. Below is the question which I think is similar to my situation.



Reference










share|improve this question









New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












I would like to replace all spaces among CHINESE TEXT ONLY.



MY TEXT: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"



IDEAL OUTPUT: "請把這裡的 10 多個字合併. Can you help me?"



var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");


I have studied the relevant question but seems not working in my situation so bring my question to here for some helps. Below is the question which I think is similar to my situation.



Reference







javascript regex






share|improve this question









New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 1 hour ago









Gurman

6,5052931




6,5052931






New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 1 hour ago









Needa HellNeeda Hell

443




443




New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 1





    Does your spaces actually are   or you just used it guessing?

    – Justinas
    1 hour ago











  • .replace(/ /g,'')

    – Nitesh Virani
    1 hour ago











  • Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

    – Wiktor Stribiżew
    1 hour ago











  • Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

    – Wiktor Stribiżew
    44 mins ago














  • 1





    Does your spaces actually are   or you just used it guessing?

    – Justinas
    1 hour ago











  • .replace(/ /g,'')

    – Nitesh Virani
    1 hour ago











  • Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

    – Wiktor Stribiżew
    1 hour ago











  • Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

    – Wiktor Stribiżew
    44 mins ago








1




1





Does your spaces actually are   or you just used it guessing?

– Justinas
1 hour ago





Does your spaces actually are   or you just used it guessing?

– Justinas
1 hour ago













.replace(/ /g,'')

– Nitesh Virani
1 hour ago





.replace(/ /g,'')

– Nitesh Virani
1 hour ago













Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
1 hour ago





Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
1 hour ago













Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
44 mins ago





Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
44 mins ago












5 Answers
5






active

oldest

votes


















8














Using @Brett Zamir soluce on how to match chinese character in regex



Javascript unicode string, chinese character but no punctuation








const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

const ret = str.replace(regex, '$1$2');

console.log(ret);







It looks like :



([blabla chinese chars]) ([blabla chinese chars])*





share|improve this answer





















  • 1





    The output here doesn't match with the ideal output. Notice the space in front of the 10.

    – holydragon
    1 hour ago











  • you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

    – jonatjano
    1 hour ago











  • I've edited my post to match your desire

    – Grégory NEUT
    1 hour ago











  • I'd use s+ instead of ' '

    – HerrSerker
    1 hour ago











  • What about eg 請 的 10 多 個 a

    – bobble bubble
    31 mins ago



















2














Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),



([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)


And replace it by $1



Demo






var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));








share|improve this answer

































    2














    Getting to the Chinese char matching pattern



    Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into



    [u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]


    In ES6, to match a single Chinese char, it can be used as



    /[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u


    Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get



    (?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])


    pattern to match any Chinese char using JS RegExp.



    So, you may use



    s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')


    See the regex demo.



    If your JS environment is ECMAScript 2018 compliant you may use a shorter



    s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')


    Pattern details





    • (CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char


    • s+ - any 1+ whitespaces (any Unicode whitespace)


    • (?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.


    JS demo:






    var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
    var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
    console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
    // ECMAScript 2018 only
    console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));








    share|improve this answer


























    • FYI: if only one whitespace is expected between Chinese chars, remove + after s.

      – Wiktor Stribiżew
      1 hour ago



















    1














    Try this



    str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');


    I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)






    var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
    str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

    console.log(str);








    share|improve this answer





















    • 3





      The space in front of the 10 is missing.

      – holydragon
      1 hour ago











    • @holydragon it's fixed now

      – Kamil Kiełczewski
      1 hour ago





















    0

















    var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

    var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

    var isChinese = function (str) {
    var charCode;
    var flag;
    var range;
    for (var i = 0; i < str.length;) {
    charCode = str.codePointAt(i);
    flag = false;
    for (var j = 0; j < chineseRange.length; j++) {
    range = chineseRange[j];
    if (charCode >= range[0] && charCode <= range[1]) {
    flag = true;
    break;
    }
    }
    if (!flag) {
    return false;
    }
    if (charCode <= 0xffff) {
    i++
    } else {
    i += 2
    }
    }
    return true;
    }
    // for more information about chinese.js visite this demo in Github
    //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

    // I wrote this function to remove space between chinese word

    var spl = chine.trim().split(/s+/);
    var text = '';
    for (var i = 0; i < spl.length; i++) {
    if (isChinese(spl[i])) {
    if (!isChinese(spl[i + 1])) {
    text += spl[i] + ' ';
    } else {
    text += spl[i];
    }
    } else {
    text += spl[i] + ' ';
    }
    }
    console.log(text);








    share|improve this answer

























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });






      Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.










      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54179179%2fremove-all-space-between-chinese-words-in-regex%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      5 Answers
      5






      active

      oldest

      votes








      5 Answers
      5






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      8














      Using @Brett Zamir soluce on how to match chinese character in regex



      Javascript unicode string, chinese character but no punctuation








      const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

      const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

      const ret = str.replace(regex, '$1$2');

      console.log(ret);







      It looks like :



      ([blabla chinese chars]) ([blabla chinese chars])*





      share|improve this answer





















      • 1





        The output here doesn't match with the ideal output. Notice the space in front of the 10.

        – holydragon
        1 hour ago











      • you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

        – jonatjano
        1 hour ago











      • I've edited my post to match your desire

        – Grégory NEUT
        1 hour ago











      • I'd use s+ instead of ' '

        – HerrSerker
        1 hour ago











      • What about eg 請 的 10 多 個 a

        – bobble bubble
        31 mins ago
















      8














      Using @Brett Zamir soluce on how to match chinese character in regex



      Javascript unicode string, chinese character but no punctuation








      const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

      const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

      const ret = str.replace(regex, '$1$2');

      console.log(ret);







      It looks like :



      ([blabla chinese chars]) ([blabla chinese chars])*





      share|improve this answer





















      • 1





        The output here doesn't match with the ideal output. Notice the space in front of the 10.

        – holydragon
        1 hour ago











      • you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

        – jonatjano
        1 hour ago











      • I've edited my post to match your desire

        – Grégory NEUT
        1 hour ago











      • I'd use s+ instead of ' '

        – HerrSerker
        1 hour ago











      • What about eg 請 的 10 多 個 a

        – bobble bubble
        31 mins ago














      8












      8








      8







      Using @Brett Zamir soluce on how to match chinese character in regex



      Javascript unicode string, chinese character but no punctuation








      const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

      const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

      const ret = str.replace(regex, '$1$2');

      console.log(ret);







      It looks like :



      ([blabla chinese chars]) ([blabla chinese chars])*





      share|improve this answer















      Using @Brett Zamir soluce on how to match chinese character in regex



      Javascript unicode string, chinese character but no punctuation








      const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

      const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

      const ret = str.replace(regex, '$1$2');

      console.log(ret);







      It looks like :



      ([blabla chinese chars]) ([blabla chinese chars])*





      const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

      const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

      const ret = str.replace(regex, '$1$2');

      console.log(ret);





      const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

      const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

      const ret = str.replace(regex, '$1$2');

      console.log(ret);






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited 1 hour ago

























      answered 1 hour ago









      Grégory NEUTGrégory NEUT

      8,69621437




      8,69621437








      • 1





        The output here doesn't match with the ideal output. Notice the space in front of the 10.

        – holydragon
        1 hour ago











      • you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

        – jonatjano
        1 hour ago











      • I've edited my post to match your desire

        – Grégory NEUT
        1 hour ago











      • I'd use s+ instead of ' '

        – HerrSerker
        1 hour ago











      • What about eg 請 的 10 多 個 a

        – bobble bubble
        31 mins ago














      • 1





        The output here doesn't match with the ideal output. Notice the space in front of the 10.

        – holydragon
        1 hour ago











      • you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

        – jonatjano
        1 hour ago











      • I've edited my post to match your desire

        – Grégory NEUT
        1 hour ago











      • I'd use s+ instead of ' '

        – HerrSerker
        1 hour ago











      • What about eg 請 的 10 多 個 a

        – bobble bubble
        31 mins ago








      1




      1





      The output here doesn't match with the ideal output. Notice the space in front of the 10.

      – holydragon
      1 hour ago





      The output here doesn't match with the ideal output. Notice the space in front of the 10.

      – holydragon
      1 hour ago













      you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

      – jonatjano
      1 hour ago





      you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

      – jonatjano
      1 hour ago













      I've edited my post to match your desire

      – Grégory NEUT
      1 hour ago





      I've edited my post to match your desire

      – Grégory NEUT
      1 hour ago













      I'd use s+ instead of ' '

      – HerrSerker
      1 hour ago





      I'd use s+ instead of ' '

      – HerrSerker
      1 hour ago













      What about eg 請 的 10 多 個 a

      – bobble bubble
      31 mins ago





      What about eg 請 的 10 多 個 a

      – bobble bubble
      31 mins ago













      2














      Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),



      ([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)


      And replace it by $1



      Demo






      var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
      console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));








      share|improve this answer






























        2














        Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),



        ([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)


        And replace it by $1



        Demo






        var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
        console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));








        share|improve this answer




























          2












          2








          2







          Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),



          ([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)


          And replace it by $1



          Demo






          var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
          console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));








          share|improve this answer















          Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),



          ([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)


          And replace it by $1



          Demo






          var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
          console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));








          var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
          console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));





          var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
          console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 59 mins ago

























          answered 1 hour ago









          Pushpesh Kumar RajwanshiPushpesh Kumar Rajwanshi

          5,7322827




          5,7322827























              2














              Getting to the Chinese char matching pattern



              Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into



              [u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]


              In ES6, to match a single Chinese char, it can be used as



              /[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u


              Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get



              (?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])


              pattern to match any Chinese char using JS RegExp.



              So, you may use



              s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')


              See the regex demo.



              If your JS environment is ECMAScript 2018 compliant you may use a shorter



              s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')


              Pattern details





              • (CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char


              • s+ - any 1+ whitespaces (any Unicode whitespace)


              • (?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.


              JS demo:






              var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
              var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
              console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
              // ECMAScript 2018 only
              console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));








              share|improve this answer


























              • FYI: if only one whitespace is expected between Chinese chars, remove + after s.

                – Wiktor Stribiżew
                1 hour ago
















              2














              Getting to the Chinese char matching pattern



              Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into



              [u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]


              In ES6, to match a single Chinese char, it can be used as



              /[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u


              Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get



              (?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])


              pattern to match any Chinese char using JS RegExp.



              So, you may use



              s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')


              See the regex demo.



              If your JS environment is ECMAScript 2018 compliant you may use a shorter



              s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')


              Pattern details





              • (CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char


              • s+ - any 1+ whitespaces (any Unicode whitespace)


              • (?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.


              JS demo:






              var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
              var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
              console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
              // ECMAScript 2018 only
              console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));








              share|improve this answer


























              • FYI: if only one whitespace is expected between Chinese chars, remove + after s.

                – Wiktor Stribiżew
                1 hour ago














              2












              2








              2







              Getting to the Chinese char matching pattern



              Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into



              [u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]


              In ES6, to match a single Chinese char, it can be used as



              /[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u


              Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get



              (?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])


              pattern to match any Chinese char using JS RegExp.



              So, you may use



              s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')


              See the regex demo.



              If your JS environment is ECMAScript 2018 compliant you may use a shorter



              s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')


              Pattern details





              • (CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char


              • s+ - any 1+ whitespaces (any Unicode whitespace)


              • (?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.


              JS demo:






              var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
              var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
              console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
              // ECMAScript 2018 only
              console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));








              share|improve this answer















              Getting to the Chinese char matching pattern



              Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into



              [u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]


              In ES6, to match a single Chinese char, it can be used as



              /[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u


              Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get



              (?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])


              pattern to match any Chinese char using JS RegExp.



              So, you may use



              s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')


              See the regex demo.



              If your JS environment is ECMAScript 2018 compliant you may use a shorter



              s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')


              Pattern details





              • (CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char


              • s+ - any 1+ whitespaces (any Unicode whitespace)


              • (?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.


              JS demo:






              var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
              var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
              console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
              // ECMAScript 2018 only
              console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));








              var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
              var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
              console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
              // ECMAScript 2018 only
              console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));





              var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
              var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
              console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
              // ECMAScript 2018 only
              console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 35 mins ago

























              answered 1 hour ago









              Wiktor StribiżewWiktor Stribiżew

              310k16131206




              310k16131206













              • FYI: if only one whitespace is expected between Chinese chars, remove + after s.

                – Wiktor Stribiżew
                1 hour ago



















              • FYI: if only one whitespace is expected between Chinese chars, remove + after s.

                – Wiktor Stribiżew
                1 hour ago

















              FYI: if only one whitespace is expected between Chinese chars, remove + after s.

              – Wiktor Stribiżew
              1 hour ago





              FYI: if only one whitespace is expected between Chinese chars, remove + after s.

              – Wiktor Stribiżew
              1 hour ago











              1














              Try this



              str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');


              I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)






              var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
              str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

              console.log(str);








              share|improve this answer





















              • 3





                The space in front of the 10 is missing.

                – holydragon
                1 hour ago











              • @holydragon it's fixed now

                – Kamil Kiełczewski
                1 hour ago


















              1














              Try this



              str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');


              I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)






              var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
              str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

              console.log(str);








              share|improve this answer





















              • 3





                The space in front of the 10 is missing.

                – holydragon
                1 hour ago











              • @holydragon it's fixed now

                – Kamil Kiełczewski
                1 hour ago
















              1












              1








              1







              Try this



              str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');


              I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)






              var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
              str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

              console.log(str);








              share|improve this answer















              Try this



              str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');


              I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)






              var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
              str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

              console.log(str);








              var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
              str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

              console.log(str);





              var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
              str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

              console.log(str);






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 52 mins ago

























              answered 1 hour ago









              Kamil KiełczewskiKamil Kiełczewski

              9,25685892




              9,25685892








              • 3





                The space in front of the 10 is missing.

                – holydragon
                1 hour ago











              • @holydragon it's fixed now

                – Kamil Kiełczewski
                1 hour ago
















              • 3





                The space in front of the 10 is missing.

                – holydragon
                1 hour ago











              • @holydragon it's fixed now

                – Kamil Kiełczewski
                1 hour ago










              3




              3





              The space in front of the 10 is missing.

              – holydragon
              1 hour ago





              The space in front of the 10 is missing.

              – holydragon
              1 hour ago













              @holydragon it's fixed now

              – Kamil Kiełczewski
              1 hour ago







              @holydragon it's fixed now

              – Kamil Kiełczewski
              1 hour ago













              0

















              var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

              var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

              var isChinese = function (str) {
              var charCode;
              var flag;
              var range;
              for (var i = 0; i < str.length;) {
              charCode = str.codePointAt(i);
              flag = false;
              for (var j = 0; j < chineseRange.length; j++) {
              range = chineseRange[j];
              if (charCode >= range[0] && charCode <= range[1]) {
              flag = true;
              break;
              }
              }
              if (!flag) {
              return false;
              }
              if (charCode <= 0xffff) {
              i++
              } else {
              i += 2
              }
              }
              return true;
              }
              // for more information about chinese.js visite this demo in Github
              //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

              // I wrote this function to remove space between chinese word

              var spl = chine.trim().split(/s+/);
              var text = '';
              for (var i = 0; i < spl.length; i++) {
              if (isChinese(spl[i])) {
              if (!isChinese(spl[i + 1])) {
              text += spl[i] + ' ';
              } else {
              text += spl[i];
              }
              } else {
              text += spl[i] + ' ';
              }
              }
              console.log(text);








              share|improve this answer






























                0

















                var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

                var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

                var isChinese = function (str) {
                var charCode;
                var flag;
                var range;
                for (var i = 0; i < str.length;) {
                charCode = str.codePointAt(i);
                flag = false;
                for (var j = 0; j < chineseRange.length; j++) {
                range = chineseRange[j];
                if (charCode >= range[0] && charCode <= range[1]) {
                flag = true;
                break;
                }
                }
                if (!flag) {
                return false;
                }
                if (charCode <= 0xffff) {
                i++
                } else {
                i += 2
                }
                }
                return true;
                }
                // for more information about chinese.js visite this demo in Github
                //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

                // I wrote this function to remove space between chinese word

                var spl = chine.trim().split(/s+/);
                var text = '';
                for (var i = 0; i < spl.length; i++) {
                if (isChinese(spl[i])) {
                if (!isChinese(spl[i + 1])) {
                text += spl[i] + ' ';
                } else {
                text += spl[i];
                }
                } else {
                text += spl[i] + ' ';
                }
                }
                console.log(text);








                share|improve this answer




























                  0












                  0








                  0










                  var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

                  var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

                  var isChinese = function (str) {
                  var charCode;
                  var flag;
                  var range;
                  for (var i = 0; i < str.length;) {
                  charCode = str.codePointAt(i);
                  flag = false;
                  for (var j = 0; j < chineseRange.length; j++) {
                  range = chineseRange[j];
                  if (charCode >= range[0] && charCode <= range[1]) {
                  flag = true;
                  break;
                  }
                  }
                  if (!flag) {
                  return false;
                  }
                  if (charCode <= 0xffff) {
                  i++
                  } else {
                  i += 2
                  }
                  }
                  return true;
                  }
                  // for more information about chinese.js visite this demo in Github
                  //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

                  // I wrote this function to remove space between chinese word

                  var spl = chine.trim().split(/s+/);
                  var text = '';
                  for (var i = 0; i < spl.length; i++) {
                  if (isChinese(spl[i])) {
                  if (!isChinese(spl[i + 1])) {
                  text += spl[i] + ' ';
                  } else {
                  text += spl[i];
                  }
                  } else {
                  text += spl[i] + ' ';
                  }
                  }
                  console.log(text);








                  share|improve this answer


















                  var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

                  var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

                  var isChinese = function (str) {
                  var charCode;
                  var flag;
                  var range;
                  for (var i = 0; i < str.length;) {
                  charCode = str.codePointAt(i);
                  flag = false;
                  for (var j = 0; j < chineseRange.length; j++) {
                  range = chineseRange[j];
                  if (charCode >= range[0] && charCode <= range[1]) {
                  flag = true;
                  break;
                  }
                  }
                  if (!flag) {
                  return false;
                  }
                  if (charCode <= 0xffff) {
                  i++
                  } else {
                  i += 2
                  }
                  }
                  return true;
                  }
                  // for more information about chinese.js visite this demo in Github
                  //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

                  // I wrote this function to remove space between chinese word

                  var spl = chine.trim().split(/s+/);
                  var text = '';
                  for (var i = 0; i < spl.length; i++) {
                  if (isChinese(spl[i])) {
                  if (!isChinese(spl[i + 1])) {
                  text += spl[i] + ' ';
                  } else {
                  text += spl[i];
                  }
                  } else {
                  text += spl[i] + ' ';
                  }
                  }
                  console.log(text);








                  var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

                  var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

                  var isChinese = function (str) {
                  var charCode;
                  var flag;
                  var range;
                  for (var i = 0; i < str.length;) {
                  charCode = str.codePointAt(i);
                  flag = false;
                  for (var j = 0; j < chineseRange.length; j++) {
                  range = chineseRange[j];
                  if (charCode >= range[0] && charCode <= range[1]) {
                  flag = true;
                  break;
                  }
                  }
                  if (!flag) {
                  return false;
                  }
                  if (charCode <= 0xffff) {
                  i++
                  } else {
                  i += 2
                  }
                  }
                  return true;
                  }
                  // for more information about chinese.js visite this demo in Github
                  //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

                  // I wrote this function to remove space between chinese word

                  var spl = chine.trim().split(/s+/);
                  var text = '';
                  for (var i = 0; i < spl.length; i++) {
                  if (isChinese(spl[i])) {
                  if (!isChinese(spl[i + 1])) {
                  text += spl[i] + ' ';
                  } else {
                  text += spl[i];
                  }
                  } else {
                  text += spl[i] + ' ';
                  }
                  }
                  console.log(text);





                  var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

                  var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

                  var isChinese = function (str) {
                  var charCode;
                  var flag;
                  var range;
                  for (var i = 0; i < str.length;) {
                  charCode = str.codePointAt(i);
                  flag = false;
                  for (var j = 0; j < chineseRange.length; j++) {
                  range = chineseRange[j];
                  if (charCode >= range[0] && charCode <= range[1]) {
                  flag = true;
                  break;
                  }
                  }
                  if (!flag) {
                  return false;
                  }
                  if (charCode <= 0xffff) {
                  i++
                  } else {
                  i += 2
                  }
                  }
                  return true;
                  }
                  // for more information about chinese.js visite this demo in Github
                  //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

                  // I wrote this function to remove space between chinese word

                  var spl = chine.trim().split(/s+/);
                  var text = '';
                  for (var i = 0; i < spl.length; i++) {
                  if (isChinese(spl[i])) {
                  if (!isChinese(spl[i + 1])) {
                  text += spl[i] + ' ';
                  } else {
                  text += spl[i];
                  }
                  } else {
                  text += spl[i] + ' ';
                  }
                  }
                  console.log(text);






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 40 mins ago

























                  answered 1 hour ago









                  Younes ZaidiYounes Zaidi

                  4771415




                  4771415






















                      Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.










                      draft saved

                      draft discarded


















                      Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.













                      Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.












                      Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
















                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54179179%2fremove-all-space-between-chinese-words-in-regex%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Alcázar de San Juan

                      Griza ansero

                      Heinkel He 51