Output Devanagari (Hindi) from raw unicode using luatex

I can get the following code to compile, using luatex, with the Hindi/Devanagari characters correctly printed in the pdf:

documentclass{article}

usepackage{fontspec}

setmainfont{Times New Roman}

newfontscript{Devanagari}{deva,dev2}

newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}



begin{document}

Here is normal text.

{hindi नमस्ते }

end{document}

However, I'm using a program that outputs the tex and that won't allow me to type the Hindi script into my tex editor; instead, it will only give me the unicode version of the word, "नमस्ते", which is "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>".

How can I get luatex to compile correctly from these raw code characters? What I want to compile (to produce a pdf with the single word "नमस्ते") is something like this:

documentclass{article}

usepackage{fontspec}

setmainfont{Times New Roman}

newfontscript{Devanagari}{deva,dev2}

newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}



begin{document}

Here is normal text.

{hindi <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> }

end{document}

...but that won't work.

edited 30 mins ago

ShreevatsaR

28.2k873102

asked 3 hours ago

lethalSinger

203

Can you get your program to output char"0928char"092Echar"0938char"094Dchar"0924 char"0947 instead of <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>?

– Mico
2 hours ago

1

Yes, I could do that! What would the full script then need to look like?

– lethalSinger
2 hours ago

I'm afraid I cannot answer your question as I don't know which scripting tool you employ. I just posted an answer, though, which creates a Lua function that converts <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> to char"0928char"092Echar"0938char"094Dchar"0924 char"0947.

– Mico
2 hours ago

add a comment |

I can get the following code to compile, using luatex, with the Hindi/Devanagari characters correctly printed in the pdf:

documentclass{article}

usepackage{fontspec}

setmainfont{Times New Roman}

newfontscript{Devanagari}{deva,dev2}

newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}



begin{document}

Here is normal text.

{hindi नमस्ते }

end{document}

How can I get luatex to compile correctly from these raw code characters? What I want to compile (to produce a pdf with the single word "नमस्ते") is something like this:

documentclass{article}

usepackage{fontspec}

setmainfont{Times New Roman}

newfontscript{Devanagari}{deva,dev2}

newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}



begin{document}

Here is normal text.

{hindi <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> }

end{document}

...but that won't work.

edited 30 mins ago

ShreevatsaR

28.2k873102

asked 3 hours ago

lethalSinger

203

Can you get your program to output char"0928char"092Echar"0938char"094Dchar"0924 char"0947 instead of <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>?

– Mico
2 hours ago

1

Yes, I could do that! What would the full script then need to look like?

– lethalSinger
2 hours ago

I'm afraid I cannot answer your question as I don't know which scripting tool you employ. I just posted an answer, though, which creates a Lua function that converts <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> to char"0928char"092Echar"0938char"094Dchar"0924 char"0947.

– Mico
2 hours ago

add a comment |

I can get the following code to compile, using luatex, with the Hindi/Devanagari characters correctly printed in the pdf:

documentclass{article}

usepackage{fontspec}

setmainfont{Times New Roman}

newfontscript{Devanagari}{deva,dev2}

newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}



begin{document}

Here is normal text.

{hindi नमस्ते }

end{document}

How can I get luatex to compile correctly from these raw code characters? What I want to compile (to produce a pdf with the single word "नमस्ते") is something like this:

documentclass{article}

usepackage{fontspec}

setmainfont{Times New Roman}

newfontscript{Devanagari}{deva,dev2}

newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}



begin{document}

Here is normal text.

{hindi <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> }

end{document}

...but that won't work.

edited 30 mins ago

ShreevatsaR

28.2k873102

asked 3 hours ago

lethalSinger

203

I can get the following code to compile, using luatex, with the Hindi/Devanagari characters correctly printed in the pdf:

documentclass{article}

usepackage{fontspec}

setmainfont{Times New Roman}

newfontscript{Devanagari}{deva,dev2}

newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}



begin{document}

Here is normal text.

{hindi नमस्ते }

end{document}

How can I get luatex to compile correctly from these raw code characters? What I want to compile (to produce a pdf with the single word "नमस्ते") is something like this:

documentclass{article}

usepackage{fontspec}

setmainfont{Times New Roman}

newfontscript{Devanagari}{deva,dev2}

newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}



begin{document}

Here is normal text.

{hindi <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> }

end{document}

...but that won't work.

fonts luatex languages characters indic

edited 30 mins ago

ShreevatsaR

28.2k873102

asked 3 hours ago

lethalSinger

203

edited 30 mins ago

ShreevatsaR

28.2k873102

asked 3 hours ago

lethalSinger

203

edited 30 mins ago

ShreevatsaR

28.2k873102

edited 30 mins ago

ShreevatsaR

28.2k873102

edited 30 mins ago

ShreevatsaR

28.2k873102

asked 3 hours ago

lethalSinger

203

asked 3 hours ago

lethalSinger

203

asked 3 hours ago

lethalSinger

203

Can you get your program to output char"0928char"092Echar"0938char"094Dchar"0924 char"0947 instead of <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>?

– Mico
2 hours ago

1

Yes, I could do that! What would the full script then need to look like?

– lethalSinger
2 hours ago

I'm afraid I cannot answer your question as I don't know which scripting tool you employ. I just posted an answer, though, which creates a Lua function that converts <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> to char"0928char"092Echar"0938char"094Dchar"0924 char"0947.

– Mico
2 hours ago

add a comment |

Can you get your program to output char"0928char"092Echar"0938char"094Dchar"0924 char"0947 instead of <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>?

– Mico
2 hours ago

1

Yes, I could do that! What would the full script then need to look like?

– lethalSinger
2 hours ago

I'm afraid I cannot answer your question as I don't know which scripting tool you employ. I just posted an answer, though, which creates a Lua function that converts <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> to char"0928char"092Echar"0938char"094Dchar"0924 char"0947.

– Mico
2 hours ago

Can you get your program to output char"0928char"092Echar"0938char"094Dchar"0924 char"0947 instead of <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>?

– Mico
2 hours ago

Yes, I could do that! What would the full script then need to look like?

– lethalSinger
2 hours ago

I'm afraid I cannot answer your question as I don't know which scripting tool you employ. I just posted an answer, though, which creates a Lua function that converts <U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> to char"0928char"092Echar"0938char"094Dchar"0924 char"0947.

– Mico
2 hours ago

add a comment |

1 Answer
1

active

oldest

votes

(added an extra operation in the Lua function 'conv' to address the OP's follow-up request)

Since you're using LuaLaTeX, here's a solution that employs a Lua function to convert strings of the form '<U%+(.-)>' to '\char"%1'; here, %+ represents the literal character + and %1 represents the non-greedy "capture" of the pattern (.-) -- in words: "0 or more characters other than >". In a second step, the Lua function converts any whitespace characters present in the string to explicit (interword) whitespace.

In addition, the code also sets up a LaTeX macro that acts as a front-end for the Lua function. Thus, one may call the Lua function via a conv{<your string here>} directive.

You can either manually encase the sequences of unicode code in conv{...} statements or, depending on how far you can get your program to do the work for you, instruct the scripting program to encase the sequences of unicode code in a conv{...} statements automatically.

enter image description here

documentclass{article}

usepackage{fontspec}

setmainfont{Times New Roman}

newfontscript{Devanagari}{deva,dev2}

newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}



%%%% -- copy the next eight lines of code to your document -- 

usepackage{luacode} % for 'luacode' env. and 'luastringN' macro

begin{luacode}

function conv ( s ) 

   s = s:gsub ( '<U%+(.-)>' , '\char"%1' ) 

   tex.sprint ( ( s:gsub( '%s+' , '\ ' ) ) )

end

end{luacode}

newcommandconv[1]{directlua{conv(luastringN{#1})}}



begin{document}

Latin-alphabet text.



{hindi नमस्ते }



{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>} }



{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>}}

end{document}

edited 42 mins ago

answered 2 hours ago

Mico

287k32393781

1

This gets incredibly close. The only problem now is with breaks between words, which get ignored. E.g. "नमस्ते राज" (2 words) gets printed as "नमस्तेराज" (1 single word) even though there is the proper space between the unicode characters: "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>". How can I fix the spacing issue?

– lethalSinger
2 hours ago

1

@lethalSinger - Please see the updated answer I just posted. (The solution is to add a second gsub (short for "global substitution") operation.)

– Mico
46 mins ago

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "85"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2ftex.stackexchange.com%2fquestions%2f485697%2foutput-devanagari-hindi-from-raw-unicode-using-luatex%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

(added an extra operation in the Lua function 'conv' to address the OP's follow-up request)

In addition, the code also sets up a LaTeX macro that acts as a front-end for the Lua function. Thus, one may call the Lua function via a conv{<your string here>} directive.

enter image description here

documentclass{article}

usepackage{fontspec}

setmainfont{Times New Roman}

newfontscript{Devanagari}{deva,dev2}

newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}



%%%% -- copy the next eight lines of code to your document -- 

usepackage{luacode} % for 'luacode' env. and 'luastringN' macro

begin{luacode}

function conv ( s ) 

   s = s:gsub ( '<U%+(.-)>' , '\char"%1' ) 

   tex.sprint ( ( s:gsub( '%s+' , '\ ' ) ) )

end

end{luacode}

newcommandconv[1]{directlua{conv(luastringN{#1})}}



begin{document}

Latin-alphabet text.



{hindi नमस्ते }



{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>} }



{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>}}

end{document}

edited 42 mins ago

answered 2 hours ago

Mico

287k32393781

1

This gets incredibly close. The only problem now is with breaks between words, which get ignored. E.g. "नमस्ते राज" (2 words) gets printed as "नमस्तेराज" (1 single word) even though there is the proper space between the unicode characters: "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>". How can I fix the spacing issue?

– lethalSinger
2 hours ago

1

@lethalSinger - Please see the updated answer I just posted. (The solution is to add a second gsub (short for "global substitution") operation.)

– Mico
46 mins ago

add a comment |

(added an extra operation in the Lua function 'conv' to address the OP's follow-up request)

In addition, the code also sets up a LaTeX macro that acts as a front-end for the Lua function. Thus, one may call the Lua function via a conv{<your string here>} directive.

enter image description here

documentclass{article}

usepackage{fontspec}

setmainfont{Times New Roman}

newfontscript{Devanagari}{deva,dev2}

newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}



%%%% -- copy the next eight lines of code to your document -- 

usepackage{luacode} % for 'luacode' env. and 'luastringN' macro

begin{luacode}

function conv ( s ) 

   s = s:gsub ( '<U%+(.-)>' , '\char"%1' ) 

   tex.sprint ( ( s:gsub( '%s+' , '\ ' ) ) )

end

end{luacode}

newcommandconv[1]{directlua{conv(luastringN{#1})}}



begin{document}

Latin-alphabet text.



{hindi नमस्ते }



{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>} }



{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>}}

end{document}

edited 42 mins ago

answered 2 hours ago

Mico

287k32393781

1

This gets incredibly close. The only problem now is with breaks between words, which get ignored. E.g. "नमस्ते राज" (2 words) gets printed as "नमस्तेराज" (1 single word) even though there is the proper space between the unicode characters: "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>". How can I fix the spacing issue?

– lethalSinger
2 hours ago

1

@lethalSinger - Please see the updated answer I just posted. (The solution is to add a second gsub (short for "global substitution") operation.)

– Mico
46 mins ago

add a comment |

(added an extra operation in the Lua function 'conv' to address the OP's follow-up request)

In addition, the code also sets up a LaTeX macro that acts as a front-end for the Lua function. Thus, one may call the Lua function via a conv{<your string here>} directive.

enter image description here

documentclass{article}

usepackage{fontspec}

setmainfont{Times New Roman}

newfontscript{Devanagari}{deva,dev2}

newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}



%%%% -- copy the next eight lines of code to your document -- 

usepackage{luacode} % for 'luacode' env. and 'luastringN' macro

begin{luacode}

function conv ( s ) 

   s = s:gsub ( '<U%+(.-)>' , '\char"%1' ) 

   tex.sprint ( ( s:gsub( '%s+' , '\ ' ) ) )

end

end{luacode}

newcommandconv[1]{directlua{conv(luastringN{#1})}}



begin{document}

Latin-alphabet text.



{hindi नमस्ते }



{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>} }



{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>}}

end{document}

edited 42 mins ago

answered 2 hours ago

Mico

287k32393781

(added an extra operation in the Lua function 'conv' to address the OP's follow-up request)

In addition, the code also sets up a LaTeX macro that acts as a front-end for the Lua function. Thus, one may call the Lua function via a conv{<your string here>} directive.

enter image description here

documentclass{article}

usepackage{fontspec}

setmainfont{Times New Roman}

newfontscript{Devanagari}{deva,dev2}

newfontface{hindi}[Script=Devanagari]{Lohit-Devanagari.ttf}



%%%% -- copy the next eight lines of code to your document -- 

usepackage{luacode} % for 'luacode' env. and 'luastringN' macro

begin{luacode}

function conv ( s ) 

   s = s:gsub ( '<U%+(.-)>' , '\char"%1' ) 

   tex.sprint ( ( s:gsub( '%s+' , '\ ' ) ) )

end

end{luacode}

newcommandconv[1]{directlua{conv(luastringN{#1})}}



begin{document}

Latin-alphabet text.



{hindi नमस्ते }



{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947>} }



{hindi conv{<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>}}

end{document}

edited 42 mins ago

answered 2 hours ago

Mico

287k32393781

edited 42 mins ago

answered 2 hours ago

Mico

287k32393781

answered 2 hours ago

Mico

287k32393781

answered 2 hours ago

Mico

287k32393781

1

This gets incredibly close. The only problem now is with breaks between words, which get ignored. E.g. "नमस्ते राज" (2 words) gets printed as "नमस्तेराज" (1 single word) even though there is the proper space between the unicode characters: "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>". How can I fix the spacing issue?

– lethalSinger
2 hours ago

1

@lethalSinger - Please see the updated answer I just posted. (The solution is to add a second gsub (short for "global substitution") operation.)

– Mico
46 mins ago

add a comment |

1

This gets incredibly close. The only problem now is with breaks between words, which get ignored. E.g. "नमस्ते राज" (2 words) gets printed as "नमस्तेराज" (1 single word) even though there is the proper space between the unicode characters: "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>". How can I fix the spacing issue?

– lethalSinger
2 hours ago

1

@lethalSinger - Please see the updated answer I just posted. (The solution is to add a second gsub (short for "global substitution") operation.)

– Mico
46 mins ago

This gets incredibly close. The only problem now is with breaks between words, which get ignored. E.g. "नमस्ते राज" (2 words) gets printed as "नमस्तेराज" (1 single word) even though there is the proper space between the unicode characters: "<U+0928><U+092E><U+0938><U+094D><U+0924><U+0947> <U+0930><U+093E><U+091C>". How can I fix the spacing issue?

– lethalSinger
2 hours ago

@lethalSinger - Please see the updated answer I just posted. (The solution is to add a second gsub (short for "global substitution") operation.)

– Mico
46 mins ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to TeX - LaTeX Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Kdjykuj