In 1958, the government in mainland China introduced the currently used (in Singapore, Macau and mainland China, but not in Taiwan) official romanization system, known in Chinese characters as 漢語拼音 and in itself as Hànyǔ Pīnyīn. The idea to use an official romanization system for phonetic transcription was extremely wise, as the Latin alphabet was generally more suited to the transcription of Chinese words than the Cyrillic, countries using the Latin alphabet far outnumbered those using any other and constituted the majority of the world economy at the time, the Western press would never tolerate using a phonetic transcription system not based on the Latin alphabet (such as the ㄓㄨˋㄧㄣ ㄈㄨˊㄏㄠˋ system used under the Republic of China government), and the Latin alphabet ultimately became the world’s global alphabet due to the spread of the Internet. To some degree, since the Hànyǔ Pīnyīn romanization system is based on English spelling, but does not conform to its rules and (see below) is used by the popular press as an arbitrary romanization system for a language which is not conventionally written using the Latin alphabet, it is a sort of deception when used for the general public, which has no idea how to pronounce the relevant words without having already studied the romanization system thoroughly. However, only one aspect of this shall concern us here, and that is the unfortunate treatment of tone marks in the everyday Western transcription of Chinese words.
Tones are important in all the Sinitic languages. Their usage is common in distinguishing words from each other. Yet, the treatment of tone marks in the 漢語拼音 romanization system results in them being omitted in the vast majority of instances Mandarin Chinese words are transcribed into the Latin alphabet. This results in occasional absurdities. For instance, the provinces of Shanxi (spelled Shānxī in 拼音 with tone marks) and Shaanxi (spelled Shǎnxī in 拼音 with tone marks) are only differentiated in their pronunciation by tone in standard Mandarin. In principle, they should both be spelled Shanxi, or, in order to differentiate them, have the tone marks always be present when spelling them. However, instead of either, the conventional approach to treating the spelling of Shaanxi is to use 國語羅馬字 (a presently rarely used romanization system created under the Republic of China and specifically designed to avert the need for tone marks) for the first syllable and toneless 漢語拼音 for the second! Much the same reasoning is used by those inhabitants of Taiwan who choose to use the 國語羅馬字 romanization system to romanize their names, but the system’s ways of indicating tone are so unintuitive as to result in it being quite inadvisable to be used for the benefit of the Latin script-reading general public.
Given that English does not use diacritics at all except for foreign loanwords, and no Western language normally distinguishes tones via spelling (though many that use the Latin alphabet do), Hànyǔ Pīnyīn style tonal diacritics tend to be omitted from romanized Mandarin and Cantonese text. Much software infamously does not handle tone diacritics very well -for example, letters under the third tone Hànyǔ Pīnyīn caron appear differently from the surrounding font in Substack (they’re bolder and lack serifs). Even languages that use the Latin alphabet seem subject to having their names’ diacritical marks removed from newspapers (though, very importantly, not from English Wikipedia) in a rather biased fashion; for instance, the New York Times and Wall Street Journal both privilege French and Portuguese diacritical marks over written Czech, Vietnamese, and Polish ones. To some extent the omission of Chinese diacritics is due to Mandarin and Cantonese, due to them generally being written in 漢字, rather than in Hànyǔ Pīnyīn, not being considered part of the family of Latin script-written languages such as Portuguese, Polish, Turkish, and Vietnamese, and, instead, being viewed as one of those languages requiring transcription into the Latin script for the benefit of Westerners, such as Russian, Korean, Arabic, and Bengali -all of which are notorious for essentially arbitrary romanization. This is strongly suggested by the fact that Taiwanese Mandarin language names in the popular press are transcribed not according to the rules of 漢語拼音, but whatever romanization system the Taiwanese prefer foreigners to use (thus “Tsai Ing-wen” rather than Cài Yīngwén, Kaohsiung instead of Gāoxióng). Much has been written on the perils of arbitrary Mandarin romanization in Taiwan, and I will not go through it here, but, suffice to say, it has no beneficial consequences.
Aside from any more fundamental reform of 漢語拼音 spelling (for which there are many good reasons I will not discuss here), I here have a modest proposal: Mandarin tone marks should (as in the ㄓㄨˋㄧㄣ ㄈㄨˊㄏㄠˋ phonetic transcription system) be put next to the syllable as separate characters (example: Beiˇjingˉ), instead of diacritics over the syllable nucleus, as they currently are in Hànyǔ Pīnyīn. This would simultaneously end both the practice of occasional 國語羅馬字 type romanization for mainland Chinese words and, killing two birds with one stone, end the confusing (to Westerners) practice of placing an apostrophe to distinguish two syllables in a single word (e.g., "Xi'an"). It would also be more likely to preserve the tone marks in newspapers and social media, as, while removing diacritics is almost universal for Hànyǔ Pīnyīn-romanized Chinese language text -indeed, given their absence from English and the treatment of romanization of Chinese as arbitrary, the diacritics almost beg themselves to be omitted- removing altogether separate characters would likely be too much for most newspaper editors, and, as a result, ordinary Westerners. Some have suggested that numbers be used instead of tone marks, but the use of numbers in writing to indicate sound differences would be so unconventional for Westerners as to be an immediate nonstarter (Cantonese, for which both Jyutping and modified Yale use numbers for noting tone, may well be out of luck here, but, either way, it is extremely doubtful whether it will survive the twenty first century, to the degree that it’s not even in Google Translate). I am sure this proposal is wise and should be implemented by all users of romanized Chinese words immediately. Does anyone else think this is a good idea? Comment below and
I don't think tone marks are necessary for infrequently used Chinese words and proper nouns when writing for an English-speaking audience. They will be meaningless for most readers. A small minority will recognize and understand them as tone marks, but will have no idea what they actually sound like. Only an even smaller minority will have an idea of how the tones actually work and sound like. Pinyin or some other Romanization without tone marks will be sufficient for most cases.
Your proposal is similar to how Wade-Giles Romanization handles tones. Wade-Giles uses the numbers 1 to 4 for the respective tones and places them to the right of the syllable or as a superscript to the right of the syllable. Using numbers is a bit inelegant, but one benefit is that it's easier to read the numbers than the small dashes in the Pinyin tone marks.
"Tsai Ing-Wen" and "Kaohsiung" are the names in Wade-Giles, without the tones. Wade-Giles has traditionally been used in Taiwan and its use still persists today.