Press "Enter" to skip to content

Java source code could switch to UTF-8 encoding

The source code for the Java Development Kit (JDK) will be rewritten in UTF-8 (Unicode Transform Format) to facilitate more defined encoding, according to a plan underway in the OpenJDK Java community.

The proposal, created in early January and updated on February 28, can be found at He describes the current state of the source code in the JDK as an “ill-defined encoding”, with no official declaration of the encoding used, while adding that it is mostly ASCII but with some non-ASCII characters that are not well-defined. The current situation creates unnecessary problems when working with the JDK codebase, for no other reason than historical background, the proposal states.

UTF-8, the byte-oriented encoding form of Unicode that is considered the web standard for character encoding, was designated as the default character set of the Java standard APIs, with the release of JDK 18 in March 2022. The new proposal would convert the JDK code base to UTF-8 using the following steps:

  • Tell Git that the text files are encoded in UTF-8.
  • Examine the code base for text files that contain non-ASCII characters, and convert them to UTF-8 if they are not already.
  • Update the tools used in the Java build to recognize that the files are now in UTF-8 and treat them accordingly by updating the compiler flags.

Copyright © 2023 IDG Communications, Inc.

Also Read:  Build a Java application to talk to ChatGPT

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *