Class CodePageUtil


  • public class CodePageUtil
    extends java.lang.Object
    Utilities for working with Microsoft CodePages.

    Provides constants for understanding numeric codepages, along with utilities to translate these into Java Character Sets.

    • Constructor Summary

      Constructors 
      Constructor Description
      CodePageUtil()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static java.lang.String codepageToEncoding​(int codepage)
      Turns a codepage number into the equivalent character encoding's name (in Java NIO canonical naming format).
      static java.lang.String codepageToEncoding​(int codepage, boolean javaLangFormat)
      Turns a codepage number into the equivalent character encoding's name, in either Java NIO or Java Lang canonical naming.
      static java.lang.String cp950ToString​(byte[] data, int offset, int lengthInBytes)
      This tries to convert a LE byte array in cp950 (Microsoft's dialect of Big5) to a String.
      static byte[] getBytesInCodePage​(java.lang.String string, int codepage)
      Converts a string into bytes, in the equivalent character encoding to the supplied codepage number.
      static java.lang.String getStringFromCodePage​(byte[] string, int codepage)
      Converts the bytes into a String, based on the equivalent character encoding to the supplied codepage number.
      static java.lang.String getStringFromCodePage​(byte[] string, int offset, int length, int codepage)
      Converts the bytes into a String, based on the equivalent character encoding to the supplied codepage number.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • DOUBLE_BYTE_CHARSETS

        public static final java.util.Set<java.nio.charset.Charset> DOUBLE_BYTE_CHARSETS
      • CP_UTF16_BE

        public static final int CP_UTF16_BE

        Codepage for UTF-16 big-endian

        See Also:
        Constant Field Values
      • CP_WINDOWS_1250

        public static final int CP_WINDOWS_1250

        Codepage for Windows 1250

        See Also:
        Constant Field Values
      • CP_WINDOWS_1251

        public static final int CP_WINDOWS_1251

        Codepage for Windows 1251

        See Also:
        Constant Field Values
      • CP_WINDOWS_1252

        public static final int CP_WINDOWS_1252

        Codepage for Windows 1252

        See Also:
        Constant Field Values
      • CP_WINDOWS_1253

        public static final int CP_WINDOWS_1253

        Codepage for Windows 1253

        See Also:
        Constant Field Values
      • CP_WINDOWS_1254

        public static final int CP_WINDOWS_1254

        Codepage for Windows 1254

        See Also:
        Constant Field Values
      • CP_WINDOWS_1255

        public static final int CP_WINDOWS_1255

        Codepage for Windows 1255

        See Also:
        Constant Field Values
      • CP_WINDOWS_1256

        public static final int CP_WINDOWS_1256

        Codepage for Windows 1256

        See Also:
        Constant Field Values
      • CP_WINDOWS_1257

        public static final int CP_WINDOWS_1257

        Codepage for Windows 1257

        See Also:
        Constant Field Values
      • CP_WINDOWS_1258

        public static final int CP_WINDOWS_1258

        Codepage for Windows 1258

        See Also:
        Constant Field Values
      • CP_MAC_ROMAN

        public static final int CP_MAC_ROMAN

        Codepage for Macintosh Roman (Java: MacRoman)

        See Also:
        Constant Field Values
      • CP_MAC_JAPAN

        public static final int CP_MAC_JAPAN

        Codepage for Macintosh Japan (Java: unknown - use SJIS, cp942 or cp943)

        See Also:
        Constant Field Values
      • CP_MAC_CHINESE_TRADITIONAL

        public static final int CP_MAC_CHINESE_TRADITIONAL

        Codepage for Macintosh Chinese Traditional (Java: unknown - use Big5, MS950, or cp937)

        See Also:
        Constant Field Values
      • CP_MAC_KOREAN

        public static final int CP_MAC_KOREAN

        Codepage for Macintosh Korean (Java: unknown - use EUC_KR or cp949)

        See Also:
        Constant Field Values
      • CP_MAC_ARABIC

        public static final int CP_MAC_ARABIC

        Codepage for Macintosh Arabic (Java: MacArabic)

        See Also:
        Constant Field Values
      • CP_MAC_HEBREW

        public static final int CP_MAC_HEBREW

        Codepage for Macintosh Hebrew (Java: MacHebrew)

        See Also:
        Constant Field Values
      • CP_MAC_GREEK

        public static final int CP_MAC_GREEK

        Codepage for Macintosh Greek (Java: MacGreek)

        See Also:
        Constant Field Values
      • CP_MAC_CYRILLIC

        public static final int CP_MAC_CYRILLIC

        Codepage for Macintosh Cyrillic (Java: MacCyrillic)

        See Also:
        Constant Field Values
      • CP_MAC_CHINESE_SIMPLE

        public static final int CP_MAC_CHINESE_SIMPLE

        Codepage for Macintosh Chinese Simplified (Java: unknown - use EUC_CN, ISO2022_CN_GB, MS936 or cp935)

        See Also:
        Constant Field Values
      • CP_MAC_ROMANIA

        public static final int CP_MAC_ROMANIA

        Codepage for Macintosh Romanian (Java: MacRomania)

        See Also:
        Constant Field Values
      • CP_MAC_UKRAINE

        public static final int CP_MAC_UKRAINE

        Codepage for Macintosh Ukrainian (Java: MacUkraine)

        See Also:
        Constant Field Values
      • CP_MAC_THAI

        public static final int CP_MAC_THAI

        Codepage for Macintosh Thai (Java: MacThai)

        See Also:
        Constant Field Values
      • CP_MAC_CENTRAL_EUROPE

        public static final int CP_MAC_CENTRAL_EUROPE

        Codepage for Macintosh Central Europe (Latin-2) (Java: MacCentralEurope)

        See Also:
        Constant Field Values
      • CP_MAC_ICELAND

        public static final int CP_MAC_ICELAND

        Codepage for Macintosh Iceland (Java: MacIceland)

        See Also:
        Constant Field Values
      • CP_MAC_TURKISH

        public static final int CP_MAC_TURKISH

        Codepage for Macintosh Turkish (Java: MacTurkish)

        See Also:
        Constant Field Values
      • CP_MAC_CROATIAN

        public static final int CP_MAC_CROATIAN

        Codepage for Macintosh Croatian (Java: MacCroatian)

        See Also:
        Constant Field Values
      • CP_ISO_8859_1

        public static final int CP_ISO_8859_1

        Codepage for ISO-8859-1

        See Also:
        Constant Field Values
      • CP_ISO_8859_2

        public static final int CP_ISO_8859_2

        Codepage for ISO-8859-2

        See Also:
        Constant Field Values
      • CP_ISO_8859_3

        public static final int CP_ISO_8859_3

        Codepage for ISO-8859-3

        See Also:
        Constant Field Values
      • CP_ISO_8859_4

        public static final int CP_ISO_8859_4

        Codepage for ISO-8859-4

        See Also:
        Constant Field Values
      • CP_ISO_8859_5

        public static final int CP_ISO_8859_5

        Codepage for ISO-8859-5

        See Also:
        Constant Field Values
      • CP_ISO_8859_6

        public static final int CP_ISO_8859_6

        Codepage for ISO-8859-6

        See Also:
        Constant Field Values
      • CP_ISO_8859_7

        public static final int CP_ISO_8859_7

        Codepage for ISO-8859-7

        See Also:
        Constant Field Values
      • CP_ISO_8859_8

        public static final int CP_ISO_8859_8

        Codepage for ISO-8859-8

        See Also:
        Constant Field Values
      • CP_ISO_8859_9

        public static final int CP_ISO_8859_9

        Codepage for ISO-8859-9

        See Also:
        Constant Field Values
      • CP_ISO_2022_JP1

        public static final int CP_ISO_2022_JP1

        Codepage for ISO-2022-JP

        See Also:
        Constant Field Values
      • CP_ISO_2022_JP2

        public static final int CP_ISO_2022_JP2

        Another codepage for ISO-2022-JP

        See Also:
        Constant Field Values
      • CP_ISO_2022_JP3

        public static final int CP_ISO_2022_JP3

        Yet another codepage for ISO-2022-JP

        See Also:
        Constant Field Values
      • CP_ISO_2022_KR

        public static final int CP_ISO_2022_KR

        Codepage for ISO-2022-KR

        See Also:
        Constant Field Values
      • CP_US_ASCII2

        public static final int CP_US_ASCII2

        Another codepage for US-ASCII

        See Also:
        Constant Field Values
    • Constructor Detail

      • CodePageUtil

        public CodePageUtil()
    • Method Detail

      • getBytesInCodePage

        public static byte[] getBytesInCodePage​(java.lang.String string,
                                                int codepage)
                                         throws java.io.UnsupportedEncodingException
        Converts a string into bytes, in the equivalent character encoding to the supplied codepage number.
        Parameters:
        string - The string to convert
        codepage - The codepage number
        Throws:
        java.io.UnsupportedEncodingException
      • getStringFromCodePage

        public static java.lang.String getStringFromCodePage​(byte[] string,
                                                             int codepage)
                                                      throws java.io.UnsupportedEncodingException
        Converts the bytes into a String, based on the equivalent character encoding to the supplied codepage number.
        Parameters:
        string - The byte of the string to convert
        codepage - The codepage number
        Throws:
        java.io.UnsupportedEncodingException
      • getStringFromCodePage

        public static java.lang.String getStringFromCodePage​(byte[] string,
                                                             int offset,
                                                             int length,
                                                             int codepage)
                                                      throws java.io.UnsupportedEncodingException
        Converts the bytes into a String, based on the equivalent character encoding to the supplied codepage number.
        Parameters:
        string - The byte of the string to convert
        codepage - The codepage number
        Throws:
        java.io.UnsupportedEncodingException
      • codepageToEncoding

        public static java.lang.String codepageToEncoding​(int codepage)
                                                   throws java.io.UnsupportedEncodingException

        Turns a codepage number into the equivalent character encoding's name (in Java NIO canonical naming format).

        Parameters:
        codepage - The codepage number
        Returns:
        The character encoding's name. If the codepage number is 65001, the encoding name is "UTF-8". All other positive numbers are mapped to their Java NIO names, normally either "windows-" followed by the number, eg "windows-1251", or "cp" followed by the number, e.g. if the codepage number is 1252 the returned character encoding name will be "cp1252".
        Throws:
        java.io.UnsupportedEncodingException - if the specified codepage is less than zero.
      • codepageToEncoding

        public static java.lang.String codepageToEncoding​(int codepage,
                                                          boolean javaLangFormat)
                                                   throws java.io.UnsupportedEncodingException

        Turns a codepage number into the equivalent character encoding's name, in either Java NIO or Java Lang canonical naming.

        Parameters:
        codepage - The codepage number
        javaLangFormat - Should Java Lang or Java NIO naming be used?
        Returns:
        The character encoding's name, in either Java Lang format (eg Cp1251, ISO8859_5) or Java NIO format (eg windows-1252, ISO-8859-9)
        Throws:
        java.io.UnsupportedEncodingException - if the specified codepage is less than zero.
        See Also:
        Supported Encodings
      • cp950ToString

        public static java.lang.String cp950ToString​(byte[] data,
                                                     int offset,
                                                     int lengthInBytes)
        This tries to convert a LE byte array in cp950 (Microsoft's dialect of Big5) to a String. We know MS zero-padded ascii, and we drop those. There may be areas for improvement in this.
        Parameters:
        data -
        offset -
        lengthInBytes -
        Returns:
        Decoded String