Package org.apache.sis.util
Class Characters.Filter
- Object
-
- Character.Subset
-
- Filter
-
- Enclosing class:
- Characters
public static class Characters.Filter extends Character.Subset
Subsets of Unicode characters identified by their general category. The categories are identified by constants defined in theCharacter
class, likeLOWERCASE_LETTER
,UPPERCASE_LETTER
,DECIMAL_DIGIT_NUMBER
andSPACE_SEPARATOR
.An instance of this class can be obtained from an enumeration of character types using the
forTypes(byte[])
method, or using one of the constants predefined in this class. Then, Unicode characters can be tested for inclusion in the subset by calling thecontains(int)
method.Relationship with international standardsISO 19162:2015 §B.5.2 recommends to ignore spaces, case and the following characters when comparing two identified object names: “_
” (underscore), “-
” (minus sign), “/
” (solidus), “(
” (left parenthesis) and “)
” (right parenthesis). The same specification also limits the set of valid characters in a name to the following (§6.3.1):A-Z a-z 0-9 _ [ ] ( ) { } < = > . , : ; + - (space) % & ' " * ^ / \ ? | °
Note: SIS does not enforce this restriction in its programmatic API, but may perform some character substitutions at Well Known Text (WKT) formatting time.If we take only the characters in the above list which are valid in a Unicode identifier and remove the characters that ISO 19162 recommends to ignore, the only characters left are letters and digits.- Since:
- 0.3
- See Also:
Character.Subset
,Character.getType(int)
, WKT 2 specification §B.5
Defined in the
sis-utility
module
-
-
Field Summary
Fields Modifier and Type Field Description static Characters.Filter
LETTERS_AND_DIGITS
The subset of all characters for whichCharacter.isLetterOrDigit(int)
returnstrue
.static Characters.Filter
UNICODE_IDENTIFIER
The subset of all characters for whichCharacter.isUnicodeIdentifierPart(int)
returnstrue
, excluding ignorable characters.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
contains(int codePoint)
Returnstrue
if this subset contains the given Unicode character.boolean
containsType(int type)
Returnstrue
if this subset contains the characters of the given type.static Characters.Filter
forTypes(byte... types)
Returns a subset representing the union of all Unicode characters of the given types.-
Methods inherited from class Character.Subset
equals, hashCode, toString
-
-
-
-
Field Detail
-
LETTERS_AND_DIGITS
public static final Characters.Filter LETTERS_AND_DIGITS
The subset of all characters for whichCharacter.isLetterOrDigit(int)
returnstrue
. This subset includes the following general categories:Character.LOWERCASE_LETTER
,UPPERCASE_LETTER
,TITLECASE_LETTER
,MODIFIER_LETTER
,OTHER_LETTER
andDECIMAL_DIGIT_NUMBER
.
-
UNICODE_IDENTIFIER
public static final Characters.Filter UNICODE_IDENTIFIER
The subset of all characters for whichCharacter.isUnicodeIdentifierPart(int)
returnstrue
, excluding ignorable characters. This subset includes all theLETTERS_AND_DIGITS
categories with the addition of the following ones:Character.LETTER_NUMBER
,CONNECTOR_PUNCTUATION
,NON_SPACING_MARK
andCOMBINING_SPACING_MARK
.
-
-
Method Detail
-
contains
public boolean contains(int codePoint)
Returnstrue
if this subset contains the given Unicode character.- Parameters:
codePoint
- the Unicode character, as a code point value.- Returns:
true
if this subset contains the given character.
-
containsType
public final boolean containsType(int type)
Returnstrue
if this subset contains the characters of the given type. The given type shall be one of theCharacter
constants likeLOWERCASE_LETTER
,UPPERCASE_LETTER
,DECIMAL_DIGIT_NUMBER
orSPACE_SEPARATOR
.- Parameters:
type
- one of theCharacter
constants.- Returns:
true
if this subset contains the characters of the given type.- See Also:
Character.getType(int)
-
forTypes
public static Characters.Filter forTypes(byte... types)
Returns a subset representing the union of all Unicode characters of the given types.- Parameters:
types
- the character types, asCharacter
constants.- Returns:
- the subset of Unicode characters of the given type.
- See Also:
Character.LOWERCASE_LETTER
,Character.UPPERCASE_LETTER
,Character.DECIMAL_DIGIT_NUMBER
,Character.SPACE_SEPARATOR
-
-