java.lang.Object | ||
↳ | java.text.Collator | |
↳ | java.text.RuleBasedCollator |
A concrete implementation class for Collation
.
RuleBasedCollator
has the following restrictions for efficiency
(other subclasses may be used for more complex languages):
RuleBasedCollator
, the
default Unicode Collation Algorithm (UCA) rule-based table is automatically
searched as a backup.The collation table is composed of a list of collation rules, where each rule is of three forms:
<modifier> <relation> <text-argument> <reset> <text-argument>
The rule elements are defined as follows:
b c
is
treated as bc
.This sounds more complicated than it is in practice. For example, the following are equivalent ways of expressing the same thing:
a < b < c a < b & b < c a < c & a < b
Notice that the order is important, as the subsequent item goes immediately after the text-argument. The following are not equivalent:
a < b & a < c a < c & a < b
Either the text-argument must already be present in the sequence, or some
initial substring of the text-argument must be present. For example
"a < b & ae < e"
is valid since "a" is present in the sequence before
"ae" is reset. In this latter case, "ae" is not entered and treated as a
single character; instead, "e" is sorted as if it were expanded to two
characters: "a" followed by an "e". This difference appears in natural
languages: in traditional Spanish "ch" is treated as if it contracts to a
single character (expressed as "c < ch < d"
), while in traditional
German a-umlaut is treated as if it expands to two characters (expressed as
"a,A < b,B ... & ae;ã & AE;Ã"
, where ã and Ã
are the escape sequences for a-umlaut).
For ignorable characters, the first rule must start with a relation (the
examples we have used above are really fragments; "a < b"
really
should be "< a < b"
). If, however, the first relation is not
", then all text-arguments up to the first
" are
ignorable. For example,
", - < a < b"
makes "-"
an ignorable
character.
Normalization and Accents
RuleBasedCollator
automatically processes its rule table to include
both pre-composed and combining-character versions of accented characters.
Even if the provided rule string contains only base characters and separate
combining accent characters, the pre-composed accented characters matching
all canonical combinations of characters from the rule string will be entered
in the table.
This allows you to use a RuleBasedCollator to compare accented strings even
when the collator is set to NO_DECOMPOSITION. However, if the strings to be
collated contain combining sequences that may not be in canonical order, you
should set the collator to CANONICAL_DECOMPOSITION to enable sorting of
combining sequences. For more information, see The Unicode Standard, Version 3.0.
Errors
The following rules are not valid:
A text-argument contains unquoted punctuation symbols, for example
"a < b-c < d"
.
A relation or reset character is not followed by a text-argument, for
example "a < , b"
.
A reset where the text-argument (or an initial substring of the
text-argument) is not already in the sequence or allocated in the default UCA
table, for example "a < b & e < f"
.
If you produce one of these errors,
RuleBasedCollator
throws a
ParseException
.
Examples
Normally, to create a rule-based collator object, you will use
Collator
's factory method getInstance
. However, to create a
rule-based collator object with specialized rules tailored to your needs, you
construct the RuleBasedCollator
with the rules contained in a
String
object. For example:
String Simple = "< a < b < c < d"; RuleBasedCollator mySimple = new RuleBasedCollator(Simple);
Or:
String Norwegian = "< a,A< b,B< c,C< d,D< e,E< f,F< g,G< h,H< i,I" + "< j,J< k,K< l,L< m,M< n,N< o,O< p,P< q,Q< r,R" + "< s,S< t,T< u,U< v,V< w,W< x,X< y,Y< z,Z" + "< å=å,Å=Å" + ";aa,AA< æ,Æ< ø,Ø"; RuleBasedCollator myNorwegian = new RuleBasedCollator(Norwegian);
Combining
Collator
s is as simple as concatenating strings. Here is
an example that combines two Collator
s from two different locales:
// Create an en_US Collator object RuleBasedCollator en_USCollator = (RuleBasedCollator)Collator .getInstance(new Locale("en", "US", "")); // Create a da_DK Collator object RuleBasedCollator da_DKCollator = (RuleBasedCollator)Collator .getInstance(new Locale("da", "DK", "")); // Combine the two collators // First, get the collation rules from en_USCollator String en_USRules = en_USCollator.getRules(); // Second, get the collation rules from da_DKCollator String da_DKRules = da_DKCollator.getRules(); RuleBasedCollator newCollator = new RuleBasedCollator(en_USRules + da_DKRules); // newCollator has the combined rules
The next example shows to make changes on an existing table to create a new
Collator
object. For example, add "& C < ch, cH, Ch, CH"
to
the en_USCollator
object to create your own:
// Create a new Collator object with additional rules String addRules = "& C < ch, cH, Ch, CH"; RuleBasedCollator myCollator = new RuleBasedCollator(en_USCollator + addRules); // myCollator contains the new rules
The following example demonstrates how to change the order of non-spacing
accents:
// old rule String oldRules = "= ¨ ; ¯ ; ¿" + "< a , A ; ae, AE ; æ , Æ" + "< b , B < c, C < e, E & C < d, D"; // change the order of accent characters String addOn = "& ¿ ; ¯ ; ¨;"; RuleBasedCollator myCollator = new RuleBasedCollator(oldRules + addOn);
The last example shows how to put new primary ordering in before the default
setting. For example, in the Japanese
Collator
, you can either sort
English characters before or after Japanese characters:
// get en_US Collator rules RuleBasedCollator en_USCollator = (RuleBasedCollator) Collator.getInstance(Locale.US); // add a few Japanese character to sort before English characters // suppose the last character before the first base letter 'a' in // the English collation rule is ア String jaString = "& ア , ー < ト"; RuleBasedCollator myJapaneseCollator = new RuleBasedCollator(en_USCollator.getRules() + jaString);
Summary
| |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
| |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
| ||||||||||
|
|
| |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||
| |||||||||||
|
Public Constructors
public
RuleBasedCollator
(String rules)
Since: API Level 1
Constructs a new instance of
RuleBasedCollator
using the
specified rules
. The rules
are usually either
hand-written based on the class description
or
the result of a former getRules()
call.
Note that the
rules
are actually interpreted as a delta to the
standard Unicode Collation Algorithm (UCA). This differs
slightly from other implementations which work with full rules
specifications and may result in different behavior.
Parameters
|
|
---|
Throws
|
|
---|---|
|
|
Public Methods
public
Object
clone
()
Since: API Level 1
Returns a new collator with the same collation rules, decomposition mode and
strength value as this collator.
Returns
a shallow copy of this collator.
public
int
compare
(String source, String target)
Since: API Level 1
Compares the
source
text to the target
text according to
the collation rules, strength and decomposition mode for this
RuleBasedCollator
. See the Collator
class description
for an example of use.
General recommendation: If comparisons are to be done with the same strings
multiple times, it is more efficient to generate
CollationKey
objects for the strings and use
CollationKey.compareTo(CollationKey)
for the comparisons. If each
string is compared to only once, using
RuleBasedCollator.compare(String, String)
has better performance.
Parameters
|
|
---|---|
|
|
Returns
an integer which may be a negative value, zero, or else a
positive value depending on whether source
is less than,
equivalent to, or greater than target
.
public
boolean
equals
(Object obj)
Since: API Level 1
Compares the specified object with this
RuleBasedCollator
and
indicates if they are equal. In order to be equal, object
must be
an instance of Collator
with the same collation rules and the
same attributes.
Parameters
|
|
---|
Returns
true
if the specified object is equal to this
RuleBasedCollator
; false
otherwise.
public
CollationElementIterator
getCollationElementIterator
(String source)
Since: API Level 1
Obtains a
CollationElementIterator
for the given string.
Parameters
|
|
---|
Returns
the CollationElementIterator
for source
.
public
CollationElementIterator
getCollationElementIterator
(CharacterIterator source)
Since: API Level 1
Obtains a
CollationElementIterator
for the given
CharacterIterator
. The source iterator's integrity will be
preserved since a new copy will be created for use.
Parameters
|
|
---|
Returns
a CollationElementIterator
for source
.
public
CollationKey
getCollationKey
(String source)
Since: API Level 1
Returns the
CollationKey
for the given source text.
Parameters
|
|
---|
Returns
the CollationKey
for the given source text.
public
String
getRules
()
Since: API Level 1
Returns the collation rules of this collator. These
rules
can be
fed into the RuleBasedCollator(String)
constructor.
Note that the
rules
are actually interpreted as a delta to the
standard Unicode Collation Algorithm (UCA). Hence, an empty rules
string results in the default UCA rules being applied. This differs
slightly from other implementations which work with full rules
specifications and may result in different behavior.
Returns
the collation rules.
public
int
hashCode
()
Since: API Level 1
Returns an integer hash code for this object. By contract, any two
objects for which
equals(Object)
returns true
must return
the same hash code value. This means that subclasses of Object
usually override both methods or neither method.
Note that hash values must not change over time unless information used in equals
comparisons also changes.
See Writing a correct
hashCode
method
if you intend implementing your own hashCode
method.
Returns
this object's hash code.