Java基础:数据容器源码分析之BitSet      
View on GitHub

龙珠

修炼自己与发现世界

Java基础:数据容器源码分析之BitSet

By arthur503 -- 15 Oct 2013

之前在写布隆过滤器的时候,就想要一个管理位的class,当时是自己写的对位的处理。现在看到BitSet,对bit的操作实现的更舒服。

一、默认参数

BitSet中存储位的值所使用的word是long变量,也就是说,一个word有64个位。BitSet使用long数组存储。BitSet中的默认值如下:

/**
 * The internal field corresponding to the serialField "bits".
 */
private long[] words;

/*
 * BitSets are packed into arrays of "words."  Currently a word is
 * a long, which consists of 64 bits, requiring 6 address bits.
 * The choice of word size is determined purely by performance concerns.
 */
private final static int ADDRESS_BITS_PER_WORD = 6;
private final static int BITS_PER_WORD = 1 << ADDRESS_BITS_PER_WORD;
private final static int BIT_INDEX_MASK = BITS_PER_WORD - 1;

由于long存储64位值,因此每个word需要寻址位数为6(2^6 = 64)。

二、构造方法

BitSet构造方法有两种,空参数构建和int参数构建。空参数构建时默认为64个bit,有参数时按照参数设定的bit位数构建。代码如下:

public BitSet() {
    initWords(BITS_PER_WORD);
    sizeIsSticky = false;
}

public BitSet(int nbits) {
    // nbits can't be negative; size 0 is OK
    if (nbits < 0)
        throw new NegativeArraySizeException("nbits < 0: " + nbits);

    initWords(nbits);
    sizeIsSticky = true;
}

private void initWords(int nbits) {
    words = new long[wordIndex(nbits-1) + 1];
}

我们可以看到,BitSet的初始化参数可以为0,但不能为负数。若为空,默认初始化为BITS_PER_WORD=64位。

三、位操作

BitSet中,对位的操作有两类,包括:取值和设值。

1. 取值

取值的方法为get(),代码如下:

/**
 * Returns the value of the bit with the specified index. The value
 * is {@code true} if the bit with the index {@code bitIndex}
 * is currently set in this {@code BitSet}; otherwise, the result
 * is {@code false}.
 *
 * @param  bitIndex   the bit index
 * @return the value of the bit with the specified index
 * @throws IndexOutOfBoundsException if the specified index is negative
 */
public boolean get(int bitIndex) {
    if (bitIndex < 0)
        throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);

    checkInvariants();

    int wordIndex = wordIndex(bitIndex);
    return (wordIndex < wordsInUse)
        && ((words[wordIndex] & (1L << bitIndex)) != 0);
}

可以看到,get()方法中先通过wordIndex()方法确定bitIndex的位置,之后返回对应位置的bit值。如果bitIndex超过了最大值,则返回false。

2. 设值

设值操作包括三种:设定为true、设定为false、设定为相反值。

set()默认设定bitIndex为true,代码如下:

public void set(int bitIndex) {
    if (bitIndex < 0)
        throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);

    int wordIndex = wordIndex(bitIndex);
    expandTo(wordIndex);

    words[wordIndex] |= (1L << bitIndex); // Restores invariants

    checkInvariants();
}

clear()方法为设定bitIndex为false,代码如下:

public void clear(int bitIndex) {
    if (bitIndex < 0)
        throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);

    int wordIndex = wordIndex(bitIndex);
    if (wordIndex >= wordsInUse)
        return;

    words[wordIndex] &= ~(1L << bitIndex);

    recalculateWordsInUse();
    checkInvariants();
}

此外,set(int bitIndex, boolean value)还可以根据用户的参数设定bitIndex为true或false。

public void set(int bitIndex, boolean value) {
    if (value)
        set(bitIndex);
    else
        clear(bitIndex);
}

flip()方法为设定bitIndex为对应位的相反值,可以使用对应位与1异或的方法,代码如下:

/**
 * Sets the bit at the specified index to the complement of its
 * current value.
 *
 * @param  bitIndex the index of the bit to flip
 * @throws IndexOutOfBoundsException if the specified index is negative
 * @since  1.4
 */
public void flip(int bitIndex) {
    if (bitIndex < 0)
        throw new IndexOutOfBoundsException("bitIndex < 0: " + bitIndex);

    int wordIndex = wordIndex(bitIndex);
    expandTo(wordIndex);

    words[wordIndex] ^= (1L << bitIndex);

    recalculateWordsInUse();
    checkInvariants();
}
四、位操作寻址和数组的扩展

之前代码中已经看到,若要对位操作,先需要确定某位所在的word的wordIndex,方法如下:

/**
 * Given a bit index, return word index containing it.
 */
private static int wordIndex(int bitIndex) {
    return bitIndex >> ADDRESS_BITS_PER_WORD;
}

这个操作基本每个方法都会用到。如果用户输入的bitIndex超过了总大小,BitSet不是返回错误,而是会通过expandTo()方法自动扩充为原来的2倍!代码如下:

/**
 * Ensures that the BitSet can accommodate a given wordIndex,
 * temporarily violating the invariants.  The caller must
 * restore the invariants before returning to the user,
 * possibly using recalculateWordsInUse().
 * @param wordIndex the index to be accommodated.
 */
private void expandTo(int wordIndex) {
    int wordsRequired = wordIndex+1;
    if (wordsInUse < wordsRequired) {
        ensureCapacity(wordsRequired);
        wordsInUse = wordsRequired;
    }
}

/**
 * Ensures that the BitSet can hold enough words.
 * @param wordsRequired the minimum acceptable number of words.
 */
private void ensureCapacity(int wordsRequired) {
    if (words.length < wordsRequired) {
        // Allocate larger of doubled size or required size
        int request = Math.max(2 * words.length, wordsRequired);
        words = Arrays.copyOf(words, request);
        sizeIsSticky = false;
    }
}

根据BitSet,我们可以写bitmap或布隆过滤器一类的算法实现。

参考资料: