HashMap源码解析(JDK11)
概览
- HashMap 根据 key 的 hashCode 值进行储存数据,大多数情况喜爱可以直接定位到他的值,因此有很快的访问速度
- 遍历的顺序不确定,并且不能保证顺序不改变
- 扩容是一个特别耗性能的操作,所以 在使用时可以先给一个大致的容量
- 允许一条记录的 key 为 null,且允许多条记录的 value 为 null
- 线程不安全,即同一时刻有多个线程同时写 HashMap,对HashMap 造成结构性变化(A structural modification is any operation that adds or deletes one or more mappings; merely changing the value associated with a key that an instance already contains is not a structural modification.),可能会导致数据不一致,此时可以使用一下方法获得同步方法:Map m = Collections.synchronizedMap(new HashMap(…));或者使用 ConcurrentHashMap
- 提供了耗时 constant-time 的基本操作(get, put)
- Iteration over collection views requires time proportional to the “capacity” of the
HashMap
instance (the number of buckets) plus its size (the number of key-value mappings). Thus, it’s very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important. - As a general rule, the default load factor (0.75) offers a good tradeoff between time and space costs.
- Note that using many keys with the same
hashCode()
is a sure way to slow down performance of any hash table. To ameliorate impact, when keys areComparable
, this class may use comparison order among keys to help break ties.
类名
public class HashMap<K,V> extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {
参数
private static final long serialVersionUID = 362498820763181265L; /** * The default initial capacity - MUST be a power of two. */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 /** * The maximum capacity, used if a higher value is implicitly specified * by either of the constructors with arguments. * MUST be a power of two <= 1<<30. */ static final int MAXIMUM_CAPACITY = 1 << 30; /** * The load factor used when none specified in constructor. */ static final float DEFAULT_LOAD_FACTOR = 0.75f; /** * The bin count threshold for using a tree rather than list for a * bin. Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2 and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon * shrinkage. */ static final int TREEIFY_THRESHOLD = 8; /** * The bin count threshold for untreeifying a (split) bin during a * resize operation. Should be less than TREEIFY_THRESHOLD, and at * most 6 to mesh with shrinkage detection under removal. */ static final int UNTREEIFY_THRESHOLD = 6; /** * The smallest table capacity for which bins may be treeified. * (Otherwise the table is resized if too many nodes in a bin.) * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts * between resizing and treeification thresholds. */ static final int MIN_TREEIFY_CAPACITY = 64;
MAXIMUM_CAPACITY
MAXIMUM_CAPACITY
表示 table 的最大容量,它是 int 类型,因此最大为 1 << 30
(最高位为符号位)
为什么 Map 的容量要限制为2的整数次方
Map 的容量作出这样的限制主要是为了快速定位到数组的位置,并且使数据更加分散,减少碰撞,查看 getNode 源码我们可以发现 :
final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) { if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; if ((e = first.next) != null) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); } } return null; }
first = tab[(n - 1) & hash]
这行代码的作用是通过hash值取出数组中的值,其中 tab 为数组,n 为数组长度,hash为key 的hash值,在默认情况下,n = 16,则n – 1 = 15,二进制为 1111, 这样进行 & hash 操作最终结果还是 hash, 假设n = 15 ,n – 1 = 14(1110), 这样会造成某些位置(如0001, 0011, 0101…)无法存放元素,空间浪费很大,并且数组中可以使用的位置比数组长度小了很多,这意味着进一步增加了碰撞的几率,减慢了查询的效率。因此数组长度使用 2 的整数次方,可以是数据分布得更分散,减少碰撞。
DEFAULT_LOAD_FACTOR
Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution
翻译:
treeNodes 大约是 普通 nodes 两倍,因此我们只在达到 TREEIFY_THRESHOLD(8)时会使用, 并且当 treeNodes 数量变少达到 UNTREEIFY_THRESHOLD(6)时,会将树转变为正常 node。使用hashcode使得节点散列正常,红黑树在平时很少用到。理论上,在随机的 hashcode 情况下,节点出现的概率遵从 Poisson 分布
默认的装载因子为 0.75,这是因为bin(指数组中的一个桶) 中节点出现的概率遵循Poisson分布(泊松分布),此时load factor = 0.75, λ=0.5 $$ P(X=k) =\frac{\lambda^ke^{-\lambda}}{k!} $$ 假如在长度为 16 的数组中放入0.75 * 16 = 12个数据时,数组中某个下标放入 k 个数据(即数组后面链表中的数据量)的概率如下
存储数据量 | 概率 |
---|---|
0 | 0.60653066 |
1 | 0.30326533 |
2 | 0.07581633 |
3 | 0.01263606 |
4 | 0.00157952 |
5 | 0.00015795 |
6 | 0.00001316 |
7 | 0.00000094 |
8 | 0.00000006 |
我们可以看到,一个下标中存在8个nodes的概率是很低的,因此 此时将链表转为 treeNode是合乎情理的,并且性价比较高 ,可以通过增加有限的空间消耗换取时间消耗上的减少。
域
/** * The table, initialized on first use, and resized as * necessary. When allocated, length is always a power of two. * (We also tolerate length zero in some operations to allow * bootstrapping mechanics that are currently not needed.) */ transient Node<K,V>[] table; /** * Holds cached entrySet(). Note that AbstractMap fields are used * for keySet() and values(). */ transient Set<Map.Entry<K,V>> entrySet; /** * The number of key-value mappings contained in this map. */ transient int size; /** * The number of times this HashMap has been structurally modified * Structural modifications are those that change the number of mappings in * the HashMap or otherwise modify its internal structure (e.g., * rehash). This field is used to make iterators on Collection-views of * the HashMap fail-fast. (See ConcurrentModificationException). */ transient int modCount; /** * The next size value at which to resize (capacity * load factor). * * @serial */ // (The javadoc description is true upon serialization. // Additionally, if the table array has not been allocated, this // field holds the initial array capacity, or zero signifying // DEFAULT_INITIAL_CAPACITY.) int threshold; /** * The load factor for the hash table. * * @serial */ final float loadFactor;
threshold
???
构造函数
/** * Constructs an empty {@code HashMap} with the default initial capacity * (16) and the default load factor (0.75). */ public HashMap() { this.loadFactor = DEFAULT_LOAD_FACTOR; // all other fields defaulted } public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); } public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; this.threshold = tableSizeFor(initialCapacity); } static final int tableSizeFor(int cap) { int n = -1 >>> Integer.numberOfLeadingZeros(cap - 1); return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; } public HashMap(Map<? extends K, ? extends V> m) { this.loadFactor = DEFAULT_LOAD_FACTOR; putMapEntries(m, false); } final void putMapEntries(Map<? extends K, ? extends V> m, boolean evict) { int s = m.size(); if (s > 0) { if (table == null) { // pre-size float ft = ((float)s / loadFactor) + 1.0F; int t = ((ft < (float)MAXIMUM_CAPACITY) ? (int)ft : MAXIMUM_CAPACITY); if (t > threshold) threshold = tableSizeFor(t); } else if (s > threshold) resize(); for (Map.Entry<? extends K, ? extends V> e : m.entrySet()) { K key = e.getKey(); V value = e.getValue(); putVal(hash(key), key, value, false, evict); } } }
通过四个构造函数我们可以看到,其主要工作为初始化这些参数:loadFactor
, threshold
,并没有实际初始化链表和数组,这样可以节省空间。
在 JDK8 中, threshold 的初始化方法:
/** * Returns a power of two size for the given target capacity. */ static final int tableSizeFor(int cap) { int n = cap - 1; n |= n >>> 1; n |= n >>> 2; n |= n >>> 4; n |= n >>> 8; n |= n >>> 16; return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }
JDK11 中调用了 Integer.numberOfLeadingZeros()
方法,该方法主要作用:指定 int 值的二进制补码表示形式中最高位(最左边)的 1 位之前,返回零位的数量。 可以参考博客:jdk11源码–Integer.numberOfLeadingZeros
/** * Returns a power of two size for the given target capacity. */ static final int tableSizeFor(int cap) { int n = -1 >>> Integer.numberOfLeadingZeros(cap - 1); return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }
以上两个 tableSizeFor(int cap)
都是返回最近的不小于输入参数的2的整数次幂。比如10,则返回16,集体步骤如下:
- Integer.numberOfLeadingZeros(cap – 1) = 28,二进制表示为
00000000 00000000 00000000 00011100
- n = -1 >>> 28:
- -1 的原码:
10000000 00000000 00000000 00000001
- -1 的反码:
11111111 11111111 11111111 11111110
- -1 的补码:
11111111 11111111 11111111 11111111
- -1 无符号 右移28位:
00000000 00000000 00000000 00001111
= 15
- -1 的原码:
- 最终结果:15 + 1 = 16
存储结构
HashMap 的存储结构为 数组 + 链表 + 红黑树(JDK8后新加),我们看源码需要弄清楚两个问题:数据底层具体存储的是什么?这样的存储方式有什么好处?
Node
HashMap 中的 Node 内部类实现了 Map.Entry 接口,本质是一个键值对,并且 HashMap 内部有一个Node<K, V>[] table 字段,这是一个 哈希桶数组
public static interface Map.Entry<K,V>
A map entry (key-value pair). The Map.entrySet
method returns a collection-view of the map, whose elements are of this class. The only way to obtain a reference to a map entry is from the iterator of this collection-view. These Map.Entry
objects are valid only for the duration of the iteration; more formally, the behavior of a map entry is undefined if the backing map has been modified after the entry was returned by the iterator, except through the setValue
operation on the map entry.
/** * The table, initialized on first use, and resized as * necessary. When allocated, length is always a power of two. * (We also tolerate length zero in some operations to allow * bootstrapping mechanics that are currently not needed.) */ transient Node<K,V>[] table; /** * Basic hash bin node, used for most entries. (See below for * TreeNode subclass, and in LinkedHashMap for its Entry subclass.) */ static class Node<K,V> implements Map.Entry<K,V> { // 用来定位数据索引的位置 final int hash; final K key; V value; // 链表的下一个 Node Node<K,V> next; Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } public final K getKey() { return key; } public final V getValue() { return value; } public final String toString() { return key + "=" + value; } public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } public final boolean equals(Object o) { if (o == this) return true; if (o instanceof Map.Entry) { Map.Entry<?,?> e = (Map.Entry<?,?>)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; } }
重要方法
首先明白一下概念,HashMap就是使用哈希表来存储的。哈希表为解决冲突,可以采用开放地址法和链地址法等来解决问题,Java中HashMap采用了链地址法。链地址法,简单来说,就是数组加链表的结合。在每个数组元素上都一个链表结构,当数据被Hash后,得到数组下标,把数据放在对应下标元素的链表上。
确定哈希桶数组索引位置
/** * Computes key.hashCode() and spreads (XORs) higher bits of hash * to lower. Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.) So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds. * * 翻译: * 计算 key.hashCode()的值,并且将hash的高位通过异或(XORs)扩展到地位,由于table * 使用2的指数进行masking,因此仅在其范围进行mask的hash将会发生冲突(已知的例子为在 * 小table中连续的Float keys)所以这里使用了转换将高位延展到地位,这是在speed * utility和quality of bit-spreading之间的tradeoff,因为很多常见的hash已经合理 * 地分布 (所以不会从spread中受益),并且我哦们使用红黑树来处理多hash冲突,我们只XOR * 一 些位来减少系统缺失,并且否则高位的不准确影响不会影响到index 计算。 * */ static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); } //jdk1.7的源码,jdk1.8没有这个方法,但是实现原理一样的 static int indexFor(int h, int length) { return h & (length-1); } // JDK 1.8 源码 final Node<K,V> getNode(int hash, Object key) { // ... (first = tab[(n - 1) & hash]) != null) // ... }
确定数组索引位置一共有三步:
- 取 hashCode值:h = key.hashCode()
- 高位参与运算:h ^ (h >>> 16)(此实现从JDK1.8开始,优化了高位运算的算法,通过hashCode()的高16位异或低16位实现的:(h = k.hashCode()) ^ (h >>> 16),主要是从速度、功效、质量来考虑的,这么做可以在�数组table的length比较小的时候,也能保证考虑到高低Bit都参与到Hash的计算中,同时不会有太大的开销。)
- 取模运算:(n – 1)& hash (n 为 capacity,这里利用位运算进行取模,效率更高,该表达式相当于 hash % n)
我们这里重点看一下取模运算,这里用了位运算,而没有使用 %
,这是因为 数组的长度 n (capacity)是2的指数
我们这里以 x = 1 << 4 为例
x: 00010000 x - 1: 00001111
令 y 与 x – 1 做 & 运算
y : 10110110 x - 1: 00001111 y&(x - 1): 00000110
这个表达式与 y 对 x 取模结果是一样的
y : 10110110 x : 00010000 y & x : 00000110
我们还能从运算中得出结论:**数组的 长度 n 越小,最终计算出的 索引 越有可能发生碰撞,因此我们在进行取模运算之前需要先进行高位异或运算,来减小发生碰撞的概率。**仅仅异或一下,既减少了系统的开销,也不会造成因为高位没有参与下标的计算(table长度比较小时),从而引起的碰撞。如下表中的例子
key.hashCode() | 00001010 00001111 00001001 00001101 |
---|---|
无符号右移16位 | 00000000 00000000 00001010 00001111 |
两者疑惑的结果 | 00001010 00001111 00000011 00000010 |
另外有一点要注意,HashMap
允许 null
作为 key
, 所以在 hash
方法中我们可以看到 null
对象的 hash
结果为 0, HashMap
使用第 0 个桶来存放键为 null
的键值对。
put 方法
public V put(K key, V value) { return putVal(hash(key), key, value, false, true); } final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; // 1. 判断 table 是否为空或为null,如果为空或null则调用 `resize()` 进行初始化 if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; // 2. 判断当前索引是否有值 if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); else { Node<K,V> e; K k; // 3. 判断数组当前索引存在的 Node 的 hash 与正要加入 Node 的hash 是否相同,以及两者的 key 是否同一引用或者equals方法返回true if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) e = p; // 4. 判断目前索引的值是否在红黑树上 else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else { // 5. 遍历当前索引链表,如果链表长度大于8,则将链表转为红黑树,在红黑树中进行插入操作,否则进行链表的插入操作,如果遍历过程中发现相同的node,则直接退出循环 for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } // key 已经存在则直接覆盖 if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } if (e != null) { // existing mapping for key V oldValue = e.value; if (!onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; // 6. 判断是否超过 threshold,超过则扩容 if (++size > threshold) resize(); afterNodeInsertion(evict); return null; } /* * The following package-protected methods are designed to be * overridden by LinkedHashMap, but not by any other subclass. * Nearly all other internal methods are also package-protected * but are declared final, so can be used by LinkedHashMap, view * classes, and HashSet. */ // Create a regular (non-tree) node Node<K,V> newNode(int hash, K key, V value, Node<K,V> next) { return new Node<>(hash, key, value, next); }
Put 方法可以抽象概括为一下六步:
- 判断 table 是否为空或为null,如果为空或null则调用
resize()
进行初始化。 - 计算 key 的 hash 值,并最终通过上面的方法求出该 key 在数组中对应的索引 i,如果 table[i] == null,直接通过
newNode()
新建非树节点(Node),转向 步骤6,否则转向 步骤3. - 判断数组当前索引存在的 首个Node 的 hash 与正要加入 Node 的hash 是否相同(用到
key.hashCode()
),以及两者的 key 是否同一引用或者equals方法(用到key.equals()
)返回true,**这一步就解释了为什么作为 key 的类一定要重写hashCode
和equals
方法。**比较之后如果相同,则直接覆盖 value,否则 步骤4。 - 判断 table[i] 是否为 TreeNode,即是否是红黑树,如果是红黑树直接在树上插入键值对,否则转向 步骤5.
- 遍历当前索引链表,如果链表长度大于8,则将链表转为红黑树,在红黑树中进行插入操作,否则进行链表的插入操作,如果遍历过程中发现相同的node,则直接覆盖。从源码中我们可以看到,从JDK8开始,链表的插入方法由头插法改成了尾插法。
- 插入成功后(这里的插入指之前 i 索引处无 node时插入),判断实际存在的键值对数量是否超过了最大容量 threshold,如果超过,进行扩容。
resize 方法
/** * Initializes or doubles table size. If null, allocates in * accord with initial capacity target held in field threshold. * Otherwise, because we are using power-of-two expansion, the * elements from each bin must either stay at same index, or move * with a power of two offset in the new table. * * @return the table */ final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; if (oldCap > 0) { // 当数组容量达到最大时不再扩容 if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; // 创建 hashMap 之后第一次put else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; // 循环遍历老map中的元素,迁移到对应新数组中去,进行扩容操作 if (oldTab != null) { for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) != null) { // 老数组中元素引用置为null oldTab[j] = null; // 该索引只有一个元素,则直接计算新索引将 e 复制到新数组 if (e.next == null) newTab[e.hash & (newCap - 1)] = e; // 如果是红黑树 else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); // 如果该所以链表由多个值 else { // preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; // 这里通过当前hash与数组长度进行逻辑与操作,判断是否为0,来区分该元素是不变位置还是需要重新更换位置 if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null); if (loTail != null) { loTail.next = null; newTab[j] = loHead; } if (hiTail != null) { hiTail.next = null; // 现有index+现有数组的长度就是新数组中的索引位置 newTab[j + oldCap] = hiHead; } } } } } return newTab; }
分析以上代码可以得出如下结论:
- 新建一个 HashMap 对象后,第一次
put
会触发resize
操作,此时会用threshold
(初始threshold为12) 来初始化capacity
(line 687newCap = oldThr;
) - 当当前数组长度 >= 16 时,数组会扩容为原来的两倍(
newCap = oldCap << 1`` newThr = oldThr << 1
) - 扩容时不会重新计算hash值,hash值保存在 Node 对象中,只会改变 Node 对象在数组中的索引
- 扩容时,现有元素要么不动,要么索引变为原来索引 + 原来数组长度
- 扩容时判断元素是否移动是通过
(e.hash & oldCap) == 0
,下面举例说明
我们以 map.put("a1","a");
为例, a1 计算后的hash值为3056,经过计算,初始化时 a1存在数组的0索引处
初始化数组长度为16, 16 – 1 = 15 对应的二进制 | 00000000 00001111 |
---|---|
3056对应的二进制 | 00001011 11110000 |
3056 & 15 | 00000000 00000000 |
扩容后,数组长度变为32,此时计算 (e.hash & oldCap)
数组初始化长度16 | 00000000 00010000 |
---|---|
3056 二进制值 | 00001011 11110000 |
3056 & 16 | 00000000 00010000 |
可以 看到结果不为0, 说明 a1 需要迁移,我们来计算一下迁移后的新位置索引
扩容后数组长度为32, 32 – 1 = 31 二进制值 | 00000000 00011111 |
---|---|
3056 二进制值 | 00001011 11110000 |
31 & 3056 | 00000000 00010000 |
我们可以看到计算出的索引为16,正好等于j + oldCap
(0 + 16)
,因为扩容后,31 相对于 15 在高位多了一个 1, 此时只要通过 (e.hash & oldCap)
计算出该高位多出的 1 对索引的影响是0还是2的指数即可,是0的话索引不变,不为0则变成 原索引+oldCap
,因此扩容时node 要么不变,要么就移动2的指数。
注意:
**JDK7中HashMap使用了链表的头插法,在多线程环境下容易出现死循环,JDK8以后才有尾插法,不存在这个问题。**具体可看这篇 面试官: HashMap 为什么线程不安全?
treeifyBin 方法
/** * Replaces all linked nodes in bin at index for given hash unless * table is too small, in which case resizes instead. */ final void treeifyBin(Node<K,V>[] tab, int hash) { int n, index; Node<K,V> e; // 如果table为null或者长度小于64时不转换为红黑树,此时进行扩容 if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) resize(); // 如果数组该索引的元素不为null else if ((e = tab[index = (n - 1) & hash]) != null) { TreeNode<K,V> hd = null, tl = null; // 循环遍历链表,将链表转为红黑树 do { // 根据链表的 node 创建 TreeNode TreeNode<K,V> p = replacementTreeNode(e, null); if (tl == null) hd = p; else { p.prev = tl; tl.next = p; } tl = p; } while ((e = e.next) != null); if ((tab[index] = hd) != null) hd.treeify(tab); } }
我们可以看到这里用到了 MIN_TREEIFY_CAPACITY(64)
参数,如果链表长度大于等于8,但是Node数组长度小于64,此时不会转为红黑树,而是进行 resize()
扩容,红黑树虽然查询时间复杂度为O(logN),但是空间占用大HashMap的设计是尽量不用红黑树。
get 方法
public V get(Object key) { Node<K,V> e; return (e = getNode(hash(key), key)) == null ? null : e.value; } final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; // 判断table不为null,且table长度大于0,且对应索引的第一个node不为null if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) { if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; if ((e = first.next) != null) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); } } return null; }
通过源码我们整理出 get 的步骤大致如下:
- 判断table不为null,且table长度大于0,且对应索引的第一个node不为null,如果不满足直接返回null
- 判断通过key计算出的hash与当前索引第一个节点的hash是否相同,并且key是否相同(满足equals方法或者引用相同),相同则返回当前第一个节点,不同继续下面的步骤。
- 若第一个节点first为红黑树,则用红黑树算法取对应value,若是链表,则循环链表,按照上面同样的方法,想比较hash是否相同,在比较key是否相同。
红黑树查询时间复杂度:O(logN)
链表查询时间复杂度:O(N)
fail fast
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw
ConcurrentModificationException
on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.