Pharo By Example - 第十四章集合（Collections）

2024-10-24 16:38 by farmCoder

为了充分利用集合类，读者至少需要了解存在的各种集合及其共性和差异。这就是本章的内容。

集合类形成了一个松散定义的 Collection 和 Stream 的通用子类群。其中一些子类，如 Bitmap 或 CompiledMethod，是为系统其他部分或应用程序中的使用而设计的专用类，因此未被系统组织归类为集合。

在本章中，我们使用术语 集合层次结构 来表示 Collection 及其子类，这些子类也在标记为 Collections-* 的包中。我们使用术语 流层次结构 来表示 Stream 及其子类，这些子类也在 Collections-Streams 包中。

在本章中，我们主要关注图14-1所示的collection类子集。流会在专门的章节中介绍。

Pharo 默认提供了一套良好的集合。此外，项目 Containers（可在 http://www.github.com/Pharo-Containers/ 获取）提供了替代实现或新的集合和数据结构。

让我们从关于 Pharo 集合设计的一个重要点开始。它们的 API 大量使用高阶函数：因此，虽然我们可以像在旧 Java 中使用 for 循环，但大多数情况下 Pharo 开发者会使用基于高阶函数的迭代器风格。

Object
  └─ Collection
       ├─ Bag
       ├─ SequenceableCollection
       │   ├─ LinkedList
       │   ├─ Interval
       │   ├─ OrderedCollection
       │   │    └─ SortedCollection
       │   └─ ArrayedCollection
       │        ├─ Array
       │        ├─ String
       │        │    ├─ ByteString
       │        │    └─ Symbol
       │        └─ Text
       └─ HashedCollection
            ├─ Set
            │    ├─ IdentitySet
            │    └─ PluggableSet
            └─ Dictionary
                 ├─ IdentityDictionary
                 ├─ PluggableDictionary
                 └─ KeyedTree

图 14-1

14.1 高阶函数

使用高阶函数而不是针对 Collection 的单个元素编程是提高程序抽象级别的重要方法。Lisp 函数 map 就是这种风格的一个早期例子，它对列表的每个元素应用一个参数函数，并返回一个包含结果的新列表。在 Smalltalk 的基础上，Pharo 采用了这种基于 Collection 的高阶编程作为核心原则。现代函数式编程语言，如 ML 和 Haskell，已经追随了 Smalltalk 的脚步。

为什么这是个好主意？让我们假设你有一个包含学生集合的数据结构，并希望对符合某些标准的所有学生执行某些操作。使用命令式语言的程序员马上会使用循环，但Pharo程序员会这样写：

students
  select: [ :each | each gpa < threshold ]

此表达式返回一个新集合，其中包含 students 中那些使块（括号内的函数）返回 true 的元素。该块可以被视为定义匿名函数 x. x gpa < threshold 的 lambda 表达式。这段代码具有特定领域查询语言的简洁性和优雅性。

Pharo中的所有Collection都可以理解消息select:。不需要确定学生数据结构是数组还是链表。注意，这与使用循环相当不同，在构造循环之前，必须知道students是数组还是链表。

在Pharo中，当谈及collection时，如果没有更具体地说明collection的类型，就意味着该对象支持定义良好的协议，用于测试成员关系和枚举元素。所有collection都理解testing（测试）消息includes:，isEmpty和occurrencesOf:。所有collection都理解enumeration（枚举）消息do:, select:, reject:(与select:相反)，collect:（类似于Lisp的map）,detect:ifNone, inject:into:（执行左折叠）等等。正是这种协议的无处不在，以及它的多样性，使得它如此强大。

下表总结了collection层次结构中大多数类支持的标准协议。这些方法被定义、重新定义、优化，有时甚至被Collection的子类禁止。

Protocol	Methods
accessing	`size`,`capacity`,`at:`,`at:put:`
testing	`isEmpty`,`includes:`,`contains:`,`occurrencesOf:`
adding	`add:`,`addAll:`
removing	`remove:`,`remove:ifAbsent:`,`removeAll:`
enumerating	`do:`,`collect:`,`select:`,`reject:`,`detect`,`detect:ifNone:`,`inject:into:`
converting	`asBag`,`asSet`,`asOrderedCollection`,`asSortedCollection`,`asArray`,`asSortedCollection:`
creating	`with:`,`with:with:`,`with:with:with:`,`with:with:with:with:`,`withAll:`

14.2 Collection的种类

除了这种基本的一致性之外，还有许多不同类型的集合，它们要么支持不同的协议，要么为相同的请求提供不同的行为。我们来简要地观察一些关键的区别:

序列: SequenceableCollection的所有子类的实例从第一个元素开始，按照一定的顺序一直到最后一个元素。另一方面，Set, Bag, Dictionary的实例是不可排序的。
可排序: SortedCollection以某种排序方式维护其元素的顺序。
可索引：大多数可序列化的Collection同时也是可索引的，也就是说，其元素可以通过消息at: anIndex 来检索。数组是一种常见的具有固定大小的可索引数据结构；anArray at: n获取第n个元素，anArray at: n put: v，将第n个元素修改为v.LinkedList是序列，但是不可索引的，也就是说，它们可以理解first和last，但是不能理解at:消息。
Keyed: Dictionary和它的子类的实例通过键而不是索引来访问。
可变性：大多数Collection是可变的，但Interval和Symbol不是。Interval表示一个整数范围。例如：5 to: 16 by: 2是一个整数范围，元素包括 5, 7, 9, 11, 13, 15. 可以通过at:来访问其元素，但是不能用at:put:来修改。
可增长性： Interval和Array的实例是固定大小的。其类型的Collection（sorted collection, ordered collection, linked list）可以动态增长。OrderedCollection比Array类更为通用；OrderedCollection的大小随着需求增长，它定义了消息addFirst:和addLast:，以及消息at:, at:put:。
接受副本：Set可以过滤掉重复项，但Bag不行。Directionary, Set和Bag使用元素提供的=方法，这些类的Identity变种使用==方法，测试参数是否为同一个对象，Pluggable变种使用Collection创建者提供的任意等价关系。
异构：大多数Collection可以包含任意类型的元素。然而，String, CharacterArray或Symbol只包含字符。数组可以保存任意混合的对象，但ByteArray只保存字节。LinkedList被约束为只保存符合Link accessing协议的元素。

14.3 Collection的实现

这些功能分类不是我们唯一需要关心的，我们还必须考虑如何实现collection类。如图 11-2 所示，主要采用五种技术。

Arrayed Implementation	Ordered Implementation	Hashed Implementation	Linked Implementation	Interval Implementation
Array String Symbol	OrderedCollection SortedCollection Text Heap	Set IdentitySet PluggableSet Bag IdentityBag Dictionary IdentityDictionary PluggableDictionary	LinkedList SkipList	Interval

Array 将元素存储在Collection对象本身的（可索引的）实例变量中；因此，数组必须具有固定的大小，但可以使用单一的内存分配来创建数组。
OrderedCollection和SortedCollection将它们的元素存储在一个数组中，该数组由collection的一个实例变量引用。因此，如果内元素数量超过了其存储容量，可以使用更大的数组替换内部数组。
Set和Dictionary的各个变种也引用一个附属数组进行存储，但将该数组用作哈希表。Bag使用一个附属的Dictionary，其中Bag的元素作为键，出现的次数为值。
LinkedList使用典型的单向链表来表示
Interval 由三个整数表示，它们记录了起点，终点和步长。

除了这些类以外，还有Array,Set和各种字典（Dictionary）的弱变体。这些collection以弱方式保留其元素，即不阻止元素被GC。Pharo知道这些类，并对它们专门处理。

14.4 关键类的例子

现在，我们使用简单的代码示例展示最常见或最重要的collection类。collection的主要协议是：

消息at:,at:put: - 访问元素
消息add:,remove: - 增加或删除元素
消息size,isEmpty,includes: - 获取一些信息
消息 do:,collect:,select: - 在collection上迭代

每一个collection可能实现（也可能不实现）这样的协议。当它们实现时，它们解释这些协议以适应其语义。我们建议你浏览类本身，以识别特定的和更高级的协议。

我们将关注最常见的collection类：OrderedCollection, Set, SortedCollection, Dictionary, Interval, Array.

14.5 通用的创建协议

有几种方法可以创建collection的实例。最通用的做法是使用消息new: aSize和with: anElement。

new: anInteger 创建一个大小为anInteger的collection，初始元素为nil。
with: anObject 创建一个初始元素包含anObject的collection.

不同的collection将以不同的方式实现这些行为。

你可以重复使用with:创建最多6个初始元素的collection.

Array with: 1
>>> #(1)

Array with: 1 with: 2
>>> #(1 2)

Array with: 1 with: 2 with: 3
>>> #(1 2 3)

Array with: 1 with: 2 with: 3 with: 4 with: 5 with: 6
>>> #(1 2 3 4 5 6)

你也可以使用addAll： aCol将一个collection的所有元素添加到另外一个collection中：

(1 to: 5) asOrderedCollection addAll: '678'; yourself
>>> an OrderedCollection(1 2 3 4 5 $6 $7 $8)

注意：addAll:返回它的参数，而不是接收者！

你还可以使用withAll:消息创建多种collection:

Array withAll: #(7 3 1 3)
>>> #(7 3 1 3)

OrderedCollection withAll: #(7 3 1 3)
>>> an OrderedCollection(7 3 1 3)

SortedCollection withAll: #(7 3 1 3)
>>> a SortedCollection(7 3 1 3)

Set withAll: #(7 3 1 3)
>>> a Set(7 1 3)

Bag withAll: #(7 3 1 3)
>>> a Bag(7 1 3 3)

14.6 数组

数组是由整数索引访问的固定大小的元素集合。与C语言不同，数组的第一个元素索引是1而不是0.访问数组的主要协议是at:, at:put:方法。

数组是固定大小的collection,因此不能在数组末尾添加或删除元素。下面的代码创建了一个大小为5的数组，将值放入前3个位置并返回批一个元素。

| anArray |
anArray := Array new: 5.
anArray at: 1 put: 4.
anArray at: 2 put: 3/2.
anArray at: 3 put: 'ssss'.
anArray at: 1
>>> 4

有几种方法可以创建Array的实例。我们可以使用：

new:, with:
#() 字面量数组
{ . } 动态紧凑语法

通过`new:`创建

消息new: anInteger创建一个大小为anInteger的数组。Array new: 5创建一个大小为5的数组。初始元素为nil。

通过`with:`创建

with: *消息允许指定元素的值。下面的代码创建了一个包含3个元素的数组，其中包括数字4,分数3/2和字符串lulu。

Array with: 4 with: 3/2 with: 'lulu'
>>> { 4 . (3/2) . 'lulu' }

通过`#()`创建字面量数组

表达式#()创建带有常量或字面量元素的字面量数组，这些常量或字面量元素在编译表达式时必须已知，而在执行时必须已知。下面的代码创建了一个大小为2的数组，其中第一个元素是(字面量)数字1，第二个元素是(字面量)字符串here。

#(1 'here') size
>>> 2

现在，如果你执行表达式#(1+2)，你得到的不是只有一个元素3的数组，而是数组#(1 #+ 2)，也就是说，有三个元素：1, 符号#+和数字2。

#(1+2)
>>> #(1 #+ 2)

这是因为构造#()不执行它包含的表达式。元素只是在解析表达式时创建的对象（称为字面量对象）。扫描表达式并将得到的元素提供给一个新数组。字面量数组包含数字、nil,true,false，符号，字符串和其他字面量数组。在#()表达式的执行过程中，没有发送任何消息。

通过`{ . }`动态创建

最后，你可以使用{ . }构造动态创建一个数组。表达式{ a . b }完全等同于Array with: a with: b。这意味着，执行表达式{} 与 #()是相反的。

[译注：个人体会，#()语法相当于在Lisp中用quote构造列表，而{}语法相当于Lisp中用反引用构造列表]

{ 1 + 2 }
>>> #(3)

{(1/2) asFloat} at: 1
>>> 0.5

{10 atRandom. 1/3} at: 2
>>> (1/3)

访问元素

所有可序列化的collection的元素都可以通过消息at: anIndex和at: anIndex put: anObject访问。

| anArray |
anArray := #(1 2 3 4 5 6) copy.
anArray at: 3 
>>> 3
anArray at: 3 put: 33.
anArray at: 3
>>> 33

注意：一般原则是不能修改字面量数组！字面量数组保存在已编译方法的字面量帧（用于存储程序中出现的字面量的空间）中，因此，除非复制数组，否则第二次执行代码时，字面量数组可能不是预期的值。在这个例子中，没有复制数组，在第二次循环中，字面量#(1 2 3 4 5 6)实际上是#(1 2 33 4 5 6)！动态数组没有这个问题，因为它们不是存储在字面量帧中。

14.7 OrderedCollection

OrderedCollection是可以增长的collection之一，元素可以按顺序添加到其中。它提供各种消息，如add:, addFirst:, addLast:和andAll:。

| ordCol |
ordCol := OrderedCollection new.
ordCol add: 'Seaside'; add: 'SmalltalkHub'; addFirst: 'GitHub'.
ordCol
>>> an OrderedCollection('GitHub' 'Seaside' 'SmalltalkHub)

移除元素

消息remove： anObject从collection中删除第一个出现的指定对象。如果collection中不包含该对象，则会引发错误。

ordCol add: 'GitHub'.
ordCol remove: 'GitHub'.
ordCol
>>> an OrderedCollection('Seaside' 'SmalltalkHub' 'GitHub')

还有一个变种叫做remove:ifAbsent:，第二个参数指定了当要删除的元素不在collection中时要执行的块。

result := ordCol remove: 'zork' ifAbsent: [ 33 ].
result
>>> 33

转换

通过给Array(或其它类型的collection)对象发送消息asOrderedCollection，可以将它转换为OrderedCollection：

#(1 2 3) asOrderedCollection
>>> an OrderedCollection(1 2 3)

'hello' asOrderedCollection
>>> an OrderedCollection($h $e $l $l $o)

14.8 Interval(区间)

Interval类表示数字的范围。例如，数字1-100之间的间隔定义如下：

Interval from: 1 to: 100
>>> (1 to: 100)

printString的结果显示Number类为我们提供了一个方便的方法to:来生成一个区间：

(Interval from: 1 to: 100) = (1 to: 100)
>>> true

我们可以使用Interval class >> from:to:by:或Number>>to:by:来指定两个数字之间的步长：

(Interval from: 1 to: 100 by: 0.5) size
>>> 199

(1 to: 100 by: 0.5) at: 198
>>> 99.5

(1/2 to: 54/7 by: 1/3) last
>>> (15/2)

14.9 Dictionary(字典)

字典是最重要的Collection,字典使用键来访问其元素。在最常用的字典消息包括at: aKey,at: aKey put: aValue,at: aKey ifAbsent: aBlock, keys, values.

| colors |
colors := Dictionary new.
colors at: #yellow put: Color yellow.
colors at: #blue put: Color blue.
colors at: #red put: Color red.
colors at: #yellow
>>> Color yellow

colors keys
>>> #(#red #blue #yellow)

colors values
>>> {Color red . Color blue . Color yellow}

字典通过相等来比较键，如果两个键在使用=比较时返回true，则认为它们是相同的。一个常见且难以发现的错误是将一个对象用作键，而该对象的=方法已经被重写，但是其hash方法没有重写。这两个方法在Dictionary实现中都被用来比较对象。

Dictionary可以被视为由一组键值对(通过->方法创建)构成的。我们可以从一组键值对的集合创建字典，也可以将字典转换为一个关联数组。

| colors |
colors := Dictionary newFrom: { #blue -> Color blue . #red -> Color red
          . #yellow -> Color yellow }.
colors removeKey: #blue.
colors associations
>>> {#yellow->Color yellow. #red->Color red}

14.10 IdentityDictionary

字典使用=和hash来比较两个键是否是相同的，而IdentityDictionary类使用身份(使用消息==)而不是值来判断一个键。也就是说，只有在两个键是同一个对象的情况下，它才认为它们是相等的。

通常使用符号Symbol作为键，在这种情况下，使用IdentityDictionary是很自然的选择，因为符号是全局唯一的。另一方面，如果你要使用字符串作为键，你最好使用普通的Dictionary,否则你可能会遇到麻烦：

a := 'footbar'.
b := a copy.
trouble := IdentityDictionary new.
trouble at: a put: 'a'; at: b put: 'b'.
trouble at: a
>>> 'a'

trouble at: b
>>> 'b'

trouble at: 'footbar'
>>> 'a'

由于a和b是不同的对象，它们被视为不同的对象。有趣的是，字面量'footbar'仅仅被用来赋值了一次，所有实际上它与a是同一个对象。你不会希望你的代码依赖于这样的行为。普通的字典会为任何与footbar的键提供相同的值。

使用全局唯一的对象（如Symbol或SmallInteger）作为IdentityDictionary的键，使用String（或其它对象）作为普通的Dictionary的键。

IdentityDictionary示例

表达式Smalltalk globals返回一个SystemDictionary的实例，它是IdentityDictionary的子类，因此它的所有键都是ByteSymbol（ByteSymbol是Symbol的子类）

Smalltalk globals keys collect: [ :each | each class ] as: Set
>>> a Set(ByteSymbol)

14.11 Set(集合)

Set类是一个行为类似于数学中的集合的collection,也就是说，它是一个没有重复元素，且无序的collection. 在Set中，元素是通过消息add:进行添加的，它们不能通过消息at:进行访问。放在集合中的对象应该实现hash和=方法。

s := Set new.
s add: 4/2; add: 4; add: 2.
s size
>>> 2

你也可以使用Set class >> newFrom:或者转换消息Collection >> asSet创建集合：

(Set newFrom: #(1 2 3 1 4)) = #(1 2 3 4 3 2 1) asSet
>>> true

asSet为我们提供了一种方便的方法来消除collection中的重复项：

{ Color black . Color white. (Color red + Color blue + Color green) } asSet size
>>> 2

注意： red + blue + green = white

Bag和Set很像，只是它允许重复：

{ Color black. Color white. (Color red + Color blue + (Color green))}
    asBag size
>>> 3

集合操作的并集、交集和成员关系测试由Collection的消息union:, intersection:，和includes:实现。接收者首先被转换为集合，所以这些操作适用于所有类型的collection.

(1 to: 6) union: (4 to: 10)
>>> a Set(1 2 3 4 5 6 7 8 9 10)

'hello' intersection: 'there'
>>> 'eh'

#Pharo includes: $a
>>> treu

正如我们下面解释的，集合的元素是勇冠迭代器访问的。

14.12 SortedCollection

和OrderedCollection相反，SortedCollection维护着其元素的排序。默认情况下，SortedCollection使用消息<=来进行排序，因此它可以对抽象类Magnitude的子类进行排序，因为它们实现了可比较对象的协议(<,=,>,>=,between:and:...)

你可以创建一个SortedCollextion新实现，然后将元素添加到其中：

SortedCollection new add: 5; add: 2; add: 50; add: -10; yourself.
>>> a SortedCollection(-10 2 5 50)

但是，更常用的方法是，通过asSortedCollection将已有的其它collection对象转换为SortedCollection对象：

#(5 2 50 -10) asSortedCollection
>>> a SortedCollection(-10 2 5 50)

'hello' asSortedCollection
>>> a SortedCollection($e $h $l $l $o)

你如何将这个结果转换回字符串？很不幸，asString返回的是printString表示，这不是我们想要的：

'hello' asSortedCollection asString
>>> 'a SortedCollection($e $h $l $l $o)'

正确的做法是使用String的类方法String class >> newFrom:， String class >> withAll:, 或者是Object >> as:

'hello' asSortedCollection as: String
>>> 'ehllo'

String newFrom: 'hello' asSortedCollection
>>> 'ehllo'

String withAll: 'hello' asSortedCollection
>>> 'ehllo'

SortedCollection中可以有不同类型的元素，只要它们都是可比较的。例如，我们可以混合不同类型的数字，如整数，浮点数和分数：

{ 5 . 2/ -3 . 5.21 } asSortedCollection
>>> a SortedCollection((-2/3) 5 5.21)

如果你希望对没有实现<=方法的对象进行排序，或者希望使用不同的排序标准。可以通过向SortedCollection提供两个block作为参数（称为sortblock）来实现。例如，Color类不是可比较对象，它没有实现<=方法，但是我们可以指定一个block,声明颜色应该根据它们的亮度进行排序:

col := SortedCollection
         sortBlock: [ :c1 :c2 | c1 luminace <= c2 luminace ].
col addAll: { Color red . Color yellow . Color white . Color black }.
col
>>> a SortedCollection(Color black Color red Color yellow Color white)

14.13 字符串

在Pharo中，String是Character的集合。它是可排序的，可索引的，可变的和同构的，只包含Character的实例。与数组一样，String也有专门的语法，通常通过在单引号中直接指定String字面量来创建，但通常的collection创建方法也可以。

'Hello'
>>> 'Hello'

String with: $A
>>> 'A'

String with: $h with: $i with: $!
>>> 'hi!'

String newFrom: #($h $e $l $l $o)
>>> 'hello'

实际上，String是抽象的。当我们实例化一个String时，我们实际上得到的是一个8位的ByteString或一个32位的WideString。为了简单起见，我们通常忽略两者的区别，只讨论String的实例。

虽然字符串由单引号分隔，但字符串可以包含单引号：要定义一个带单引号的字符串，我们应该输入单引号两次。注意，该字符串只包含一个元素，而不是两个，如下所示：

'l''idiot' at: 2
>>> $'

'l''idiot' at: 3
>>> $i

消息,连接两个String的实例。这些消息可按如下方式连接：

s := 'no', ' ', 'worries'.
s
>>> 'no worries'

由于字符串是一个可变的collection,我们也可以使用消息at:put:来更改它。从设计的角度来看，最好避免对字符串的修改，因为字符串通常在方法执行过程中被共享。

s at: 4 put: $h; at: 5 put: $u.
s
>>> 'no hurries'

注意，逗号方法是由Collection所定义的，因此，它可以适用于任意类型的collection!

(1 to: 3), '45'
>>> #(1 2 3 $4 $5)

我们还可以使用replaceAll:with:或replaceFrom:to:with修改现有字符串，如下所示。注意字符的数量和区间应该有相同的大小。

s replaceAll: $n with: $N.
s
>>> 'No hurries'

s replaceFrom: 4 to: 5 with: 'wo'.
s
>>> 'No worries'

与上面描述的方法相比，方法copyReplaceAll:会创建一个新字符串。（奇怪的是，这里的参数是子字符串而不是单个的字符，它们的大小可以不匹配。）

s copyReplaceAll: 'rries' with: 'mbats'
>>> 'No wombats'

快速浏览一下这些方法就会发现，它们不仅仅是为String而定义的，它们是为任意类型的SequenceableCollection定义的，因此，下面的代码也可以工作：

(1 to: 6) copyReplaceAll: (3 to: 5) with: { 'three' . 'etc.' }
>>> #(1 2 'three' 'etc.' 6)

字符串匹配

可以通过给一个模式发送消息match:来询问该模式是否与给定的字符串相匹配。模式可以使用*来匹配任意序列的字符，使用#匹配单个字符。注意，match:消息被发送给模式，而不是要匹配的字符串。

'Linux *' match: 'Linux mag'
>>> true

'GNU#Linux #ag' match: 'GNU/Linux tag'
>>> true

Regex包中还提供了更高级的模式匹配工具。

子字符串

对于子字符串操作，我们可以使用在SequenceableCollection中定义的first,fitst:,allButFirst,copyFrom:to:及其它方法。

'alphabet' at: 6
>>> $b

'alphabet' first
>>> $a

'alphabet' first: 5
>>> 'alpha'

'alphabet' allButFirst: 3
>>> 'habet'

'alphabet' copyFrom: 5 to: 7
>>> 'abe'

'alphabet' copyFrom: 3 to: 3
>>> 'p'       "not $p"

请注意，结果类型可能不同，这取决于所使用的方法。大多数与子字符串相关的方法都返回String实例。但是某些方法总是返回单个的字符。例如：alphabet at: 6返回字符$b。有关字符串相关消息的完整列表，请浏览SequenceableCollection类（特别是accessing协议）。

与字符串有关的一些谓词

下面的例子演示了isEmpty,includes:和anySatisfy:的使用。这些消息并不仅限于字符串，而是可用于更广泛的collection.

'Hello' isEmpty
>>> false

'Hello' includes: $a
>>> false

'JOE' anySatisfy: [ :c | c isLowercase ]
>>> false

'Joe' anySatisfy: [ :c | c isLowercase ]
>>> true

字符串模板

有三个消息对管理字符串模板很有用：format:, expandMacros和expandMacrosWith:：

'{1} is {2}' format: {'Pharo' . 'cool'}
>>> 'Pharo is cool'

expandMacros系列的消息提供变量替换，使用<n>表示回车，<t>表示tab, <1s>, <2s>, <3s>作为参数(<1p>, <2p>会将字符串包裹上单引号),<1?value1:value2>作为条件。 [注：和Lisp的format很相似。<1s>相当于~a, <1p>相当于~s。]

'look-<t>-here' expandMacros
>>> 'look-    -here'

'<1s> is <2s>' expandMacrosWith: 'Pharo' with: 'cool'
>>> 'Pharo is cool'

'<2s> is <1s>' expandMacrosWith: 'Pharo' with: 'cool'
>>> 'cool is Pharo'

'<1p> or <1s>' expandMacrosWith: 'Pharo' with: 'cool'
>>> '''Pharo'' or Pharo'

'<1?Quentin:Thibaut> plays' expandMacrosWith: true
>>> 'Quentin plays'

'<1?Quentin:Thibaut> plays' expandMacrosWith: false
>>> 'Thibaut plays'

一些实用的方法

String类提供了许多实用工具，包括asLowercase, asUppercase和capitalized。

'XYZ' asLowercase
>>> 'xyz'

'xyz' asUppercase
>>> 'XYZ'

'tintin' capitalized
>>> 'Tintin'

'Tintin' uncapitalized
>>> 'tintin'

'1.54' asNumber
>>> 1.54

'this sentence is without a doubt far too long' contractTo: 20
>>> 'this sent...too long'

asString vs. printString

请注意，通过发送消息printString请求对象的字符串表示和通过发送消息asString将其转换为字符串之间通常是有区别的。这里有一个不同的例子：

#ASymbol printString
>>> '#ASymbol'

#ASymbol asString
>>> 'ASymbol'

符号类似于字符串，但保证全局唯一。因此，符号比字符串更适合作为字典的键，特别是对于IdentityDictionary的实例。关于字符串和符号的更多信息，请参见[基本类]章节。

14.14 Collection迭代器

在Pharo中，循环和条件分支仅仅是发送给collection或其它对象（比如整数和block）的消息。除了像to:do:这样的低级消息，它计算带有从初始值到最终值的参数的块之外，collection层次结构提供了各种高级迭代器。使用这样的迭代器将使得你的代码更加健壮紧凑。

迭代(do:)

方法do:是基本的collection迭代器。它将它的参数（一个带有单个参数的块）应用到接收器的每一个元素。下面的示例将接收程序中包含的所有字符串打印到记录。

#('bob' 'joe' 'toto') do: [ :each | Transcript show: each; cr ]

变体

do:有很多变体，比如do:without:, doWithIndex:和reverseDo:。

对于可索引的collection(Array,OrderedCollection,SortedCollection)消息doWithIndex:提供了对当前索引的访问。此消息与Number类中定义的to:do:相关。

#('bob' 'joe' 'toto')
    doWithIndex: [ :each :i | (each = 'joe') ifTrue: [ ^ i ] ]
>>> 2

对于有序collection,消息reverseDo:按相反的顺序遍历collection.

下面的代码显示了一个有趣的消息：do:separatedBy:，它只在两个元素之间执行第二个块

| res |
res := ''.
#('bob' 'joe' 'toto')
   do: [ :e | res := res, e ]
   separatedBy: [ res := res, '.' ].
res
>>> 'bob.joe.toto'

注意，这段代码不是很高效，因为它创建了中间字符串，最好使用流来缓冲结果（参见[Stream]章节）

String streamContents: [ :stream |
   #('bob' 'joe' 'toto') asStringOn: stream delimiter: '.' ]
>>> 'bob.joe.toto'

字典

当消息do:被发送给字典时，考虑的元素是值，而不是键值对。要分别对键、值、或键值对进行迭代，正确的消息是:keysDo:, valuesDo:, associationsDo:。

colors := Dictionary newFrom: { #yellow -> Color yellow. #blue ->
    Color blue. #red -> Color red }.
colors keysDo: [ :key | Transcript show: key; cr ].
colors valuesDo: [ :value | Transcript show: value; cr ].
colors associationsDo: [ :value | Transcript show: value; cr ].

14.15 收集结果(collect:)

如果你想对collection中的每一个元素应用一个函数，并得到一个新的collection, 你应当使用collect:或其他迭代器而不是do:。其中大部分可以在Collection及其子类的enumerating协议中找到。

假设我们想要从一个collection生成一个新的collection, 其元素是原先的2倍。如果使用do:，我们不得不写成这样：

| double |
double := OrderedCollection new.
#(1 2 3 4 5 6) do: [ :e | double add: 2 * e ].
double
>>> an OrderedCollection(2 4 6 8 10 12)

消息collect:在每一个元素上执行参数块，并返回一个包含结果的新collection。使用这个函数，代码就简单多了：

#(1 2 3 4 5 6) collect: [ :e | 2 * e ]
>>> #(2 4 6 8 10 12)

在下面的例子中，collect:相对于do:的优点更为重要。我们接受一个整数的collection, 然后生成一个新的collection, 其中包含原始元素的绝对值。

aCol := #(2 -3 4 -35 4 11).
result := aCol species new: aCol size.
1 to: aCol size do: [ :each |
  result at: each put: (aCol at: each) abs ].
result
>>> #(2 3 4 35 4 11)

对比一下使用collect:的表达式：

#(2 -3 4 35 4 11) collect: [ :each | each abs ]
>>> #(2 3 4 35 4 11)

第二种方案的另一个优点是，它也适用于Set和Bag。通常你应该避免使用do:，除非你想把消息发送给collection中的每个元素。

请注意，发送消息collect:将返回与接收方相同的collection类型。因此，下面的代码会失败（String不能保存整数值）

'abc' collect: [ :ea | ea asciiValue ]
>>> "error!"

相反，我们必须首先将字符串转换为Array或OrderedCollection：

'abc' asArray collect: [ :ea | ea asciiValue ]
>>> #(97 98 99)

实际上，collect:不保证返回与接收者完全相同的collection, 而是返回相同的"species"。在接收者是Interval的情况下，species就是数组！

(1 to: 5) collect: [ :ea | ea * 2 ]
>>> #(2 4 6 8 10)

14.16 选择和排除元素

消息select:返回接收者中满足特定条件的元素：

(2 to: 20) select: [ :each | each isPrime ]
>>> #(2 3 5 7 11 13 17 19)

消息reject:则相反：

(2 to: 20) reject: [ :each | each isPrime ]
>>> #(4 6 8 9 10 12 14 15 16 18 20)

使用`detect:`标识元素

消息detect:返回接收者中与参数block匹配的第一个元素。

'through' detect: [ :each | each isVowel ]
>>> $o

消息detect:ifNone:是它的变体，当没有匹配的元素时，对它的第二个block进行求值.

Smalltalk globals allClasses
    detect: [ :each | '*cobol*' match: each asString]
    ifNone: [ nil ]
>>> nil

使用`inject:into:`累积结果

函数式编程语言通常提供名为fold或reduce的高阶函数，通过对集合中的所有元素迭代应用某种二元运算符来累积结果。在Pharo中，这是通过Collection >> inject:into:来完成的。

它的第一个参数是一个初始值，第二个参数是一个拥有两个参数的block, 它被依次应用到中间结果和每一个元素。

(1 to: 100) inject: 0 into: [ :sum :each | sum + each ]
>>> 5050

另一个例子是下面的单参数block,它计算阶乘：

factorial := [ :n |
  (1 to n)
    inject: 1
    into: [ :product :each | product * each ] ].
factorial value: 10
>>> 3628800

14.17 其它高阶函数

还有许多其他迭代器消息，你可以检查Collection类。这里精选几个作介绍：

count: 该消息返回满足条件的元素数量。条件表达式为布尔block.

Smalltalk globals allClasses
    count: [ :each | 'Collection*' match: each asString ]
>>> 10

includes: 该消息检查参数是否被包含在collection中

| colors |
colors := {Color white . Color yellow . Color blue . Color orange}.
colors includes: Color blue.
>>> true

anySatisfy: 如果collection中至少有一个元素满足参数所表示的条件，则返回true

colors anySatisfy: [ :c | c red > 0.5 ]
>>> true

14.18 常见错误：使用`add:`的结果

下面的错误是Smalltalk中最常见的错误之一。

| collection |
collection := OrderedCollection new add: 1; add: 2.
collection
>>> 2

这里，变量collection保存的不是新创建的collection, 而是最后添加的p娄子。这是因为add:方法返回添加的元素，而不是接收者。

下面的代码会生预期的结果：

| collection |
collection := OrderedCollection new.
collection add: 1 add: 2.
collection
>>> an OrderedCollection(1 2)

你可以使用yourself消息来返回接收者：

| collection |
collection := OrderedCollection new add: 1; add: 2; yourself
>>> an OrderedCollection(1 2)

14.19 常见错误：迭代时删除元素

你可能会犯的另一个错误是从当前正在迭代的collection中删除一个元素。虽然会产生bug, 但是这样的错误很难发现，因为迭代顺序可能会根据collection的存储策略而改变。

| range |
range := (2 to: 20) asOrderedCollection.
range do: [ :aNumber | aNumber isPrime
                         ifFalse: [ range remove: aNumber ]].
range
>>> "error!"

解决方案是在遍历collection之前拷贝它：

| range |
range := (2 to: 20) asOrderedCollection.
range copy do: [ :aNumber |
                   aNumber isPrime
                     ifFalse: [ range remove: aNumber ]].
range
>>> an OrderedCollection(2 3 5 7 11 13 17 19)

14.20 常见错误：没有同时重新定义`=`和`hash`

一个很难发现的错误是当你重新定义了=方法，而hash方法没有同步地重新定义时。症状是你会丢失你放入collection中的元素或其他奇怪的行为。Kent Beck提出的一个解决方案是使用bitXor:重新定义hash。假设我们希望两本书的标题和作者相同，就认为它们是相等的。我们可以像下面这样重新定义=和hash：

Book >> = aBook
    self class = aBook class ifFalse: [ ^ false ].
    ^ title = aBook title and: [ authors = aBook authors ]
    
Book >> hash
    ^ title hash bitXor: authors hash

如果你使用一个可变对象，也就是一个可以随时改变其哈希值的对象，作为Set的元素或Dictionary的键，就会出现另一个严重的问题。除非你喜欢调试，否则千万不要这样做！

14.21 本章总结

collection层次结构为统一操作各种不同类型的集合提供了通用的词汇表。

一个关键的区别在于：SequenceableCollection以给定的顺序维护其元素，Dictionary及其子类维护“键”与“值”的关联，Set和Bag则是无序的。
你可以通过消息asArray, asOrderedCollection等将大多数collection类型的对象转换为另一个collection.
要对一个collection排序，请向它发送消息asSortedCollection
#( ... )创建只包含字面量对象的数组（即不发送消息就创建的对象）; { ... }使用紧凑形式创建动态数组。
Dictionary通过相等来比较键。当键是String的实例时，它是最有用的。而IdentityDictionary则使用对象标识来比较键。当使用符号作为键或将对象引用映射到值时，它更适合。
字符串还可以理解通常的collection消息。此外，String还支持一种简单的模式匹配形式。对于更高级的应用程序，请查看RegEx包。
基本的迭代消息是do:。它对于命令式代码非常有用，例如修改collection的每一个元素，或向每个元素发送消息。
使用collect:,select:,reject:,includes:,inject:into:和其他更高级的消息来以统一的方式处理collection, 而不是使用do。
永远不要从正在迭代的collection中删除元素。如果需要，则拷贝一个副本，在副本上迭代。
如果你重写了=，一定要记住重写hash！

Pharo By Example - 第十四章 集合（Collections）