学习记录——SAM、SPM

发布人：shili8 发布时间：2025-01-21 17:29 阅读次数：0

**学习记录——SAM、SPM**

在自然语言处理领域，Self-Attention Mechanism（SAM）和Sparse Positional Encoding（SPM）是两个非常重要的概念。它们分别用于Transformer模型中的自注意力机制和位置编码。

### SAM（Self-Attention Mechanism）

SAM 是 Transformer 模型中一个关键组成部分，它允许模型同时考虑序列中的所有元素，而不是简单地从左到右或从右到左扫描。这种机制使得模型能够捕捉长距离依赖关系，提高了其对语言理解的能力。

#### SAM 的工作原理SAM 的核心思想是将输入序列分成多个小块，每个块代表一个位置。然后，对于每个位置，模型计算它与其他所有位置之间的注意力权重。这些权重表示不同位置之间的重要性。

下面是一个简单的例子：

假设我们有一个长度为5 的序列：`[CLS] Hello, world! [SEP]`

SAM 将将其分成两个小块：`[CLS] Hello` 和 `world! [SEP]`

对于每个位置，我们计算它与其他所有位置之间的注意力权重：

* `[CLS]` 与 `[CLS]` 之间的注意力权重为1* `[CLS]` 与 `Hello` 之间的注意力权重为0.5* `[CLS]` 与 `world!` 之间的注意力权重为0.2* `[CLS]` 与 `[SEP]` 之间的注意力权重为0.1同样，对于 `Hello`、`world!` 和 `[SEP]` 的位置，我们也计算它们与其他所有位置之间的注意力权重。

#### SAM 的实现代码

import torchimport torch.nn as nnclass SelfAttention(nn.Module):
 def __init__(self, num_heads=8, hidden_size=512):
 super(SelfAttention, self).__init__()
 self.num_heads = num_heads self.hidden_size = hidden_size self.query_key_value = nn.Linear(hidden_size,3 * hidden_size)
 self.scale = torch.sqrt(torch.tensor(1.0 / num_heads))

 def forward(self, x):
 batch_size, seq_len, _ = x.size()
 query_key_value = self.query_key_value(x).view(batch_size, seq_len, self.num_heads, -1)
 attention_weights = torch.matmul(query_key_value.permute(0,2,1,3), query_key_value.permute(0,2,1,3)).squeeze(-1) / self.scale context_vector = torch.matmul(query_key_value.permute(0,2,1,3), attention_weights).permute(0,2,1)
 return context_vector# 初始化 SelfAttention 模型self_attention_model = SelfAttention(num_heads=8, hidden_size=512)

# 前向传播input_tensor = torch.randn(1,5,512) # (batch_size, seq_len, hidden_size)
output_tensor = self_attention_model(input_tensor)
print(output_tensor.shape) # (1,5,512)

### SPM（Sparse Positional Encoding）

SPM 是一种用于位置编码的方法，它通过将位置信息表示为稀疏向量来实现。这种方法可以有效地捕捉序列中的位置依赖关系。

#### SPM 的工作原理SPM 将位置信息表示为一系列稀疏向量，每个向量代表一个位置。这些向量的非零元素表示该位置与其他位置之间的重要性。

下面是一个简单的例子：

假设我们有一个长度为5 的序列：`[CLS] Hello, world! [SEP]`

SPM 将将其分成五个小块，每个块代表一个位置。每个位置对应一个稀疏向量：

* `[CLS]` 对应的向量是 `[1,0,0,0,0]`
* `Hello` 对应的向量是 `[0,1,0,0,0]`
* `world!` 对应的向量是 `[0,0,1,0,0]`
* `[SEP]` 对应的向量是 `[0,0,0,1,0]`

#### SPM 的实现代码

import torchimport torch.nn as nnclass SparsePositionalEncoding(nn.Module):
 def __init__(self, num_positions=512):
 super(SparsePositionalEncoding, self).__init__()
 self.num_positions = num_positions self.position_encoding = nn.Embedding(num_positions +1, num_positions)

 def forward(self, x):
 batch_size, seq_len = x.size()
 position_encoding = self.position_encoding(torch.arange(seq_len).to(x.device))
 return position_encoding# 初始化 SparsePositionalEncoding 模型sparse_positional_encoding_model = SparsePositionalEncoding(num_positions=512)

# 前向传播input_tensor = torch.randn(1,5) # (batch_size, seq_len)
output_tensor = sparse_positional_encoding_model(input_tensor)
print(output_tensor.shape) # (5,512)

在本文中，我们分别介绍了SAM和SPM的概念、工作原理以及实现代码。这些方法可以有效地捕捉序列中的位置依赖关系和注意力权重，提高模型对语言理解的能力。

上一条：TC网络带宽控制（包含与ebpf结合方案）

下一条：EXISTS和IN的区别