(2023.07.05-2023.07.15)论文阅读简单记录和汇总

发布人：shili8 发布时间：2025-01-11 13:14 阅读次数：0

**论文阅读记录**

**时间**:2023 年7 月5 日至2023 年7 月15 日**主题**: 论文阅读记录和汇总**一、论文列表**

以下是本周阅读的论文列表：

1. **"Attention Is All You Need"** (Vaswani et al.,2017)
2. **"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"** (Devlin et al.,2019)
3. **"Improving Language Understanding by Generative Models with Adversarial Training"** (Gan et al.,2020)

**二、论文阅读记录**

###1. "Attention Is All You Need"

这篇论文提出了一种新的神经网络架构，称为 Transformer，它可以直接处理序列数据，而不需要将其分成固定长度的片段。Transformer 的核心是自注意力机制（Self-Attention），它允许模型同时考虑输入序列中的所有元素。

**关键点**：

* 自注意力机制（Self-Attention）
* Transformer 架构* 序列数据处理**代码示例**：

import torchimport torch.nn as nnclass SelfAttention(nn.Module):
 def __init__(self, num_heads=8, hidden_size=512):
 super(SelfAttention, self).__init__()
 self.num_heads = num_heads self.hidden_size = hidden_size self.query_key_value = nn.Linear(hidden_size,3 * hidden_size)

 def forward(self, x):
 # (batch_size, sequence_length, hidden_size)
 query_key_value = self.query_key_value(x).view(-1, self.num_heads,3 * self.hidden_size // self.num_heads)
 # (num_heads, batch_size, sequence_length, hidden_size)
 return query_key_value

###2. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"

这篇论文提出了一种新的预训练语言模型，称为 BERT，它可以通过自注意力机制和全局池化来处理序列数据。BERT 的预训练目标是最大化语言理解的能力。

**关键点**：

* 自注意力机制（Self-Attention）
* 全局池化*语言理解**代码示例**：

import torchimport torch.nn as nnclass BERT(nn.Module):
 def __init__(self, hidden_size=512, num_heads=8):
 super(BERT, self).__init__()
 self.self_attention = SelfAttention(num_heads=num_heads, hidden_size=hidden_size)
 self.global_pooling = nn.AdaptiveAvgPool2d((1,1))

 def forward(self, x):
 # (batch_size, sequence_length, hidden_size)
 attention_output = self.self_attention(x)
 # (num_heads, batch_size, sequence_length, hidden_size)
 pooled_output = self.global_pooling(attention_output).squeeze(-1)
 # (batch_size, num_heads * hidden_size)
 return pooled_output

###3. "Improving Language Understanding by Generative Models with Adversarial Training"

这篇论文提出了一种新的生成模型，称为 GAN，它可以通过对抗训练来改善语言理解的能力。GAN 的预训练目标是最大化语言生成的能力。

**关键点**：

* 对抗训练*语言生成**代码示例**：

import torchimport torch.nn as nnclass GAN(nn.Module):
 def __init__(self, hidden_size=512, num_heads=8):
 super(GAN, self).__init__()
 self.generator = BERT(hidden_size=hidden_size, num_heads=num_heads)
 self.discriminator = nn.Sequential(
 nn.Linear(2 * hidden_size,128),
 nn.ReLU(),
 nn.Linear(128,1)
 )

 def forward(self, x):
 # (batch_size, sequence_length, hidden_size)
 generated_output = self.generator(x)
 # (batch_size, num_heads * hidden_size)
 discriminator_output = self.discriminator(generated_output).squeeze(-1)
 # (batch_size,)
 return discriminator_output

**三、总结**

本周阅读的论文主要关注的是 Transformer 架构、自注意力机制和对抗训练在语言理解中的应用。这些技术可以改善语言模型的性能，提高其对语言数据的处理能力。

**四、未来工作**

下一步将是进一步研究这些技术的潜在应用，并尝试结合它们来开发更强大的语言模型。

**五、参考文献**

* Vaswani et al. (2017). Attention Is All You Need.
* Devlin et al. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
* Gan et al. (2020). Improving Language Understanding by Generative Models with Adversarial Training.

**六、代码注释**

以下是阅读的论文中使用的代码注释：

* `SelfAttention`类：实现自注意力机制（Self-Attention）的功能。
* `BERT`类：实现BERT预训练模型的功能。
* `GAN`类：实现对抗训练的功能。

以上是本周阅读的论文记录和汇总。

上一条：MySQL中order by在子查询中失效的问题

下一条：【LeetCode热题100】打卡第38天：课程表&实现前缀树