Help us learn about your current experience with the documentation. Take the survey.

Shell 命令开发指南

本文档包含在 GitLab 代码库中处理进程和文件的指南。这些指南旨在让你的代码更可靠和安全。

参考资料

使用 File 和 FileUtils 而非 shell 命令

有时我们通过 shell 调用基本的 Unix 命令，但实际上也有对应的 Ruby API 可以完成同样的操作。如果存在 Ruby API，请使用它。

# 错误
system "mkdir -p tmp/special/directory"
# 更好（分开令牌）
system *%W(mkdir -p tmp/special/directory)
# 最佳（不使用 shell 命令）
FileUtils.mkdir_p "tmp/special/directory"

# 错误
contents = `cat #{filename}`
# 正确
contents = File.read(filename)

# 有时 shell 命令确实是最佳解决方案。下面的示例没有用户输入，且在 Ruby 中很难正确实现：
# 删除 /some/path 下所有超过 120 分钟的文件和目录，但不包括 /some/path 本身。
Gitlab::Popen.popen(%W(find /some/path -not -path /some/path -mmin +120 -delete))

这种编码风格本可以防止 CVE-2013-4490。

始终使用可配置的 Git 二进制路径执行 Git 命令

# 错误
system(*%W(git branch -d -- #{branch_name}))

# 正确
system(*%W(#{Gitlab.config.git.bin_path} branch -d -- #{branch_name}))

通过将命令拆分为单独的令牌来绕过 shell

当我们将 shell 命令作为单个字符串传递给 Ruby 时，Ruby 会让 /bin/sh 评估整个字符串。本质上，我们是在要求 shell 评估一行脚本。这会造成 shell 注入攻击的风险。最好我们自己将 shell 命令拆分为令牌。有时我们使用 shell 的脚本功能来更改工作目录或设置环境变量。所有这些都可以直接从 Ruby 中安全地实现。

# 错误
system "cd /home/git/gitlab && bundle exec rake db:#{something} RAILS_ENV=production"
# 正确
system({'RAILS_ENV' => 'production'}, *%W(bundle exec rake db:#{something}), chdir: '/home/git/gitlab')

# 错误
system "touch #{myfile}"
# 更好
system "touch", myfile
# 最佳（完全不运行 shell 命令）
FileUtils.touch myfile

这种编码风格本可以防止 CVE-2013-4546。

另请参阅 https://gitlab.com/gitlab-org/gitlab/-/merge_requests/93030 和 https://starlabs.sg/blog/2022/07-gitlab-project-import-rce-analysis-cve-2022-2185/ 了解另一个示例。

使用 – 将选项与参数分开

使用 -- 让系统命令的参数解析器清楚地区分选项和参数。这得到了许多 Unix 命令的支持，但不是全部。

要理解 -- 的作用，请看下面的问题。

# 示例
$ echo hello > -l
$ cat -l

cat: illegal option -- l
usage: cat [-benstuv] [file ...]

在上面的示例中，cat 的参数解析器假设 -l 是一个选项。上面示例中的解决方案是让 cat 明确 -l 确实是一个参数，而不是选项。许多 Unix 命令行工具遵循使用 -- 将选项与参数分开的约定。

# 示例（续）
$ cat -- -l

hello

在 GitLab 代码库中，对于支持 -- 的命令，我们始终使用它来避免选项/参数的歧义。

# 错误
system(*%W(#{Gitlab.config.git.bin_path} branch -d #{branch_name}))
# 正确
system(*%W(#{Gitlab.config.git.bin_path} branch -d -- #{branch_name}))

这种编码风格本可以防止 CVE-2013-4582。

不要使用反引号

使用反引号捕获 shell 命令的输出看起来很直观，但你被迫将命令作为单个字符串传递给 shell。我们上面已经解释过这是不安全的。在主要的 GitLab 代码库中，解决方案是使用 Gitlab::Popen.popen。

# 错误
logs = `cd #{repo_dir} && #{Gitlab.config.git.bin_path} log`
# 正确
logs, exit_status = Gitlab::Popen.popen(%W(#{Gitlab.config.git.bin_path} log), repo_dir)

# 错误
user = `whoami`
# 正确
user, exit_status = Gitlab::Popen.popen(%W(whoami))

在其他仓库中，如 GitLab Shell，你也可以使用 IO.popen。

# 安全的 IO.popen 示例
logs = IO.popen(%W(#{Gitlab.config.git.bin_path} log), chdir: repo_dir) { |p| p.read }

注意与 Gitlab::Popen.popen 不同，IO.popen 不会捕获标准错误。

避免在路径字符串开头使用用户输入

Ruby 中用于打开和读取文件的各种方法可以用来读取进程的标准输出，而不是文件。以下两个命令大致做同样的事情：

`touch /tmp/pawned-by-backticks`
File.read('|touch /tmp/pawned-by-file-read')

关键在于打开一个名称以 | 开头的 ‘文件’。受影响的方法包括 Kernel#open、File::read、File::open、IO::open 和 IO::read。

你可以通过确保攻击者无法控制你正在打开的文件名字符串的开头来保护自己免受 ‘open’ 和 ‘read’ 的这种行为的影响。例如，以下内容足以防止意外地用 | 启动 shell 命令：

# 我们假设 repo_path 不受攻击者（用户）控制
path = File.join(repo_path, user_input)
# 现在路径不能以 '|' 开头。
File.read(path)

如果你必须使用用户提供的相对路径，请在路径前加上 ./。

为用户提供的路径添加前缀也能为以 - 开头的路径提供额外的保护（参见上面关于使用 -- 的讨论）。

防御路径遍历攻击

路径遍历是一种安全漏洞，程序（GitLab）试图限制用户对磁盘上某个目录的访问，但用户利用 ../ 路径记号设法在该目录之外打开文件。

# 假设用户给了我们一个路径，他们试图欺骗我们
user_input = '../other-repo.git/other-file'

# 我们在某个地方查找仓库路径
repo_path = 'repositories/user-repo.git'

# 下面代码的意图是在 repo_path 下打开一个文件，但
# 因为用户使用了 '..' 他们可以 '突破' 到
# 'repositories/other-repo.git'
full_path = File.join(repo_path, user_input)
File.open(full_path) do # 哎呀！

防御这种情况的一个好方法是使用 Ruby 的 File.absolute_path 将完整路径与其 ‘绝对路径’ 进行比较。

full_path = File.join(repo_path, user_input)
if full_path != File.absolute_path(full_path)
  raise "Invalid path: #{full_path.inspect}"
end

File.open(full_path) do # 等等。

这样的检查本可以避免 CVE-2013-4583。

正确地将正则表达式锚定到字符串的开头和结尾

当使用正则表达式验证作为 shell 命令参数传递的用户输入时，请确保使用 \A 和 \z 锚点来指定字符串的开始和结束，而不是 ^ 和 $，或者根本不使用锚点。

如果不这样做，攻击者可能会利用它来执行可能造成有害影响的命令。

例如，当项目的 import_url 如下验证时，用户可能会欺骗 GitLab 从本地文件系统上的 Git 仓库克隆。

validates :import_url, format: { with: URI.regexp(%w(ssh git http https)) }
# URI.regexp(%w(ssh git http https)) 大致评估为 /(ssh|git|http|https):(something_that_looks_like_a_url)/

假设用户提交以下内容作为他们的导入 URL：

file://git:/tmp/lol

由于所使用的正则表达式中没有锚点，值中的 git:/tmp/lol 会匹配，验证会通过。

导入时，GitLab 会执行以下命令，将 import_url 作为参数传递：

git clone file://git:/tmp/lol

Git 会忽略 git: 部分，将路径解释为 file:///tmp/lol，并将仓库导入到新项目中。此操作可能会让攻击者访问系统中的任何仓库，无论是否私有。