Help us learn about your current experience with the documentation. Take the survey.

指标检测指南

本指南介绍如何使用指标检测来开发 Service Ping 指标。

视频教程请参见通过检测类添加 Service Ping 指标。

术语

检测类：
- 继承以下指标类之一：DatabaseMetric、NumbersMetric 或 GenericMetric。
- 实现计算 Service Ping 指标值的逻辑。
指标定义 Service 数据指标的 YAML 定义。
加固：方法加固是指确保方法安全失败，返回 -1 等回退值的过程。

指标检测的工作原理

所有指标都必须有相应的指标定义，才能包含在 service ping 载荷中。指标定义可以包含 instrumentation_class 字段，该字段可以设置为一个类。

定义的检测类应该继承现有的指标类之一：DatabaseMetric、NumbersMetric 或 GenericMetric。

目前的惯例是一个检测类对应一个指标。

使用检测类可以确保指标能够独立安全失败，而不会中断整个 Service Ping 生成过程。

数据库指标

在可能的情况下，我们建议使用内部事件跟踪而不是数据库指标。数据库指标可能会给大型 GitLab 实例的数据库带来不必要的负载，潜在的优化可能会影响实例性能。

您可以使用数据库指标来跟踪数据库中保存的数据，例如某个实例上存在的 issue 数量。

operation：对给定 relation 的操作，可以是 count、distinct_count、sum 或 average 之一。
relation：分配一个 lambda，返回我们要执行 operation 的对象的 ActiveRecord::Relation。分配的 lambda 最多可以接受一个参数。该参数会被哈希并存储在指标定义的 options 键下。
start：指定批量计数的起始值，默认为 relation.minimum(:id)。
finish：指定批量计数的结束值，默认为 relation.maximum(:id)。
cache_start_and_finish_as：指定 start 和 finish 值的缓存键，并设置缓存它们。当 start 和 finish 是需要在不同指标计算之间重用的昂贵查询时，使用此调用。
available?：指定是否应报告该指标。默认为 true。
timestamp_column：可选地指定用于时间限制指标过滤记录的时间戳列。默认为 created_at。

添加数据库指标的合并请求示例。

优化建议和示例

Service Ping 指标的任何单个查询在冷缓存下必须保持在 1 秒执行时间以内。

使用专用索引。示例请参见以下合并请求：
- 示例 1
- 示例 2
使用定义的 start 和 finish。这些值可以被记忆化并重用，如本示例合并请求所示。
避免在查询中使用连接和不必要的复杂性。请参阅此示例合并请求。
为 distinct_count 设置自定义的 batch_size，如本示例合并请求所示。

数据库指标示例

计数示例

module Gitlab
  module Usage
    module Metrics
      module Instrumentations
        class CountIssuesMetric < DatabaseMetric
          operation :count

          relation ->(options) { Issue.where(confidential: options[:confidential]) }
        end
      end
    end
  end
end

批量计数器示例

module Gitlab
  module Usage
    module Metrics
      module Instrumentations
        class CountIssuesMetric < DatabaseMetric
          operation :count

          start { Issue.minimum(:id) }
          finish { Issue.maximum(:id) }

          relation { Issue }
        end
      end
    end
  end
end

唯一批量计数器示例

# frozen_string_literal: true

module Gitlab
  module Usage
    module Metrics
      module Instrumentations
        class CountUsersAssociatingMilestonesToReleasesMetric < DatabaseMetric
          operation :distinct_count, column: :author_id

          relation { Release.with_milestones }

          start { Release.minimum(:author_id) }
          finish { Release.maximum(:author_id) }
        end
      end
    end
  end
end

求和示例

# frozen_string_literal: true

module Gitlab
  module Usage
    module Metrics
      module Instrumentations
        class JiraImportsTotalImportedIssuesCountMetric < DatabaseMetric
          operation :sum, column: :imported_issues_count

          relation { JiraImportState.finished }
        end
      end
    end
  end
end

平均值示例

# frozen_string_literal: true

module Gitlab
  module Usage
    module Metrics
      module Instrumentations
        class CountIssuesWeightAverageMetric < DatabaseMetric
          operation :average, column: :weight

          relation { Issue }
        end
      end
    end
  end
end

估算批量计数器

估算批量计数器功能通过提供的 estimate_batch_distinct_count 方法处理 ActiveRecord::StatementInvalid 错误。错误返回 -1 的值。

此功能估算特定 ActiveRecord_Relation 在给定列中的唯一计数，使用的是 HyperLogLog 算法。由于 HyperLogLog 算法是概率性的，结果总是包含误差。遇到的最大误差率为 4.9%。

正确使用时，estimate_batch_distinct_count 方法能够在包含非唯一值的列上进行高效计数，这是其他计数器无法保证的。

`estimate_batch_distinct_count` 方法

方法：

estimate_batch_distinct_count(relation, column = nil, batch_size: nil, start: nil, finish: nil)

该方法包含以下参数：

relation：要执行计数的 ActiveRecord_Relation。
column：执行唯一计数的列。默认是主键。
batch_size：来自 Gitlab::Database::PostgresHll::BatchDistinctCounter::DEFAULT_BATCH_SIZE。默认值：10,000。
start：批量计数的自定义起始值，以避免复杂的最小值计算。
finish：批量计数的自定义结束值，以避免复杂的最小值计算。

该方法包含以下先决条件：

提供的 relation 必须包含定义为数字列的主键。例如：id bigint NOT NULL。
estimate_batch_distinct_count 可以处理连接的 relation。要使用其计数非唯一列的能力，连接的 relation 不能具有一对多关系，例如 has_many :boards。
即使估算的计数引用另一列，start 和 finish 参数也应始终表示主键关系值，例如：

    estimate_batch_distinct_count(::Note, :author_id, start: ::Note.minimum(:id), finish: ::Note.maximum(:id))

示例：

估算批量计数器的简单执行，仅提供 relation，返回值表示 Project relation 中 id 列（即主键）的唯一值的估算数量：

    estimate_batch_distinct_count(::Project)

估算批量计数器的执行，其中提供的 relation 应用了额外的过滤器（.where(time_period)），在自定义列（:author_id）中估算唯一值的数量，参数：start 和 finish 共同应用边界，定义要分析的 provided relation 的范围：

    estimate_batch_distinct_count(::Note.with_suggestions.where(time_period), :author_id, start: ::Note.minimum(:id), finish: ::Note.maximum(:id))

数字指标

operation：对给定 data 块的操作。目前我们只支持 add 操作。
data：包含数字数组的 block。
available?：指定是否应报告该指标。默认为 true。

# frozen_string_literal: true

module Gitlab
  module Usage
    module Metrics
      module Instrumentations
          class IssuesBoardsCountMetric < NumbersMetric
            operation :add

            data do |time_frame|
              [
                 CountIssuesMetric.new(time_frame: time_frame).value,
                 CountBoardsMetric.new(time_frame: time_frame).value
              ]
            end
          end
        end
      end
    end
  end
end

您还必须在 YAML 设置中包含检测类名称。

time_frame: 28d
instrumentation_class: IssuesBoardsCountMetric

通用指标

您可以使用通用指标来处理其他指标，例如实例的数据库版本。

value：指定指标的值。
available?：指定是否应报告该指标。默认为 true。

添加通用指标的合并请求示例。

module Gitlab
  module Usage
    module Metrics
      module Instrumentations
        class UuidMetric < GenericMetric
          value do
            Gitlab::CurrentSettings.uuid
          end
        end
      end
    end
  end
end

Prometheus 指标

此检测类允许您通过将 Prometheus 客户端对象作为参数传递给 value 块来处理 Prometheus 查询。任何 Prometheus 错误处理都应该在块本身中完成。

value：指定指标的值。Prometheus 客户端对象作为第一个参数传递。
available?：指定是否应报告该指标。默认为 true。

添加 Prometheus 指标的合并请求示例。

module Gitlab
  module Usage
    module Metrics
      module Instrumentations
        class GitalyApdexMetric < PrometheusMetric
          value do |client|
            result = client.query('avg_over_time(gitlab_usage_ping:gitaly_apdex:ratio_avg_over_time_5m[1w])').first

            break FALLBACK unless result

            result['value'].last.to_f
          end
        end
      end
    end
  end
end

创建新的指标检测类

生成器将类名作为参数，并接受以下选项：

--type=TYPE 必需。指示指标类型。必须是以下之一：database、generic、redis、numbers。
--operation 对 database 和 numbers 类型是必需的。
- 对于 database，必须是以下之一：count、distinct_count、estimate_batch_distinct_count、sum、average。
- 对于 numbers，必须是：add。
--ee 指示指标是否用于 EE。

rails generate gitlab:usage_metric CountIssues --type database --operation distinct_count
        create lib/gitlab/usage/metrics/instrumentations/count_issues_metric.rb
        create spec/lib/gitlab/usage/metrics/instrumentations/count_issues_metric_spec.rb

实现后，您应该在本地运行 service ping 来验证指标是否包含并按预期工作。

将 Service Ping 指标迁移到检测类

本指南介绍如何将 Service Ping 指标从 lib/gitlab/usage_data.rb 或 ee/lib/ee/gitlab/usage_data.rb 迁移到检测类。

选择指标类型：

确定检测类的位置：在 ee 下或 ee 外。
生成检测类文件。
填充检测类主体：
- 为指标添加代码逻辑。这可能类似于 usage_data.rb 中的指标实现。
- 为单个指标添加测试 spec/lib/gitlab/usage/metrics/instrumentations/。
- 为 Service Ping 添加测试。
生成指标定义文件。
从 lib/gitlab/usage_data.rb 或 ee/lib/ee/gitlab/usage_data.rb 中移除代码。
从 spec/lib/gitlab/usage_data.rb 或 ee/spec/lib/ee/gitlab/usage_data.rb 中移除测试。

故障排除指标

有时指标会因为不明确的原因而失败。故障可能与性能问题或其他问题有关。以下配对会议视频为您提供了一个真实世界故障指标调查的示例。

请观看来自：产品智能办公室时间 10 月 27 日的视频，以了解有关指标故障排除过程的更多信息。