Help us learn about your current experience with the documentation. Take the survey.

列表分区

描述

在要分区的表中添加分区键列。在以下约束中包含分区键：

主键。
引用要分区的表的所有外键。
所有唯一约束。

示例

步骤 1 - 添加分区键

添加分区键列。例如，在 Rails 迁移中：

class AddPartitionNumberForPartitioning < Gitlab::Database::Migration[2.1]
  TABLE_NAME = :table_name
  COLUMN_NAME = :partition_id
  DEFAULT_VALUE = 100

  def change
    add_column(TABLE_NAME, COLUMN_NAME, :bigint, default: 100)
  end
end

步骤 2 - 创建必需的索引

添加包含分区键列的索引。例如，在 Rails 迁移中：

class PrepareIndexesForPartitioning < Gitlab::Database::Migration[2.1]
  disable_ddl_transaction!

  TABLE_NAME = :table_name
  INDEX_NAME = :index_name

  def up
    add_concurrent_index(TABLE_NAME, [:id, :partition_id], unique: true, name: INDEX_NAME)
  end

  def down
    remove_concurrent_index_by_name(TABLE_NAME, INDEX_NAME)
  end
end

步骤 3 - 强制执行唯一约束

将所有唯一索引更改为包含分区键列，包括主键索引。您可以先添加一个唯一索引在 [primary_key_column, :partition_id] 上，这将是接下来两步所必需的。例如，在 Rails 迁移中：

class PrepareUniqueContraintForPartitioning < Gitlab::Database::Migration[2.1]
  disable_ddl_transaction!

  TABLE_NAME = :table_name
  OLD_UNIQUE_INDEX_NAME = :index_name_unique
  NEW_UNIQUE_INDEX_NAME = :new_index_name

  def up
    add_concurrent_index(TABLE_NAME, [:id, :partition_id], unique: true, name: NEW_UNIQUE_INDEX_NAME)

    remove_concurrent_index_by_name(TABLE_NAME, OLD_UNIQUE_INDEX_NAME)
  end

  def down
    add_concurrent_index(TABLE_NAME, :id, unique: true, name: OLD_UNIQUE_INDEX_NAME)

    remove_concurrent_index_by_name(TABLE_NAME, NEW_UNIQUE_INDEX_NAME)
  end
end

步骤 4 - 强制执行外键约束

强制执行包含分区键列的外键。例如，在 Rails 迁移中：

class PrepareForeignKeyForPartitioning < Gitlab::Database::Migration[2.1]
  disable_ddl_transaction!

  SOURCE_TABLE_NAME = :source_table_name
  TARGET_TABLE_NAME = :target_table_name
  COLUMN = :foreign_key_id
  TARGET_COLUMN = :id
  FK_NAME = :fk_365d1db505_p
  PARTITION_COLUMN = :partition_id

  def up
    add_concurrent_foreign_key(
      SOURCE_TABLE_NAME,
      TARGET_TABLE_NAME,
      column: [PARTITION_COLUMN, COLUMN],
      target_column: [PARTITION_COLUMN, TARGET_COLUMN],
      validate: false,
      on_update: :cascade,
      name: FK_NAME
    )

    # This should be done in a separate post migration when dealing with a high traffic table
    validate_foreign_key(TABLE_NAME, [PARTITION_COLUMN, COLUMN], name: FK_NAME)
  end

  def down
    with_lock_retries do
      remove_foreign_key_if_exists(SOURCE_TABLE_NAME, name: FK_NAME)
    end
  end
end

如果我们想要更新分区列，on_update: :cascade 选项是必需的。这会将更新级联到所有依赖行。如果不指定它，更新目标表上的分区列会导致 Key is still referenced from table ... 错误，而更新源表上的分区列会引发 Key is not present in table ... 错误。

步骤 5 - 交换主键

交换包含分区键列的主键。这只能在所有引用外键都包含分区键之后才能完成。例如，在 Rails 迁移中：

class PreparePrimaryKeyForPartitioning < Gitlab::Database::Migration[2.1]
  disable_ddl_transaction!

  TABLE_NAME = :table_name
  PRIMARY_KEY = :primary_key
  OLD_INDEX_NAME = :old_index_name
  NEW_INDEX_NAME = :new_index_name

  def up
    swap_primary_key(TABLE_NAME, PRIMARY_KEY, NEW_INDEX_NAME)
  end

  def down
    add_concurrent_index(TABLE_NAME, :id, unique: true, name: OLD_INDEX_NAME)
    add_concurrent_index(TABLE_NAME, [:id, :partition_id], unique: true, name: NEW_INDEX_NAME)

    unswap_primary_key(TABLE_NAME, PRIMARY_KEY, OLD_INDEX_NAME)

    # We need to add back referenced FKs if any, eg: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/113725/diffs
  end
end

不要忘记在模型中显式设置主键，因为 ActiveRecord 不支持复合主键。

class Model < ApplicationRecord
  self.primary_key = :id
end

步骤 6 - 创建父表并将现有表作为初始分区附加

现在您可以使用数据库团队提供的以下辅助函数来创建父表，并将现有表作为初始分区附加。

例如，在 Rails 后迁移中使用列表分区：

class PrepareTableConstraintsForListPartitioning < Gitlab::Database::Migration[2.1]
  include Gitlab::Database::PartitioningMigrationHelpers::TableManagementHelpers

  disable_ddl_transaction!

  TABLE_NAME = :table_name
  PARENT_TABLE_NAME = :p_table_name
  FIRST_PARTITION = 100
  PARTITION_COLUMN = :partition_id

  def up
    prepare_constraint_for_list_partitioning(
      table_name: TABLE_NAME,
      partitioning_column: PARTITION_COLUMN,
      parent_table_name: PARENT_TABLE_NAME,
      initial_partitioning_value: FIRST_PARTITION
    )
  end

  def down
    revert_preparing_constraint_for_list_partitioning(
      table_name: TABLE_NAME,
      partitioning_column: PARTITION_COLUMN,
      parent_table_name: PARENT_TABLE_NAME,
      initial_partitioning_value: FIRST_PARTITION
    )
  end
end

initial_partitioning_value 可以是值的数组。它必须包含所有现有分区的值。更多详情请参阅此问题。

class ConvertTableToListPartitioning < Gitlab::Database::Migration[2.1]
  include Gitlab::Database::PartitioningMigrationHelpers::TableManagementHelpers

  disable_ddl_transaction!

  TABLE_NAME = :table_name
  PARENT_TABLE_NAME = :p_table_name
  FIRST_PARTITION = 100
  PARTITION_COLUMN = :partition_id

  def up
    convert_table_to_first_list_partition(
      table_name: TABLE_NAME,
      partitioning_column: PARTITION_COLUMN,
      parent_table_name: PARENT_TABLE_NAME,
      initial_partitioning_value: FIRST_PARTITION
    )
  end

  def down
    revert_converting_table_to_first_list_partition(
      table_name: TABLE_NAME,
      partitioning_column: PARTITION_COLUMN,
      parent_table_name: PARENT_TABLE_NAME,
      initial_partitioning_value: FIRST_PARTITION
    )
  end
end

不要忘记在模型中显式设置序列名称，因为它将被路由表拥有，而 ActiveRecord 无法确定它。这可以在 table_name 更改为路由表后进行清理。

class Model < ApplicationRecord
  self.sequence_name = 'model_id_seq'
end

如果分区约束迁移需要超过 10 分钟才能完成，可以使其异步运行以避免在繁忙时段运行后迁移。

在以下迁移 AsyncPrepareTableConstraintsForListPartitioning 前添加，并使用 async: true 选项。此更改将分区标记为 NOT VALID 并安排一个作业在周末验证表中的现有数据。

然后第二个后迁移 PrepareTableConstraintsForListPartitioning 只将分区约束标记为已验证，因为现有数据已经在前一个周末进行了测试。

例如：

class AsyncPrepareTableConstraintsForListPartitioning < Gitlab::Database::Migration[2.1]
  include Gitlab::Database::PartitioningMigrationHelpers::TableManagementHelpers

  disable_ddl_transaction!

  TABLE_NAME = :table_name
  PARENT_TABLE_NAME = :p_table_name
  FIRST_PARTITION = 100
  PARTITION_COLUMN = :partition_id

  def up
    prepare_constraint_for_list_partitioning(
      table_name: TABLE_NAME,
      partitioning_column: PARTITION_COLUMN,
      parent_table_name: PARENT_TABLE_NAME,
      initial_partitioning_value: FIRST_PARTITION,
      async: true
    )
  end

  def down
    revert_preparing_constraint_for_list_partitioning(
      table_name: TABLE_NAME,
      partitioning_column: PARTITION_COLUMN,
      parent_table_name: PARENT_TABLE_NAME,
      initial_partitioning_value: FIRST_PARTITION
    )
  end
end

步骤 7 - 重新指向父表的外键

引用初始分区的表必须更新为指向父表。如果没有此更改，这些表中的记录将无法定位下一个分区中的行，因为它们会在初始分区中查找它们。

步骤：

将外键添加到分区表并异步验证它，例如。
在 GitLab.com 上异步验证完成后同步验证它，例如。
删除旧的外键并将新的重命名为旧名称，例如。

步骤 8 - 确保跨分区的 ID 唯一性

所有唯一约束必须包含分区键，所以我们可以在不同分区中有重复的 ID。为了解决这个问题，我们强制只有数据库可以设置 ID 值并使用序列来生成它们，因为序列保证生成唯一的值。

例如：

class EnsureIdUniquenessForPCiBuilds < Gitlab::Database::Migration[2.1]
  include Gitlab::Database::PartitioningMigrationHelpers::UniquenessHelpers

  TABLE_NAME = :p_ci_builds
  SEQ_NAME = :ci_builds_id_seq

  def up
    ensure_unique_id(TABLE_NAME, seq: SEQ_NAME)
  end

  def down
    revert_ensure_unique_id(TABLE_NAME, seq: SEQ_NAME)
  end
end

步骤 9 - 分析分区表并创建新分区

autovacuum 守护进程不处理分区表。需要定期手动运行 ANALYZE 以保持表层次结构的统计信息是最新的。

使用 partitioned: true 选项实现 Ci::Partitionable 的模型默认每周分析一次。要启用此功能并创建新分区，您需要在 PostgreSQL 初始化程序中注册模型。

步骤 10 - 更新应用程序以使用分区表

现在父表已经准备就绪，我们可以更新应用程序来使用它：

class Model < ApplicationRecord
  self.table_name = :partitioned_table
end

根据模型的不同，使用变更管理问题可能更安全。