Skip to content

MDEV-37009 bulk insert into vector index#5150

Draft
shabbann wants to merge 1 commit into
MariaDB:mainfrom
shabbann:MDEV-37009-bulk-insert-into-vector-index
Draft

MDEV-37009 bulk insert into vector index#5150
shabbann wants to merge 1 commit into
MariaDB:mainfrom
shabbann:MDEV-37009-bulk-insert-into-vector-index

Conversation

@shabbann
Copy link
Copy Markdown
Contributor

Add support for bulk insertion into MHNSW vector indexes during ALTER TABLE

@shabbann shabbann marked this pull request as draft May 31, 2026 04:09
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces bulk insert support for hierarchical navigable small world (mhnsw) indexes during table copy operations. The reviewer identified several critical issues, including compilation errors from accessing bulk_insert_active through the wrong pointer, an uninitialized member variable, and an ignored return value during bulk insert initialization. Suggestions were also made to reset the active flag on failure and correct formatting to match the repository's style.

Comment thread sql/sql_base.cc
Comment on lines +10148 to +10156
DBUG_ASSERT(s->hlindexes() == (hlindex != NULL));
if (hlindex && hlindex->in_use)
{
if (hlindex->bulk_insert_active)
return mhnsw_bulk_insert_row(this, key_info + s->keys);
else
return mhnsw_insert(this, key_info + s->keys);
}
return 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The member bulk_insert_active was added to the TABLE struct in sql/table.h, but here it is accessed via hlindex->bulk_insert_active. Since hlindex is not of type TABLE*, this will cause a compilation error. It should be accessed directly as bulk_insert_active (or this->bulk_insert_active).

Additionally, please update the indentation to 2 spaces to match the repository's implicit style guide.

  DBUG_ASSERT(s->hlindexes() == (hlindex != NULL));
  if (hlindex && hlindex->in_use)
  {
    if (bulk_insert_active)
      return mhnsw_bulk_insert_row(this, key_info + s->keys);
    else
      return mhnsw_insert(this, key_info + s->keys);
  }
  return 0;

Comment thread sql/sql_base.cc
Comment on lines +10216 to +10234
int TABLE::hlindexes_bulk_insert_begin(ha_rows rows)
{
if (hlindex && hlindex->in_use)
{
hlindex->bulk_insert_active= true;
return mhnsw_bulk_insert_begin(this, key_info + s->keys, rows);
}
return 0;
}

int TABLE::hlindexes_bulk_insert_end()
{
if (hlindex && hlindex->in_use)
{
hlindex->bulk_insert_active= false;
return mhnsw_bulk_insert_end(this, key_info + s->keys);
}
return 0;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are three issues here:

  1. bulk_insert_active is a member of TABLE, not hlindex, so accessing it via hlindex->bulk_insert_active will cause a compilation error.
  2. If mhnsw_bulk_insert_begin fails (returns a non-zero error code), bulk_insert_active should be reset to false so that subsequent inserts do not incorrectly attempt bulk insertion.
  3. The indentation should be updated to 2 spaces to match the repository's implicit style guide.
int TABLE::hlindexes_bulk_insert_begin(ha_rows rows)
{
  if (hlindex && hlindex->in_use)
  {
    bulk_insert_active= true;
    int err= mhnsw_bulk_insert_begin(this, key_info + s->keys, rows);
    if (err)
      bulk_insert_active= false;
    return err;
  }
  return 0;
}

int TABLE::hlindexes_bulk_insert_end()
{
  if (hlindex && hlindex->in_use)
  {
    bulk_insert_active= false;
    return mhnsw_bulk_insert_end(this, key_info + s->keys);
  }
  return 0;
}

Comment thread sql/sql_table.cc
Comment on lines +12666 to 12677
if (!to->s->long_unique_table)
{
to->file->ha_start_bulk_insert(from->file->stats.records,
ignore ? 0 : HA_CREATE_UNIQUE_INDEX_BY_SORT);
bulk_insert_started= 1;
to->file->ha_start_bulk_insert(from->file->stats.records,
ignore ? 0 : HA_CREATE_UNIQUE_INDEX_BY_SORT);
bulk_insert_started= 1;

if (to->s->hlindexes())
{
to->hlindexes_bulk_insert_begin(from->file->stats.records);
hlindex_bulk_started= 1;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The return value of to->hlindexes_bulk_insert_begin is currently ignored. If the bulk insert initialization fails, we should handle the error, print it, and avoid setting hlindex_bulk_started to 1 so that we do not proceed with the bulk insert or attempt to end it later.

Additionally, please update the indentation to 2 spaces to match the repository's implicit style guide.

  if (!to->s->long_unique_table)
  {
    to->file->ha_start_bulk_insert(from->file->stats.records,
                                   ignore ? 0 : HA_CREATE_UNIQUE_INDEX_BY_SORT);
    bulk_insert_started= 1;

    if (to->s->hlindexes())
    {
      if ((error= to->hlindexes_bulk_insert_begin(from->file->stats.records)))
      {
        to->file->print_error(error, MYF(0));
      }
      else
      {
        hlindex_bulk_started= 1;
      }
    }
  }

Comment thread sql/sql_table.cc
Comment on lines +13009 to +13015
if (hlindex_bulk_started && to->hlindexes_bulk_insert_end() && error <= 0)
{
if (!thd->is_error())
to->file->print_error(my_errno, MYF(0));
error= 1;
}
hlindex_bulk_started=0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Please update the indentation to 2 spaces and add proper spacing around operators (e.g., hlindex_bulk_started = 0;) to match the repository's implicit style guide.

  if (hlindex_bulk_started && to->hlindexes_bulk_insert_end() && error <= 0)
  {
    if (!thd->is_error())
      to->file->print_error(my_errno, MYF(0));
    error= 1;
  }
  hlindex_bulk_started= 0;

Comment thread sql/table.h
*/
bool alias_name_used; /* true if table_name is alias */
bool get_fields_in_item_tree; /* Signal to fix_field */
bool bulk_insert_active; /* mhnsw bulk_insert_started flag */
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new member variable bulk_insert_active is not initialized. To prevent potential uninitialized memory bugs, it is highly recommended to initialize it to false directly in the struct definition using C++ member initialization.

  bool bulk_insert_active= false;    /* mhnsw bulk_insert_started flag */

@vuvova vuvova requested review from iMineLink and vuvova May 31, 2026 08:35
@vuvova vuvova added the GSoC label May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

2 participants