Skip to content

fix type in featureselection converters#744

Merged
cristian-tamblay merged 2 commits into
developfrom
feat/fix-type-featureselect
Jun 29, 2026
Merged

fix type in featureselection converters#744
cristian-tamblay merged 2 commits into
developfrom
feat/fix-type-featureselect

Conversation

@Felipedino

Copy link
Copy Markdown
Collaborator

Problem

The feature selection converters (SelectKBest, SelectPercentile,
SelectFdr, SelectFpr, SelectFwe, GenericUnivariateSelect and
VarianceThreshold) hardcoded their get_output_type to always return
Float (float64). Since these converters only drop columns and never modify
the values of the retained ones, this corrupted the type: an integer column was
reported as float64 even though the underlying data stayed integer.

Solution

Preserve each retained column's original type:

  • Added to the base class FeatureSelectionConverter a fit that remembers the
    input types and a get_output_type that returns the original type per column
    (falling back to float64 only when the type is unknown). This covers the 6
    scikit-learn selectors.
  • Removed the duplicated get_output_type (and the now-unused DashAIDataType
    import) from the 6 selector files, which now inherit the behavior.
  • Applied the same fix to VarianceThreshold (same bug, same "only drops
    columns" nature).

Key detail

Types are captured in fit and not in transform: scikit-learn
(_SetOutputMixin.__init_subclass__) automatically wraps any transform
defined on a subclass of a sklearn transformer and would coerce the output back
into a pandas DataFrame. fit is never wrapped, so it is the safe place and it
always runs before transform.

Verification

  • End-to-end test: integer columns stay int64, float columns stay float64,
    and the declared type matches the underlying arrow data.
  • ruff check clean.
  • Existing converter tests pass (including test_base_converter_metadata.py).

Modified files

  • DashAI/back/converters/category/feature_selection.py
  • DashAI/back/converters/scikit_learn/select_k_best.py
  • DashAI/back/converters/scikit_learn/select_percentile.py
  • DashAI/back/converters/scikit_learn/select_fdr.py
  • DashAI/back/converters/scikit_learn/select_fpr.py
  • DashAI/back/converters/scikit_learn/select_fwe.py
  • DashAI/back/converters/scikit_learn/generic_univariate_select.py
  • DashAI/back/converters/scikit_learn/variance_threshold.py

@cristian-tamblay cristian-tamblay changed the base branch from production to develop June 29, 2026 01:17
@cristian-tamblay cristian-tamblay merged commit 9965e96 into develop Jun 29, 2026
20 checks passed
@cristian-tamblay cristian-tamblay deleted the feat/fix-type-featureselect branch June 29, 2026 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants