0.9.6 by cristian-tamblay · Pull Request #745 · DashAISoftware/dashAI

cristian-tamblay · 2026-06-29T02:43:35Z

Summary

Release 0.9.6. Promotes the accumulated fixes and improvements from develop to production. The bulk of the work is converter/model robustness fixes (feature selection, dimensionality reduction, normalizer, samplers, categorical encoding), a refactor that centralizes categorical encoding and dtype restrictions, packaging fixes so the frozen executable can run HuggingFace image/translation models, and several frontend UX/translation improvements.

Type of Change

Check all that apply like this [x]:

Changes (by file)

Converters – feature selection & dimensionality reduction

converters/category/feature_selection.py, select_k_best.py, select_fdr/fpr/fwe.py, select_percentile.py, generic_univariate_select.py: fix typing of feature-selection converters and remove duplicated per-file logic.
converters/scikit_learn/pca.py, category/dimensionality_reduction.py, fast_ica.py, nystroem.py, truncated_svd.py: add n_components validation against input cardinality.
converters/scikit_learn/variance_threshold.py: allow empty output by handling ValueError during fit.
converters/scikit_learn/missing_indicator.py: enhance MissingIndicator with missing-value normalization and indicator columns.
converters/scikit_learn/normalizer.py: clarify Normalizer is row-wise (display name + description); update static/images/normalizer.png.
converters/scikit_learn/additive_chi_2_sampler.py, skewed_chi_2_sampler.py: fix Spanish display names.

Converters – HuggingFace

converters/hugging_face/tokenizer.py, embedding.py: remove original text columns replaced by tok_* / emb_* counterparts; fix column type handling.

Models

models/categorical_encoder_mixin.py (new), models/utils.py, scikit_learn/sklearn_like_model.py, mlp_regression.py: extract shared categorical-encoder mixin; encode categorical features in MLP regression; large cleanup of sklearn_like_model.py.
models/scikit_learn/* (linear/ridge/lasso/logistic/svc/svr/knn/sgd/etc.): remove per-type dtype restrictions in favor of a global dtype blacklist.
models/hugging_face/sd15_depth_controlnet_model.py: preserve source resolution in SD1.5 depth ControlNet.

Metrics / Optimizers / Exploration

metrics/regression/explained_variance.py: add MAXIMIZE = True.
optimizers/optuna_optimizer.py: make CmaEsSampler work and drop incompatible GridSampler.
exploration/base_explorer.py + explorers: metadata adjustments (see updated test).

Packaging / Build / CI

dashai.spec, requirements.txt, setup.py, .github/workflows/publish.yml: ship diffusers source and avoid AutoPipeline so the frozen exe can run image models; match GGUF filename case so llama.cpp models load on Linux; render AppImage icon from SVG at high resolution. Bump to 0.9.6.

Frontend

components/notebooks/ColumnSelector.jsx, RightBar.jsx: show excluded data types in column selector; cap "select all" to cardinality limit.
components/converterCreation/ConverterTargetColumnModal.jsx, ParameterStepConverter.jsx, FormConverterSection.jsx, ScopeStepConverter.jsx: modal target column restyle and parameter step fixes.
components/models/ModelComparisonTable.jsx: fix translation handling on language change.
components/generative/GenerativeChat.jsx: center chat title, adjust session info layout.
utils/i18n/locales/{de,en,es,pt,zh}/datasets.json: add excluded-data-types translations.

Docs

README.rst: add --force-reinstall --no-cache-dir to pip install commands.
docs/docs/deep-dive/benchmark.md (+ i18n): replace en dashes with hyphens.

Testing (optional)

tests/back/exploration/test_base_explorer_metadata.py updated for explorer metadata changes.
Verify image-generation and translation models load from the frozen/packaged executable (diffusers source + AutoPipeline removal).
On Linux: confirm GGUF (llama.cpp) models load and AppImage icon renders correctly.

Notes (optional)

Version bump to 0.9.6.
The dtype handling change is a behavioral shift: per-model dtype restrictions are replaced by a global blacklist — worth a closer review for any model that relied on the old per-type rules.

…sonTable

…ands

…mbedding class

…okenizerConverter

Move the categorical feature/target encoding logic out of SklearnLikeModel into a reusable CategoricalEncoderMixin so tabular models share a single implementation instead of duplicating it. Drop the per model CATEGORICAL_ENCODING strategy attribute and its enum: a column whose encoder preference is "label" is label encoded, everything else is one hot encoded. The previous fallback only affected columns with an unrecognized encoder preference, which does not occur in practice.

MLP regression fed raw Categorical columns straight into torch, crashing with "can't convert np.ndarray of type numpy.object_". It now inherits CategoricalEncoderMixin to encode categorical features (and any categorical target) before tensor conversion. At predict time the fitted encoders are applied from stored state rather than derived again from the dataset's encoder metadata, which can drift between training and prediction. The fitted encoders are persisted in save/load so prediction matches training time preprocessing.

Rename display name to "Row Wise Normalizer" and rewrite the description to state that normalization is applied per row across the selected columns, warning that a single column collapses to plus or minus 1. Update the preview image to reflect row wise normalization.

…list

…un image models The packaged .exe strips .py source, but torch.jit.script runs at import time in diffusers (kolors) and needs original source via inspect.getsource, crashing with 'could not get source code'. Ship diffusers source as a data dir in the spec (same as transformers) so TorchScript can read it, and load sdxl-turbo via StableDiffusionXLPipeline directly to avoid importing every pipeline class through AutoPipeline.

The depth map was hardcoded to 512x512, forcing every output to that size regardless of the input image. Interpolate the depth map back to the source resolution instead, rounding each side down to a multiple of 8 as required by the SD 1.5 UNet latent downsampling.

Add the cmaes dependency so CmaEsSampler no longer fails with ModuleNotFoundError, and remove GridSampler from the sampler enum since it requires an explicit search_space and cannot run with the bounds-based suggest_int/suggest_float optimization flow.

Llama.from_pretrained matches the filename pattern with fnmatch, which is case insensitive on Windows but case sensitive on Linux. The SmolLM patterns used uppercase quant tags (Q4_K_M, Q8_0) while HuggingFaceTB ships lowercase files, and the Mixtral default fallback used a lowercase repo prefix. Both failed only in the packaged Linux AppImage. Align the patterns and default with the real repo filenames.

…uring fit

The AppImage recipe extracted frame 0 of dashAI.ico, a 16x16 entry, so python-appimage installed the icon under hicolor/16x16 and desktops showed a generic icon. Rasterize the scalable dashai-isotype.svg to a 512x512 PNG via rsvg-convert instead, giving desktops a high-resolution icon.

…erters and update UI to reflect input cardinality requirements

…aming in translations

…tion and indicator columns

…ation Replace en dashes with hyphens in benchmark docs

…anguage-update Update translation handling for language changes in ModelComparisonTable

…nstall Force clean reinstall in README pip install commands

…olumn-type Preserve column types after Embedding and Tokenizer conversion

…rical-encoding Encode categorical features in MLP regression

…display-names Update Spanish display names for Additive and Skewed Chi² Samplers

…ximize Mark ExplainedVariance metric as maximize so weighted scores compute correctly

feat: Solve problem with temp_dir in translation

…larity Clarify Normalizer is row wise in display name and description

Simplify explorer dtype restrictions and fix select all cap

Fix TorchScript source error for image models in packaged exe

Make optuna CmaEsSampler work and drop incompatible GridSampler

Match GGUF filename case so llama.cpp models load on Linux

…low-empty-output Allow converter VarianceThreshold to remove all features instead of raising

Center generative chat header content and remove opacity

Render AppImage icon from SVG isotype at high resolution

…dation Add n_components validation and warnings for dimensionality reduction converters

…verter feat: enhance MissingIndicator converter with missing value normaliza…

feat: Change colors and style of the modal target column

fix type in featureselection converters

Irozuku and others added 30 commits June 24, 2026 17:31

fix: replace en dashes with hyphens in benchmark docs

48b6df4

fix: update translation handling for language changes in ModelCompari…

f919d54

…sonTable

docs: add --force-reinstall --no-cache-dir to README pip install comm…

784aaf9

…ands

fix: remove original text columns replaced by emb_* counterparts in E…

c37ec8a

…mbedding class

fix: remove original text columns replaced by tok_* counterparts in T…

7a7659c

…okenizerConverter

fix: update Spanish display names for Additive and Skewed Chi² Samplers

ac64c23

fix: add MAXIMIZE = True to explained variance metric

4a68536

feat: Solve problem with temp_dir in translation

1eadfb8

fix: cap select all to column cardinality limit

95f5694

refactor: replace per-type dtype restrictions with global dtype black…

4801d55

…list

feat: show excluded data types in column selector

601e423

feat: add excluded data types translations

6866edf

fix: allow empty output in VarianceThreshold by handling ValueError d…

c13a778

…uring fit

style: center chat title and adjust session info layout

8a782b4

fix: add validation for n_components in dimensionality reduction conv…

9e43267

…erters and update UI to reflect input cardinality requirements

fix: update n_components warning message to use consistent variable n…

4ebe18b

…aming in translations

feat: enhance MissingIndicator converter with missing value normaliza…

c40808c

…tion and indicator columns

feat: Change colors and style of the modal target column

995ef58

fix type in converters

da23123

Merge pull request #722 from DashAISoftware/fix/benchmark-dash-punctu…

38f61f9

…ation Replace en dashes with hyphens in benchmark docs

Merge pull request #723 from DashAISoftware/fix/metric-descriptions-l…

57bf38b

…anguage-update Update translation handling for language changes in ModelComparisonTable

Merge pull request #725 from DashAISoftware/docs/readme-pip-force-rei…

8a93e1b

…nstall Force clean reinstall in README pip install commands

cristian-tamblay and others added 19 commits June 28, 2026 20:59

Merge pull request #726 from DashAISoftware/fix/embedding-tokenizer-c…

2b145d4

…olumn-type Preserve column types after Embedding and Tokenizer conversion

Merge pull request #728 from DashAISoftware/fix/mlp-regression-catego…

594b535

…rical-encoding Encode categorical features in MLP regression

Merge pull request #729 from DashAISoftware/fix/chi2-sampler-spanish-…

690f019

…display-names Update Spanish display names for Additive and Skewed Chi² Samplers

Merge pull request #730 from DashAISoftware/fix/explained-variance-ma…

e5a3972

…ximize Mark ExplainedVariance metric as maximize so weighted scores compute correctly

Merge pull request #731 from DashAISoftware/fix/temp_dir

ca0bff1

feat: Solve problem with temp_dir in translation

Merge pull request #732 from DashAISoftware/fix/normalizer-row-wise-c…

e6fee4f

…larity Clarify Normalizer is row wise in display name and description

Merge pull request #734 from DashAISoftware/refactor/non-allowed-dtypes

7507498

Simplify explorer dtype restrictions and fix select all cap

Merge pull request #735 from DashAISoftware/fix/exe-torchscript-source

529d28c

Fix TorchScript source error for image models in packaged exe

Merge pull request #736 from DashAISoftware/fix/optuna-samplers

ff1faae

Make optuna CmaEsSampler work and drop incompatible GridSampler

Merge pull request #737 from DashAISoftware/fix/gguf-filename-case-linux

41fa0d6

Match GGUF filename case so llama.cpp models load on Linux

Merge pull request #738 from DashAISoftware/fix/variance-threshold-al…

cabc62c

…low-empty-output Allow converter VarianceThreshold to remove all features instead of raising

Merge pull request #739 from DashAISoftware/style/center-chat-title

6239aef

Center generative chat header content and remove opacity

Merge pull request #740 from DashAISoftware/fix/appimage-icon-svg

2963d56

Render AppImage icon from SVG isotype at high resolution

Merge pull request #741 from DashAISoftware/fix/pca-n-components-vali…

98cba92

…dation Add n_components validation and warnings for dimensionality reduction converters

Merge pull request #742 from DashAISoftware/fix/missing-indicator-con…

0d3fa0f

…verter feat: enhance MissingIndicator converter with missing value normaliza…

Merge pull request #743 from DashAISoftware/fix/modal-target

af4af9b

feat: Change colors and style of the modal target column

Bump to 0.9.6

43fa7bf

Merge branch 'develop' into feat/fix-type-featureselect

97eded9

Merge pull request #744 from DashAISoftware/feat/fix-type-featureselect

9965e96

fix type in featureselection converters

cristian-tamblay merged commit af8ec3c into production Jun 29, 2026
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

0.9.6#745

0.9.6#745
cristian-tamblay merged 49 commits into
productionfrom
develop

cristian-tamblay commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

cristian-tamblay commented Jun 29, 2026

Summary

Type of Change

Changes (by file)

Testing (optional)

Notes (optional)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants