Skip to content

0.9.6#745

Merged
cristian-tamblay merged 49 commits into
productionfrom
develop
Jun 29, 2026
Merged

0.9.6#745
cristian-tamblay merged 49 commits into
productionfrom
develop

Conversation

@cristian-tamblay

Copy link
Copy Markdown
Member

Summary

Release 0.9.6. Promotes the accumulated fixes and improvements from develop to production. The bulk of the work is converter/model robustness fixes (feature selection, dimensionality reduction, normalizer, samplers, categorical encoding), a refactor that centralizes categorical encoding and dtype restrictions, packaging fixes so the frozen executable can run HuggingFace image/translation models, and several frontend UX/translation improvements.


Type of Change

Check all that apply like this [x]:

  • Backend change
  • Frontend change
  • CI / Workflow change
  • Build / Packaging change
  • Bug fix
  • Documentation

Changes (by file)

Converters – feature selection & dimensionality reduction

  • converters/category/feature_selection.py, select_k_best.py, select_fdr/fpr/fwe.py, select_percentile.py, generic_univariate_select.py: fix typing of feature-selection converters and remove duplicated per-file logic.
  • converters/scikit_learn/pca.py, category/dimensionality_reduction.py, fast_ica.py, nystroem.py, truncated_svd.py: add n_components validation against input cardinality.
  • converters/scikit_learn/variance_threshold.py: allow empty output by handling ValueError during fit.
  • converters/scikit_learn/missing_indicator.py: enhance MissingIndicator with missing-value normalization and indicator columns.
  • converters/scikit_learn/normalizer.py: clarify Normalizer is row-wise (display name + description); update static/images/normalizer.png.
  • converters/scikit_learn/additive_chi_2_sampler.py, skewed_chi_2_sampler.py: fix Spanish display names.

Converters – HuggingFace

  • converters/hugging_face/tokenizer.py, embedding.py: remove original text columns replaced by tok_* / emb_* counterparts; fix column type handling.

Models

  • models/categorical_encoder_mixin.py (new), models/utils.py, scikit_learn/sklearn_like_model.py, mlp_regression.py: extract shared categorical-encoder mixin; encode categorical features in MLP regression; large cleanup of sklearn_like_model.py.
  • models/scikit_learn/* (linear/ridge/lasso/logistic/svc/svr/knn/sgd/etc.): remove per-type dtype restrictions in favor of a global dtype blacklist.
  • models/hugging_face/sd15_depth_controlnet_model.py: preserve source resolution in SD1.5 depth ControlNet.

Metrics / Optimizers / Exploration

  • metrics/regression/explained_variance.py: add MAXIMIZE = True.
  • optimizers/optuna_optimizer.py: make CmaEsSampler work and drop incompatible GridSampler.
  • exploration/base_explorer.py + explorers: metadata adjustments (see updated test).

Packaging / Build / CI

  • dashai.spec, requirements.txt, setup.py, .github/workflows/publish.yml: ship diffusers source and avoid AutoPipeline so the frozen exe can run image models; match GGUF filename case so llama.cpp models load on Linux; render AppImage icon from SVG at high resolution. Bump to 0.9.6.

Frontend

  • components/notebooks/ColumnSelector.jsx, RightBar.jsx: show excluded data types in column selector; cap "select all" to cardinality limit.
  • components/converterCreation/ConverterTargetColumnModal.jsx, ParameterStepConverter.jsx, FormConverterSection.jsx, ScopeStepConverter.jsx: modal target column restyle and parameter step fixes.
  • components/models/ModelComparisonTable.jsx: fix translation handling on language change.
  • components/generative/GenerativeChat.jsx: center chat title, adjust session info layout.
  • utils/i18n/locales/{de,en,es,pt,zh}/datasets.json: add excluded-data-types translations.

Docs

  • README.rst: add --force-reinstall --no-cache-dir to pip install commands.
  • docs/docs/deep-dive/benchmark.md (+ i18n): replace en dashes with hyphens.

Testing (optional)

  • tests/back/exploration/test_base_explorer_metadata.py updated for explorer metadata changes.
  • Verify image-generation and translation models load from the frozen/packaged executable (diffusers source + AutoPipeline removal).
  • On Linux: confirm GGUF (llama.cpp) models load and AppImage icon renders correctly.

Notes (optional)

  • Version bump to 0.9.6.
  • The dtype handling change is a behavioral shift: per-model dtype restrictions are replaced by a global blacklist — worth a closer review for any model that relied on the old per-type rules.

Irozuku and others added 30 commits June 24, 2026 17:31
Move the categorical feature/target encoding logic out of SklearnLikeModel into a reusable CategoricalEncoderMixin so tabular models share a single implementation instead of duplicating it.

Drop the per model CATEGORICAL_ENCODING strategy attribute and its enum: a column whose encoder preference is "label" is label encoded, everything else is one hot encoded. The previous fallback only affected columns with an unrecognized encoder preference, which does not occur in practice.
MLP regression fed raw Categorical columns straight into torch, crashing with "can't convert np.ndarray of type numpy.object_". It now inherits CategoricalEncoderMixin to encode categorical features (and any categorical target) before tensor conversion.

At predict time the fitted encoders are applied from stored state rather than derived again from the dataset's encoder metadata, which can drift between training and prediction. The fitted encoders are persisted in save/load so prediction matches training time preprocessing.
Rename display name to "Row Wise Normalizer" and rewrite the description to state that normalization is applied per row across the selected columns, warning that a single column collapses to plus or minus 1. Update the preview image to reflect row wise normalization.
…un image models

The packaged .exe strips .py source, but torch.jit.script runs at import
time in diffusers (kolors) and needs original source via inspect.getsource,
crashing with 'could not get source code'. Ship diffusers source as a data
dir in the spec (same as transformers) so TorchScript can read it, and load
sdxl-turbo via StableDiffusionXLPipeline directly to avoid importing every
pipeline class through AutoPipeline.
The depth map was hardcoded to 512x512, forcing every output to that size
regardless of the input image. Interpolate the depth map back to the source
resolution instead, rounding each side down to a multiple of 8 as required by
the SD 1.5 UNet latent downsampling.
Add the cmaes dependency so CmaEsSampler no longer fails with ModuleNotFoundError, and remove GridSampler from the sampler enum since it requires an explicit search_space and cannot run with the bounds-based suggest_int/suggest_float optimization flow.
Llama.from_pretrained matches the filename pattern with fnmatch, which is case insensitive on Windows but case sensitive on Linux. The SmolLM patterns used uppercase quant tags (Q4_K_M, Q8_0) while HuggingFaceTB ships lowercase files, and the Mixtral default fallback used a lowercase repo prefix. Both failed only in the packaged Linux AppImage. Align the patterns and default with the real repo filenames.
The AppImage recipe extracted frame 0 of dashAI.ico, a 16x16 entry, so python-appimage installed the icon under hicolor/16x16 and desktops showed a generic icon. Rasterize the scalable dashai-isotype.svg to a 512x512 PNG via rsvg-convert instead, giving desktops a high-resolution icon.
…erters and update UI to reflect input cardinality requirements
…ation

Replace en dashes with hyphens in benchmark docs
…anguage-update

Update translation handling for language changes in ModelComparisonTable
…nstall

Force clean reinstall in README pip install commands
cristian-tamblay and others added 19 commits June 28, 2026 20:59
…olumn-type

Preserve column types after Embedding and Tokenizer conversion
…rical-encoding

Encode categorical features in MLP regression
…display-names

Update Spanish display names for Additive and Skewed Chi² Samplers
…ximize

Mark ExplainedVariance metric as maximize so weighted scores compute correctly
feat: Solve problem with temp_dir in translation
…larity

Clarify Normalizer is row wise in display name and description
Simplify explorer dtype restrictions and fix select all cap
Fix TorchScript source error for image models in packaged exe
Make optuna CmaEsSampler work and drop incompatible GridSampler
Match GGUF filename case so llama.cpp models load on Linux
…low-empty-output

Allow converter VarianceThreshold to remove all features instead of raising
Center generative chat header content and remove opacity
Render AppImage icon from SVG isotype at high resolution
…dation

Add n_components validation and warnings for dimensionality reduction converters
…verter

feat: enhance MissingIndicator converter with missing value normaliza…
feat: Change colors and style of the modal target column
@cristian-tamblay cristian-tamblay merged commit af8ec3c into production Jun 29, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants