0.9.6#745
Merged
Merged
Conversation
…okenizerConverter
Move the categorical feature/target encoding logic out of SklearnLikeModel into a reusable CategoricalEncoderMixin so tabular models share a single implementation instead of duplicating it. Drop the per model CATEGORICAL_ENCODING strategy attribute and its enum: a column whose encoder preference is "label" is label encoded, everything else is one hot encoded. The previous fallback only affected columns with an unrecognized encoder preference, which does not occur in practice.
MLP regression fed raw Categorical columns straight into torch, crashing with "can't convert np.ndarray of type numpy.object_". It now inherits CategoricalEncoderMixin to encode categorical features (and any categorical target) before tensor conversion. At predict time the fitted encoders are applied from stored state rather than derived again from the dataset's encoder metadata, which can drift between training and prediction. The fitted encoders are persisted in save/load so prediction matches training time preprocessing.
Rename display name to "Row Wise Normalizer" and rewrite the description to state that normalization is applied per row across the selected columns, warning that a single column collapses to plus or minus 1. Update the preview image to reflect row wise normalization.
…un image models The packaged .exe strips .py source, but torch.jit.script runs at import time in diffusers (kolors) and needs original source via inspect.getsource, crashing with 'could not get source code'. Ship diffusers source as a data dir in the spec (same as transformers) so TorchScript can read it, and load sdxl-turbo via StableDiffusionXLPipeline directly to avoid importing every pipeline class through AutoPipeline.
The depth map was hardcoded to 512x512, forcing every output to that size regardless of the input image. Interpolate the depth map back to the source resolution instead, rounding each side down to a multiple of 8 as required by the SD 1.5 UNet latent downsampling.
Add the cmaes dependency so CmaEsSampler no longer fails with ModuleNotFoundError, and remove GridSampler from the sampler enum since it requires an explicit search_space and cannot run with the bounds-based suggest_int/suggest_float optimization flow.
Llama.from_pretrained matches the filename pattern with fnmatch, which is case insensitive on Windows but case sensitive on Linux. The SmolLM patterns used uppercase quant tags (Q4_K_M, Q8_0) while HuggingFaceTB ships lowercase files, and the Mixtral default fallback used a lowercase repo prefix. Both failed only in the packaged Linux AppImage. Align the patterns and default with the real repo filenames.
The AppImage recipe extracted frame 0 of dashAI.ico, a 16x16 entry, so python-appimage installed the icon under hicolor/16x16 and desktops showed a generic icon. Rasterize the scalable dashai-isotype.svg to a 512x512 PNG via rsvg-convert instead, giving desktops a high-resolution icon.
…erters and update UI to reflect input cardinality requirements
…aming in translations
…tion and indicator columns
…ation Replace en dashes with hyphens in benchmark docs
…anguage-update Update translation handling for language changes in ModelComparisonTable
…nstall Force clean reinstall in README pip install commands
…olumn-type Preserve column types after Embedding and Tokenizer conversion
…rical-encoding Encode categorical features in MLP regression
…display-names Update Spanish display names for Additive and Skewed Chi² Samplers
…ximize Mark ExplainedVariance metric as maximize so weighted scores compute correctly
feat: Solve problem with temp_dir in translation
…larity Clarify Normalizer is row wise in display name and description
Simplify explorer dtype restrictions and fix select all cap
Fix TorchScript source error for image models in packaged exe
Make optuna CmaEsSampler work and drop incompatible GridSampler
Match GGUF filename case so llama.cpp models load on Linux
…low-empty-output Allow converter VarianceThreshold to remove all features instead of raising
Center generative chat header content and remove opacity
Render AppImage icon from SVG isotype at high resolution
…dation Add n_components validation and warnings for dimensionality reduction converters
…verter feat: enhance MissingIndicator converter with missing value normaliza…
feat: Change colors and style of the modal target column
fix type in featureselection converters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Release 0.9.6. Promotes the accumulated fixes and improvements from
developtoproduction. The bulk of the work is converter/model robustness fixes (feature selection, dimensionality reduction, normalizer, samplers, categorical encoding), a refactor that centralizes categorical encoding and dtype restrictions, packaging fixes so the frozen executable can run HuggingFace image/translation models, and several frontend UX/translation improvements.Type of Change
Check all that apply like this [x]:
Changes (by file)
Converters – feature selection & dimensionality reduction
converters/category/feature_selection.py,select_k_best.py,select_fdr/fpr/fwe.py,select_percentile.py,generic_univariate_select.py: fix typing of feature-selection converters and remove duplicated per-file logic.converters/scikit_learn/pca.py,category/dimensionality_reduction.py,fast_ica.py,nystroem.py,truncated_svd.py: addn_componentsvalidation against input cardinality.converters/scikit_learn/variance_threshold.py: allow empty output by handlingValueErrorduringfit.converters/scikit_learn/missing_indicator.py: enhance MissingIndicator with missing-value normalization and indicator columns.converters/scikit_learn/normalizer.py: clarify Normalizer is row-wise (display name + description); updatestatic/images/normalizer.png.converters/scikit_learn/additive_chi_2_sampler.py,skewed_chi_2_sampler.py: fix Spanish display names.Converters – HuggingFace
converters/hugging_face/tokenizer.py,embedding.py: remove original text columns replaced bytok_*/emb_*counterparts; fix column type handling.Models
models/categorical_encoder_mixin.py(new),models/utils.py,scikit_learn/sklearn_like_model.py,mlp_regression.py: extract shared categorical-encoder mixin; encode categorical features in MLP regression; large cleanup ofsklearn_like_model.py.models/scikit_learn/*(linear/ridge/lasso/logistic/svc/svr/knn/sgd/etc.): remove per-type dtype restrictions in favor of a global dtype blacklist.models/hugging_face/sd15_depth_controlnet_model.py: preserve source resolution in SD1.5 depth ControlNet.Metrics / Optimizers / Exploration
metrics/regression/explained_variance.py: addMAXIMIZE = True.optimizers/optuna_optimizer.py: make CmaEsSampler work and drop incompatible GridSampler.exploration/base_explorer.py+ explorers: metadata adjustments (see updated test).Packaging / Build / CI
dashai.spec,requirements.txt,setup.py,.github/workflows/publish.yml: shipdiffuserssource and avoidAutoPipelineso the frozen exe can run image models; match GGUF filename case so llama.cpp models load on Linux; render AppImage icon from SVG at high resolution. Bump to 0.9.6.Frontend
components/notebooks/ColumnSelector.jsx,RightBar.jsx: show excluded data types in column selector; cap "select all" to cardinality limit.components/converterCreation/ConverterTargetColumnModal.jsx,ParameterStepConverter.jsx,FormConverterSection.jsx,ScopeStepConverter.jsx: modal target column restyle and parameter step fixes.components/models/ModelComparisonTable.jsx: fix translation handling on language change.components/generative/GenerativeChat.jsx: center chat title, adjust session info layout.utils/i18n/locales/{de,en,es,pt,zh}/datasets.json: add excluded-data-types translations.Docs
README.rst: add--force-reinstall --no-cache-dirto pip install commands.docs/docs/deep-dive/benchmark.md(+ i18n): replace en dashes with hyphens.Testing (optional)
tests/back/exploration/test_base_explorer_metadata.pyupdated for explorer metadata changes.Notes (optional)