Skip to content

False negative from csv.Sniffer.has_header with only strings #102140

@Midnighter

Description

@Midnighter

Bug report

On the following CSV content, the csv.Sniffer.has_header method returns False although it clearly has a header.

sample,fastq_1,fastq_2
A1-35-8,/mnt/scratch/sarek/data/A1-35-8/A1-35-8_R1.fastq.gz,/mnt/scratch/sarek/data/A1-35-8/A1-35-8_R2.fastq.gz
A2-102-5,/mnt/scratch/sarek/data/A2-102-5/A2-102-5_R1.fastq.gz,/mnt/scratch/sarek/data/A2-102-5/A2-102-5_R2.fastq.gz
A5-35-17,/mnt/scratch/sarek/data/A5-35-17/A5-35-17_R1.fastq.gz,/mnt/scratch/sarek/data/A5-35-17/A5-35-17_R2.fastq.gz
AD1-7a,/mnt/scratch/sarek/data/AD1-7a/AD1-7a_R1.fastq.gz,/mnt/scratch/sarek/data/AD1-7a/AD1-7a_R2.fastq.gz
AD1-83a,/mnt/scratch/sarek/data/AD1-83a/AD1-83a_R1.fastq.gz,/mnt/scratch/sarek/data/AD1-83a/AD1-83a_R2.fastq.gz
AD2-60a,/mnt/scratch/sarek/data/AD2-60a/AD2-60a_R1.fastq.gz,/mnt/scratch/sarek/data/AD2-60a/AD2-60a_R2.fastq.gz
Arg1366,/mnt/scratch/sarek/data/Arg1366/Arg1366_R1.fastq.gz,/mnt/scratch/sarek/data/Arg1366/Arg1366_R2.fastq.gz
Br795,/mnt/scratch/sarek/data/Br795/Br795_R1.fastq.gz,/mnt/scratch/sarek/data/Br795/Br795_R2.fastq.gz
Bt100,/mnt/scratch/sarek/data/Bt100/Bt100_R1.fastq.gz,/mnt/scratch/sarek/data/Bt100/Bt100_R2.fastq.gz

I believe this is due to the following lines in the has_header method.

                if thisType != columnTypes[col]:
                    if columnTypes[col] is None: # add new column type
                        columnTypes[col] = thisType
                    else:
                        # type is inconsistent, remove column from
                        # consideration
                        del columnTypes[col]

When all columns are strings both thisType and columnTypes[col] are integers denoting their length. Since they are of different lengths all the column are removed and columnTypes ends up being an empty dictionary which leads to the false negative down the line.

I believe there needs to be a special case introduced to avoid this when comparing integers rather than types.

Your environment

  • CPython versions tested on: Python 3.10.10
  • Operating system and architecture: Linux helios 5.17.5-76051705-generic #202204271406~1653440576~20.04~6277a18-Ubuntu SMP PREEMPT Thu Ma x86_64 x86_64 x86_64 GNU/Linux

Linked PRs

Metadata

Metadata

Labels

3.13bugs and security fixes3.14bugs and security fixes3.15pre-release feature fixes, bugs and security fixesstdlibStandard Library Python modules in the Lib/ directorytype-bugAn unexpected behavior, bug, or error
No fields configured for issues without a type.

Projects

Status
In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions