"Symbols must also only contain uppercase Latin letters and Arabic numerals, and cannot be offensive or derogatory."
So how exactly should I abbreviate my new gene dicovery. Cytosine: Upper Nasal and Tracheal?
Geneticists have issued new guidelines in naming human genes – after spending years wrestling with Microsoft Excel and similar software that automatically converts the names of genes to dates. The Gene Nomenclature Committee of the Human Genome Organisation (HUGO), which sets the standard for the titles and shorthand labels of …
Veeeerrrrrryyyy carefully, at least in today's atmosphere of word censorship at the drop of
your knickers a hat.
More to the point, when are they going to get rid of such symbols as BRCA (Breast Cancer)? It is clearly micro aggression, indicating Gene researchers want all women to wear burkas.
For the humo(u)r impaired, that's .... ah, forget it.
Classic examples of how off the shelf software is fundamentally limited to what vendor can understand, and users embodying classic hammering of square peg into round hole.
I have often wondered how much money/time is spent around the world battling business software into half doing something other than business when a bespoke solution would do it out of the box. Odd when you consider the wasted brainpower of who is doing the battling.
Well, but if you are a dedicated scientist, then you won't throw up your hands, but try and find ways
to work around the problem.
For example, if you are faced with a column of text data, where some of it has been changed into dates,
then you can fairly easily copy that column into a new spreadsheet, change the format of that column
to 'Text', and then use the 'Search and replace' function to undo the damage. You could even write
a little macro, or a formula to do something like that. Yes, Excel is wrong and dumb, as they do not let the user decide up front what data format an imported data can have.
Way back when, I once wrote a macro in Lotus 123, that replaced all the missing zeros in a system
that required "left-filled zeros", and formatted the field as text. No good reason why, but can you argue with tax authorities?
With Lotus 123, I could actually do that. With Excel, not so much, because their macro language is not intuitive, not in the least. In fact, I consider it so utterly un-intuitive that it stinks.
Nonetheless, a scientist ought to be resourceful, and if they are not, they are not good scientists.
You can set field types on import.
As someone who works with Market Research data and has to do this very often and would kindly prefer if Microsoft stopped helping, it's a trivial step but seemingly manageable by this uni-drop out, so, one would hope not that difficult for phd's...
People doing data analysis on genetics shouldn't really be using excel, I think the quoted "Biologists in particular are reluctant to invest time in learning programming skills." is somewhat condescending when this is the community that produced Bioconductor. Experts in their field know how to analyse their data.
"I don't know why you think scientists who are experts in their field, should be any better at office software than they are say at plumbing or arc-welding."
They should because the office software is part of their job. Now probably it shouldn't be that big a part, but if they insist on using Excel for their database, then they need to know how to use it for the database-style things they intend to do. Column typing is one of those things.
I'm a programmer. I only need to know how to write code, so I shouldn't have to know very much about infrastructure which I don't administer, right? We probably all know programmers like that. Yet that knowledge is crucial to understanding how my code will be working and therefore making my products useful. It isn't a thing specifically named in my job description, but if I don't know how to do it, I am not as good an employee.
"They should because the office software is part of their job. Now probably it shouldn't be that big a part, but if they insist on using Excel for their database, then they need to know how to use it for the database-style things they intend to do. Column typing is one of those things."
It's no excuse to not know about the tools you use. No chef would say that they don't know about the different knives and pans they use. No violinist would be blase about the violin they use. As a programmer I'm really fussy about my hardware, hosting and tools.
I've known that Excel makes assumptions for years. One of the reasons I try and get people using a combination of SQL Server + some sort of sql reporting tool is that SQL doesn't mess with your data. You can also set up things like SSIS jobs that will import your data in a consistent way (like filtering data or removing columns). And this stuff is cheap now.. Put it on a server and run PowerBI.
I despise the auto data conversion (and many other autocorrect features that seem to get it wrong far more times than they get it right : I turn them off for a more productive experience).
But if you were regularly importing CSV into a spreadsheet, wouldn't you write a macro that set the column types first, then imported it ? And pass that macro around the community ?
"But if you were regularly importing CSV into a spreadsheet, wouldn't you write a macro that set the column types first, then imported it ?"
No, I would continue to expect the spreadsheet to treat my data as data and not pre-mangle it into corrupted data. If some of the data required cell formatting I would do that, as and when I wanted.
Bloody stupid marketing departments !!!!!!!!!
The frustrating thing is that for txt files there is a nice import wizard on open that allows you to set, among other things, the type of each column, so you can label the sensitive ones as text, but CSVs just open and you don't get to stop Excel from doing nasty things to them.
The workaround is to change the extension to txt before opening, but I'm sure that a lot of people in those research institutes aren't aware of it and don't know how or don't even have the permission to make Windows show extension for known types.
The additional problem with CSVs and Excel is regional formatting, because a lot of times a CSV gets exported under one set of local settings and then opened on another computer with different ones, such as reversed decimal and thousands separator (probably more prevalent in the Balkans, where half the computers run with US settings and half with local, but any large international cooperative is likely to bump into something like that).
Excel and CSVs, definitely not a match made in heaven, but if MS simply added an import wizard such as the one for txt files, it would almost make things bearable.
The additional problem with CSVs and Excel is regional formatting, because a lot of times a CSV gets exported under one set of local settings and then opened on another computer with different ones, such as reversed decimal and thousands separator (probably more prevalent in the Balkans, where half the computers run with US settings and half with local, but any large international cooperative is likely to bump into something like that).
How about the field separator? About half the world uses "," and about half uses ";", with the remainder using more exotic characters. Luckily this one can be worked around using a non-data header line specifying the separator.
One of the tasks I have to do often is taking a sub-set of data from a system and producing a report to show status. One of the identifiers is a Hex string but if Excel sees no letters it converts it to a number with exponent and due to the length of the string it will lose the last 8 digits...forever
An option in Excel settings. That's all what is needed. For Christ's sake they have options for every other conceivable thing under the Sun in there except "Would you like me to permanently stop guessing at what you are trying to do? Y/N"
God help us if Excel ever gets a real AI upgrade.
"Hi! Snoopy here, I've just instructed your bank to move all your money into Jelly Beans, based on your spreadsheet, "Investment tutorial for beginners"
I had a similar issue with customer reference numbers for bPay payments. These are 16 digits and excel changes the last digit to zero. I was using google sheets as it didn't do this.
Then google announced an upgrade to make it more compatible with Excel and introduced the same problem.
> An option in Excel settings. That's all what is needed.
No one would use that option. If there is no guessing then when opening any text file it would have to assume all columns are text. That would make any genuine numeric and date data unusable without the pain of excels data type conversion functions.
There are 2 problems with the current behaviour
First it guesses on a cell by cell basis, not a column basis. So if you have some values that might be a date they get converted but the rest of the column looks OK.
Second there is no visibility of what's been converted so when the user sees the first screen full and it looks alright they assume it's all good.
Even a notification on opening a non-native file that it is being interpreted/converted would be an improvement - I'm sure I'm not the only person in financial services that's had to break out the luhn algorithm to regenerate the last digit of credit card numbers after a csv file has been opened and saved in Excel.
Happens in Google sheets as well. Enter "10-12" to indicate pages 10 and 12 and you get Dec 10th or 12th Oct depending on your locale.
Incidentally did they take into account locales e.g. MAI would presumibly be mangled if users Locale was set to French so when nameing genes you would need to avoid all of this:
Happens in Google sheets as well. Enter "10-12" to indicate pages 10 and 12 and you get Dec 10th or 12th Oct depending on your locale.
For additional fun share a spreadsheet with people on different locales and ask them to fill the cells. Happened on an onedrive-shared Excel file. it was so annoying I had to split the date into three columns.
All dates on sheets I create are now entered in 'ISO 8641 Date' format for those very reasons.
Sometimes somebody will comment - "What is that value?" to which the answer is "A date, formatted unambiguously in compliance with an internationally recognised standard". People have complained that it is difficult to understand, to which the response is along the lines of "Well that's not a problem with the data or the system..."
Long week - must be about time to go to the brewery!
I work in the tech department of a Fortune 500 company in the USA.
Can't even get people to use 24H clocks, let alone ISO 8641 date format...
Having grown up in Europe, where even a six-year-old knows how to tell time in 24H, I'm baffled by the insistence here on using am and pm. At first I thought it was just my unfamiliarity with the system, but it wasn't long before I noticed US-born colleagues oversleeping because they thought the dot on their alarm clock meant am, not pm. Or missing planes because they got to the airport 12 hours late. Or setting up a cron job to restart a server in the middle of the night, only to have it go down at 15:00.
And if I got a penny for all the times I've been asked if 00:00 is the end of the day or the beginning of the day...
Unfortunately, the day doesn't really have "an end" point - there's a concrete *start* at 00:00, and various times leading up to that such as 23:59:59.9999. But no specific end point, just an analogue rollover. There's also no such time as 24:00; the constraint is 0 <= HH <= 23. So midnight is always YYYY-MM-DD 00:00, where DD represents the day that is just starting [which feels a bit counter-intuitive].
FWIW, the most serious Excel issue here doesn't seem to have been noted: the default association of a CSV with an application that doesn't handle CSV's properly. All one of those scientists has to do to destroy their data is double-click the file and then hit "Save"; there is no point at which it is even possible to intervene during the "import". This has caused grief in just about every business I've worked in, particularly when the marketing team have been involved.
Mine's the one with the group policy to associate CSV's with notepad....
And up until 2019 your argument would have been wrong - however TIL since ISO 8601-1:2019, both the term "midnight" and "24" as an hour value have been deprecated. You're now meant to say "beginning of the day" (as opposed to the end of the day, which is presumably "00:00"). Retrograde meddling.
I hate Excel it just breaks data by default.
Customer gets an EDI file does a small mod to a piece of text then saves.
Item codes with leading zeros, gone, long barcode numbers changed completely.
Importing prices for customer 016, does a tidy in Excel, now customer 16.
I use CSVED and Notepad++ as Excel will make me swear.
Finally NEVER open a DBF with it and save, it will not save in that format but completely change it.
I remember the telling off of the custoomer who did it and the trawls through backups.
Does almost as such damage as crypto worms.
And not forgetting that the data is formatted on input so when it then turns something into a date or whatever, when you then go and put the correct formatting in the 10/07 that was turned into 10 July gets converted into complete irrelevance.
It is called progress and the usual developers knowing best without any concept that someone may not want it to do this. The fact that you do not appear to be able to turn it off is just as infuriating.
It is "usability" gone nuts.
It's one of the reasons why so many scientists are switching to Python's Jupyter and Pandas, with Excel relegated to the format for reports.
Excel's import of text files has always been miserable, though this isn't helped by the deficiencies of the CSV format. But it really would be useful to be able to disable type inference as a preference and not fiddle with it, file by file.
We need a 'CSV 2' format, with first line containing 'CSV2', the second containing formatting info, the third containing the labels fior each field, and a 4th line containing 200 commas...
The data, from line 5 and down, will all begin with a comma.
That should make such a complete mess of Excrement's import function that whoever is using it can't help but notice and will be forced to take action...
Add a 'CSV2_CSV2_CSV2_CSV2' line every 100 line or so, with a random number of commas before it to screw with autogenerated field widths...
"Excel's import of text files has always been miserable, though this isn't helped by the deficiencies of the CSV format. But it really would be useful to be able to disable type inference as a preference and not fiddle with it, file by file."
Thankfully, I've not had to deal with the vagaries of CSV files and various programs' methods of imparting them for many, many years. But I do remember having to wrangle CSV files to get them into DBase II, Delta, SuperCalc and Lotus123, often writing code specific to a particular CSV file to make it look like what the destination program expected.
Files get passed around. If you rely on setting the program that opens them to something non-default you come up against the problem of relying on everyone to do that correctly every time otherwise munged files start getting circulated. It's a bit like you rely on everyone to use mv or rm -r correctly in root. Of course nobody ever got the latter wrong did they?
UPVOTE. I was literally about to post the exact same thing. I know people aren't all stupid, but really how difficult is it to define a spreadsheet column as text. I've had that problem myself when some entry in a spreadsheet column is defined by the spreadsheet as a date field. Right click, define as text, it's probably different in different software but takes two ticks (2 February 1943 to genetic scientists)
Easy to scream and shout at MS for this one, but methinks this is more a case of everything looking like a nail to a man who has only a hammer. Spreadsheets were not designed for such tasks, and jamming them in where they do not fit was bound to cause problems. Anyone who has worked support in an office environment will tell you the same.
And this alleged "solution" to the problem is depressingly like those that we see in companies every day — rather than use an appropriate tool, they've decided to beat and mangle what they've got until it just about fits into the holes they've already decided to use (I know that there isn't any one authority out there that's decided they're going to use spreadsheets, but I feel like the drum could have been sounded a little louder on the message of not using spreadsheets for anything that needs data integrity).
I look forward to the new names being broken by a future Excel "feature" update (or even a localisation issue in the current version), requiring another rename drive and at least two sets of mapping data to translate between the 3+ generations of naming convention. Personally, I'll stick with the time-honoured tradition of vetoing software that corrupts data silently.
I look forward to the new names being broken by a future Excel "feature" update...
Indeed. Or any other software TBH. As for example people know who try to write a publication/ text with words/ abbreviations that have a different format of caps and lower case than your software thinks appropriate...
“ Spreadsheets were not designed for such tasks,”
What alternative would you suggest?
Spreadsheets are designed exactly for such tasks, it’s just that spreadsheet manufacturers felt it was helpful to auto format some data on import/input, if that default behaviour could be changed then problem solved.
A spreadsheet is a computer application for organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in cells of a table. Each cell may contain either numeric or text data, or the results of formulas that automatically calculate and display a value based on the contents of other cells. A spreadsheet may also refer to one such electronic document.
If this issue affects gene researchers, it might also affect others. Whilst the gene researchers have, as a group, decided on a workaround, it would be sensible for MS to allow users to configure Excel to not auto-format when importing files - this way benefitting other users who might otherwise run afoul. Alternatively, have Excel bring up a dialogue box telling the user how to convert the cell type - "Hey, it looks like you might be working with dates in this column. If you wish to convert it to a date format, select the column by double clicking the header and then right click, Format Cells".
Lots of people use Excel for some pretty exotic stuff, including control and automation. That horse has bolted. Some options, clear and well communicated, would seem to be the sensible way forward. Maybe just a choice at install: Are you an accountant or an engineer?
I look forward to the time that Excel gets a real AI appendage to its program.
Maybe the Doomsday clock will start ticking a little faster at that point, but have no fear, we can force M$ to take on the maintenance of that sayer of the end, and a few updates later, it will no longer be 23:59 31st December, but 23rd of the 59th month 31 minutes past 12 and humanity will have been saved forever by m$'s unflinching hubris.
"Alternatively, have Excel bring up a dialogue box telling the user how to convert the cell type - "Hey, it looks like you might be working with dates in this column. If you wish to convert it to a date format, select the column by double clicking the header and then right click, Format Cells"."
Maybe they could give it a little personality, and a name, say "Clippy"? (or don't, really, please don't)
The movie trailer;
In a dystopian future, where the Microsoft world believes you cannot think for yourself, an unwanted hero has has returned...
Clip-AI - "Hey I see you are trying to do something productive, so I'm going too help by converting some of your data randomly to date format"
Helpless user - "NOOOOOOOOO!"
Clipp-AI - "Glad I could help! Now lets take a look at your bank account and see what great new cloudy services we can sing you up for"
@ "Spreadsheets were not designed for such tasks", indeed they were designed to get rid of book keepers of such business records as invoicing and sales.
@"What alternative would you suggest?", software designed for the purpose by people who understand the subject i.e. use the right tool for the job
"A spreadsheet is a computer application for organisation, analysis and storage of data in tabular form", your definition also applies to other software including databases, report etc so let us agree that not everything that uses the format is EXCEL/SQL, that a lot of time is being wasted because EXCEL/SQL is selected because the people holding the purse strings recognise the name.In reality the tabbed format is just easy for humans to process since they are trained at school to extract data from reports in the same format.
"your definition "
not my definition, I lifted it from Wikipedia and included the Wikipedia link before the definition.
Yes i have a science degree,
Yes i was taught to reference my sources.
Yes i could have written my own definition but why bother creating definition pollution if someone else has done a perfectly adequate job already!!
I have been fortunate enough to work for a University Lab doing medical research.
One of the researchers asked me (a few years ago now) for a 4k laptop and 64bit office.
As you can imagine this was an arm and a leg, and at the time 64 Bit office was as stable as a very unstable thing.
When I asked why, he showed me ( I don't pretend to understand a fraction of what they do other than it was really important work for Diabetes ). They import the data then use the conditional colours to see the various gene snips.
It can make it much more obvious where changes / outliers are. you look down the column of red and see a blue indicates a different gene snip that can possibly indicate the faulty \ different gene sequence. Much easier than looking down a list of numbers \ letters
Knowing I was helping people in a very small way did sort of help my soul recover after working for _ing Lawers.
Beer (or Non Alcoholic beverage of their choice I know some of them don't drink, but the others take up their share ) for all the wonderful medical workers \ researchers trying to find cures for all kinds of stuff.
My experience of spreadsheet for science in academia is that the format is selected as it allows some protection of functions so as limit expertise theft and/or where the academic in question cannot understand/recreate the calculations themselves.
Yet another example of dumbing down due to IT again allowing the inept to present themselves as otherwise.
"t can make it much more obvious where changes / outliers are. you look down the column of red and see a blue indicates a different gene snip that can possibly indicate the faulty \ different gene sequence."
That is trivial to do in vim or XEMACS. Has been since the 16-bit days. On 4 megs of RAM. (Well, at least 8 meg for EMACS + 4 more for the GUI, of course ... and back then vim was Stevie & ran on an Atari ST.)
Some-time biologist who moved into databases here. That's databases as a complete career change. You can keep your emacs but vi is hardwired to the fingers.
Nevertheless a database might not be the right solution here. As soon as you start talking about tabular data and databases in the same breath you start thinking about RDBMS but the time taken to set that up for what might be a one-off is probably not worth it. There's almost certainly a need for a simple way to tabulate data without making too many assumptions about the nature of the data. It's a principle I've observed many times; the more assumptions you make in the design of an application the less capable it becomes.
In this case take on the data as text columns which is probably the most minimal assumption you can make. If you want to flag the column* as date - or anything else - that's the time to let the application reformat the text. If you want to use a database to store the data you can, of course, have the application generate the schema at this point.
* Spot the implied assumption here.
There is an entire Data section on the Excel ribbon, of which the left hand side is given over to importing and formatting data. The underlying functions have been available for decades.
The problem is largely around the complete failure of universities and other institutions to take software training seriously. Since most of these people can't even book a meeting properly, forget about them trying to manipulate data.
No, the problem is the brain dead default software behaviour of excel. People are lazy and always will be. And excel is a stupid program for any data analysis that is more complex than basic bookkeeping... Ok, that's three problems.
Looking at typical excel course curricula, the data import is probably in the advanced course. Fun thing: this stupid autoconvert happens (at least used to) also when your LOCALE setting is not English or American. In many other languages the string 1DEC has no calendaric interpretation.
Yeah, functions for this exist. They are a pain to use, unless you have scripted some sort of import filter - otherwise you have to jump through the hoops for every single file you use. Excel also has problems with csv (comma separated values) in the default settings when trying to load the file by just opening it.
I'll go with a poke at both sides.
As a curmudgeonly git I expect that whatever I type into an excel sheet* should stay exactly as I type it - until I do something about it.
Learning how to 'Format Cells...' isn't hard. Being a prima-donna about not learning to use the simple formatting tools is just unprofessional.
The first 20 years of desktop spreadsheets (visicalc onwards) had a lot of usage restrictions and yes formatting was a pain at times but at least you knew that the data you typed stayed as typed.
* I'd extend this to wordprocessors as well, underline in red by all means but don't change the letters I typed which may well be a correct technical reference or foreign word (or both).
"And excel is a stupid program for any data analysis that is more complex than basic bookkeeping... Ok, that's three problems."
I'm sure there's a much larger set of suitable problems than that. However there are other problems for which it isn't suitable and for which a similarly accessible application doesn't exist.
When I open .csv files Excel does exactly nothing to them. Perhaps that's because I open them as .csv rather than importing them into Excel? I then save them as .csv without making any changes. If I want to import data into an Excel file I usually copy and paste it into places where I've already set up the formatting. So dates get turned into dates in a way that I control. It's 20 years since I was a heavy Excel user, so I've forgotten loads of it now.
It does suffer from being a program designed to do financial stuff - that's morphed into a Swiss Army Knife - which then gets abused for jobs a chainsaw would do better. Yes, I did once use Excel to print a crude poster for a seminar on the Water Regulations, who's asking?
You do have that choice - and people are complaining about the auto-guess step. If you import via file open then you get to override those auto-guesses with a simple columnar application of format buttons. if you import via well.."file import from text/csv" then you can nerd out on column type choices and token separators.
This stuff has been around for decades.
This response exemplifies the user issue of simply not doing a very simple check on how to use the tool properly.
Interesting - I just today read Humble Pi which touches on this exact problem. And I'm sure we've all had to wrestle with Excel when it insists on treating text as numbers and then mangling them.
It does seem odd to tailor your processes to accommodate a vendor's quirky software, but realistically: Excel isn't going to change.
if someone sends a CSV and it opens in excel just by double-clicking, then people are going to double-click. You can't stop people doing things the easy way except by actually preventing it from working.
(Corrupting the data silently so they may not notice doesn't count as preventing it from working)
There's been a lot of similar commentardery with two common themes:
(1) I've spent my entire sad and geeky life buried in data. Why can't these people do what I do?
(2) Biologists should be able to deal with this.
Please remember that these people are just Lusers. They have all the aspects of the vast herd of non-biologists who drive the average IT support person to drink on a daily basis.
Your average Luser will, early on, learn to use a word processor and spread sheet at a very simple level and will then be bound by the few functions that they learned for the rest of their working life.
Trying to break them of the habit, or educate them, is a way to profound despair.
Hammer/nail. Financial people will do everything in a spreadsheet, however unsuitable this seems. Marketing people will use PowerPoint in a similar way.
Deciding to change what you can control instead of another decade or more of fighting what you can't control seems eminently sensible to me.
Just thinking about it makes me need ->
LibreOffice also likes to auto-detect data types, and then auto-format based on what it thinks you're doing.
One solution would be to have a setting for default cell formatting. Auto-detect can still be the default option, but let people set it so negative numbers are always red and in parentheses, for example.
Some people have this very strong aversion to databases. They don't necessarily provide a reason, and if they did you wouldn't understand it. I've seen lots of people do this, including several types who know about databases but still use spreadsheets. Part of it might be that they don't want to have to write the UI around the database and the only reasonably common portable database format is SQLite*, but that's not a great reason.
*For example, the MS Access database format isn't easy to open if you don't have a license for Access. Dumps from other databases might need tailoring if you're using a different server, or people just don't want to run servers. I view an SQLite file as a perfectly valid way to send a dataset, but I'm comfortable issuing SQL queries. I don't know if there are good GUIs for that which allow viewing, sorting, adding, and all of that without needing to learn SQL, but if not maybe we should write one for the biologists. Anyone want to collaborate on that open source project?
By the way, XKCD has been prescient as always.
Data sharing? CSV is a useful format for sharing data. Unfortunately, given Excel's behviour, the following scenario is probably: click on "download" link, file opens in Excel auto-corrupting date-like data, researcher saves file to disk using Excel's "save" function, imports into real database.
I did suggest that adding a machine-readable data format (JSON) would be helpful on the last project I worked on, but this wasn't taken up as it wasn't common for people to be able to process it.
I think the reasoning is like this: in a spreadsheet, the user can see where everything is, and navigate it easily - it's just a virtual grid of pigeon holes. If you can navigate yourself K3 or whatever, and know how to reference sets of row or columns, it's all easy, and the mental model is clear and simple.
However, who know what goes on in a "database"?-- anyone who prefers a spreadsheet probably doesn't, and very likely doesn't want to know either. Here the mental model is probably something like handing your stuff to a cloakroom attendant who puts it away where you can't see, and you can only do things by learning some weird query language which you will probably keep forgetting, and then have to export as a spreadsheet anyway -- again with some command or other -- to send to other people. Perhaps such misconceptions can be changed, but will it be easy?
Still, perhaps if this auto-mangling of spreadsheets gets more profile then attitudes could shift. Then again, IT stuff does weird incomprehensible shit all the time and ordinary people learn to shrug, adapt, and cope - more or less - with that. :-)
"I did suggest that adding a machine-readable data format (JSON) would be helpful on the last project I worked on, but this wasn't taken up as it wasn't common for people to be able to process it."
CSV is also a machine-readable format. The problem isn't having the data in a machine readable format, it's having the machine read it in the right way. People have a machine to read the CSV - it's just the wrong machine. For your JSON you'd also have to provide the machine.
Not necessarily. They just want a quick and dirty way of viewing and exchanging tabular text data. Excel is a case of "if you have a hammer...". An RDBMS would make that a sledgehammer. What's needed is something simpler, not something more complex. And I say that as a biologist who changed to become an RDBMS specialist mid-career.
I am an excellent mechanic. While it’s true that good mechanic doesn’t blame his tools, I have held plenty of crappy and badly performing tools in my hands. Excel’s auto format (and for that matter Word’s and most cell phones spelling auto correct “feature”) have caused me much grief over the years as these programs try to do my thinking for me. Sorry Winkypop, it IS the tool that is causing the problem.
My Snap On dealer once told me a story about a Ford production line where the car came down the belt with metric on one side and English on the other. True? Who knows, but funny. The argument about Excel (or any other program) being appropriate can easily be made (especially when a database should be used instead of a spreadsheet). However, in the real world, people use what’s at hand or what they’re familiar with. And, since most are not really skilled, the subtleties of controlling the formatting are not evident. But, even with me paying attention, I have had many bad experiences with Excel formatting.
"where the car came down the belt with metric on one side and English on the other. True? Who knows"
Not unless the Union negotiated an extra 15% for all lug-nut installers in order to reflect their updated training. 20% for lefties on the right side and vice-versa, of course.
Sometime mid seventies, some UK Fords came off the line with imperial heads on metric bolts. (Or possibly the other way around - it's been a long time since I took one to bits). I always assumed it was something to do with their production line transition to metric fixings.
(And I recall a Datsun of the same vintage which was metric throughout, except for a single 3/4 BSP thread for the temperature sensor. Though that had at least some excuse: the engine was a licensed version of the BMC engine found in Minis and Morris Minors.)
BSP -- British Standard Pipe -- is a metric thread. The metric thread for pipes.
I can only imagine the negotiations and horse trading which led to that result (standards creation is notoriously corrupt), but out of all the thought that went into creating the metric standards, BSP was chosen without modification.
For nuts/bolts, a modern thread profile was wanted, which was why all the existing British bolt threads were abandoned, and Britain had the choice of going with Europe, or continuing with the new standard it had just developed in co-operation with the USA. Britain abandoned the new trans-Atlantic standard, and adopted the European standard instead. And Europe chose BSP for pipe threads. Was that the best choice for Europe? I have no idea what their pipe threads looked like before metric threads. But after metric threads, I know what their pipe thread standard is: BSP.
"However, in the real world, people use what’s at hand or what they’re familiar with."
Of course. But it's still not the tool's fault when somebody tightens the wheel-bearing nuts on a Dana 60 with a Stillson wrench (for the uninitiated, that's the proper name for your common or garden pipe wrench).
"But it's still not the tool's fault when somebody tightens the wheel-bearing nuts on a Dana 60 with a Stillson [...]"
Resorting to a Stillson is possibly because of the need for leverage. Even a socket spanner's T-bar can bend - especially when getting extra leverage with a piece of steel pipe.
The recurring frustration when undoing my Range Rover's wheel nuts was only solved by a very large one-piece four-legged socket "spider". For a country which didn't have department stores in the early 1980s - Luxembourg City had a large Aladdin's Cave shop of tools. They even stocked road construction equipment.
Not lug nuts. Wheel bearing nuts. On a Dana 60, the retaining nut for the wheel bearings is a large (1.25") nut that, following the proper procedure, is ultimately tightened to just barely finger-tight, basically setting the bearing pre-load. A Stillson is massive over-kill for such a delicate operation, and has lead to many early wheel bearing failures. So why do people use the pipe-wrench? Because most shade-tree mechanics don't have the proper socket/spanner in that size ... and/or can't be arsed to read and follow the simple proper procedure.
Sound familiar, in the context of this series of comments?
 Finger tight for me, your mileage may vary. Consult your manual.
"No, it's not the tool. Its the user choosing the wrong tool, in this case a spreadsheet to do the work of a database. You don't use Metric tools on your old Ford, do you?"
Yes, if it fits, better than scrabbling through a box of AF, BA, Whitworth and BSF before finally grabbing an adjustable wrench.
No, it very much is the tool.
Consider the following scenario: web site offers data in CSV format for download. Researcher clicks "download" link - the browser downloads the file and helpfully opens it ... in Excel - at this point the data is now mangled. The researcher clicks "save as" to save it somewhere for import into something else later.
If you don't consider that the software is the problem in the above scenario, then I would suggest you go and listen to J Nielsen and D Norman for a bit - unexpected behaviour is a defect in the software when a sufficient proportion of people find it unexpected.
Honestly, Excel's auto-formatting of dates is a menace - CSV is a perfectly fine data format for data exchange, but Excel's behaviour does make it problematic
It's not the tool. It's the choice of tool. The tool is doing EXACTLY what it was designed to do. It is hardly the tool's fault that it doesn't do what somebody expects it to do.
It amazes me that some readers of this august forum seemingly expect the tool to read the mind of the user, and then do that thing instead of what it was programmed to do.
So let me get this straight ... if you order the fish at a steakhouse, and you actually received fish (as ordered) instead of steak you bitch to the waiter about it "because everybody knows that you eat steak at a steakhouse"? That's hardly sensible, now is it?
Be very careful what you wish for ... DWIM functionality causes far more headaches than it fixes. See modern autocorrection if you're not old enough to remember Damn Warren's Infernal Machine.
"It amazes me that some readers of this august forum seemingly expect the tool to read the mind of the user"
I think it's the opposite. By changing data without any express instruction, Excel does try to read the mind of the user. It wouldn't be so bad if it only changed the appearance of the data, but kept the original data intact behind the scenes.
Note that I didn't say I liked Excel. I don't. In fact, I never use it. Not for over ten years.
However, the reality is that for the vast majority of users it is behaving exactly the way they need/want it to behave. And for the few who don't like the default behavio(u)r, it can be changed to whatever they want it to do (in this particular case, anyway).
But no. Instead the gene researchers are going to change the nomenclature that the entire planet has been using for decades. Who, exactly, is going to re-write (and then pay to publish & distribute) all of the textbooks and other documentation to reflect this change? Or are they going to turn it into a giant clusterfuck because they can't be arsed to learn how to use the proper tool for the job?
If the job is important to you, LEARN TO PROPERLY USE THE PROPER FUCKING TOOLS TO DO THAT JOB OR BE PREPARED TO FAIL MISERABLY! How fucking hard is this to understand?
Nobody wants this feature of Excel.
Lots of people like Excel, and lots of people use Excel, but nobody likes this feature of Excel. There are a large number of people who haven't been bitten by this feature of Excel recently, who don't care today, and there are people who have been bitten, who do care, and there are people who don't use Excel, but there are no Excel users who want this feature of Excel.
"It's not the tool. It's the choice of tool."
I agree it's the choice of tool. But the correct choice of tool isn't something more complex than a spreadsheet, it's something simpler. The standard PC, whatever the OS, doesn't have it. If it were text you'd have the choice of your word processor or Notepad/Kedit/whatever. For tabular data such as CSV you have a choice of spreadsheet or.
"CSV is a perfectly fine data format for data exchange"
Oh no it isn't! Capital letter 'O' instead of numeral '0', etc. Some years ago, exam results in Scotland were screwed up by that problem in the data.
XML with a schema that enforces data types is something you could write into a contract to ensure reliable data transfers between organisations. You cannot reasonably type XML manually, but Excel can, with difficulty, be made to export that way.
But clearly XML is a non-starter for the situation the article describes.
Oh I wondered how long before someone brought up XML schemas. has anyone ever actually seen one that works, is supported by software and actually used to exchange data?
I bet there's a beautiful XML schema for gene wrangling. And I bet that 99% of the really clever (not Excel) software geneticists use won't read it because it's a ridiculously complicated construct that by trying to catch all the weird edge cases that got put up in the millennia of committee that wrote it, ends up so complicated that no-one in their right mind would try.
I have had to supply data in XML files to CMS (Medicare/Medicaid), so there is at least one schema that works and is routinely used to exchange data.
They even helpfully supplied a sample file, so I was able to write a program to extract our data and create the file they required.
Would this have any use to anyone besides moving data? Nah, certainly not a useful format to view or manipulate data.
"You don't use Metric tools on your old Ford, [...]"
In general a set of Metric spanners will fit most Imperial hex bolt heads. There are only a few Imperial sizes that are needed for some interpolations. Even then the bolt head or the spanner tolerance - or the bolt head wear - can mean the expected spanner doesn't quite fit. I have several nominal 13mm and 1/2inch spanners which are mix&match against some bolts/nuts.
Apparently LandRover put different size heads and nuts on one bolts' fitting - so you didn't need two spanners of the same Imperial size to adjust it.
"I've never understood why no one produces a standard C and ring spanner set with different sized ends on each spanner [...]"
It can be done - but possibly not in that way. You can buy "double ended" spanners with different sizes at the ends. They come as either both ends "open" or both ends "ring". So two spanners - one of each type - will replace two "combination" spanners of each size. You are still stuck if you want to use only the open or ring ends though.
"a spreadsheet to do the work of a database"
I have regularly used Excel as a database for membership lists of organisations. From there I could do Mail Merge postal or email messages to the members. Also it was a format that most other committee members could handle if necessary.
These are not the gene names, they are the common abbreviation used for the gene. Thus, the gene abbreviated as BRCA1 is named "Breast Cancer 1, early onset'. And yes, I realize people rarely use the full name, and then probably only once per article. This article is not about changing the name, but rather the standard abbreviation.
Also, excel is being used to display genetic information in a format people can easily read and manipulate for small subsets of data. The source genetic data typically uses formats like SAM or BAM or other specialized formats, which are much easier to work with programmatically, but rather difficult to browse (and they can be HUGE). There is a large amount of open source software for analyzing genetic data, but much of it is fairly complex and probably overkill for small subsets of data, particular for non-IT literate scientists.
You would lose a lot of talent, skills, various ways of seeing the world, and thus insights, if the only people who were allowed to be scientists were IT literate to a certain level. However, there some issues to be sorted out.
In a similar vein, a good number of working scientists aren't statisticians, and can as a result fall into traps. Some universities therefore employ some full time statisticians for researchers to consult.
So perhaps the answer is to educate scientists in IT enough so that they know what they don't know, and thus seek assistance - and then provide said assistance. Of course, the learning would go both ways, with the IT specialists learning about the sort of issues in IT the scientist in his or her field might come across in their workflow*.
*And that's another thing - if you're at the cutting edge of something, you don't always adhere to an established work flow. You might carefully record data in a logbook (or Excel file) out of habit, but only discover it to be significant down the line when something surprising has happened.
Excellent comment, particularly concerning the statistics. Statistical errors seem to be one of the most common reasons for retractions and corrections of scientific studies. It would be nice if all studies could include a professional statistician, since many of the issues are not at all obvious to non statisticians.
@RM Myers: "It would be nice if all studies could include a professional statistician, since many of the issues are not at all obvious to non statisticians."
Many studies would love to have a statitician on board, but there aren't enough statisticians to go around - there aren't even enough to go round for research ethics committees! My last REC was fortunate enough to have a really good statistician, and she (and, to a much lesser extent, I) used to get VERY frustrated with really fairly basic errors which meant that the project wasn't testing what it thought it was. All thus because no-one was adequately teaching stats to researchers, and there was no-one to run a project past.
"[...] I KNOW what it will do and not screw up my data."
I THINK I KNOW what it will do and not screw up my data.
Ah - the computer is always right.
There once was a database which was apparently losing track of important records on terminal searches. The symptom was only spotted when an astute operator was asked if they wanted to add a very common family name to the indexing eg "Smith".
An investigation showed that several names had been re-entered as "new" when they already existed - and this had happened several times for the same names. The result was that any previous records linked to that name became orphans. The records were still accessible as serial data - so were not actually lost.
The fault lay in the disk controller. When the logic rack voltage was in spec - but not spot on - it would very occasionally fail to recognise a data match. The relevant data key was for unique names - on which the system could request searches***.
***The database was serially archived and reloaded for efficiency every so often - by tidying up the additions and deletions. This changed which names' keys were potentially affected by their position on the disk.. Only the last record on a track would be potentially missed.
From Azure Function Host (https://github.com/Azure/azure-functions-host/issues/6480):
Update terminology used in runtime source #6480
@@ -75,30 +75,30 @@ public bool TryGetFunctionMetadata(string functionName, out FunctionMetadata fun
/// Gets the function metadata array from all providers.
/// <param name="forceRefresh">Forces reload from all providers.</param>
- /// <param name="applyWhitelist">Apply functions whitelist filter.</param>
+ /// <param name="applyAllowList">Apply functions allow list filter.</param>
/// <returns> An Immmutable array of FunctionMetadata.</returns>
public ImmutableArray<FunctionMetadata> GetFunctionMetadata(bool forceRefresh, bool applyWhitelist = true)
public ImmutableArray<FunctionMetadata> GetFunctionMetadata(bool forceRefresh, bool applyAllowList = true)
1K issues to deal with just with the Azure Function Host, and this one bubbles to the top of the list.
This is of limited value when the data arrived mangled because someone who passed it to the person who passed it to you didn't do that. Or possibly the person who passed it to them. Or an assistant who opened the CSV file that had been emailed to the boss.
I remember, once upon a time, you'd open a CSV and it'd give you the wizard.
Now, it defaults to just opening the CSV - and if you want it to open the Wizard, you have to rename it to .TXT and open it - or use some weird data import wizard that configures the CSV as a data source ?
So hundreds of comentards moan about removing derogatory language but are clearly not fussed about biologists changing their nomenclature to avoid spreadsheet application auto correct.
People’s standards are deplorable, I’m glad I don’t have to come into close proximity to you lot.
You must have read a different comments thread. Most of the people here are either bashing MS, suggesting how Excel should behave, suggesting that scientists should a database, defending the use of excel in this context, talking about cats, or saying things about scientific research culture.
Reading between the lines, most of the comments here are based upon an unease that scientists should have to change their way of working to fit a tool. That most here aren't expressing their unease as outrage actually speaks well of them.
( As for removing loaded, potentially offensive terms, yeah, I might differ from the majority here, but in the appropriate threads I merely suggest to them that experiments and studies have been conducted and that it's worthwhile looking at them. It's not so much offense that is an issue, but continuously and subtly perpetuating views and assumptions that do not withstand objective scrutiny. )
Cats? What cats? ElReg must have automatically killfiled the cats because The Beaky Ones have decided that that's the way most users want to see the thread.
Try to think about what I just typed in the context of this conversation before taking it literally. Perhaps have a beer while you cogitate. I'll get this round in.
Excel doesn't actually change e.g. 'DEC1' to '01-Dec' when your enter it. That's just what it shows the user. It actually modifies the data into the internal clock tick representation of the date (in this case the number 44166). Because of this, if you reformat the cells to "general" and back to "date", it now reads "01/12/2020", which is even further from your original entered data, but is still 44166 internally..
This has always been the default behaviour for anything that looks remotely like a date. It can only be overridden by opening Excel and then importing the data. At that point you get to specify how it's to be interpreted. We had this problem once when analysing a c. 10,000 entry password dump - almost half the passwords were corrupted by Excel.
However, if you know that has happened, you can import the Excel abomination into something like Alteryx and run a formula on a column that contains string data to replace 44166 with DEC1 again, then output into a useful format.
After all - you do have the data specification for your inputs don't you? - Don't answer that...
Easy to point a finger at researchers for blaming their tools, but they are scientists not developers. It would be better if the tool maker produced a product that made it easier for lay-people to achive what they want without having to go through additional training.
But as another posted pointed out; in this current era of MS making changes to your configuration without proper warning (yeah yeah you publish your changes but to my users that's like changing one digit in a phone book and expecting them to notice) it's a significant risk that if you change Excel to not carry out this particular piece of autoediting MS will reenable it randomly with updates or just for shits and giggles.
MS should consider the implication (but I know they won't) that a significant body of professional workers cannot trust their product to process information which is of massive importance to humanity.
The user interfaces for most of the scientific software I have seen (which is quite a lot) suggest that their practical and aesthetic sensibilities were developed sometime around 1990. While it wouldn't take much effort to update most of these interfaces, their file import/export, and god-forbid their network handling, they will also charge a phenomenal amount of money for updates. For those people who think Office is expensive (because they don't remember when Lotus 1-2-3 by itself was $2000 in the mid-80s), science packages run into the $10Ks with huge hikes to "support" operating system updates.
It doesn't help that nobody made any attempt early on to make CSV abide by any rules at all and then have programs that export in this format stick to them. Excel seems to have a few things it can put in the first line.
I build electronics and more often than not the easiest way to share co-ordinate data for components (aka a centroid file) is using a "CSV" . Exported from the Cad package this usually has some text at the top, author, owner, units etc - standards bodies could easily have agreed a format for that early on. These same programs will happily export fields that contain the separation character or worse, use tabs or spaces combined with excessively long fields that then break any attempt at using Calc or Excels methods of handling them.
And then one you have hacked and chopped at the supplied file so it imports vaguely sensibly you find you've missed the column Calc (yes it does it too) or Excel randomly decides you don't want your leading zeros because it surely must be a number. I don't understand why this sort of thing is only done to how it is displayed not how it is stored so you could undo it. Even a nice checkbox, treat all columns as text by default on the import dialogue or in the settings would be good.
Yes custom tools could do the job better but writing such a tool than also handles all the many ways people/CAD packages format the document is not a trivial job, all these mean the same thing... R1,R2,R3,R4,R5 or R1-5 or R1..5 or R1-R5 or R1,2,3,4,5,
a CSV is a CSV,
if you need more structure just use xml or if its specifically intended for opening in spreadsheets use the ODF spreadsheet format.
there are solutions. use a acsv for sharing CSV data, if it needs more structure then use ODF & structure as you want and need.
use a CSV then don't complain that there is no standard when there are standards you could use.
Your issue is different to those that want to import data into a sheet or the spreadsheet mangles the data on entry/import.
I turn all that auto formatting stuff off. It slows me down and having text flicker and faff about like that just looks too "broken".
It isn't like any typos will get through because they are still highlighted in red.
I also recommend the same with Visual Studio. I see more people fighting with the incorrect autoformatter than actually writing correct code.
This post has been deleted by its author
> "Really I think the issue is that non-explicit auto-conversion of data types is a bad default software behavior."
Absolutely. It should never be a default behaviour to automatically and irretrievably alter the base data based on some half-assed assumptions.
At most, it should be just a presentation layer that doesn't alter the original.
I have a similar issue with putting hex addresses into a spreadsheet. Every one containing a E gets auto-incorrected to an exponent.
Personally I get sick of having to turning off all the autoformat and autocorrect options. I'd like a single checkbox that says "What I typed is what I want. Don't fuck it up".
There is an RFC but it's not really a standard.
I have a regular task to import a very odd "csv" file to a MySQL database. My solution is some task-specific code. The generalised problem, for which I've only found one possible "one size fits all" conversion program (in Python & not tried it) is: identify the peculiarities of the incoming file, identify the requirements of the destination app, do the conversion.
It may be possible to write a file parser to identify the characteristics of the input file but the user would need to enter the requirements of the destination app.
My specific task, the incoming file uses tilde as field separator ("comma" equivalent) tilde being unlikely to appear in the data, then a mix of quoted and unquoted fields which may include quotes, commas, apostrophes, tabs, escape characters and a load of other "surprise" characters.
I read that 20% of a large body of scientific papers on genomics included CSV data that would be misinterpreted by excel
Treats the year 1900 as a leap year because Microsoft’s main aim when they introduced it, was to kill Lotus 123. Lotus (on the PC) replaced VisiCalc on the Apple II who’s epoch was 1 Jan 1904, so VisiCalc didn’t go back that far. Microsoft’s original spreadsheet Multiplan didn’t cut it, so Excel (originally introduced for the Macintosh) was rapidly promoted to run on Windows... and spreadsheet-jockeys attempt to run major business with it...
Credit Card numbers transformed to scientific notation -BTW these were test card #s - We weren't storing real customer numbers in excel.
When loading to excel - excel looks at first 20 rows and "guesses" the datatype so you end up having to load a Dummy row with chars at the top to prevent excel deciding its an Integer column.
Schizophrenic Compatibility Checker Dialog box - did i save it or not?
Dates transformed into numbers based off 1900 -01-01 as 1 - Why?
Non-standard copy-paste rules - you get one shot and then excel helpfully clears your clipboard.
VBA is still the best way to program with Excel - In 2020 really?
C# Interop is "not recommended" for Server code - what's plan B?
Named Ranges - A clue as to the Range names would be nice.
Formulas - can't append more than 255 chars - however CONCAT() works - why
Still no easy way to cycle thru sheets via keyboard (hint CTRL-TAB would be nice)
"Still no easy way to cycle thru sheets via keyboard (hint CTRL-TAB would be nice)"
In mine, CTRL with page up and down does that. it isn't great if you're at one edge and want to go to the other one, but otherwise it's pretty useful. Maybe that will work for your situation?
How many time have we said it: EXCEL IS NOT A DATABASE.
It a wonderfully, full function spreadsheet. It handles spreadsheet data. It does it well, and for most people it is pretty inuitive. I normally bash Microsoft, but Excel is very good.
But it is not a data storage or data transfer tool. We have very good databases (SQL Server or Access if you want to stick with Microsoft - other databases are available).
So, Scientists, use a tool suitable for the job and stop whining when you use a hammer to crack a peanut and your peanut goes to dust.
I'm not sure a database is the right application for just passing tabular data around. If this were someone complaining about importing straight text files into Word and getting unwanted auto-corrections you wouldn't tell them to use a desktop publishing application instead would you? You'd tell them to use Notepad. Here's the problem - there isn't the equivalent tell-it-like-it-is equivalent of Notepad for tabular data.
"there isn't the equivalent tell-it-like-it-is equivalent of Notepad for tabular data."
As I pointed out earlier, EMACS and vi are both perfectly suited for this kind of thing. There are many other editors that can also be
coerced tweaked into handling this kind of thing.
And NO, vi and/or EMACS are NOT too hard for scientists to learn to use for this kind of thing. The subset of commands required to adequately use either is on the order of a couple of dozen, including the basic editing commands. My 9 year old granddaughter learned that much vi in an afternoon (with a hand-written, by her, cheat sheet).
Or are geneticists devolving from their tool using ancestors?
The new guidelines only seem to be available to Nature subscribers
I found out of date info at:
http://www.hugo-international.org/HGM-News last news item 2018
https://www.genenames.org/about/guidelines/ seems to be the 2002 version
Presumable there is also a list of those genes that have been renamed but I didn't find it.
To be useful they need to disseminate the info as widely as possible.
I heard about this years ago:
First thing on a new install of LibreOffice: Disable Autocorrect except for smart quotes.Doesn't fix THIS problem.
I'm actually considering turning off that too as “” those are AltGr V and AltGr B on Linux. Shift for singles.
Though the auto format recognition is due to not setting cells to text, but having the default General. No idea how to change that default on Excel, Gnumeric or LO Calc.
General is an evil default as Indicative numbers may be in a text column and never ever should be treated as numbers, except maybe on some of them to check checksums (ISBN, IBAN, credit cards).
I'm surprised someone hasn't written a standard Macro or VB Script "Open Gene data file", so you just open Excel, press the hotkey to activate the Macro that imports a prompted-for file and does the formatting as part of the Macro. Then that Macro/VB script can be shared out and be a "must-have" 'plugin' for Scientists doing Gene-work.
I hate it when any app tries to guess my intentions, as at least 50% of the time they seem to get it wrong. Microsoft software being probably the worst at it. If it's wrong that often, have you really saved any time when you now must proofread your work carefully not just for your errors, but for ones introduced by the software?
I can't count the number of times I've had to work with a spreadsheet that someone else created and couldn't be bothered to simply 'format cells' and set the correct format. Or just change them to 'text' if no math needs to be done. Computing is still 'garbage in, garbage out' as it has been since the beginning.
... the Excel import wizard tries to make sense of the data in each cell and select (or suggest) an appropriate format. However, data in one column tends to be of the same type (or supertype in an object world). Failing that, it should be 'Text' (a binary blob or whatever the top level class is). I'm not sure what heuristics were used for the conversion. If row 1, column 5 reads 'DEC1' then a reasonable assumption is that this is a date. However if row 2, column 5 is 'BRCA2' then the date decision should be backtracked. My guess is that the spreadsheet type selection is either per cell or based on the first row (or several). This seems to be far too simplistic and a poor assumption on the part of the import wizard writer.
Another example of CSV-to-XLS(x) problems. Was asked to help migrate old data by exporting from their database into a format the new application could read. But the data included a Comment field, and that fields data entry allowed Carriage Returns. Meaning a single record could span multiple lines in the CSV file! They sent a nice letter to my boss about the extra effort developing a "bespoke migration tool" (really just a little PowerShell to stitch the lines back together by changing the extra CR to a flag character).
Back to the main thread, yeah - default should assume incoming CSV columns are text. Then I will assign formatting as needed for display or calculations.
I have used various spreadsheet programs from visicalc onwards. I use a spreadsheet for calculations, light database, forms, and anything else I see fits. For serious heavy lifting database use a spreadsheet will not do. A csv file is not serious heavy lifting database.
I have tweaked Excel to import data without mangling it, but I am a bit more proficient than the average Excel user. Even when I have disabled everything I could find (cannot remember which version), Excel would still present a date in a pulldown which had data that resembled a date.
I still spend way too much time disabling or working around Excel's propensity to alter what I am trying to enter.
For those commentards suggesting some automated solution to this problem, How do you think we got into this mess? The default should be just leave the data as it is.
There was no need to specify a standard for CSV or TAB files. They are quite simple. Each "record" is a line in the file terminated by the system's text file EOL character sequence. Each "record" has the same number of "fields". "Fields" are separated by a comma for CSV or a tab for TAB, (the delimiter cannot be part of the data). Any CSV or TAB file not following these simple rules was invalid. Both gave the ability to have varying length fields and having more compact files. The alternative was fixed field and/or record length which was usually inefficient at storage use, but usually gave better performance and there was no delimiter to worry about appearing in the data.
It will also tend to interpret e.g. 10-12 as a date regardless of any preferences on your side. In my case I was making a list of rack positions for test tubes (rack 10, position 12), and saw it happen as I typed and told Excel to keep its lousy excuse for smarts to itself.
It's not just dates; I had a customer saying "your data is useless" when they had opened the CSV in Excel, just to
check it wasn't corrupted, AND SAVED IT BACK. Thus corrupting it....
All the 64-bit integers were now double floats. Thus removing the significance.
These were 32 bits of latitude, and 32 of longitude, in 1/2^32 units of a full circle.
Handily that resolution is about a centimetre, we were very pleased with that encoding scheme for mapping.....
As someone who is not a programmer, yet spent years using spreadsheets to manipulate and make sense of huge database dumps, the fact that these so-called scientists can't be bothered to take 5 minutes to learn how to format cells during the import process, tell me they should be flipping burgers instead.
They also must be pretty sloppy when it comes to peer review, if his kind of thing is rampant in the field.
> can't be bothered to take 5 minutes to learn how to format cells during the import process
From the users' perspective there is no import process.
User looks at file in explorer - it's an Excel file because Excel has registered itself to handle .csv files
User opens file - Excel silently corrupts random values
User saves file - Data is now permanently corrupted
PS. I took four goes to write ".csv" without my browser correcting a perceived typo.
For the millionth time, this is a spreadsheet analysis tool, not a Relational Database Management system.
This article reads like "All the bus passengers are complaining about the lack of room on my scooter!"
Besides that, you can select an entire column in Excel and format the cell to a specific type of data (like TEXT data, instead of letting Excel assume a date type).
Leave it up to scientist to fail basic spreadsheet training. ;-)
Biting the hand that feeds IT © 1998–2020