Lotus Reader - The Hackernews Client

Lotus Reader

If you click (or tap) on the name of a parent in a discussion, you will be directed to the comment that the parent wrote.

Excel warriors who save governments and companies from spreadsheet errors

mortenlarsen

5 years ago

258

186

https://www.wired.co.uk/article/spreadsheet-excel-errors

celias5 years ago

Matt Parker's "When Spreadsheets Attack" video about EUSpRig

https://www.youtube.com/watch?v=yb2zkxHDfUE

hermitcrabcelias5 years ago

Matt Parker's 'Humble Pi' book is also well worth a read: https://mathsgear.co.uk/products/humble-pi-signed-paperback

OscarCunninghamcelias5 years ago

Also related, his recent video about the UK government's loss of COVID data (featuring a fantastic parody ad for Spreadsheets™): https://www.youtube.com/watch?v=zUp8pkoeMss

vlucas5 years ago

Love this. Glad to know there is a whole established group of other spreadsheet nerds out there!

I made BudgetSheet ( https://budgetsheet.net ) for myself and others who love to work in their own spreadsheets!

NoPicklez5 years ago

I work at a large firm which has shifted itself from using Excel, to significantly promoting data analytics tools such as Alteryx in place of Excel.

One of the MAJOR improvements from using Alteryx is that it is easier to audit and review the process. With Excel you can be down the rabbit hole of data manipulation or formula driven outputs and realise you have made a mistake but cannot find where or how. Or cannot find where you went wrong without recreating the steps followed. The workflow style interface of Alteryx and other data analytics tools allows everyone to see the process from start to finish of how you are getting to your outputs. It can also ingest and handle much larger datasets than default Excel.

I honestly believe that we will see a large shift from using Excel to using better suited analytics tools. I cannot think of anything I can do in Excel that I can do in a GUI based data analytics tool. Not to write Excel off completely as will certainly still be using it, but to a lesser extent and not where complex data manipulation is required.

vikramkrNoPicklez5 years ago

The problem is that everyone has and knows how to use excel. And enough people in finance and consulting see excel skills as a mark of status that there's going to be some resistance to change there. A new tool has to be equally universal, easy to use for basic purposes while having advanced capabilities, etc. There are courses that teach supply chain management with solver and the like - having people shift over from the only software they were trained to use is not trivial. It was easier for geneticists to change gene names than change excel behavior in the end for a reason - excel is powerful and everywhere

ecpottingervikramkr5 years ago

Not only have I seen spreadsheet used for function they were never intended for like using it as a text notepad. The main problem since spreadsheets are easy to set up, people create them without asking even the most basic safety functions.

Testing the range on inputs or outputs is not done enough, but worse using the wrong functions can give a sheet that looks good but is very wrong.

NoPicklezvikramkr5 years ago

I'll stop you right there when I say that the firm I work at is a global finance and consulting firm and has globally started shifting to using data analytics tools such as Alteryx over Excel.

The barrier to entry into Alteryx is nowhere near as difficult as say python or R for finance teams and consultants.

It is also seen as a mark of status to master a data analytics tool which improves a particular process either within the company or at a client.

1123581321NoPicklez5 years ago

Alteryx appears to be considerably more expensive, though. Shouldn’t it take a tool of comparable cost to replace Excel? Especially in self-serve use.

I’m not disputing the quality of the tool, to be clear. It seems like a good change and a totally reasonable line item. Just questioning what it will take to actually replace Excel in a widespread way.

https://www.alteryx.com/products/platform-details/pricing

iagovar11235813215 years ago

The answer to that is Knime. Not as polished, but still way better than Excel.

leetroutiagovar5 years ago

That is a very interesting tool.

I’m always interested in visual programming tooling. Thanks for sharing.

hermitcrab11235813215 years ago

My own Easy Data Transform software (https://www.easydatatransform.com) does a fair amount of what Alteryx does and is just $99 (one time fee). So cheaper than Excel!

pyrominehermitcrab5 years ago

How is this better than power query? I know this has a visulaization of the transformations, but that's on the roadmap for PQ.

hermitcrabpyromine5 years ago

Apart from the visual UI (which is a big deal, especially if you have lots of transforms): -wider range of input and output file formats supported (xls/xlsx, fixed width, json, xml, csv, tsv, yaml, html, markdown, vcard) -much faster -45 transforms to choose from -written from scratch for the purpose, rather than grafted on to an existing product -doesn't mangle your data like Excel does!

claudeganonNoPicklez5 years ago

Also, Excel is just buggy enough to screw you over anyway, even if you’re trying to use it safely. I can’t count how many times valid formulas have failed on me across various instances of spreadsheets. Or I have to save, reload, or restart just to make things work. I’m constantly amazed by how much it’s used for business critical functions while being so routinely unreliable.

swileyclaudeganon5 years ago

Most business people don't value and appreciate correct software. It's odd but shouldn't be surprising that most people are unable to effectively evaluate the risk of a software defect, especially in a tool like excel.

craftinatorswiley5 years ago

I don't think it's that odd. Marketing in general is the art of ignoring edge cases, focusing on majority tailoring. Why would I care about losing 1 customer due to changing our branding, if it gains me 1000?

This is why business is not considered a form of engineering, while software is (just barely). Imprecision is a tool in business, while in engineering it is an enemy.

Spooky23NoPicklez5 years ago

Getting away from using excel is something you demand that everyone else do.

cosmieNoPicklez5 years ago

Excel has similar capabilities (since Excel 2010) in the form of PowerQuery[1].

It can pull in 3rd party data sources (including custom HTTP calls if you want to reach out to an API), has data types, can both process and store substantially larger data volumes (data is cached/stored separately from the spreadsheet data in a compressed, columnar format and processed by a separate data processing engine), and every step of the data processing pipeline is shown and easily referenced/reviewed (including the ability to leave comments in the code).

That said, PowerQuery is a graft onto Excel, originating in the SQL Server world. So "Excel" skills aren't very helpful/transferable; it leverages Microsoft's M language[2] instead of Excel functions, and the usage paradigm is far more natural for someone with a programming background than an Excel background.

But as someone else mentioned in another comment, Alteryx is anything but cheap. PowerQuery is already widely distributed and available to anyone with Excel installed[3]. So for anyone looking for this type of operational discipline around data, it's a handy tool to pick up. Bonus points that PowerQuery/M-code is portable between a variety of Microsoft tools (SQL Server, PowerBI, Excel, and a variety of Azure's data services).

[1] https://support.microsoft.com/en-us/office/power-query-overview-and-learning-ed614c81-4b00-4291-bd3a-55d80767f81d

[2] https://docs.microsoft.com/en-us/powerquery-m/

[3] Traditionally only available for Windows, but the base tech has recently been added to Excel for Mac and is being brought up to parity.

hermitcrabcosmie5 years ago

I tried doing some joins in Power Query and was surprised how slow it is.

cosmiehermitcrab5 years ago

Join performance is a bit of a hairy topic.

If you're pulling data from an external source that supports query folding[1] such as a database, it'll try to push those joins to the original source.

If you're pulling in data from a source that doesn't (or the join occurs after query folding is no longer possible), there are a host of performance considerations/optimizations.

By default PowerQuery knows nothing about your source data and makes no assumptions. If you're joining data using the GUI, the generated code will likely use the NestedJoin[2] function. A single "step" in the processing will have its ram capped at 256MB[3], so depending on the size of the data you're joining and absent of any data processing steps that would give it guarantees about your data, you may or may not be paging massive amounts of data to disk as it loops through the entire dataset you're joining for each row that is being joined.

That said, PowerQuery has 6-7 join algorithms available, and a variety of techniques that can be used to optimize the processing. For example, if you add a primary key to the table (via Table.AddKey or Table.Distinct or Table.Group), it can short-circuit the data processing the moment it finds a match. This[4] article series is really helpful for optimizing joins in particular and the link-outs and tidbits scattered within the series are fantastic for getting a better understanding of PowerQuery's inner workings and performance considerations.

[1] https://docs.microsoft.com/en-us/power-query/power-query-folding

[2] https://docs.microsoft.com/en-us/powerquery-m/table-nestedjoin

[3] Mentioned halfway down this article: https://blog.crossjoin.co.uk/2019/04/21/power-bi-dataflow-container-size/

[4] https://blog.crossjoin.co.uk/2020/05/31/optimising-the-performance-of-power-query-merges-in-power-bi-part-1/

blendocosmie5 years ago

From the M-code link: “A core capability of Power Query is to filter and combine, that is, to mash-up data from one or more of a rich collection of supported data sources. Any such data mashup is expressed using the Power Query M Formula Language. It's a functional, case sensitive language similar to F#.”

Does M-code serve as a replacement for (the abysmal) VBA? And how similar is it to F#?

contextfreeblendo5 years ago

It's not really comparable to VBA, as it's not used to automate/script the Excel object model. There is a JavaScript scripting model in recent releases which at least overlaps in use cases with VBA but I'm not sure if you can use it to do everything that VBA can.

LegitShadyblendo5 years ago

M code is the query language mostly used for getting the data into power query and transforming it/cleaning it. The replacement for VBA for power pivot (as well as analysis services and power bi) is DAX (Data Analysis expressions). I've never used F# before.

cosmieblendo5 years ago

> Does M-code serve as a replacement for (the abysmal) VBA?

Sort of depends. M (and PowerQuery in general) is fundamentally about data processing. If you were only dropping into VBA for ETL purposes, then chances are you can use PowerQuery/M instead. With the advantage that it exists in a standard xlsx file, rather than the macro-enabled xlsm files that are more heavily restricted in many environments. It's not intended for more general purpose tasks, nor can it do stuff like interact with the Excel object model directly, so you'd still need to use VBA (or if targeting Excel 2016 and above, Javascript[1]) for that.

PowerQuery/M is basically a sidecar app within Excel, and doesn't have access to nor is aware of the Excel object model. Excel.CurrentWorkbook[2] is just a convenience function available when used within Excel, and it treats data from the current workbook the same as if it was pulling from an external workbook or csv file or web API or any other source. To the extent that even when solely accessing data from within the same workbook and not calling out externally at all, it prompts users with an "Enable External Connections" warning to use. You can then either have the output of that sync to a Table object within Excel (and easily/generally available for use), or made available from within the data model (only available via the object model or pivot tables).

That said, the data access functions[3] are fairly expansive, and you can get pretty creative in how you (ab)use them. For example, Web.Action is one I've abused often: pinging a web service to track usage, triggering external jobs, and exporting/syncing data to an external service. Although those last two uses require a bit of complexity in architecting, to account for cases when it gets manually refreshed/triggered in quick succession or with duplicate data.

> And how similar is it to F#?

I'm not familiar enough with F# to be able to answer that. But if you are, you can probably find out pretty quickly via this[4] article series. PowerQuery/M is used across a variety of Microsoft products, and most resources around it are akin to product-specific "how-tos". Ben's primer series is the only source I've come across that approaches it from a software engineering perspective.

[1] https://docs.microsoft.com/en-us/office/dev/add-ins/reference/overview/excel-add-ins-reference-overview

[2] https://docs.microsoft.com/en-us/powerquery-m/excel-currentworkbook

[3] https://docs.microsoft.com/en-us/powerquery-m/accessing-data-functions

[4] https://bengribaudo.com/blog/2017/11/17/4107/power-query-m-primer-part1-introduction-simple-expressions-let

jrumbutcosmie5 years ago

Thanks for this great explainer. I discovered PowerQuery almost be accident (trying to just do a simple join).

The capabilities were nice (and according to your posts more extensive than I know!) but the interface was so beyond counterintuitive, no discoverability, so unlike any of the variety of environments I've encountered in 10 years of programming, and the performance was painfully slow (compared to an RDBMS).

I have hopes for MS moving Excel to a less unique interface (I heard smooth scrolling is coming!) and I will definitely be reading your links.

cosmiejrumbut5 years ago

Yea, discoverability is a pain. Although importing data from a CSV file has migrated from their obtuse "Text Import Wizard" to PowerQuery in the most recent versions of Excel, so more people will be stumbling on it.

And also yes, the Power Query Editor interface sucks. It's intended to be intuitive for Excel users, which inversely makes it a bit counterintuitive for someone coming from a dev background. The GUI functionality is also pretty limiting and only exposes a fraction of the overall capabilities. That said, there's an "Advanced Editor" button in the Ribbon that'll open up a modal that's far more familiar for a dev; shows you all of the code/Steps for the query all at once and has IntelliSense for code completion. Still crude as far as a dev environment goes, but far more productive than flailing around in the GUI the whole time.

Since you're coming from a dev background, I'd highly encourage that M primer[1] I referenced. It's a crash course on everything that's relevant to a developer (type system, error handling, patterns, antipatterns, custom functions, execution logic, etc). Between that and glancing over the standard library[2], you can avoid using the GUI entirely except for when it's convenient (such as scaffolding out some tedious stuff to then clean up in the Advanced Editor).

> and the performance was painfully slow (compared to an RDBMS).

Also of note is that Excel is only one of the data sources PowerQuery supports. There are a ton of others[3], including many RDBMSs natively supported and generic ODBC support if you have the appropriate driver installed.

[1] https://bengribaudo.com/blog/2017/11/17/4107/power-query-m-primer-part1-introduction-simple-expressions-let

[2] https://docs.microsoft.com/en-us/powerquery-m/power-query-m-function-reference

[3] https://docs.microsoft.com/en-us/power-query/connectors/

benhurmarcelcosmie5 years ago

> PowerQuery is already widely distributed and available to anyone with Excel installed

Looks like you need to install it separately as a module, it's not included with base Excel. So every user needs to download it, and have admin rights to install it. That makes it a non-starter in many of the corporate use-cases.

ourlordcaffeineNoPicklez5 years ago

Same where I work. Excel was replaced completely with R a long time ago. Even automated pdf reports are made using R programs. Yet people are often shocked to hear it at first.

ekianjoourlordcaffeine5 years ago

Same here, moving away from Excel with R and Python for pretty much everything.

ZeroCool2uNoPicklez5 years ago

As someone who has personally had to use Alteryx in multiple projects, fuck Alteryx. What an absolute garbage piece of software.

I've already written a rant about it before and I don't have the energy to repeat myself, but do not fool yourself into thinking switching from excel to Alteryx is doing yourself any favors. You're just trading one monster for another.

Save yourself a huge amount of money, not to mention your sanity and just take the time to learn some Python, Julia, literally anything else to get the same results faster, more reliably, and not be locked into that noveau Oracle-esque nightmare.

kfkZeroCool2u5 years ago

Alteryx was my first step into analytics and it helped me get 2 heads funded for my team. But yeah at some point I arrived to your same conclusion and moved my team to python, we never looked back. One thing people don’t realize with these UI tools is that there is no way to use a version control system and I nowadays I cannot think of doing any analytics (including basic BI) without git.

hermitcrabkfk5 years ago

Why can't you version them? Do they all use binary formats?

kfkhermitcrab5 years ago

Alteryx files are xml files but the point is change management is hard to do. You need to compare each single element in the UI to understand what was changed between versions. It's a lot like Excel in that regard.

hermitcrabkfk5 years ago

I guess diff'ing 2 xml files isn't ideal to see what changed. Do you have a better solution in mind?

ZeroCool2uhermitcrab5 years ago

Technically, you can diff 2 XML files, but Alteryx will make slight alterations to the xml literally everytime you open it, even if you make no changes and it's a huge amount of spaghetti XML that is absolutely not meant to be parsed by human eyes.

hermitcrabZeroCool2u5 years ago

Thanks for the clarification.

hermitcrabZeroCool2u5 years ago

What was it about Alteryx you didn't like?

NoPicklezZeroCool2u5 years ago

Interesting, may I ask for some examples?

My point is moving from Excel to Alteryx, now the average population of people that uses Excel would not have the credentials to be able to stop using Excel and move to python to do the same job. I'm focusing on non-technical/programmers.

wortelefant5 years ago

The European Spreadsheet Risks Interest Group (EUSpRig) runs an annual conference [...] — and there’s also a Yahoo Groups mailing list, where members offer tips and tricks, share links to resources and pick apart press coverage of the contact tracing debacle

When Yahoo Groups shuts down, no more excel repair for governments?

orthoxerox5 years ago

Excel took a step in the right direction when it introduced tables, which autofill formulas for all its rows (seriously, use them!), but it should've made deviations more obvious in them. Right now it's the usual green corner, which is extremely easy to miss in a large data set. This, and explicit column types would prevent maybe 80% of spreadsheet errors.

Already__Takenorthoxerox5 years ago

You know, for all of the rebellion over windows telemetry they probably do have the metrics somewhere to find out if your correct.

cosmieorthoxerox5 years ago

Technically you do have column types already[1], although only available via the data model[2] (another wholly underutilized feature set).

Although it only helps to an extent, since it doesn't prevent bad data from being input into a table, and supports so many implicit conversions (listed out in [1]) that undesired behavior is still possible. Plus it enters into a little known area of Excel rather than baked into the customary workflows.

That said, it still comes in handy. A typical pattern of mine for worksheets I have to release into the wild (i.e. no telling who will end up (ab)using it and when it'll eventually come back to me in a bastardized form to salvage) is:

1. Create table(s) for raw/user-inputted data

2. Use that table as a source for PowerQuery

3. Leverage PowerQuery for type checking, data normalization validation, and data joining.

4. Output to a new table on the worksheet, referencing this instead of the original table for all formulas.

Also optionally putting complex logic into the PowerQuery steps, that way the "output" table is ready for end-use and doesn't require any complex and error-prone formulas or anything.

Even without leveraging the data model and DAX functions, this works out super well. More sophisticated users can easily modify the PowerQuery steps, and less sophisticated users are protected from accidental mistakes by either using it as-is or reaching out for help earlier on, before there's a crisis for assistance.

[1] https://support.microsoft.com/en-us/office/create-a-data-model-in-excel-87e7a54c-87dc-488e-9410-5c75dbcb0f7b

[2] https://support.microsoft.com/en-us/office/data-types-in-data-models-e2388f62-6122-4e2b-bcad-053e3da9ba90

clairitycosmie5 years ago

> "1. Create table(s) for raw/user-inputted data"

you can still be exposed to data munging on input/import, due to all sorts of formatting issues (like zip codes losing their leading zeroes or invalid phone numbers due to misaligned formatting).

cosmieclairity5 years ago

Very true, and that's why I prefaced my comment with "only helpful to an extent".

In some cases, data validation[1] on the cells directly can help. Particularly if the data is going to be manually entered by the end user, rather than a bulk copy/paste job from elsewhere.

In other cases, you apply defensive checks within the PowerQuery processing. If a field is supposed to contain US-formatted phone numbers, you can explicitly set the column type to text, split at dashes and keep the first part, verify it only contains numerical content, optionally check for non-sense values (all zeros, all nines, 12345, etc), then left pad it with zeros to ensure they're all 5 digits. You can also package up all of those steps into a function that can easily be added into future worksheets via copy/paste and save you from re-creating the wheel. And depending on the data itself, it may be handy to call out to a third-party API for validation/normalization.

To maintain sanity, you can also create a helper column for validation. If any of the columns can't be "safely" normalized or validated, you update this field with details on what failed. At the end, split out any records where that column is not null into its own table. That way you both filter those records out of the "final" table, and can also easily add in an Exceptions Report section to the spreadsheet, clearly calling out the rows that had issues and providing the error details to the end user.

It sounds complicated, and it is compared to the alternative (Excel's undefined behavior silently swallowing it). But it's actually incredibly straightforward to implement and a fairly natural design pattern for someone coming from a software development background. You just start treating Excel as a UI and PowerQuery as a backend. All of the business/processing logic gets decoupled into PowerQuery, and formulas become primarily tools for presentation logic (such as adding an an Exceptions notice to the primary worksheet if COUNT(EXCEPTION_TABLE)>0).

[1] https://www.howtoexcel.org/tips-and-tricks/11-awesome-examples-of-data-validation/

moomin5 years ago

I used to professionally audit spreadsheets that were used to back up million dollar decisions. I say “back up” because it was fairly clear the decisions were made independently. But they needed to tick a box saying they’d had it audited so they paid us for a two day audit. In that limited time, which included writing the report, you’ll appreciate we rarely had time to audit anything more than the top-level summary sheet.

There was always at least one mistake that put the numbers out by 8 digits.

imperfectcatsmoomin5 years ago

Is your job to help the justify the justification, or to really fix things?

thaumasiotesimperfectcats5 years ago

I assume you fix the math errors and then whatever the result is turns out to be the justification for the decision. The result doesn't matter, so you don't need to aim at a particular one.

moominimperfectcats5 years ago

Neither. You report what you found. You make recommendations. You state which areas you haven't had the time to look at. Under no circumstances do you provide any guarantees that might put you at the wrong end of a multi-million dollar lawsuit.

This, incidentally, is every audit ever. I've been audited many times, every audit finds stuff and make recommendations. Then management gets to pick which recommendations they adopt and which they'll derogate. I'm not saying it's a completely useless process, but it doesn't provide the cast-iron guarantees people think it does.

Then, of course, there's financial audits, which are meant to provide a cast-iron guarantee the firm is still a viable entity. But nearly every big firm that went bankrupt had passed such an audit in the previous year.

xivzgrevmoomin5 years ago

I had no idea this was a thing, but makes total sense!

conductrmoomin5 years ago

QOE drove me nuts in a past life. M&A of small companies with crappy financial data. Biggest caveat is basically garbage in and garbage out. But often they already had the deal done and would just feel better if I told them the garbage didn’t stink.

moominconductr5 years ago

One of my best behaviours for actually writing one of these things is to have a single page that only has inputs/assumptions on it and put an annotation/note on every last data item/series saying where you got it from, because someone on the warpath will want to know where you got that BS number from two months later, and it's very satisfying to show it was them.

conductrmoomin5 years ago

Agree. That’s a must. You have to cover your ass when it comes to this type of work. It’s high stakes enough that I save off copies of emails (.msg) and create a Sources folder for these, spreadsheets, and I’ll even do a screen capture if I have to login to some system and pull the data myself. I am usually given the parameters on what to pull, and it’s very easy for someone to accuse me of pulling the data wrong a month or more after the fact. I did this on every analysis. I can relate to what your saying. Two months later something like this always happens;

What was you assumption of growth?

Infinity.

What?! That’s ridiculous why would you assume that?

forward back an email you sent to me when I pointed out the growth rate seemed high but you insisted it was a slam dunk (aka the only way the deal would go through and we’d get that bonus)

At times I feel like I put so many footnotes in the resulting presentation later the information should be interpreted as meaningless.

jgilias5 years ago

Excel is a great case study. Turns out, if you give people an accessible and widely available Touring complete environment, they will manage to do pretty much anything with it.

On another note, it's interesting to observe that the problems spreadsheets have are the same problems that plague poorly designed software everywhere - lack of documentation, lack of testing, lack of input validation, among others.

hermitcrabjgilias5 years ago

One of the big issues with spreadsheets is the lack of visibility. It is very hard to tell what is going on in any non-trivial spreadsheet. That something that visual data flow tools such as Alteryx, Knime and (my own) Easy Data Transform try to address.

The other problem is specific to Excel. It just loves to mangle your data! The problem is so bad that geneticists have renamed some genes to stop Excel reinterpreting them as dates.

askvictorhermitcrab5 years ago

Not just Excel; Google Sheets too. Our school system has student codes in the format ABC0001 where ABC is the first three letters of the surname, and the number is sequential. So a student with a surname Martin might get a code MAR0004. Which Sheets, in it's wisdom, decides to interpret as the date March 2004. Go figure.

sciurusaskvictor5 years ago

Dates can be weird just about anywhere.

I recently hit a bug where all data was being checked to see if it was a datetime, and python interpreted `st1` as September 1st of the current year.

https://github.com/mozilla/frost/pull/330

ptxsciurus5 years ago

Not really Python as such, it seems, but rather some third party library[1] which promises "generic parsing of dates in almost any string format". This is impossible to do correctly (what is 01/02/03?) and usually a bad idea to attempt.

[1] https://pypi.org/project/python-dateutil/

Moruptx5 years ago

There is actually some standards here at least. 01/02/03 has one order, 01-02-03 another. The character between tells you how to read them. Not that most people know how though...

fauigerzigerkMoru5 years ago

>01/02/03 has one order

It has one order in the U.S and a different one in most other countries.

icebrainingsciurus5 years ago

Can't say I blame python or even the library; why is the code even trying to convert every string in an object, instead of just the keys that are supposed to contain dates?! Seems like a poor hack, imho.

Sharlinaskvictor5 years ago

Excel being the de facto standard, Sheets has to try to be compatible with it, quirks and all.

2fast4youSharlin5 years ago

Speaking of quirks, Excel still incorrectly assumes the year 1900 is a leap year. Even Excel has to try to be compatible with Excel.

https://docs.microsoft.com/en-us/office/troubleshoot/excel/wrongly-assumes-1900-is-leap-year

eru2fast4you5 years ago

Microsoft even tried to get that quirk into the official standard.

phonon2fast4you5 years ago

You think that's bad...try to use a date function with a year before 1900...

tijsvdaskvictor5 years ago

The age-old workaround for this used to be to type an apostrophe in front of the value, which forces Excel to interpret as text. IIRC that used to work in Google Sheets as well. Doesn't that trick work anymore?

qayxctijsvd5 years ago

It does, but people are uneducated and think that just because something runs in the browser, it doesn't require training and knowledge to use properly...

watwutaskvictor5 years ago

You can set the format of the cell.

johnrgracehermitcrab5 years ago

A well designed high complexity spreadsheets are pretty easy to tell what's going on. The problem is most people don't put thought into how they design their spreadsheets. That's not their fault because there are very few resources that teach people how to design a complex spreadsheet well.

If you have not seen a spreadsheet with say dedicated data input tabs, calculation tabs where you can see and audit the work being done, output tabs, revision notes, summary sheet you haven't seen a good spreadsheet.

onlyrealcuzzojohnrgrace5 years ago

I have literally never seen this. Do you have any links?

setronlyrealcuzzo5 years ago

I've seen quite a few of these, but only in the context of video game equipment/leveling optimization sheets...

setrhermitcrab5 years ago

More problematic tbh is that excel doesn't really let you configure it for your setting: ideally you should be able to set it up with defaults suitable for genetic scientists, and then spread that configuration throughout the group.

Or if it does have this feature, no one knows how to access it

munk-ajgilias5 years ago

This is pretty much the same reason that PHP is much maligned - it certainly has gotchas but they're not so much dangerous than, say, the C++ gotchas... When everyone is able to "have a go" at a tool they'll all abuse it in their special way to get things done and good on them for doing it since if every data problem required an engineering team to solve everyone's business would grind to a halt.

erumunk-a5 years ago

You are right about PHP vs C++. But C++ is too easy a target.

I dislike PHP just as much as the next guy, but I am quite impressed with how Hack turned out. It's Facebook's improved PHP. See eg https://hacklang.org/

It's not that Hack is a good language in absolute terms. But I think it's about the best language they could have come up with when starting from PHP.

(For a better comparison instead of C++, but still in similar spirit to C++, perhaps take D.)

ianhornjgilias5 years ago

It’s an extreme end of the spectrum where static, restrictive, but harder to screw up is one end, and weak, dynamic, permissive, but easy to screw up is on the other.

For what it’s good for, it’s great. I love it because you can do many things faster with it than anything else, and the immediate feedback and visual data flow aren’t matched in any language’s IDE I’ve seen. But like all systems, when they grow they need checks. Like the trope about when your code grows to too much complexity, you start needing static types.

Fejjgilias5 years ago

It's not even just that Excel is Turing-complete. It's that Excel is widely used and is its own programming language with a comprehensive standard library, and I don't mean VBA - entering the = character in a cell opens up a massive library of functions which can be used in a fashion sort of similar to functional languages, since everything is nested functions.

There's branching. There's Boolean operations. There's variables (cells, which is ultimately the point of the entire exercise). There's even a switch construct. There's even really simple web requests! All of these are of course functions. The only thing that's missing is function definitions (which can be done with VBA, but I excluded that specifically for the sake of this exercise).

It's not particularly pretty, but is usable by almost anyone, has good documentation, and a huge community of like-minded power users. As you mentioned, Excel doesn't have any of the constructs which help developers for other languages to avoid mistakes. The simple syntax, users which are not necessarily aware of good practices, and a lack of methods to implement such practices produces a perfect storm of factors to introduce bugs.

crispyporkbites5 years ago

Excel is a triumph of software. Never before has there been a REPL so powerful, a programming language so widely used, an environment so simple even mere mortals can produce magic - without it what would we have? The power of software would remain closed behind the Ivory gates of the self-titled engineers with development cycles, sprints and countless hours to make a single formulae change.

Fear the excel less world, for without it society is a shadow of its potential!

smabiecrispyporkbites5 years ago

I don't actually like Excel, and think it could be significantly better, but in general, I agree with you. Excel has allowed users with domain specific knowledge to create real honest to god software cheaply and quickly.

The biggest triumph of Excel is that you can incrementally learn new things, slowly turning a basic worksheet into a Turing complete masterpiece. In contrast, I believe that programming languages have a much higher barrier to entry.

wintorezsmabie5 years ago

I would like to argue that Excel (i.e. spreadsheets) are a form of programming language; albeit in tabular format. Each cell is a variable, some cells are constants, some cells are computed, cells can be conditional, etc.

A programming language must not necessary come in form of written text. We can say that concept of spreadsheet is the programming language, and Excel, Google Sheets, etc. are just implementation of the same concept.

The success of spreadsheet is in the fact that its visual and intuitive, similar to what made Visual Basic and Delphi very successful.

tuatorucrispyporkbites5 years ago

> without it what would we have?

In an ideal world, we would have pgsql (the PostgreSQL front-end) and AWK, and people who realize that thinking about the model before you start playing with layouts and copy-pasting formulas is a good idea.

Edit: Excel is a Dunning-Kruger syndrome enabler. People think they can do data analysis because they can type "=sum(B1:B4)" into a spreadsheet.

qayxctuatoru5 years ago

We seem to have wildly different definitions of the term "ideal"...

hermitcrabtuatoru5 years ago

I last used AWK about 30 years ago. It seemed clunky then. Does it still output "Bailing out at line 1" regardless of the error?

7952tuatoru5 years ago

Just coincidentally I spent most of the last week converting a spreadsheet to Postgres. It is interesting how similar some of the formula functions are to SQL functions. All the functions are standalone and there are no methods on top of objects/types.

DavidPeiffer5 years ago

Excel suffers from being accessible to "everyone", and having a low barrier of entry.

While I'm not an auditor, I've been the go-to Excel person at all my internships and in full time roles. I've reviewed, enhanced, and fixed hundreds for tons of different functions (HR, supply chain, manufacturing production reporting). You can build a respectable system in Excel which has built in error flags that make it obvious to the end user. Here are some of my tips for people who deal with spreadsheets:

* Cell styles, much like headings in Word, are really great but nobody uses them. They clue end users about what cells do. If my end users see an orange cell, they know it's an input they can modify.

* Lock sheets, even without a password. 90% of the time will prevent people from breaking the spreadsheet in the first case.

* Stay up on new functions! =Unique, filter, sort, Switch, textjoin, and {sum|average|min|max}ifs have been added in the last ~18 months and can greatly clean up messy formulas.

* Document stuff like code should be documented. The Explanatory Text style is excellent for more in depth explanations but not visually clogging things up.

* Use tables. Name the tables. =Average(tbl_Production[Widgets Produced]) is self documenting, and the range expands with the table. Range references are notoriously obtuse.

The European Spreadsheet Risks Interest Group has some great information on best practices.

http://www.eusprig.org/best-practice.htm

Joel Spolsky's "You Suck at Excel" is also eye opening for most.

https://youtube.com/watch?v=0nbkaYsR94c

razakelDavidPeiffer5 years ago

>Excel suffers from being accessible to "everyone", and having a low barrier of entry.

I call this "baby with a nailgun syndrome". You end up with a bloody mess.

airstrikerazakel5 years ago

Love it. I'm in Excel ~50+ hours / week and will definitely start using this sentence

jen729wairstrike5 years ago

Fifty hours a week? You might need to think about working less.

qayxcjen729w5 years ago

...or becoming more efficient - who knows?

airstrikeqayxc5 years ago

In Excel? Impossible.

airstrikejen729w5 years ago

I work this much out of my own volition and I am happy to do so

m463jen729w5 years ago

He could be balancing his player stats.

no_wizardDavidPeiffer5 years ago

I am just completely amazed but yet not surprised there is an entire organization dedicated to spreadsheet risks and best practices. What a world we live in. This just goes to show what the killer app is for computers, it seems.

We have these wonderful machines and 80% of people use them for browsing the web/using web apps and producing spreadsheets. Thats the reality I tend to forget sometimes.

wintorezno_wizard5 years ago

Spreadsheet and word processor were THE killer apps of microcomputers. As long as humanity uses letters and numbers, we will have spreadsheet and word processor in some shape and form.

dwaltripDavidPeiffer5 years ago

> * Lock sheets, even without a password. 90% of the time will prevent people from breaking the spreadsheet in the first case.

Is there any way to lock all cells in sheet except for certain input cells? That seems like it would prevent many errors.

gegtikdwaltrip5 years ago

yes. Go to cell properties and you will find there is a "locked cell" property that is on by default.

When you Protect a spreadsheet, all Locked Cells become... locked.

So a typical use case is to only un-lock the cells you wish people to enter variables into, (color the cell to make it clear), and then protect the sheet so nothing else can be tampered with.

nhebbdwaltrip5 years ago

I'm reluctant to pimp my own product on HN, but one of the features of Spreadspeed [1] (my Excel add-in) is a Quick Protect tool which protects the sheet but will unlock individual cells based on styling (e.g., Input style).

https://www.breezetree.com/excel-utilities

jen729wdwaltrip5 years ago

The problem is that someone creates a sheet, locks all the cells, forces a bunch of people to use it via some process, and then either a) never touches it again or b) leaves the organisation and doesn’t tell anyone the unlock password.

Example: a lookup list for “hardware models” which is protected. So you can’t add a new model, but some cretin forces you to keep using the sheet.

With great power comes great responsibility.

foucDavidPeiffer5 years ago

>* Cell styles, much like headings in Word, are really great but nobody uses them. They clue end users about what cells do. If my end users see an orange cell, they know it's an input they can modify.

It would be interesting if excel had some sort of internal markup language perhaps. So those orange cells could be semantically represented as <input> or some such.

sohkamyung5 years ago

> there’s also a Yahoo Groups mailing list, where members offer tips and tricks, share links to resources and pick apart press coverage of the contact tracing debacle.

I do hope the group is aware that Yahoo! is closing down Yahoo Groups and have moved their mailing list

milesvp5 years ago

Obligatory link to excelint/checkcell threads.

https://news.ycombinator.com/item?id=20491091

https://news.ycombinator.com/item?id=22431500

https://arxiv.org/abs/1901.11100

One of the interesting takeaways from Emery Berger's talk, is that he was able to replicate to 2 decimal places a published spreadsheet transcription error rate of 5.26% of cells have errors.

I highly recommend the talk. https://www.youtube.com/watch?v=GyWKxFxyyrQ

F_J_H5 years ago

To riff on a quote originally about unix:

"Excel doesn't stop you from doing stupid things because then it would stop you from doing smart things."

m3nu5 years ago

Not surprised. I modeled company finances in my first internship. We would check the numbers for plausibility, but I'm certain most were still wrong.

For anything serious, I'd probably choose Python/Pandas/Jupyter these days for better reproducibility.

SMAAART5 years ago

Recently switched jobs. In both my old job and my new job we use spreadsheets for data-driven-decision-making.

At my old job I was driving to migrate from classic spreadsheet analysis to Google Data Studio, and I am doing the same at my new job.

There's some resistance in old dogs not wanting to learn new tricks, but for most of what we do, GDS is really impressive.

racecar7895 years ago

About Excel visual grid performance...I have tried many data table libraries over the years in various languages. Almost all have performance issues when reaching 100-1000+ row range (even with virtualization and other tricks enabled). I would like to know how the Excel team made their grids so fast. Scrolling is so smooth.

gmuecklracecar7895 years ago

MS Office was updated to use GPU accelerated rendering and I guess this has something to do with it. I don't know how this works (which API, which portions of the software, which algorithms) as details on that seem to be pretty sparse.

sjygmueckl5 years ago

I don’t think so – Excel has felt significantly slower to me ever since they introduced the GPU “accelerated” interface.

racecar789sjy5 years ago

Same here. From around Excel 2007 - 2013... activating the GPU acceleration option slowed Excel way down. I remember turning off the setting many times during those years to get better performance.

Fast forward to 2016-today...Microsoft fixed the GPU acceleration bugs.

nerdponxracecar7895 years ago

I recently opened a 9000 row sheet in LibreOffice Calc on an x220 and had no performance degradation at all. In fact I was impressed at how responsive the application felt compared to Excel. Might be interesting to try bigger and bigger files until it looks like it's slowing down.

867-53095 years ago

>£40,335

>start date

they forgot to elaborate on this one

867-5309867-53095 years ago

uncertain for the downvotes, care to share which date you see in those numbers..?

DanBC867-53095 years ago

I don't know why you got downvotes.

Here's an explanation of the way Excel stores dates: https://news.ycombinator.com/item?id=24809037

867-5309DanBC5 years ago

that's a lot more helpful, thank you

hashtagmarkup5 years ago

At what point does being a "warrior" turn into being an "enabler"?

OzCrimson5 years ago

Excel bashing is easy but, where's a truly comparable alternative?

- Alteryx at $5000/year/user?

- A database that requires specialized skill or going through a DBA for every little need?

- Salesforce? It's hell to configure if you don't have the skill.

- Sharepoint? Again. If you're a small business or nonprofit, you've gotta hire or train someone before this is a serious conversation.

I would in no way suggest that an enterprise can run on Excel with centralized data and multiple users. And that's where I believe a lot of criticism of Excel is unfair: the comparisons are not apples to apples.

And let's be honest. EVERYTHING comes with a level of human error.

- Enterprise level databases can get so big that the corrupt data in them can be impossible to cleanse.

- Reports made by SQL people are often flawed.

Over my 7 years in an international company I discovered report after report that didn't accurately match business rules or they were based on loads of incomplete records, duplicates, inconsistencies, incorrect math, etc.

How about fixing the reports? Often the decision was "no." Fixing the reports wasn't something a department head wanted to pay a consultant for.

Eventually, I had the DBA stop sending me reports. I started exporting the raw data into Excel and made my own reports.

- Brand new reports can be impossible to get if they don't justify the cost of a consultant to create the report. (Ah, but there's Excel.)

- Databases get hacked and they crash.

So. Where's the tool that's perfect, free and easy to use? It doesn't exist.

ssdspoimdsjvvOzCrimson5 years ago

Excel is the proverbial hammer that makes everything look like a nail. It is very flexible, powerful and beginner-friendly. And while you can solve a lot of problems with it, some of those should be handled by other programs (or other people). The problem is that the average Excel user doesn't have the right knowledge to judge what the best solution is to their problem. And sometimes it only becomes clear after long use that they have created an unwieldy tool which causes many unforeseen problems.

OzCrimsonssdspoimdsjvv5 years ago

Again. What's the comparable alternative?

This is so frustrating because I've worked in small departments in an international company. Our problems were too small to get consistent, needed help from IT and we weren't important enough to get one of the limited licenses to the fancy reporting system.

Ultimately, there's all this bad talk about evil Excel, but we get no help or other resources.

And what about all the junk reports that were coming out of the company database that no one wanted to spend the money to fix?

THE ONLY VIABLE ALTERNATIVE WAS EXCEL.

deknosOzCrimson5 years ago

No. it's not the only alternative. it's the only thing you and many other accept. Python+Jupyter is a real alternative.

OzCrimsondeknos5 years ago

You go tell the Director of Customer Service that his people who are handling critical tasks need to learn Python. Good luck.

deknosOzCrimson5 years ago

exactly. because he does not see a problem with this because people like you tell him it's the only solution since dawn of excel.

But when developers complain they have to learn domain specific items, people say the shall keep up with the times. but suddenly the analysts and the others are exempt from this?

ssdspoimdsjvvOzCrimson5 years ago

Real life doesn't always provide ideal solutions. In your case, using Excel is probably the easiest or even only way to do what's needed. That doesn't mean it's the best, or even a good solution. If it causes organization-wide problems, whoever runs your organization should make sure better processes are instilled. The only thing you can do is make it clear to the person above you and hope things improve.

OzCrimsonssdspoimdsjvv5 years ago

We're closer in agreement. And I'd never suggest that Excel was the best solution or even a good solution at times. (I hated the tedious error-prone things that I used to do in Excel.)

I'm just offering the context from which weird stuff happens. And the answer isn't always as easy as: put it in a database.

The context, the processes (or lack of processes), the decision-makers, budgets, timelines, they all play a part in the outcomes.

fauigerzigerkOzCrimson5 years ago

The alternative depends on the people involved. In many instances it's a relational database, perhaps in combination with Excel for charts.

One thing Excel is extremely bad at is reliably importing data from CSV. It's unbelievable what errors you can get if you're not extremely careful.

Aeolunfauigerzigerk5 years ago

I think the question is what a viable alternative is if the company does not want to spend any more effort than the effort of the singular employee that wants the report.

They can do it by themselves with Excel. Can they do that with any other software?

fauigerzigerkAeolun5 years ago

Yes, some can, but depends on the task.

Not many people would be able to do complex calculations in an RDBMS, especially if it requires joins or grouping. But many more are able to use a GUI tool to create a single database table and import some CSV data.

The recent UK disaster wouldn't have happened if the data had been imported into Access or SQLite instead of an Excel sheet.

Of course it's not a panacea. Badly designed databases can be just as error prone as Excel, but they have different error modes.

bpicheOzCrimson5 years ago

R, or Python.

AnIdiotOnTheNetssdspoimdsjvv5 years ago

"The right solution to their problem" being a developer, at least as far as developers are concerned. Hammers and nails...

MoruAnIdiotOnTheNet5 years ago

Sometimes that database feels more like a jetplane when all I need is a hammer for my nail. Excel (or libreoffice calc) is perfect for a lot of stuff, especially those small things that you won't ever do again.

OzCrimsonMoru5 years ago

100%

bpicheMoru5 years ago

Excel is a UI over a database table. With a bunch of other UI driven math/stats functions built in. It really comes down to a difference between UI driven folks and folks that are code literate.

Pamarssdspoimdsjvv5 years ago

Your position is more reasonably explained than most of the Excel Haters' (I am a card carrying member of the group, btw) but there is an additional problem.

Even if/when the "average Excel user" finally sees the Light ... s/he still has to work in an environment where excel files (including .csv) are the lingua franca.

It's a bit like working for the German (or French or Japanese etc.) branch of a US large company. Of course the official language is English, and all documents should be written in English and emails should be in English and so on. Even if everybody (down to the Janitor?) speak acceptable English and are perfectly able to talk business in English... everyone will still use German unless they have to communicate with someone like me, whose German fluency is rated in Imaginary Numbers.

lloekiOzCrimson5 years ago

To me it seems Python or Julia notebook stuff like ipython or jupyter could (and did in some circles) displace Excel in almost all cases. The self-sustained inertia to change is real though.

OzCrimsonlloeki5 years ago

Who's going to pay to train a Customer Service Rep, Admin Assistant, Sales Assistant or Warehouse Manager to do Python when there's Excel?

Get into the world of the Excel user. Some of us do learn coding. But generally speaking? No. If it happens, it's on our own.

And we can agree that spreadsheets have problems, and Excel users typically are not data people.

BUT!

When we agree on that, it becomes evident that a lot of suggested solutions are non-starters. Not all Excel users are in an enterprise that's throwing money around for training and dedicated software.

I'd like to see the Warehouse Manager who goes in Monday morning and says to the company vice-president, "I've been looking into this Python class ... "

rjtavaresOzCrimson5 years ago

The problem isn't businesses using Excel, the problem is businesses using Excel for critical processes.

The Warehouse Manager or Sales Assistant doesn't need to learn an alternative, but e.g. the Finance people should. PowerBI/Tableau for reporting and R/Python for modelling should be a thing in those worlds (even if Excel continues to be an amazing tool for entry level stuff and some quick number crunching).

Aeolunrjtavares5 years ago

> PowerBI/Tableau for reporting

In my experience these take ages to load data that is near instant on Excel. I can totally see how people prefer excel.

rjtavaresAeolun5 years ago

It may take longer to load, but it allows you to automate the data processing. That's where the effort (and errors) usually lie...

OzCrimsonrjtavares5 years ago

How do you have so many answers to situations you haven't been in?

Do I need to say it again? My Excel skills were necessary because of garbage in databases.

I could sit at my desk; extract a data-dump of 50k rows of data; locate the fuzzy matches, identify which records needed to be kept and which ones deleted; merge it with the client's spreadsheet that they emailed to me and compare them; identify problems and report back to the angry client.

Why was the client angry? Because of inflexible garbage reports coming out of the database that people were blindly trusting.

Why was there garbage in the database? Not because of Excel.

CJeffersonlloeki5 years ago

With jupyter, you get the problem that editting cells does not automatically make later cells get recalculated. That alone, in my opinion, is worse than any of excel's failings -- imagine a spreadsheet which could get out of sync like that?

intendedlloeki5 years ago

Charting in excel is a million times faster and easier than having to learn an entire language.

Can’t we simply accept this for what it is? Code nerds preferring a code solution while normal people simply use the modern equivalent of graph paper?

OzCrimsonintended5 years ago

BRAVO

robscallsignlloeki5 years ago

Try deploying python apps in corporate environments with fairly locked down machines to non-programmers and get them to perform simple operations, and then compare that with just emailing someone an excel spreadsheet.

SymbioteOzCrimson5 years ago

Instead of only considering a consultant, the company should have a developer on staff to work on small tasks like this. The cost gradually decreases as they learn the business and don't need everything explained from scratch.

OzCrimsonSymbiote5 years ago

See. You've got answers to something that you haven't asked a single question about. But in your judgement the company should have had the right developer on staff.

That's what's frustrating about these anti-Excel discussions.

Ok. What happens when I say that the company bought another company and it had a product line that the new company wanted to pay sales people commissions on? And the compensation plan around it was complex.

For whatever reasons, that's not a report the DBA was up for developing from scratch (or, someone decided his time wasn't to be used that way). That's just the reality.

We can talk about what should have been or what ought to have been, but that doesn't change the reality.

The reality is that I ended up doing most of it ACCURATELY in Excel.

deknosOzCrimson5 years ago

> But in your judgement the company should have had the right developer on staff.

Yes, like a janitor for example.

> The reality is that I ended up doing most of it ACCURATELY in Excel.

Yes it's the hammer for all your nails. You could also use Python+Jupyter. There are selfhosted and managed versions.

Because of guys like you, municipial installations run their sewage system with an excel sheet because some guy implemented a vba macro to talk to the sewage api endpoint.

let it go.

AeolunOzCrimson5 years ago

I think the company would quickly realize that they need a different solution if they have no Excel wizard on staff, and the team spends weeks just calculating sales commissions.

Or maybe not, and they’d just hire more people solely to calculate commission :/

heavenlyblueOzCrimson5 years ago

I would say that you probably do not know what you did wrong having done it in Excel. Stuff gets complicated really fast in Excel thus it’s much easier to do something that you can’t possibly debug and thus live in blissful ignorance.

OzCrimsonheavenlyblue5 years ago

It's just unbelievable ... how people KNOW about situations that they weren't in and don't ask any questions. But since you brought it up I can tell you about my accuracy in Excel:

- The process involved 2 people checking my work before it was sent to payroll. And we did catch and correct mistakes. We were able to trace back where and how the mistakes were make, and build in ways to prevent or flag future mistakes.

- The job was necessary because of all the garbage in the database. Excel's data-checking and data-cleansing and flexibility was superior to the database.

- After 1 year all of my work was audited. For several dozen sales people, there was ONE person I wasn't 100% accurate with. And that one was off by 0.5%. SUCCESS!

- If we'd gone with the report coming out of the database, things would have been such a confused mess.

heavenlyblueOzCrimson5 years ago

It has nothing to do with your particular situation. This is a reasonable assumption to make if somebody comes and wants a solution in Excel.

But ok. So what you are saying your Excel was reviewed by somebody else. How is this any different from reviewing somebody else’s code?

I work in a company where even the least technical analysts are capable of using numpy. We have libraries to use Google Sheets as input-output data. So you get the best of the two worlds.

What does Excel resolve here?

heavenlyblueOzCrimson5 years ago

Also I don’t believe that you could not use Excel to cleanse the data and then put it back in the database.

And I also do not believe that a python script would not do the same cleansing of the data in a much more maintainable way.

OzCrimsonheavenlyblue5 years ago

You've got all the answers about a company in Chicago in 2005-2008 and how it should have been run.

Why didn't you jump in and help us?

heavenlyblueOzCrimson5 years ago

I am not giving you advice on how to run a company, I am saying that your anecdotal opinion will not affect my decision-making process. And I am clearly stating that it should not affect the decision-making of other people here either.

Also I am going to underline that what you did was relevant 16 years ago; which you did not state in your initial message about the usefullness of Excel.

Which is kinda funny, if I didn’t ask you would have otherwise omitted a lot of otherwise relevant information which makes your opinion even less important.

It only makes me question why do you take this whole thread so personally? Probably because by now your skills are way less relevant than back in 2005.

zhte415Symbiote5 years ago

Not just a developer. A developer that's clued up in business needs, can communicate (listen, clarify, translate technical concepts), has a smattering in finance to promote cost, time and/or quality savings they make. And the company needs a succession plan should such a person not turn up.

Pamarzhte4155 years ago

Like a Business Analyst, sorta?

What is the first tool a BA would reach for for most of these problems?

iagovarOzCrimson5 years ago

Knime. Desktop app is free, is not as polished as alteryx but with some community love it will improve massively. It already is WAY better than Excel.

Stuff is organized in a manner someone unfamiliar with data handling or programming can understand it.

It takes a little bit to master, an some stuff (like looks and variable passing) is still unintuitive.

dagaciiagovar5 years ago

Is it just a better "Excel" or does it have some actual feature to prevent people making mistakes?

hermitcrabdagaci5 years ago

It isn't a spreadsheet.

It uses a visual data flow approach to transform data step by step from one form to another. This approach is much better for data transformation. It is also less error prone as operations take place on columns or rows of data, not cells. But you wouldn't try to create a set of company accounts with it.

iagovardagaci5 years ago

Take a look and you'll understand it right away. It's organized with nodes and links. Nodes perform operations, and links transport data between nodes. You can arrange them visually as you please. There's even stuff like "metanodes" that allow you to put nodes inside, so you don't have to have a messy playground.

Most nodes (there are official and 3rd party nodes) do have error output, and there's a console to inspect what's going on.

You can also inspect data visually in most nodes.

Even someone who is not used to play with data, who doesn't understand consistency and yada yada will understand the advantages.

viraptoriagovar5 years ago

This is amazing. Thanks for pointing out knime. There were so many occasion since and including uni where I wish I could've used that.

iagovarviraptor5 years ago

It's pretty useful for data manipulation. I use it for very heavy lifting (millions of rows with A LOT of text manipulation, regex, and even some ML) with a Dual Xeon + 32 GB Ram and it's a beast. It takes some time to master, but I encourage you to play with it.

ksquarekumariagovar5 years ago

And how does knime/everything else compare with PowerBI, given how aggressively it is being pushed by Microsoft, and the fact that it would come bundled as part of most organizations Office 365 deployment.

iagovarksquarekumar5 years ago

When you see it, you understand it. Knime interface is designed around "data pipelines" if you wish, completely different from PowerBI, which seems more oriented to dashboards.

n4r9OzCrimson5 years ago

Libre Office Calc covers 99% of what Excel use cases, has a very similar UI, and doesn't do secret magic guessing of whether a value is a date or not.

jasoden4r95 years ago

>, and doesn't do secret magic guessing of whether a value is a date or not.

This Libre Office question&answer[1] in 2020 says it guesses that the users want leading zeros removed from numbers. This behavior is similar to other software tools I looked at.[2]

This means that if a csv has "02142021" as a date for Valentine's Day with a leading zero, it will be imported as number "2142021" -- unless one takes extra steps to prevent the default behavior.

[1] https://ask.libreoffice.org/en/question/246295/how-can-i-import-a-csv-in-calc-with-the-leading-zeros-that-are-in-the-csv-file/

[2] https://news.ycombinator.com/item?id=24696732

n4r9jasode5 years ago

Ok, it might try to deduce the data type, but you can set it in the import options that appear when you open a CSV file. Excel will automatically convert a string like "2-4" to "02-Feb" upon opening a CSV file. No import options appear when you open the file. If you then save the file it then uses that new string. As far as I can tell there's no way to stop it doing this.

pvaldesn4r95 years ago

Gnumeric or Tex combined with R can help also...

There is ages since I opened an excel and I don't miss it.

OzCrimsonpvaldes5 years ago

BRAVO for you. And I mean that. You've got the tool YOU need and you haven't pushed YOUR tool on other people. Thank you.

numpad0OzCrimson5 years ago

- hypothetical Excel as a DB client app with MySQL as backend

sixhobbitsOzCrimson5 years ago

Metabase (for reports rather than transformations) was useful to fill all the "cant we just have like a spreadsheet" requests while maintaining some quality

listenallyall5 years ago

Give me a break. 2nd sentence of the article: "It had the number 40,335 in a random box, and payroll wasn’t clear why it was there."

If you work with Excel with any regularity at all, it's immediately 100% clear that this is a Date issue. I can't emphasize enough how obvious and elementary this "mystery" is.

smcllistenallyall5 years ago

I don’t use excel - could you explain what date this and why it’s obvious?

jasodesmcl5 years ago

>I don’t use excel - could you explain what date this and why it’s obvious?

MS Excel internally represents dates as a floating point number from epoch of Jan 1 1900

So "40335" or "40335.0" is +40,335 days from that epoch -- which is June 10 2006.

An experienced Excel user often runs into "mysterious 5 digit numbers" from 3xxxx to 4xxxx and will instinctively think it's probably a date that's inadvertently formatted/interpreted as a number. By switching the formatting of that cell to "Date" (Excel right-click "Format cells..." choose Category "Date"), the mystery number becomes a readable mm/dd/yyyy.

Same idea as experienced programmers coming across a 10-digit number like "1602930678" in a JSON or XML data file and instinctively assume it's probably a UNIX timestamp rather than a phone number with area code (160).

Veenlistenallyall5 years ago

The article is written for a general audience that probably doesn’t use Excel regularly. It would be counter-productive to use an example the reader needs paragraphs of explanation to understand.

listenallyallVeen5 years ago

It's not about the article, it's a criticism of a company so clueless that it was unable to decipher a date without calling upon outside consultants.

Veenlistenallyall5 years ago

That describes thousands of SMEs and non-technical business units who use Excel to manage their accounts and business operations. For example, a small law practice might use it to manage salaries, but, in spite of being full of very smart people, employ no one who has a clue about the quirks of how Excel handles dates.

These consultants exist to serve that niche.

listenallyallVeen5 years ago

I'm not criticizing the consultant, hell, I'll happily take someone's money to "solve" these types of problems. But nobody from "payroll" (as described in the article) should be authorized to cut checks, especially for $40,000+, if they haven't figured out a pretty basic aspect of the primary tool they use -- and in addition, don't have critical thinking skills to determine that, hey, the person's start date seems to be missing. If Excel is too challenging, either outsource payroll or go ahead and spend for a dedicated application. Being a non-technical business is no excuse to hire incompetent people.

Good thing Excel doesn't use Unix time... payroll would likely have just written the guy a $1.6 billion check. Look at that, Excel just reduced the company's expenses by 99.997%!

OzCrimson5 years ago

One thing I'm thankful for is my Excel skill.

Why do I even have Excel skill? Because of the garbage reports and data that I was pulling out of databases. I spent YEARS exporting data-dumps and cleansing them in Excel to help customers whose lives were turned inside out because of garbage in databases.

Not a negative word should be uttered about Excel until you can guarantee that databases and software are pristine and never fail due to human error.

strangeattractrOzCrimson5 years ago

I agree the excel is great for cleansing or processing small/medium sized data sets. It's great for comparing two or more data sets too. The thing that makes it great for these things is visual editing, and the ease of setting up formulas. This also makes it dangerous in business critical processes because it allows too great a surface for mistakes.

There's also the issue that they're often set up as an ad hoc analysis only to become a process carried out on a schedule. This gets particularly bad when macros are involved - especially when they're password protected.

PeterStuer5 years ago

Having consulted in large enterprises, the main reason spreadsheets exist: A knowledge worker needs some form of automation support to do the job.

Choices are:

(a) spend a day drafting a rough spreadsheet, then over the next months tinker with it making it a bit better, more efficient and add more relevant functionalities.

(b) put in a formal project request to IT. This will be put into the backlog, and most probably classified as non priority unless there is some lobbying from people with clout to 'get this done'. When accepted, usually months or even years later, it will be analyzed at length by a business analyst that is unfamiliar with the problems that need to be solved, handed over to a programmer (or worse, an external contract) that will implement something that will have 30 assumptions filling the holes in the analysis, be accepted by the analyst and then thrown over to the poor user that will have to figure out if something, anything relevant to the job can be done with it. The user can start asking for corrections or rewrites, but since most of the operation is CYA from start to finish has a high chance of being berated for 'wasting the companies resources' (after all, this project that could have been done in a day with a spreadsheet has now already cost the org 200k or more in the least, a significant amount of that went to the IT dept's internal platform rewrites that they are desperate to do but never get budget for from the business) and with high probability land the user on the 'difficult/nagging' people list of the IT dept.

Every few years the enterprise will launch an initiative to 'clean up all our processes', typically when a new key manager moves in with 'friendly' IT contractors that he was palls with at his previous place, and they'll start with an inventory of all the 'shadow IT' (spreadsheets, Access DB solutions, homebrew VB programs, some SaaS stuff ...) in order to purge and consolidate all onto the shiny new ERP/BPM/... platfrom that was sold to the CxO. The new platform a few dozen million dollars later will (if successful) deliver maybe 33% of the required functionality, and the users can either 'let the business fail to show the nobheads that it is an impossible system to run the business on' or, more likely, keep using their spreadsheets and other quick fix solutions to keep the business running despite.

hermitcrabPeterStuer5 years ago

Here speaks the voice of bitter experience. ;0)

no-shermitcrab5 years ago

>>Here speaks the voice of bitter experience. ;0)

Hear! Hear! It's another mystery of capitalism! People screw up majestically, band-aid it, and still they make money. I'm not complaining, btw, having made plenty on the phenomenon...One might suppose all those people laboring mightily to band-aid all those problems must have some kind of incentive to do it that way, eh? Magical!

mmsimangaPeterStuer5 years ago

I work in Business Intelligence (BI) and the joke is that Excel is the best BI tool. My job starts after the new platform has delivered 33% of the functionality required. The rest of the functionality is in the Cobol system. Standard practice in BI is for business users to export spreadsheets/CSV from the different systems then use Excel to create the reports they want. No new BI software is going to magically solve the issues with data.

dagaci5 years ago

They key here is not about swapping tools, or swapping people, because tools and people are both equally fallible.

The key is "Code Review" and reasonably proof by testing, not just unit testing which is just nice, but integration testing which is strong.

Responsibility is knowing the scale of potential impact, that's when you should 100% insist having some proof with a proper tester.

erudagaci5 years ago

One of the problems with Excel is that it is just shy of being inauditable.

Excel is a great 'programming language' to quickly _write_ business applications. Especially for relative lay-people.

But Excel is almost impossible to _read_. And it's very hard to make changes, or track them sensibly.

ryeguy_245 years ago

This has been my profession for the last 15 years in a large consulting firm. We do this work for financial institutions (some other industries too). Despite how this group feels about banks, US banks have set up internal groups that literally review (validate) and test quantitative models, many of which live in Excel. This was driven by SR11-7 and OCC2000-16 both regulatory guidance. The term for this group / department / function is Model Risk Management (or model validation). Most banks now hold an inventory of models that exist in the entire bank (or at least the ones that have been declared) and they run a battery of testing procedures on them annually. It’s pretty exhaustive and sometimes too much but it certainly finds errors both in implementation and in logic.

You’d be amazed at the types of errors we’ve found over the years. Formulas referencing blank cells, relative references formulas that were meant to be static when the formula was copy and pasted to other cells. These were in some pretty serious models (e.g. trading models, risk monitoring models).

One point on Excel, it’s wildly powerful. We’ve done things in Excel that would impress most of this group. Things like 3D formulas (formulas that go through tabs), array functions (which are very infrequently used but are very powerful). I’d argue that most pieces of code (with the exception of recursion and other advanced things) can be replicated in Excel.

Half of the time when validating complex models implemented outside of Excel, we’d use Excel to replicate the logic to benchmark the result.

My big question to this group is, does this type of practice exist in non-banks? Do other industries do this?

dmaevsky5 years ago

I used Excel professionally for 20+ years as a quant in major banks. Mostly as a front end to mathematical models in C++. Always been amazed how a 40 years old UX paradigm is still being used virtually unchanged in modern systems.

Spreadsheets are truly ubiquitous in the financial industry, and for 20 years I've been struggling with exactly the same issues mentioned in the article and in other comments here.

The problem is that neither Excel, nor most alternatives treat a spreadsheet as code that it essentially is. And being code, it needs to - be source control friendly - be testable - have business logic (formulas) decoupled from presentation

Only then one can seriously consider using a spreadsheet as part of any production grade system.

For the past couple of years I've been working on https://ellx.io - a platform for visual reactive programming and a new generation spreadsheet/notebook designed to solve exactly this pain point.

Here are some examples of what you can do with it:

[1] https://ellx.io/dmaevsky/monte-carlo

[2] https://ellx.io/matyunya/tensorflowjs-simple-demo

[3] https://ellx.io/ellx-hub/plot

PlugTunin5 years ago

In my former life I worked in Finance departments of Fortune 500 companies, responsible for process and report automation, and making Excel-based systems less unwieldy -- often by converting them to Access. I have an Accounting & Info Systems degree. I'm well-versed w/ relational databases, SQL, and VBA. In more recent years, to distract myself from wanting to die, I've become proficient with Python and R. Unfortunately, a gap in my resume a few years ago became the reason recruiters started treating me like a leper.

What went wrong, besides bad luck? I was good enough at what I did that the perception formed that I'm not a real Finance person. So, instead of being used in roles with longevity, where writing code was an adjunct to domain expertise, I became The Tech Guy: a'hired gun', used exclusively to build tools for the real Finance pros to use in perpetuity.

Reading these comments is upsetting because they suggest that in 2020, someone w/ my background should be able to support themselves. To those who understand the power of databases and automation, the inanity of using Excel to do almost everything except wash your dishes is painfully obvious. Trust me: there's a better way.

Many technical people -- W/OUT a business background -- are prepared to do whatever is necessary to improve their understanding of The Business Side. Why is it that asking finance types to write code is akin to asking them to perform brain surgery? Perhaps grasping math and accounting principles doesn't imply creativity, or competency w/ language and logic? I've worked with people who can do all of these things. In 20 years, those who can't may struggle to find work. Somehow, in spite of my background, I ended up in the same boat.

Sorry to sound bitter, but people with skills who can't find work are sometimes more miserable than those without any skills, who have resigned themselves to a life of earning minimum wage. I miss the good old days, when I could flush money down the toilet for sport.

giardini5 years ago

Jocelyn Ireson-Paine (JIP) et al developed software years ago that converts spreadsheets to language and vice versa, so you can write in code and then generate a corresponding spreadsheet. This two-way capability aids, among other things, in debugging spreadsheets. JIP is in euSPRIG (see http://eusprig.org/best-practice.htm ). Here are some links:

http://j-paine.org/eusprig2005.html

https://www.i-nth.com/tag/jocelyn-paine

http://www.j-paine.org