I made BudgetSheet ( https://budgetsheet.net ) for myself and others who love to work in their own spreadsheets!
I made BudgetSheet ( https://budgetsheet.net ) for myself and others who love to work in their own spreadsheets!
One of the MAJOR improvements from using Alteryx is that it is easier to audit and review the process. With Excel you can be down the rabbit hole of data manipulation or formula driven outputs and realise you have made a mistake but cannot find where or how. Or cannot find where you went wrong without recreating the steps followed. The workflow style interface of Alteryx and other data analytics tools allows everyone to see the process from start to finish of how you are getting to your outputs. It can also ingest and handle much larger datasets than default Excel.
I honestly believe that we will see a large shift from using Excel to using better suited analytics tools. I cannot think of anything I can do in Excel that I can do in a GUI based data analytics tool. Not to write Excel off completely as will certainly still be using it, but to a lesser extent and not where complex data manipulation is required.
Testing the range on inputs or outputs is not done enough, but worse using the wrong functions can give a sheet that looks good but is very wrong.
The barrier to entry into Alteryx is nowhere near as difficult as say python or R for finance teams and consultants.
It is also seen as a mark of status to master a data analytics tool which improves a particular process either within the company or at a client.
I’m not disputing the quality of the tool, to be clear. It seems like a good change and a totally reasonable line item. Just questioning what it will take to actually replace Excel in a widespread way.
I’m always interested in visual programming tooling. Thanks for sharing.
This is why business is not considered a form of engineering, while software is (just barely). Imprecision is a tool in business, while in engineering it is an enemy.
It can pull in 3rd party data sources (including custom HTTP calls if you want to reach out to an API), has data types, can both process and store substantially larger data volumes (data is cached/stored separately from the spreadsheet data in a compressed, columnar format and processed by a separate data processing engine), and every step of the data processing pipeline is shown and easily referenced/reviewed (including the ability to leave comments in the code).
That said, PowerQuery is a graft onto Excel, originating in the SQL Server world. So "Excel" skills aren't very helpful/transferable; it leverages Microsoft's M language[2] instead of Excel functions, and the usage paradigm is far more natural for someone with a programming background than an Excel background.
But as someone else mentioned in another comment, Alteryx is anything but cheap. PowerQuery is already widely distributed and available to anyone with Excel installed[3]. So for anyone looking for this type of operational discipline around data, it's a handy tool to pick up. Bonus points that PowerQuery/M-code is portable between a variety of Microsoft tools (SQL Server, PowerBI, Excel, and a variety of Azure's data services).
[2] https://docs.microsoft.com/en-us/powerquery-m/
[3] Traditionally only available for Windows, but the base tech has recently been added to Excel for Mac and is being brought up to parity.
If you're pulling data from an external source that supports query folding[1] such as a database, it'll try to push those joins to the original source.
If you're pulling in data from a source that doesn't (or the join occurs after query folding is no longer possible), there are a host of performance considerations/optimizations.
By default PowerQuery knows nothing about your source data and makes no assumptions. If you're joining data using the GUI, the generated code will likely use the NestedJoin[2] function. A single "step" in the processing will have its ram capped at 256MB[3], so depending on the size of the data you're joining and absent of any data processing steps that would give it guarantees about your data, you may or may not be paging massive amounts of data to disk as it loops through the entire dataset you're joining for each row that is being joined.
That said, PowerQuery has 6-7 join algorithms available, and a variety of techniques that can be used to optimize the processing. For example, if you add a primary key to the table (via Table.AddKey or Table.Distinct or Table.Group), it can short-circuit the data processing the moment it finds a match. This[4] article series is really helpful for optimizing joins in particular and the link-outs and tidbits scattered within the series are fantastic for getting a better understanding of PowerQuery's inner workings and performance considerations.
[1] https://docs.microsoft.com/en-us/power-query/power-query-folding
[2] https://docs.microsoft.com/en-us/powerquery-m/table-nestedjoin
[3] Mentioned halfway down this article: https://blog.crossjoin.co.uk/2019/04/21/power-bi-dataflow-container-size/
Does M-code serve as a replacement for (the abysmal) VBA? And how similar is it to F#?
Sort of depends. M (and PowerQuery in general) is fundamentally about data processing. If you were only dropping into VBA for ETL purposes, then chances are you can use PowerQuery/M instead. With the advantage that it exists in a standard xlsx file, rather than the macro-enabled xlsm files that are more heavily restricted in many environments. It's not intended for more general purpose tasks, nor can it do stuff like interact with the Excel object model directly, so you'd still need to use VBA (or if targeting Excel 2016 and above, Javascript[1]) for that.
PowerQuery/M is basically a sidecar app within Excel, and doesn't have access to nor is aware of the Excel object model. Excel.CurrentWorkbook[2] is just a convenience function available when used within Excel, and it treats data from the current workbook the same as if it was pulling from an external workbook or csv file or web API or any other source. To the extent that even when solely accessing data from within the same workbook and not calling out externally at all, it prompts users with an "Enable External Connections" warning to use. You can then either have the output of that sync to a Table object within Excel (and easily/generally available for use), or made available from within the data model (only available via the object model or pivot tables).
That said, the data access functions[3] are fairly expansive, and you can get pretty creative in how you (ab)use them. For example, Web.Action is one I've abused often: pinging a web service to track usage, triggering external jobs, and exporting/syncing data to an external service. Although those last two uses require a bit of complexity in architecting, to account for cases when it gets manually refreshed/triggered in quick succession or with duplicate data.
> And how similar is it to F#?
I'm not familiar enough with F# to be able to answer that. But if you are, you can probably find out pretty quickly via this[4] article series. PowerQuery/M is used across a variety of Microsoft products, and most resources around it are akin to product-specific "how-tos". Ben's primer series is the only source I've come across that approaches it from a software engineering perspective.
[2] https://docs.microsoft.com/en-us/powerquery-m/excel-currentworkbook
[3] https://docs.microsoft.com/en-us/powerquery-m/accessing-data-functions
The capabilities were nice (and according to your posts more extensive than I know!) but the interface was so beyond counterintuitive, no discoverability, so unlike any of the variety of environments I've encountered in 10 years of programming, and the performance was painfully slow (compared to an RDBMS).
I have hopes for MS moving Excel to a less unique interface (I heard smooth scrolling is coming!) and I will definitely be reading your links.
And also yes, the Power Query Editor interface sucks. It's intended to be intuitive for Excel users, which inversely makes it a bit counterintuitive for someone coming from a dev background. The GUI functionality is also pretty limiting and only exposes a fraction of the overall capabilities. That said, there's an "Advanced Editor" button in the Ribbon that'll open up a modal that's far more familiar for a dev; shows you all of the code/Steps for the query all at once and has IntelliSense for code completion. Still crude as far as a dev environment goes, but far more productive than flailing around in the GUI the whole time.
Since you're coming from a dev background, I'd highly encourage that M primer[1] I referenced. It's a crash course on everything that's relevant to a developer (type system, error handling, patterns, antipatterns, custom functions, execution logic, etc). Between that and glancing over the standard library[2], you can avoid using the GUI entirely except for when it's convenient (such as scaffolding out some tedious stuff to then clean up in the Advanced Editor).
> and the performance was painfully slow (compared to an RDBMS).
Also of note is that Excel is only one of the data sources PowerQuery supports. There are a ton of others[3], including many RDBMSs natively supported and generic ODBC support if you have the appropriate driver installed.
[2] https://docs.microsoft.com/en-us/powerquery-m/power-query-m-function-reference
[3] https://docs.microsoft.com/en-us/power-query/connectors/
Looks like you need to install it separately as a module, it's not included with base Excel. So every user needs to download it, and have admin rights to install it. That makes it a non-starter in many of the corporate use-cases.
I've already written a rant about it before and I don't have the energy to repeat myself, but do not fool yourself into thinking switching from excel to Alteryx is doing yourself any favors. You're just trading one monster for another.
Save yourself a huge amount of money, not to mention your sanity and just take the time to learn some Python, Julia, literally anything else to get the same results faster, more reliably, and not be locked into that noveau Oracle-esque nightmare.
My point is moving from Excel to Alteryx, now the average population of people that uses Excel would not have the credentials to be able to stop using Excel and move to python to do the same job. I'm focusing on non-technical/programmers.
When Yahoo Groups shuts down, no more excel repair for governments?
Although it only helps to an extent, since it doesn't prevent bad data from being input into a table, and supports so many implicit conversions (listed out in [1]) that undesired behavior is still possible. Plus it enters into a little known area of Excel rather than baked into the customary workflows.
That said, it still comes in handy. A typical pattern of mine for worksheets I have to release into the wild (i.e. no telling who will end up (ab)using it and when it'll eventually come back to me in a bastardized form to salvage) is:
1. Create table(s) for raw/user-inputted data
2. Use that table as a source for PowerQuery
3. Leverage PowerQuery for type checking, data normalization validation, and data joining.
4. Output to a new table on the worksheet, referencing this instead of the original table for all formulas.
Also optionally putting complex logic into the PowerQuery steps, that way the "output" table is ready for end-use and doesn't require any complex and error-prone formulas or anything.
Even without leveraging the data model and DAX functions, this works out super well. More sophisticated users can easily modify the PowerQuery steps, and less sophisticated users are protected from accidental mistakes by either using it as-is or reaching out for help earlier on, before there's a crisis for assistance.
you can still be exposed to data munging on input/import, due to all sorts of formatting issues (like zip codes losing their leading zeroes or invalid phone numbers due to misaligned formatting).
In some cases, data validation[1] on the cells directly can help. Particularly if the data is going to be manually entered by the end user, rather than a bulk copy/paste job from elsewhere.
In other cases, you apply defensive checks within the PowerQuery processing. If a field is supposed to contain US-formatted phone numbers, you can explicitly set the column type to text, split at dashes and keep the first part, verify it only contains numerical content, optionally check for non-sense values (all zeros, all nines, 12345, etc), then left pad it with zeros to ensure they're all 5 digits. You can also package up all of those steps into a function that can easily be added into future worksheets via copy/paste and save you from re-creating the wheel. And depending on the data itself, it may be handy to call out to a third-party API for validation/normalization.
To maintain sanity, you can also create a helper column for validation. If any of the columns can't be "safely" normalized or validated, you update this field with details on what failed. At the end, split out any records where that column is not null into its own table. That way you both filter those records out of the "final" table, and can also easily add in an Exceptions Report section to the spreadsheet, clearly calling out the rows that had issues and providing the error details to the end user.
It sounds complicated, and it is compared to the alternative (Excel's undefined behavior silently swallowing it). But it's actually incredibly straightforward to implement and a fairly natural design pattern for someone coming from a software development background. You just start treating Excel as a UI and PowerQuery as a backend. All of the business/processing logic gets decoupled into PowerQuery, and formulas become primarily tools for presentation logic (such as adding an an Exceptions notice to the primary worksheet if COUNT(EXCEPTION_TABLE)>0).
[1] https://www.howtoexcel.org/tips-and-tricks/11-awesome-examples-of-data-validation/
There was always at least one mistake that put the numbers out by 8 digits.
This, incidentally, is every audit ever. I've been audited many times, every audit finds stuff and make recommendations. Then management gets to pick which recommendations they adopt and which they'll derogate. I'm not saying it's a completely useless process, but it doesn't provide the cast-iron guarantees people think it does.
Then, of course, there's financial audits, which are meant to provide a cast-iron guarantee the firm is still a viable entity. But nearly every big firm that went bankrupt had passed such an audit in the previous year.
What was you assumption of growth?
Infinity.
What?! That’s ridiculous why would you assume that?
forward back an email you sent to me when I pointed out the growth rate seemed high but you insisted it was a slam dunk (aka the only way the deal would go through and we’d get that bonus)
At times I feel like I put so many footnotes in the resulting presentation later the information should be interpreted as meaningless.
On another note, it's interesting to observe that the problems spreadsheets have are the same problems that plague poorly designed software everywhere - lack of documentation, lack of testing, lack of input validation, among others.
The other problem is specific to Excel. It just loves to mangle your data! The problem is so bad that geneticists have renamed some genes to stop Excel reinterpreting them as dates.
I recently hit a bug where all data was being checked to see if it was a datetime, and python interpreted `st1` as September 1st of the current year.
It has one order in the U.S and a different one in most other countries.
https://docs.microsoft.com/en-us/office/troubleshoot/excel/wrongly-assumes-1900-is-leap-year
If you have not seen a spreadsheet with say dedicated data input tabs, calculation tabs where you can see and audit the work being done, output tabs, revision notes, summary sheet you haven't seen a good spreadsheet.
Or if it does have this feature, no one knows how to access it
I dislike PHP just as much as the next guy, but I am quite impressed with how Hack turned out. It's Facebook's improved PHP. See eg https://hacklang.org/
It's not that Hack is a good language in absolute terms. But I think it's about the best language they could have come up with when starting from PHP.
(For a better comparison instead of C++, but still in similar spirit to C++, perhaps take D.)
For what it’s good for, it’s great. I love it because you can do many things faster with it than anything else, and the immediate feedback and visual data flow aren’t matched in any language’s IDE I’ve seen. But like all systems, when they grow they need checks. Like the trope about when your code grows to too much complexity, you start needing static types.
There's branching. There's Boolean operations. There's variables (cells, which is ultimately the point of the entire exercise). There's even a switch construct. There's even really simple web requests! All of these are of course functions. The only thing that's missing is function definitions (which can be done with VBA, but I excluded that specifically for the sake of this exercise).
It's not particularly pretty, but is usable by almost anyone, has good documentation, and a huge community of like-minded power users. As you mentioned, Excel doesn't have any of the constructs which help developers for other languages to avoid mistakes. The simple syntax, users which are not necessarily aware of good practices, and a lack of methods to implement such practices produces a perfect storm of factors to introduce bugs.
Fear the excel less world, for without it society is a shadow of its potential!
The biggest triumph of Excel is that you can incrementally learn new things, slowly turning a basic worksheet into a Turing complete masterpiece. In contrast, I believe that programming languages have a much higher barrier to entry.
A programming language must not necessary come in form of written text. We can say that concept of spreadsheet is the programming language, and Excel, Google Sheets, etc. are just implementation of the same concept.
The success of spreadsheet is in the fact that its visual and intuitive, similar to what made Visual Basic and Delphi very successful.
In an ideal world, we would have pgsql (the PostgreSQL front-end) and AWK, and people who realize that thinking about the model before you start playing with layouts and copy-pasting formulas is a good idea.
Edit: Excel is a Dunning-Kruger syndrome enabler. People think they can do data analysis because they can type "=sum(B1:B4)" into a spreadsheet.
While I'm not an auditor, I've been the go-to Excel person at all my internships and in full time roles. I've reviewed, enhanced, and fixed hundreds for tons of different functions (HR, supply chain, manufacturing production reporting). You can build a respectable system in Excel which has built in error flags that make it obvious to the end user. Here are some of my tips for people who deal with spreadsheets:
* Cell styles, much like headings in Word, are really great but nobody uses them. They clue end users about what cells do. If my end users see an orange cell, they know it's an input they can modify.
* Lock sheets, even without a password. 90% of the time will prevent people from breaking the spreadsheet in the first case.
* Stay up on new functions! =Unique, filter, sort, Switch, textjoin, and {sum|average|min|max}ifs have been added in the last ~18 months and can greatly clean up messy formulas.
* Document stuff like code should be documented. The Explanatory Text style is excellent for more in depth explanations but not visually clogging things up.
* Use tables. Name the tables. =Average(tbl_Production[Widgets Produced]) is self documenting, and the range expands with the table. Range references are notoriously obtuse.
--
The European Spreadsheet Risks Interest Group has some great information on best practices.
http://www.eusprig.org/best-practice.htm
Joel Spolsky's "You Suck at Excel" is also eye opening for most.
I call this "baby with a nailgun syndrome". You end up with a bloody mess.
We have these wonderful machines and 80% of people use them for browsing the web/using web apps and producing spreadsheets. Thats the reality I tend to forget sometimes.
Is there any way to lock all cells in sheet except for certain input cells? That seems like it would prevent many errors.
When you Protect a spreadsheet, all Locked Cells become... locked.
So a typical use case is to only un-lock the cells you wish people to enter variables into, (color the cell to make it clear), and then protect the sheet so nothing else can be tampered with.
Example: a lookup list for “hardware models” which is protected. So you can’t add a new model, but some cretin forces you to keep using the sheet.
With great power comes great responsibility.
It would be interesting if excel had some sort of internal markup language perhaps. So those orange cells could be semantically represented as <input> or some such.
I do hope the group is aware that Yahoo! is closing down Yahoo Groups and have moved their mailing list
https://news.ycombinator.com/item?id=20491091
https://news.ycombinator.com/item?id=22431500
https://arxiv.org/abs/1901.11100
One of the interesting takeaways from Emery Berger's talk, is that he was able to replicate to 2 decimal places a published spreadsheet transcription error rate of 5.26% of cells have errors.
I highly recommend the talk. https://www.youtube.com/watch?v=GyWKxFxyyrQ
"Excel doesn't stop you from doing stupid things because then it would stop you from doing smart things."
For anything serious, I'd probably choose Python/Pandas/Jupyter these days for better reproducibility.
At my old job I was driving to migrate from classic spreadsheet analysis to Google Data Studio, and I am doing the same at my new job.
There's some resistance in old dogs not wanting to learn new tricks, but for most of what we do, GDS is really impressive.
Fast forward to 2016-today...Microsoft fixed the GPU acceleration bugs.
>start date
they forgot to elaborate on this one
Here's an explanation of the way Excel stores dates: https://news.ycombinator.com/item?id=24809037
- Alteryx at $5000/year/user?
- A database that requires specialized skill or going through a DBA for every little need?
- Salesforce? It's hell to configure if you don't have the skill.
- Sharepoint? Again. If you're a small business or nonprofit, you've gotta hire or train someone before this is a serious conversation.
I would in no way suggest that an enterprise can run on Excel with centralized data and multiple users. And that's where I believe a lot of criticism of Excel is unfair: the comparisons are not apples to apples.
And let's be honest. EVERYTHING comes with a level of human error.
- Enterprise level databases can get so big that the corrupt data in them can be impossible to cleanse.
- Reports made by SQL people are often flawed.
Over my 7 years in an international company I discovered report after report that didn't accurately match business rules or they were based on loads of incomplete records, duplicates, inconsistencies, incorrect math, etc.
How about fixing the reports? Often the decision was "no." Fixing the reports wasn't something a department head wanted to pay a consultant for.
Eventually, I had the DBA stop sending me reports. I started exporting the raw data into Excel and made my own reports.
- Brand new reports can be impossible to get if they don't justify the cost of a consultant to create the report. (Ah, but there's Excel.)
- Databases get hacked and they crash.
So. Where's the tool that's perfect, free and easy to use? It doesn't exist.
This is so frustrating because I've worked in small departments in an international company. Our problems were too small to get consistent, needed help from IT and we weren't important enough to get one of the limited licenses to the fancy reporting system.
Ultimately, there's all this bad talk about evil Excel, but we get no help or other resources.
And what about all the junk reports that were coming out of the company database that no one wanted to spend the money to fix?
THE ONLY VIABLE ALTERNATIVE WAS EXCEL.
But when developers complain they have to learn domain specific items, people say the shall keep up with the times. but suddenly the analysts and the others are exempt from this?
I'm just offering the context from which weird stuff happens. And the answer isn't always as easy as: put it in a database.
The context, the processes (or lack of processes), the decision-makers, budgets, timelines, they all play a part in the outcomes.
One thing Excel is extremely bad at is reliably importing data from CSV. It's unbelievable what errors you can get if you're not extremely careful.
They can do it by themselves with Excel. Can they do that with any other software?
Not many people would be able to do complex calculations in an RDBMS, especially if it requires joins or grouping. But many more are able to use a GUI tool to create a single database table and import some CSV data.
The recent UK disaster wouldn't have happened if the data had been imported into Access or SQLite instead of an Excel sheet.
Of course it's not a panacea. Badly designed databases can be just as error prone as Excel, but they have different error modes.
Even if/when the "average Excel user" finally sees the Light ... s/he still has to work in an environment where excel files (including .csv) are the lingua franca.
It's a bit like working for the German (or French or Japanese etc.) branch of a US large company. Of course the official language is English, and all documents should be written in English and emails should be in English and so on. Even if everybody (down to the Janitor?) speak acceptable English and are perfectly able to talk business in English... everyone will still use German unless they have to communicate with someone like me, whose German fluency is rated in Imaginary Numbers.
Get into the world of the Excel user. Some of us do learn coding. But generally speaking? No. If it happens, it's on our own.
And we can agree that spreadsheets have problems, and Excel users typically are not data people.
BUT!
When we agree on that, it becomes evident that a lot of suggested solutions are non-starters. Not all Excel users are in an enterprise that's throwing money around for training and dedicated software.
I'd like to see the Warehouse Manager who goes in Monday morning and says to the company vice-president, "I've been looking into this Python class ... "
The Warehouse Manager or Sales Assistant doesn't need to learn an alternative, but e.g. the Finance people should. PowerBI/Tableau for reporting and R/Python for modelling should be a thing in those worlds (even if Excel continues to be an amazing tool for entry level stuff and some quick number crunching).
In my experience these take ages to load data that is near instant on Excel. I can totally see how people prefer excel.
Do I need to say it again? My Excel skills were necessary because of garbage in databases.
I could sit at my desk; extract a data-dump of 50k rows of data; locate the fuzzy matches, identify which records needed to be kept and which ones deleted; merge it with the client's spreadsheet that they emailed to me and compare them; identify problems and report back to the angry client.
Why was the client angry? Because of inflexible garbage reports coming out of the database that people were blindly trusting.
Why was there garbage in the database? Not because of Excel.
Can’t we simply accept this for what it is? Code nerds preferring a code solution while normal people simply use the modern equivalent of graph paper?
That's what's frustrating about these anti-Excel discussions.
Ok. What happens when I say that the company bought another company and it had a product line that the new company wanted to pay sales people commissions on? And the compensation plan around it was complex.
For whatever reasons, that's not a report the DBA was up for developing from scratch (or, someone decided his time wasn't to be used that way). That's just the reality.
We can talk about what should have been or what ought to have been, but that doesn't change the reality.
The reality is that I ended up doing most of it ACCURATELY in Excel.
Yes, like a janitor for example.
> The reality is that I ended up doing most of it ACCURATELY in Excel.
Yes it's the hammer for all your nails. You could also use Python+Jupyter. There are selfhosted and managed versions.
Because of guys like you, municipial installations run their sewage system with an excel sheet because some guy implemented a vba macro to talk to the sewage api endpoint.
let it go.
Or maybe not, and they’d just hire more people solely to calculate commission :/
- The process involved 2 people checking my work before it was sent to payroll. And we did catch and correct mistakes. We were able to trace back where and how the mistakes were make, and build in ways to prevent or flag future mistakes.
- The job was necessary because of all the garbage in the database. Excel's data-checking and data-cleansing and flexibility was superior to the database.
- After 1 year all of my work was audited. For several dozen sales people, there was ONE person I wasn't 100% accurate with. And that one was off by 0.5%. SUCCESS!
- If we'd gone with the report coming out of the database, things would have been such a confused mess.
But ok. So what you are saying your Excel was reviewed by somebody else. How is this any different from reviewing somebody else’s code?
I work in a company where even the least technical analysts are capable of using numpy. We have libraries to use Google Sheets as input-output data. So you get the best of the two worlds.
What does Excel resolve here?
And I also do not believe that a python script would not do the same cleansing of the data in a much more maintainable way.
Why didn't you jump in and help us?
Also I am going to underline that what you did was relevant 16 years ago; which you did not state in your initial message about the usefullness of Excel.
Which is kinda funny, if I didn’t ask you would have otherwise omitted a lot of otherwise relevant information which makes your opinion even less important.
It only makes me question why do you take this whole thread so personally? Probably because by now your skills are way less relevant than back in 2005.
What is the first tool a BA would reach for for most of these problems?
Stuff is organized in a manner someone unfamiliar with data handling or programming can understand it.
It takes a little bit to master, an some stuff (like looks and variable passing) is still unintuitive.
It uses a visual data flow approach to transform data step by step from one form to another. This approach is much better for data transformation. It is also less error prone as operations take place on columns or rows of data, not cells. But you wouldn't try to create a set of company accounts with it.
Most nodes (there are official and 3rd party nodes) do have error output, and there's a console to inspect what's going on.
You can also inspect data visually in most nodes.
Even someone who is not used to play with data, who doesn't understand consistency and yada yada will understand the advantages.
This Libre Office question&answer[1] in 2020 says it guesses that the users want leading zeros removed from numbers. This behavior is similar to other software tools I looked at.[2]
This means that if a csv has "02142021" as a date for Valentine's Day with a leading zero, it will be imported as number "2142021" -- unless one takes extra steps to prevent the default behavior.
If you work with Excel with any regularity at all, it's immediately 100% clear that this is a Date issue. I can't emphasize enough how obvious and elementary this "mystery" is.
MS Excel internally represents dates as a floating point number from epoch of Jan 1 1900
So "40335" or "40335.0" is +40,335 days from that epoch -- which is June 10 2006.
An experienced Excel user often runs into "mysterious 5 digit numbers" from 3xxxx to 4xxxx and will instinctively think it's probably a date that's inadvertently formatted/interpreted as a number. By switching the formatting of that cell to "Date" (Excel right-click "Format cells..." choose Category "Date"), the mystery number becomes a readable mm/dd/yyyy.
Same idea as experienced programmers coming across a 10-digit number like "1602930678" in a JSON or XML data file and instinctively assume it's probably a UNIX timestamp rather than a phone number with area code (160).
These consultants exist to serve that niche.
Good thing Excel doesn't use Unix time... payroll would likely have just written the guy a $1.6 billion check. Look at that, Excel just reduced the company's expenses by 99.997%!
Why do I even have Excel skill? Because of the garbage reports and data that I was pulling out of databases. I spent YEARS exporting data-dumps and cleansing them in Excel to help customers whose lives were turned inside out because of garbage in databases.
Not a negative word should be uttered about Excel until you can guarantee that databases and software are pristine and never fail due to human error.
There's also the issue that they're often set up as an ad hoc analysis only to become a process carried out on a schedule. This gets particularly bad when macros are involved - especially when they're password protected.
Choices are:
(a) spend a day drafting a rough spreadsheet, then over the next months tinker with it making it a bit better, more efficient and add more relevant functionalities.
(b) put in a formal project request to IT. This will be put into the backlog, and most probably classified as non priority unless there is some lobbying from people with clout to 'get this done'. When accepted, usually months or even years later, it will be analyzed at length by a business analyst that is unfamiliar with the problems that need to be solved, handed over to a programmer (or worse, an external contract) that will implement something that will have 30 assumptions filling the holes in the analysis, be accepted by the analyst and then thrown over to the poor user that will have to figure out if something, anything relevant to the job can be done with it. The user can start asking for corrections or rewrites, but since most of the operation is CYA from start to finish has a high chance of being berated for 'wasting the companies resources' (after all, this project that could have been done in a day with a spreadsheet has now already cost the org 200k or more in the least, a significant amount of that went to the IT dept's internal platform rewrites that they are desperate to do but never get budget for from the business) and with high probability land the user on the 'difficult/nagging' people list of the IT dept.
Every few years the enterprise will launch an initiative to 'clean up all our processes', typically when a new key manager moves in with 'friendly' IT contractors that he was palls with at his previous place, and they'll start with an inventory of all the 'shadow IT' (spreadsheets, Access DB solutions, homebrew VB programs, some SaaS stuff ...) in order to purge and consolidate all onto the shiny new ERP/BPM/... platfrom that was sold to the CxO. The new platform a few dozen million dollars later will (if successful) deliver maybe 33% of the required functionality, and the users can either 'let the business fail to show the nobheads that it is an impossible system to run the business on' or, more likely, keep using their spreadsheets and other quick fix solutions to keep the business running despite.
Hear! Hear! It's another mystery of capitalism! People screw up majestically, band-aid it, and still they make money. I'm not complaining, btw, having made plenty on the phenomenon...One might suppose all those people laboring mightily to band-aid all those problems must have some kind of incentive to do it that way, eh? Magical!
The key is "Code Review" and reasonably proof by testing, not just unit testing which is just nice, but integration testing which is strong.
Responsibility is knowing the scale of potential impact, that's when you should 100% insist having some proof with a proper tester.
Excel is a great 'programming language' to quickly _write_ business applications. Especially for relative lay-people.
But Excel is almost impossible to _read_. And it's very hard to make changes, or track them sensibly.
You’d be amazed at the types of errors we’ve found over the years. Formulas referencing blank cells, relative references formulas that were meant to be static when the formula was copy and pasted to other cells. These were in some pretty serious models (e.g. trading models, risk monitoring models).
One point on Excel, it’s wildly powerful. We’ve done things in Excel that would impress most of this group. Things like 3D formulas (formulas that go through tabs), array functions (which are very infrequently used but are very powerful). I’d argue that most pieces of code (with the exception of recursion and other advanced things) can be replicated in Excel.
Half of the time when validating complex models implemented outside of Excel, we’d use Excel to replicate the logic to benchmark the result.
My big question to this group is, does this type of practice exist in non-banks? Do other industries do this?
Spreadsheets are truly ubiquitous in the financial industry, and for 20 years I've been struggling with exactly the same issues mentioned in the article and in other comments here.
The problem is that neither Excel, nor most alternatives treat a spreadsheet as code that it essentially is. And being code, it needs to - be source control friendly - be testable - have business logic (formulas) decoupled from presentation
Only then one can seriously consider using a spreadsheet as part of any production grade system.
For the past couple of years I've been working on https://ellx.io - a platform for visual reactive programming and a new generation spreadsheet/notebook designed to solve exactly this pain point.
Here are some examples of what you can do with it:
[1] https://ellx.io/dmaevsky/monte-carlo
What went wrong, besides bad luck? I was good enough at what I did that the perception formed that I'm not a real Finance person. So, instead of being used in roles with longevity, where writing code was an adjunct to domain expertise, I became The Tech Guy: a'hired gun', used exclusively to build tools for the real Finance pros to use in perpetuity.
Reading these comments is upsetting because they suggest that in 2020, someone w/ my background should be able to support themselves. To those who understand the power of databases and automation, the inanity of using Excel to do almost everything except wash your dishes is painfully obvious. Trust me: there's a better way.
Many technical people -- W/OUT a business background -- are prepared to do whatever is necessary to improve their understanding of The Business Side. Why is it that asking finance types to write code is akin to asking them to perform brain surgery? Perhaps grasping math and accounting principles doesn't imply creativity, or competency w/ language and logic? I've worked with people who can do all of these things. In 20 years, those who can't may struggle to find work. Somehow, in spite of my background, I ended up in the same boat.
Sorry to sound bitter, but people with skills who can't find work are sometimes more miserable than those without any skills, who have resigned themselves to a life of earning minimum wage. I miss the good old days, when I could flush money down the toilet for sport.
http://j-paine.org/eusprig2005.html
https://www.youtube.com/watch?v=yb2zkxHDfUE