The main pro about postgresql is that it’s immensely versatile and can do an incredible amount of things.
The main pro about mysql is that there are some open source and battle-tested multi-master extensions for high availability.
Now I know that MySQL is the most popular (because there's a banner on mysql.com that says so! /s), and I get the sense that on raw speed alone, it's probably faster (this is purely anecdotal), but folks that use PostgreSQL speak about it with an air of superiority.
So I'm hoping some of you HN users can shed light on this, based on your experience.
Is there a clear winner, or is it purely a case of "it depends"?
(and if it's the latter, is there a generally agreed comparison list of pros and cons that I can refer to?)
Thanks!
The main pro about postgresql is that it’s immensely versatile and can do an incredible amount of things.
The main pro about mysql is that there are some open source and battle-tested multi-master extensions for high availability.
Meanwhile MariaDB lacks a distinct JSON type, but instead provides a lot of functions for manipulating JSON stored in textual types.
One thing mysql is better than postgres is that mysql can fast replicate across different data centers. See this blog: https://www.uber.com/en-DK/blog/postgres-to-mysql-migration/
However what Uber is doing may not be what you doing. So do some research and hire qualified db admin or devop people to make the informed choice.
> Writing 512 byte blobs as separate transactions, in WAL mode with synchronous=normal3, temp_store=memory, and mmap enabled, I got 13.78μs per write, or ~72,568 writes per second.
Look ma, no DBA required.
- MySQL from about 1999-2007
- Postgres from 2006-present
- Oracle at a job 4 years ago
One of the things that absolutely made my brain explode is that Oracle, too, does not support DDL transactions. I discovered this during a pretty funny code review.
“Why are you wrapping this DDL in a transaction? That’s pointless.”
“Wait… Oracle doesn’t support DDL transactions?”
“What do you mean ‘DDL transactions’? That’s… should you be working on DB code at all?”
“Wait… you mean if a migration in Oracle fails half way through it just… leaves things half broken? Let me show you what this would look like in Postgres”
“WHAT THATS AMAZING!!!”
It just happened to be a great opportunity for both of us to expand our mental model of how different systems work. If you imagine someone whose mental model is "DML commands can be transactional, DDL modifies the database structure directly" then it's pretty reasonable for them to go "Uh... this guy doesn't seem to know the difference between modifying data and modifying tables..."
Having said that, my preference is for postgresql these days. It feels more ergonomic for the (fairly boring, normal database kinda stuff) things I do.
The philosophy of mysql was to do the best it can as fast as it can for as many users as it can. Not necessarily the 'right' or 'best' way.
Its not like either philosophy is formally documented or mandatory but "generally" matches up pretty closely to real world behavior.
The biggest problem you'll have evaluating them is both products are incredibly old and have changed over the decades so you'll find plenty of web search results DEMANDING that mysql can't do transactions, for example, because that was written 25 years ago and transactions were pretty much a solved problem some decades ago on mysql. Similar issues exist for tooling, library support, clustering, I wouldn't trust a many years old blog post for either product LOL.
For instance, a major upgrade of a MySQL server is usually painless (Debian even upgrades automatically). My last upgrade from Pg12 to 14 was not that easy. Replication has been easier with MySQL, though I've heard Pg has recently matured on this side. The last time I used a Pg DB that was queried by a web app, PgPool was installed because Pg could not handle many concurrent connections, but I've also heard this may not be required with recent Pg.
Client side, in many cases, a "poor man's search" with LIKE is enough or even suitable. With MySQL, I can declare `description TEXT collate utf8mb4_general_ci`, and then `LIKE '%de toutes façons%'` will match `De toutes facons`. Obtaining the same result with Postgres requires much more work.
Now for the bad side of MySQL: it has so many footguns, like the various compatibility modes (silent truncation when inserting is still the default behavior, I think), or the transactions breaking when a DDL starts. Support of complex types (arrays, JSON, specialized types like ISBN...) is also years ahead in Postgresql, and so are many advanced features.
Hasn't been the default since MySQL 5.6. MySQL 5.7 was released in 2015 and some distros may have altered the defaults, but I doubt it.
MySQL 5.7, released in Oct 2015, changed the default to enable strict sql_mode. All prior versions have hit end-of-life for support years ago, so there is no modern version of MySQL with this silent truncation behavior.
That said, there's still a major problem, but it's not MySQL's fault; rather, it's Amazon and their nonstandard defaults. AWS's managed database offerings inexplicably change the default sql_mode to disable strict-mode, both for regular RDS as well as Aurora. This is a huge problem, and it also definitely perpetuates this "MySQL still silently truncates!" perception.
25 years ago postgresql was clearly a slow DB, but they didn't claim you didn't need performance, just that you had to measure first to ensure you really did (nobody did that measure).
Things have changed. However my career ended up not needing much of a DB ever and so I've never had any need to see how much. I just like to watch the fights from the sidelines.
MySQL has very mature cluster managers like Vitess / Planetscale. Citus for Postgres was bought by Microsoft and doesn’t seem as popular as Vitess.
The functionality: from what I've heard and seen, PostgreSQL is commonly touted as the solution with more correctness in regards to SQL, but also has great support even for things like JSON, as well as PL/pgSQL is pretty nice to use, in addition to custom types and a whole lot of other useful stuff. That said, not every system needs than and the MySQL/MariaDB approach of having databases instead of databases and schemas is a bit simpler. That said, transactional DDL in PostgreSQL is a big plus, as someone else mentioned.
The performance: it probably depends on what you're doing. Benchmarking both for your particular workload is probably the way to go, though historically using pgBouncer has been a good idea for PostgreSQL. In the current day and age, both should be decent choices, especially with partitioning and clustering.
The tooling: here I'll come out and say that I like MySQL Workbench more than pgAdmin, both because it feels more like a native piece of software, but also because they have really great ER diagram functionality that when coupled with forwards/reverse engineering and schema sync can allow you to plan and alter your schema in a visual way: https://dev.mysql.com/doc/workbench/en/wb-design-engineering.html (and it gives you SQL at the end of the day, so you can further alter things as necessary, really nice for both exploring the schema, so you don't need someting like DBVis as much, as well as for quick edits). That said, pgAdmin has built in support for displaying geospatial data (e.g. PostGIS) which is pretty great since you don't need something like QGIS as much.
The developer experience: both are good. Both have containers that are easy to run, both are easy to connect to and work with (though MySQL drivers sometimes aren't shipped with other software out of the box), both are widely supported by a variety of tools (DataGrip is one of the better ones out there as well), and they also have managed cloud offerings from many vendors, if needed. Both will scale up and down pretty decently, though PostgreSQL seems to use a bit less memory in containers and has faster startup when I last checked, though that's a minor detail. Either will be easier than trying to find a trustworthy Oracle XE container image, or to build your own with some DB setup and initialization scripts that actually work and don't throw errors about the DB being closed when the container restarts.
At the end of the day, it's hard to go wrong with either of them, much like even SQLite can be a good choice for some workloads. Right now I'm building a personal project with MariaDB, but Keycloak more or less demands PostgreSQL, so running that as well. Though in regards to MariaDB one could probably mention the whole SPAC situation that went down a bit ago.
I find it useful to remember that MySQL has various storage engines (that make different trade offs, and may themselves have different options such as various row formats), when using for instance the default of InnoDB knowing that it uses index organized (meaning stores the data in a btree as opposed to heap) tables can influence what workloads perform well (is the primary key used heavily/load bearing).
Postgres often excels when queries are more complex, for instance being able to use partial indexes (include a subset of a table, a “where” clause in the index) can be very powerful.
One other point of differentiation that has been useful for me to consider for various projects is also the programming within the database(stored programs): specifically doing things like writing stored procedures/functions, Postgres has many high quality procedural languages(some are shipped in core like PL/Perl, or others like PL/R that are external) that can useful to express these with.
Most answers allude to either: they are kinda same and full-featured, or different philosophies. Neither really hints at how you pick one.
The fact it overlaps with MySQL if your use case is "backend for a common ORM" barely makes them comparable.
- Streaming Replication (keep a secondary instance for contingency or moving heavy queries away from the main database);
- Procedures in various languages;
- Foreign Data Wrappers for integration with other databases (much sturdier and Oracle's Heterogeneous Services and there's even an FDW for Elasticsearch);
- Backup/restore is quite easy as is upgrading.
The only thing I miss are packages like Oracle's.
So, unless I have no other option, I'll choose Postgresql.
MariaDB/MySQL's replication is also streamed. It's a continuous stream of replication data over an unbroken connection, for both statement-based and row-based methodologies. The only difference is that Postgres has since long been able to break transactions up into smaller pieces and apply them piecemeal, which is something that didn't show up in MariaDB/MySQL camp until a few years ago when introduced in Galera.
Over here in Latvia, I've seen orgs that run Oracle and are still starting new projects in it, because that's what they know best and presumably have the licensing all figured out. Admittedly, their automatic indexing and a few other features in the recent releases are useful.
However, in practice it's easier to work with PostgreSQL and MySQL/MariaDB, because of how easy it is to launch either, be it locally or on a server, thanks to the availability of container images. Even if you can launch Oracle XE or something like it in containers, provided that you get your hands on such containers, there will still be challenges in regards to startup time, resource usage and various other quirks.
But hey, for people that don't need to use that approach, I guess nobody ever got fired for picking IBM, Microsoft or even Oracle. Personally I'd just go for PostgreSQL/MySQL/MariaDB in most cases because of the ease of use.
* If you have pre-5.6.4 date/time columns in your table and perform any kind of ALTER TABLE statement on that table (even one that doesn't involve the date/time columns), it will TAKE A TABLE LOCK AND REWRITE YOUR ENTIRE TABLE to upgrade to the new date/time types. In other words, your carefully-crafted online DDL statement will become fully offline and blocking for the entirety of the operation. To add insult to injury, the full table upgrade was UNAVOIDABLE until 5.6.24 when an option (still defaulted to off!) was added to decline the automatic upgrade of date/time columns. If you couldn't upgrade to 5.6.24, you had two choices with any table with pre-5.6.4 types: make no DDL changes of any kind to it or accept downtime while the full table rewrite was performed. To be as fair as possible, this is documented in the MySQL docs, but it is mind-blowing to me that any database team would release this behavior into production. In other words, in what world is the upgrade of date/time types to add a bit more fractional precision so important that all online DDL operations on that table will be silently, automatically, and unavoidably converted to offline operations in order to perform the upgrade? To me, this is indicative of the same mindset that released MySQL for so many years with the unsafe and silent downgrading of data as the default mode of operation.
* Dropping a table takes a global lock that prevents the execution of ANY QUERY until the underlying files for the table are removed from the filesystem. Under many circumstances, this would go unnoticed, but I experienced a 7-minute production outage when I dropped a 700GB table that was no longer used. Apparently, this delay is due to the time it takes the underlying Linux filesystems to delete the large table file. This was an RDS instance, so I had no visibility into the filesystem used and it was probably exacerbated by the EBS backing for the RDS instance, but still, what database takes out a GLOBAL LOCK TO DROP A TABLE? After the incident, I googled for and found this description of the problem (https://www.percona.com/blog/2009/06/16/slow-drop-table/) which isn't well-documented. It's almost as if you have to anticipate every possible way in which MySQL could screw you and then google for it if you want to avoid production downtime.
There are others, too, but to this day, those two still raise my blood pressure when I think about them. In addition to MySQL, I've pushed SQL Server and PostgreSQL pretty hard in production environments and never encountered gotchas like that. Unlike MySQL, those teams appear to understand the priorities of people who run production databases and they make it very clear when there are big and potentially disrupting changes and they don't make those changes obligatory, automatic, and silent as MySQL did. IMO, MySQL has its place for very specific workloads, but if you don't have deep enough knowledge of DB engines and your workload to know that yours is one of those specific workloads, you should default to PostgreSQL.
For anything other than a data-oriented startup, I'd pick Postgres all day long.
The troubles each database gives you do have different flavours. Postgres's query planning is mercurial and sometimes quixotic. It can be tedious forcing it to do the right thing, and it does sometimes switch strategy resulting in queries taking 100x or more longer than they used to, just because some stats changed. MySQL's query planner is more stupid, more predictable, and less likely to give you this kind of pain. OTOH you have to be more careful writing queries and you sometimes end up adding hints or encoding a query structure which locks in a query plan which might not be the best one long term. But it usually degrades gradually rather than suddenly. I'm not sure which is worse, though, because gradual degradation doesn't demand fixing.
To be fair, my PostgreSQL experience is more recent, and both products have matured a bunch, but I've also seen plenty of the bad old MySQL problems—like MyISAM tables and 3-byte unicode—in just the last couple of years.
No, not really. All else being equal:
I'd say that today, it's much muddier than it was maybe ten years ago, before the data science explosion and cloud explosion.
That is, turn-key cloud Postgres is not Y2K Postgres, and MySQL post-Maria is not Y2K MySQL. In the former, administration of Postgres has gotten much easier since Heroku got involved and Andy Pavlo started talking about self-driving databases. In the latter, MySQL can do text search well enough that I sat pair programming this with a young engineer around 2016-2017 when I'd have argued prior that you were wasting your time.
In the end, it probably comes down to how you rank your requirements:
usability, performance, suitability for a particular purpose.
Things like postgis and pgrouting may have little equal, so like others, those can be huge differentiators if you need them and see the advantage in db backend computation without invoking the N+1 query problem in aggregate.
I'm biased since I used MySQL before the inefficient backend was deprecated and I used PostgreSQL to do all manner of experimental EAV implementations using bjson including various forms of tuples from Rich Hickey.
All hail Stonebraker.
I recommend https://pgloader.io if you're currently on MySQL and want to switch your existing app.
Coming from Oracle, I've been able to use PostgreSQL pretty much as though it were Oracle, including Oracle's optimistic locking strategy. Some of the SQL functions are a little different, but it seems the underlying philosophy is the same. In other words, PostgreSQL seems like Oracle, just a little different.
MySQL has always felt completely foreign to me. Depending on your background and familiarity with other RDBMS's though, you may be right at home with it. Wouldn't be a bit surprised if someone had a similar anecdote to mine replacing PostgreSQL with MySQL and Oracle with SQL Server.
If you don't have a personal preference for either system then do some benchmarking on the platform you'll be running them on and the loads you're expecting and see if either particularly excels over the other for what you're intending to do.
The concept of schema is also weird for MySQL users, although pervasive in some kind of Michael Stonebreaker' worldview.
If anything Postgres has a much clearer development model. Lots of features in the pipeline.
All other things being equal they actually perform about the same for our use case.
Then let’s talk replication… MySql GTID/Binlog replication is stupid easy to setup. And _it just works _ and recovers really easy. You can make hundreds of near-real time replicas.
Finally the tooling. I find MySql’s tooling far easier to use than Postgres. Nearly everything is SQL. You have to memorize configuration in both cases so take this last point is an opinion.
Both are a good choice. I think would save if you need absolute single-server performance, probably Postgres is the better option, but understand you're only getting marginal gains. If you want to make hundreds of read-only replicas or use their semi-sync HA plugin, MySQL is probably way to go, but understand that multi-master writes is still not really a thing.
Our access patterns are mostly analytical. A lot of scanning and aggregating has to be done. Nobody cared about clustering before so we had to change a lot of table schemas to make our IO efficient.
I only heard from colleagues that such issues are easier to handle in postgres because it is more flexible. But for mysql, you have to understand that primary key and clustering key are always the same thing.
(of course this depends on the storage engine, but it's the behavior of the standard engine innodb)
PostgreSQL is a great database, it's hard to go wrong with it.
I mean, unless there's something you know for sure that PostgreSQL won't do for you, it's a great default choice (even over commercial options such as Oracle or SQLServer).
(Meanwhile a senior dev in iowa is picking up cow shit, and not regretting his career change).
if you are already using MySQL, Postgres isn't sufficiently better that it justifies the cost of migrating.
Postgres back then was completely different then what it is now. It was mainly used for PhD and Master's student to hack on for their research. It was a mess internally and was hardly usable for production. I don't think that MySQL went through this style of development.
What eventually happened is that around 1995 SQL was added to Postgres and a bunch of non-Berkeley people started hacking on it. They did a fantastic job, and deserve all the credit for making it what it is today (Stonebraker has publically said this).
- Take data types, for example. Seems something simple, right? You can grok all data types in PostgreSQL in an afternoon. Want to grok MySQL data types? You're going to waste an entire week to grok them. There are multiple data types that seem to do the same thing with very small differences and a lot of incompatibilities... and a lot of exceptional behavior (see point below). MySQL data types are LESS powerful, less flexible and 10x more complicated than PostgreSQL data types.
- Enums and sets need to have their possible values defined on a table by table basis. What a nightmare to keep them in sync. PostgreSQL allows you to this once for the whole DB.
- Open MySQL manual and it's full of passages like: feature A does "blah" if X mode is enabled, but does "bar" if mode Y is enabled and will corrupt data if storage engine is used with mode Z enabled.
- SQL strict mode can be disabled by the client. You cannot enforce that your server will work only with SQL strict mode on because a client can simply open a connection with SQL strict mode off. There's no way to enforce that clients will only use SQL strict mode. You have to trust your clients. (This other comment explains strict mode: https://www.reddit.com/r/PostgreSQL/comments/xblooo/comment/io3iy2c/?utm_source=reddit&utm_medium=web2x&context=3)
- According to the manual, if client A does stuff with strict mode on and client B does stuff with strict mode off, data corruption will occur.
- A lot of surprising and unexpected behavior, even with SQL Strict mode on
- String comparison is case insensitive. There are workarounds but they are still workarounds.
- A lot of behavior is dependent on server configuration. So switching between servers you can't expect them to behave the same.
- Behavior not standardized among versions and installs
- Changing the server SQL mode after creating and inserting data into partitioned tables can cause major changes in the behavior of such tables, and could lead to loss or corruption of data
- Useless error messages (see https://www.reddit.com/r/PostgreSQL/comments/xblooo/comment/io3pha2/?utm_source=reddit&utm_medium=web2x&context=3)
- MySQL, at the same time it is less powerful and flexible, has a lot more levers and knobs than PostgreSQL. PostgreSQL, while more powerful and flexible, conceptually is a lot simpler than MySQL.
- PostgreSQL stays out of your way; with MySQL you’re constantly fighting it
PostgreSQL -> simplicity
MySQL -> complexity
Today, both databases are well supported as a service on major cloud platforms. Postgres does have some performance characteristics that lead Uber to drop it [0], but this in itself is not a reason to not choose it, because most teams don't have their kind of workload.
If I were to start a new project today however I would seriously consider Postgres given that Oracle is in my opinion fairly hostile. For example, they re-licensed the .NET driver from LGPL to GPL so that anyone updating would either have to buy a commercial license or open-source their entire product. While there are OSS .NET drivers, this move alone is enough to make me think twice about using it again for a future project.
[0] https://www.uber.com/en-RO/blog/postgres-to-mysql-migration/