RDS/Aurora was our most expensive resource so we were looking at ways to cut that cost down and mysql was one option (though the way the app worked and the extensions that it relied on made it not a possibility.)
2) Yes, if you are already HUGE and have requirements on Vitesse then by all means use it. If so, you are not asking this question—see #1.
3) It's a blog or something where it doesn't matter, use a static site generator.
The easy deploy tools (digital ocean apps, doku, beanstalk, heroku, etc) destroy the instance (losing state). You need to configure an offsite backup tool for your db or you have to manually setup a server to live forever.
Having a separate db machine is so much easier to setup and plays better with the diy tools.
MySQL's handling of unicode has also been terrible historically, with way too many foot guns, but I don't know if that may be better with recent versions.
People aren't providing strong reasons because the question wasn't "what are some objective reasons for picking one over another", but "how do you pick between them". People are simply answering the question OP asked, and a lot of people's process is simply to pick PostgreSQL.
No one is holding the literal bits that make up the database executable accountable here, they are indicating they don't trust the devs of MySQL/MariaDB to do a good job. Whether or not that is an accurate view on their part is arguable, but it's pretty clear from context that they don't think that several if/else statements had it out for them.
Unicode generally "just works" if the charset in use is utf8mb4. As of MySQL 8.0, this is the default.
This is a common misconception, but this hasn't been the case for over 7 years. MySQL 5.7, released in Oct 2015, changed its defaults to enable strict sql_mode. All prior versions have hit end-of-life for support years ago, so there is no modern version of MySQL with this silent truncation behavior.
The only reason this problem persists is because Amazon RDS (all versions and forms, including Aurora) uses nonstandard default settings which disable strict mode!
That all said, I do believe Postgres is an excellent database, and a great choice for quite a large range of use-cases. But please, let's compare 2023 Postgres with 2023 MySQL, not 2023 Postgres with much older MySQL. It's only fair.
It might be better now. But for me it’s in the same shithole as mongodb, php (old style, so no recovery there even though it’s possible to create proper code) and most JavaScript.
Other things is that people don’t even want to use Oracle’s MySQL but MariaDB. Why the hell would I want to run a fork of something, and still keep calling it something else.
The only reason for MySQL is wordpress
Heh Heh Heh
On a humorous note, the official MySQL page (in the early 2000's) comparing MySQL vs other databases had the same problem.
They'd list the latest and greatest MySQL version, but compare it against archaic versions of the others. Clearly on purpose, because "Marketing" probably.
Asked them (via official @postgresql.org email address) to please update that page to more a recent PostgreSQL, for fairness. And was completely ignored of course.
So it's kind of amusing to see a request for fairness in the opposite direction (which I agree with anyway) ~20 years later. ;)
The fact is that MySQL historically was terrible for complex schemas with complex types while postgres was a pleasure to work with. MySQL had a huge performance edge for many years but that came at a cost of resiliency and reliability. MySQL has greatly improved on these key areas and Postgres has also made significant performance improvements. Systems these days are also so powerful that the database probably isn't your benchmark.
Regardless, I always use Postgres every single time because I am just scarred from years of dealing with MySQL. What even is MySQL is also an open question at this point, there's MySQL and MariaDB and Percona flavors and the founder of Percona was just ousted and I can't be bothered to put in mental energy to untangle all this to even figure out what MySQL I should be developing against.
Compare this to Postgres where the community seems to have an extremely steady hand and constant progress. There's no forks, there's no infighting, there's no drama, there's a great regular release schedule with incremental improvements.
- Transactional DDL statements (schema modifications)
- Better support for UPSERT operations
- Better JSON support (including ability to index into JSON columns)
- the RETURNING statement to return data that was inserted/updated
In general Postgres is a lot more featureful than MySQL.
This has saved my ass a bunch of times. Not even as a column type, just in complex queries that otherwise became unwieldy monsters. The usual "started out simple but now it's a frankenbase" and you're stuck with a shitty schema. (The one thing worse than refactoring code: refactoring databases!)
In one case I was able to replace a 15-minute process with thousands of queries with one single query that aggregated all the data into a bunch of arrays. It completed in a few seconds. (Doing it without arrays would have been possible, but duplicated a lot of data in the result set.)
That issue was filed in 2005 and it still isn't fixed.
Another gripe I have is that MySQL's JSON Path capabilities are much more limited than the Postgres JSON Path capabilities. MySQL doesn't have filter expressions nor any operators/methods but Postgres does. Don't get me wrong, neither is jq, but I hate having to jump through extra hoops to get my JSON in the right format.
Compare
https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH
and
https://dev.mysql.com/doc/refman/8.0/en/json.html#json-path-syntax
Again, what is this thing you are trying to compare? I just see MariaDB and MariaDB =)
However, because of my blindness to actual MySQL, I have totally not paid attention to any differences between the two. I guess "drop in replacement" isn't actually true any more. Thanks for the info
No server process and a single file per DB which I can put wherever I like.
"In a CREATE TABLE statement, if the "STRICT" table-option keyword is added to the end, after the closing ")", then strict typing rules apply to that table. ... The STRICT keyword at the end of a CREATE TABLE statement is only recognized by SQLite version 3.37.0 (2021-11-27) and later.
sqlite> create table x (a int);
sqlite> insert into x values ('hello');
sqlite> select * from x;
hello
sqlite> drop table x;
sqlite> create table x (a int) strict;
sqlite> insert into x values ('hello');
Runtime error: cannot store TEXT value in INT column x.a (19)
Basically, AFAIK, SQLite becomes problematic once you need more than one computer to handle requests.
They're probably not as battle tested as the PostgreSQL ones, but they are around, have users, and are actively developed.
The ones I remember off the top of my head:
* https://github.com/rqlite/rqlite <-- more of a "distributed database using RAFT" type of thing
Honestly though, if I need these sorts of distribution features, I would probably prefer the database to have them built in. I don't really see the point in using SQLite at that scale.
https://www.sqlite.org/whentouse.html
https://www.sqlite.org/quirks.html
Full-scale RDBMSs, especially Postgres, have lots of goodies that either SQLlite doesn't have (or which it does have, but which aren't so richly featured). Once you've gotten hooked on a few of these, the distinction will feel a lot more clear.
Meanwhile the tipping points in favor of SQLite seem to be embedded systems, and its whole "service-less" architecture and plain ease of use. Which is why it still gets lots of love, for those contexts.
It's a good reminder to give some thought to whether one actually needs MySQL|Postgres. If not, SQLite is the way to go. Most of my code that uses a DB is using SQLite.
But obviously, if you actually need MySQL|Postgres then SQLite is not an option.
My adventures with QGis+PostGIS have also led me to discover the fun fact that even setting the TLS mode to "enforce" may enforce that you're using TLS ... but not that the certificates are correct. Silly. It also won't use the system's store when set to "full". Oh well.
By the way: I find Railway to be a great home for low-use Postgres/PostGIS databases. You basically pay for operations and the free tier is totally fine for semi-casual use (about 1-2k entities) with a few people from across the pond.
Whichever one you start out with, you will be annoyed if you switch to the other one 5 years later. I started out with mysql, and when I started working on a postgres project, I was shocked at some of the ways it was lacking (how am I supposed to store email addresses in a database without collations?).
But when postgres folks grouse about stuff in mysql, I'm usually nodding along and saying "yeah, that would be nice".
They're both great options. If anybody on your team is already an expert at one of them, use that one.
How long ago was this? :)
The documentation of Postgres 11 [2] states that this was not possible:
Note that while this system allows creating collations that “ignore case” or “ignore accents” or similar (using the ks key), PostgreSQL does not at the moment allow such collations to act in a truly case- or accent-insensitive manner. Any strings that compare equal according to the collation but are not byte-wise equal will be sorted according to their byte values.
[1]: https://www.postgresql.org/docs/12/collation.html
[2]: https://www.postgresql.org/docs/11/collation.htmlNot familiar with MySQL so trying to look that up, but with a constraint? Or just don't do that? - SO answer I found says 'it's much more useful to have johndoe@ and JohnDoe@ treated as the same than it is to support case sensitive email addresses'.. ok, it's also incompliant, but whatever's 'more useful’ I guess!
I will also add that their are giant companies relying on both databases with great success. Facebook still runs on MySQL, and contribute back to it. Youtube I'm not sure about, but it did run on MySQL for a long time, well after it got massive. I'm sure examples exist for Postgres (Amazonm since they moved off Oracle?)
PSQL in my experience has vastly better tooling, community, and is ahead of the curve tech wise. Same with extensions availability. Or perhaps you need to move away from it to say CockroachDB, or similar which is much easier.
I would say go with what you know and are most comfortable with. You are more likely to get the better outcome.
I've heard countless times that Postgres is better and I've watched talks where they show how loosey-goosey MySQL is with some things but I know how to backup, restore, tune, secure and replicate MySQL. I grok it's permissions in depth more than I ever have with Postgres and I've even written a mysql plugin in C so I have that in my toolbox if I need it. So I'd by default, usually go with MySQL (or in some cases SQLite.) but if I didn't have to administer or maintain it, and someone else was handling that I think I'd be fine with Postgres too.
also, I assume you mean MariaDB as MySQL is owned by Oracle and I would greatly implore anyone and everyone to avoid Oracle as if it has herpes.
There are a lot of historic problems with MySQL accepting invalid data, committing data even when there are constraint issues, and having very poor transactional isolation, I am not sure if these have improved.
Truthfully, the only benefits you gain from using MariaDB or MySQL are:
* Memory tables
* Having inconsistent replicas (which can be useful when you want your downstream to have less data than your upstream and you know it won’t get updated.)
This is a thread comparing MySQL and Postgres and your claim is that postgres is harder to do major version upgrades than anything you have used??
Context is important here, have you honestly actually upgraded a MySQL node? It’s a lesson in pain and “major” version changes happen on minor versions, like the entire query planner completely trashing performance in 5.6->5.7
Postgres has two forms of updates:
1) in place binary upgrade.
Fast, clean, simple, requires that you have the binaries for the old and the new database.
2) dump/restore.
Serialise the database into text files, load a new database and deserialise those files into it.
Slow, but works flawlessly & consistently with relatively low danger.
MySQL can only do option 2.
You can sort of fake an “update” by abusing the fact that MYSQLs replication offers no guarantees, so you can make a new server a replica; then roll over. But it is impossible to know what data was lost in that transition and MySQL will happily continue without ever telling you.
I have experienced this behaviour in large e-commerce retailers. MySQL was very popular for a very long time and I am intimately aware of operational best practices and how they are merely patching over an insane system.
MySQL absolutely fully supports in-place binary upgrades, saying otherwise is pure FUD. And the upgrade process in MySQL doesn't even iterate over your table data in any way, so claiming it will cause "data loss" is also pure FUD.
At Facebook we automated rolling in-place updates of our entire fleet, with new point builds of fb-mysql going out several times a month, to the largest MySQL deployment in the world. Worked flawlessly and this was a full decade ago.
MySQL is widely considered easier to upgrade (relative to Postgres) because MySQL's built-in replication has always been logical replication. Replicating from an older-version primary to a newer-version replica is fully supported. When upgrading a replica set, the usual dance is "upgrade the replicas in-place one at a time, promote one of the replicas to be the new primary while temporarily booting out the old primary; upgrade the old primary and then rejoin the replica set".
It is not relevant to the discussion to discuss how Facebook has managed to munge it to work reasonably well by pouring thousands of hours of engineer time into the effort; and MySQLs in-place upgrades absolutely do not work the way you describe consistently.
I know this because I have been in the code, and only after having experienced it. Maybe some of your lovely colleagues has helped out your particular version to be marginally more sane.
It genuinely must be nice having a dozen people who can work around these issues though, I certainly wouldn’t consider it an operational win, most companies have no DB automation engineers, or DB performance engineers or MySQL infrastructure engineers.
> Replicating from an older-version primary to a newer-version replica is fully supported.
Here also be dragons, as eluded to. I know it works quite often, I have used it.
FWIW: I ran global AAA online-only game profile systems on a handful of Postgres machines at about 120k transactions/s in 2016, I would have needed 5x as many instances to do it in MySQL, and this was only tiny part of our hosted infra.. which included a global edge deployment of game servers, auth servers, matchmaking, voice bridges and so on.
and we only had two people responsible for the entire operation
I haven't made any comments regarding performance comparisons, and have also run extremely large DB footprints with tiny teams, but I don't see how any of that is relevant to the specific topic of new-version upgrade procedure!
I mean, I’ve also seem my share of “ERROR 1071 (42000) at line xxx: Specified key was too long; max key length is xxx bytes” randomly that basically means the machine needs manual recovery.
God help you if you don’t have innodb_file_per_table enabled to begin with too.
I know you want me to cite exactly. That will take me time to find because I stopped caring about MySQL 7 years ago, but I will dig for you.
What? This error has nothing to do with upgrades, nothing to do with manual recovery, and hasn't been a common problem for many many years.
In old versions of MySQL, it just meant you needed to configure a few things to increase the InnoDB index limit to 3072 bytes, instead of the older limit of 767 bytes:
innodb_file_per_table=ON innodb_large_prefix=ON innodb_file_format=barracuda
and then ensure the table's row_format is DYNAMIC or COMPRESSED.
But again, all of this happens by default in all modern versions of MySQL and MariaDB.
Should it have been the defaults much earlier? Absolutely yes, MySQL used to have bad defaults. It doesn't anymore.
It happens and I seriously never claimed that it was an ultra common problem, merely that upgrades in Postgres are more intentional and not painful except for a little extra work between major versions. The standard upgrade path within major versions; 9.x or 10.x or 11.x or 12.x is working just the same as MySQL, except I have much more experience of MySQL completely fumbling their “automatic unattended” upgrade or even the mysql_upgrade command.
Mostly because in the real world outside of engineering cultures databasen are massively abused, ISAM tables that are constantly updated, InnoDB ibdata1 in the terabytes, poor configs, replicas that have skipped a few queries, column changes inside a transaction that failed but actually modified data, it happens. Usually I am called in to clean the mess.
Major difference here is that Postgres doesn’t leave a mess, so I never have the kind of issues that I am describing in this thread with it, and you don’t because I am guessing that you’re there when they’re installed, someone with knowledge was actively maintaining. or you have a lot of people to help with shortcomings.
I get it though. you’ve got your sunk cost knowledge of MySQL and you’ve been on large support teams for it. Maybe you’re afraid I’m suggesting that this knowledge goes out the window. and it has gotten better, but I wouldn’t give my kids watered down led infused soft drinks just because I had suffered through led poisoning. I remember coming to blows with you in other threads over the years because you think MySQL can be saved or is totally fine, but honestly, just, no.
I'm responding to you because you're repeatedly posting factually incorrect items, for years. For example you and I have directly discussed the "MySQL doesn't use SemVer" thing before on HN, and yet here you are again in this thread, claiming 5.6 to 5.7 should be a "minor" upgrade.
Anyway, to the topic at hand, as others have also mentioned in this thread: historically the difficulty with Postgres upgrades has been the lack of cross-version replication, due to Postgres WAL replication being a low-level physical replication system. This made it difficult to perform an upgrade while keeping your site fully online. Perhaps the newer logical replication support makes this easier these days. I hope to learn more about it someday. If you can share your process for upgrading a Postgres cluster while keeping it online, that would be helpful and informative.
2. Failovers of any database are not instant, but they are indeed quick! So let’s not claim that you can do an upgrade with zero downtime.
3. In-place upgrades are extremely fast and you can test the speed using a physical replica before hand, usually it’s a couple of seconds though the docs say minutes.
4. MySQLs major version being in the minor position is exactly the kind of “you should be sure you know what you’re doing but we won’t make it obvious” territory that I really despise.
It was nevertheless a pretty epic journey of dialectic discourse plunging _deep_ into the esoteric and nuanced realm of expert-level technical minutiae. A mostly intellectual journey, albeit distinctly punctuated by an undertone of emotional angst that steadily progressed in its growing intensity in a manner proportional to the magnitude of your collective disagreement… epic indeed.
I won't say it's impossible you ran into issues doing this, but it is the documented and supported upgrade path.
I love Postgres, but as someone whose maintained both for years, upgrades (at small scale) are the one area where I'd say MySQL has Postgres beat.
11.0->11.2 will work totally fine, with no command needed.
Minor versions on both Postgres and MySQL are painless, just install and restart the server. Major upgrades on MySQL are significantly less painful.
it is the only official, in-binary replication mechanism.
Postgres does separately support logical replication now, but it has some limitations, such as not permitting replication of DDL: https://www.postgresql.org/docs/current/logical-replication-restrictions.html
Mssql has had this for ever, and I’m pretty sure MySQL has it as well.
My gut tells me that I would do it in the query itself though, and not rely on the collation. Maybe I am misunderstanding.
SELECT \* FROM users WHERE name = 'João' COLLATE ignore_accent;
It’s an excellent alternative to use a WORM drive when you’re trying to preserve everything (say, a list of financial transaction).
I’ve looked for something like this in Postgres (which I love!), but sadly it doesn’t seem supported.
https://dev.mysql.com/doc/refman/8.0/en/archive-storage-engine.html
Also, you might be interested in Parquet, perhaps as seen through Delta Lake https://delta.io/ or Postgres Foreign Data Wrappers like https://github.com/adjust/parquet_fdw -- Delta Lake's simple "LSM of Parquet files in an object store" design is pretty sweet.
herpes isn't that bad. most people will get it in their lifetime. 1 in 6 people have hsv-2, the less common variant. trying to avoid herpes is like trying to avoid chickenpox (although herpes isn't nearly as harmful as chickenpox).
you should avoid Oracle like it's a blood pathogen.
The worst part about having it is having to talk about having it. It's really not bad as a condition separate from societal concern.
Citation needed.
If you are an adult of typical sexual activity, it is likely you have already had sex with someone infected with herpes.
That doesn’t mean you have contracted it — carriers aren’t always shedding the virus.
I’m not sure I see any correlation between a country being conservative and an absence of sexually transmitted infection; the 10 countries where HIV is most prevalent are all (as far as I’m aware) relatively conservative.
Furthermore, here are some statistics from Wikipedia on HSV which may be referring to some of these conservative countries you’re referring to:
> Turkey— High levels of HSV-1 (97%) and HSV-2 (42%) were found amongst pregnant women in the city of Erzurum in Eastern Anatolia Region, Turkey. In Istanbul however, lower HSV-2 seroprevalence was observed; HSV-2 antibodies were found in 4.8% of sexually active adults, while HSV-1 antibodies were found in 85.3%. Only 5% of pregnant women were infected with HSV-2, and 98% were infected with HSV-1. Prevalence of these viruses was higher in sex workers of Istanbul, reaching levels of 99% and 60% for HSV-1 and HSV-2 prevalence respectively.
> Jordan— The prevalence of HSV-2 in Jordan is 52.8% for men and 41.5% for women.
> Israel— HSV-1 seroprevalence is 59.8% in the population of Israel and increases with age in both genders but the adolescent seroprevalence has been declining as in most industrialized nations. An estimated 9.2% of Israeli adults are infected with HSV-2. Infection of either HSV-1 or HSV-2 is higher in females; HSV-2 seroprevalence reaches 20.5% in females in their 40s. These values are similar to levels in HSV infection in Europe.
> Antibodies for HSV-1 or HSV-2 are also more likely to be found individuals born outside of Israel, and individuals residing in Jerusalem and Southern Israel; people of Jewish origin living in Israel are less likely to possess antibodies against herpes. Among pregnant women in Israel a small scale cross sectional study found the prevalence of HSV-2 infection was 13.3% and that of HSV-1 was 94.9%. The HSV-2 infection rate was 3-fold higher among immigrants from the former Soviet Union (27.5%) than among Israeli-born Jewish and Arab women (9%). Approximately 78% of HSV-2 infections in Israel are asymptomatic. HSV-1 causes 66.3% of genital herpes in the Tel Aviv area.
> Syria— Genital herpes infection from HSV-2 is predicted to be low in Syria although HSV-1 levels are high. HSV-1 infections is common (95%) among healthy Syrians over the age of 30, while HSV-2 prevalence is low in healthy individuals (0.15%), and persons infected with other sexually transmitted diseases (9.5%). High risk groups for acquiring HSV-2 in Syria, include prostitutes and bar girls; they have 34% and 20% seroprevalence respectively.
Again, not in conservative populations where marriage is the typical way to have sexual relations. Syria is one example that you quoted. A smart person would get his/her partner tested if they suspect anything before getting married.
In any case, assuming that infidelity or anything else sinful/haram doesn't occur in "conservative" populations strikes me as frightfully naïve.
The data is in. Arabs get herpes too.
Edit: it seems that HSV-1 is not an STD, but it causes genital herpes if engaging in oral sex? Which can explain the big gap between the two in conservative cultures.
Another related site: http://orthomolecular.org/resources/omns/index.shtml (scroll down for articles).
None of the above will help though in deciding between Postgres or MySQL ;)
Sneak peek photo [1] (from [2]). Just imagine its literally 500-1000x more convoluted per B-tree leaf node. That's every Postgres table unless you CLUSTER periodically.
[1]: https://josipmisko.com/img/clustered-vs-nonclustered-index.webp
[2]: https://josipmisko.com/posts/clustered-vs-non-clustered-index
Mind boggling how many people aren't aware of primary indexes in MySQL that is not supported at all in Postgres. For certain data layouts, Postgres pays either 2x storage (covering index containing every single column), >50x worse performance by effectively N+1 bombing the disk for range queries, or blocking your table periodically (CLUSTER).
In Postgres the messiness loading primary data after reaching the B-tree leaf nodes pollutes caches and takes longer. This is because you need to load one 8kb page for every row you want, instead of one 8kb with 20-30 rows packed together.
Example: Dropbox file history table. They initially used autoinc id for primary key in MySQL. This causes everybodys file changes to be mixed together in chronological order on disk in a B-Tree. The first optimization they made was to change the primary key to (ns_id, latest, id) so that each users (ns_id) latest versions would be grouped together on disk.
Dropbox scaling talk: https://youtu.be/PE4gwstWhmc?t=2770
If a dropbox user has 1000 files and you can fit 20 file-version rows on each 8kb disk page (400bytes/row), the difference in performance for querying across those 1000 files is 20 + logN disk reads (MySQL) vs 1000 + logN disk reads (Postgres). AKA 400KiB data loaded (MySQL) vs 8.42MiB loaded (Postgres). AKA >50x improvement in query time and disk page cache utilization.
In Postgres you get two bad options for doing this: 1) Put every row of the table in the index making it a covering index, and paying to store all data twice (index and PG heap). No way to disable the heap primary storage. 2) Take your DB offline every day and CLUSTER the table.
Realistically, PG users pay that 50x cost without thinking about it. Any time you query a list of items in PG even using an index, you're N+1 querying against your disk and polluting your cache.
This is why MySQL is faster than Postgres most of the time. Hopefully more people become aware of disk data layout and how it affects query performance.
There is a hack for Postgres where you store data in an array within the row. This puts the data contiguously on disk. It works pretty well, sometimes, but it’s hacky. This strategy is part of the Timescale origin story.
Open to db perf consulting. email is in my profile.
This is not the same as "having a primary key", Postgres also has primary keys. It just stores the PK index separately from the bulk of the data.
Oracle also has primary keys, even if the order of the rows is different to the key order. In Oracle, when the rows are stored in the same order as the keys in the primary index, it is a special case and these tables are called IOT, index ordered tables.
The disadvantages of IOT are that inserts are slower, because in a normal table, the data is appended to the table, which is the fastest way to add data, and only the index needs to be reordered. In an IOT, the entire table storage is reordered to take the new data into account.
Select queries, OTOH, are much faster when using IOT, for obvious reasons, and this is what you describe in your comment.
If you use TEXT, BLOB, or JSON fields, even in MySQL, the actual data is stored separately.
You’re incorrect about IOT reordering the entire table at least wrt mysql. MySQL uses a B-tree to store rows, so at most it’s insertion sort on a B-tree node and rare b-tree rebalance. Most b-tree leaf nodes have empty space to allow for adding new data without shifting more than a few hundred other rows. Also, non-IOT tables also need to do a similar process to write to each of its indexes. Last, it’s sort of a tossup since if you’re appending to an IOT table frequently, the right edge of the B-tree is likely cached. (similarly for any small number of paths through the primary index B-tree). At worst Postgres heap will need to surface one new heap disk page for writing, although I’m sure they have some strategy for caching the pages they write new data to.
Sorry to spam this info! Glad to see we both love databases and I’m always please to see engagement about this topic!
Postgres also has other gotchas with indexes - MVCC row visibility isn't stored in the index for obvious performance reasons (writes to non-indexed columns would mean always updating all indexes instead of HOT updates [1]) so you have to hope the version information is cached in the visibility map or else don't really get the benefit of index only scans.
But OTOH, I've read that secondary indexes cause other performance penalties with having to refer back to the data in clustered indexes? Never looked into the details because no need to for postgres which we've been very happy with at our scale :)
[1] https://www.postgresql.org/docs/current/storage-hot.html
Wrt secondary indexes, yes and no. There is a cost to traverse a B-tree for point lookups. Also, foreign keys may now be composite keys if primary key is composite as in the Dropbox example.
If the secondary index is very different from the primary, it will be more expensive. However it’s pretty common to at least use a “user_id” as the first part of the primary key. This will make partial full scans a lot faster for queries regarding a single user; only need to scan that users data, and it comes at a 1-2 order of magnitude cheaper disk read cost. So you’d need a secondary index only if the data you need is spread across 1000s of pages (megabytes of data for a single user in one table) and you’re looking for only a handful of rows randomly located in that sequence.
Twitter is a characteristic case where you need many different clustered sets for the same data (tweets) to power different peoples feeds. I believe twitter just stores many copies of tweets in different clusters in Redis- basically the same as having a (author_id, ts) primary key tweets table and a (follower_id, ts) primary key feed table, both having tweet data inlined. If one clustered table isn’t enough, use two.
I will say: we kept every table we didn’t have to migrate for perf reasons in Postgres, and never regretted it.
Edit: and the index “fix” for Postgres doesn’t work often. Postgres will validate the row on disk, even if it’s in the index, if the page’s visibility map isn’t set. If you data isn’t write once, there’s a decent chance your page is dirty and it will still make the “heap” fetch.
Because you can only have a single clustered index, you're effectively paying for efficient range queries on a single clustering key by making all other queries slower.
This tradeoff may or may not be worth it depending on your query patterns. In my experience, you can often get away with adding some subset of columns to a non-clustered index to make it covering, and get efficient range queries without making a copy of the entire dataset.
And even with clustered indexes, as soon as you want a range query that's not supported by your clustered index, you're faced with the exact same choices, except that you have to pay the cost of the extra B-tree traversals.
Example: 80 bytes keys gives you branching factor of roughly 100. 10M rows and you can pack say 20 rows per page. That’s a 4GB table, give or take. That btree still only has 3 intermediate layers and primary data on a 4th layer. (Calculation is log(10M/20/0.75)/log(100)+1.) The first two layers take up less than a megabyte of ram and are therefore easily cached. So you wind up only needing 2 disk reads for the final two layers. Unless Postgres is caching the entire index for point lookups, it should come out about even.
Can’t find any resource saying that btree height exceeds 5, so I’m thinking it’s at worst 2x the (very small) disk read cost vs Postgres.
I'm now super interested in the perf aspects of running pg_repack. It would definitely require scratch storage space to be able to copy over the largest table in the DB (I'd guess 2.1x the largest table vs 2x the total DB size). I imagine repacking isn't as efficient as putting stuff into a B-tree. But I wouldn't expecting it to be anything like 50x worse like I portrayed above.
If you have a lot of write traffic, that log table will get really big, and pg_repack will have to do a lot of work to replay it. It's possible that the log table grows faster than pg_repack can replay it, which will cause it to never finish and eventually exhaust your storage space. You can also run into an issue where the log table never gets completely empty, and pg_repack never initiates that final lock and swap.
Partitioning helps a lot because it divides both the size of the tables and the write traffic to each table.
Granted I haven’t had to make a decision like this for several years, I’ve hardly even touched a database except to debug some stuff on localhost that’s outside of my explicit purview. So maybe Postgres solutions have narrowed the gap on this without my knowing it.
It's like saying C is faster than Ruby. It always will be.
PostgreSQL has memory tables too.
Galera is the main one I can think of:
* https://galeracluster.com/library/documentation/tech-desc-introduction.html
* https://mariadb.com/kb/en/what-is-mariadb-galera-cluster/
* https://packages.debian.org/search?keywords=galera
I'm not aware of any multi-master, active-active(-active) replication system that is open source for PostgreSQL.
In the Postgres space there is citusdb which provides multi master.
There is also BDR from 2ndquadrant if you want a paid/supported solution. https://www.enterprisedb.com/products/edb-postgres-distributed
Use Postgres if you need some of the quite-good extensions, most notably PostGIS, or if you just want things to work; most documentation will be postgres flavored. Postgres gets more love from web-developers, and it shows.
Always use postgresql. It's more logical, more extensible, saner, supports many extensions and is more predictable.
MySQL is inconsistent crap, that trades away consistency, correctness and stability for a little bit of performance in standard use cases.
Do yourself a favor and always use postgreSQL. I switched 15 years ago and never looked back. Have done 15-50 projects since in psql.
That said, if you don't expect to have tables with 100m+ rows, even vanilla postgres will be good enough.
Apart from that (and noticeably higher memory consumption), Postgres is most likely preferable.
But if you need something that can handle 100TB+, go Vitess(mysql compatible).
Simple: I don't like having headaches. Therefore, I chose postgres.
I've always assumed that PostgreSQL is a step up, but never really bothered to look into what I get for the effort. Do I really get anything if I'm not trying to make apps at scale?
You're making a (mistaken) assumption that Postgres giving you a "step up" means that you also have to put in more effort. You don't, at least not in my experience. Both are database servers with a bunch of install & setup. There's phppgadmin if you want an 1:1 replacement for phpmyadmin (no opinion on these, haven't used either).
Postgres just gets you farther if you need to at a later point.
I would recommend you swap out mysql for postgres on your raspis. You're gaining experience on one of the two. But experience on Postgres seems to be more useful and valuable (cf. rest of the HN comments), for the same cost of your time.
There's so many things I want to learn I'm not sure postgres is such a step up from MySQL that it's worth being at the top of the list.
That includes VirtualBox, MySQL, Horracle Cloud. Just step back. Walk away. Do not pass go, do not collect $20000 lawyers fees for unintended actions.
That VirtualBox extension pack? That aint free... well, it is for personal use only because they're not shaking individuals down. However, Oracle watches what domains download that extension pack, and sues companies when too many employees download it.
You can see that in this reddit thread: https://www.reddit.com/r/sysadmin/comments/d1ttzp/oracle_is_going_after_companies_using_virtualbox/
And this article also addresses the risks of dealing with Horracle.
And a company I worked for had dealings with them as well. Again, Horracle played dirty and did bullshit, except over Horracle DB itself.
... if that is too big I use SQLite.
For everyone else who's in most cases not even stuffing a million rows into their database… just stick with Postgres :)
If you need replication go with MySQL.
Dapper and ^ that works very well IME.
Agreed about replication.
A super long time ago (decades) when I was using Oracle regularly I had to make a decision on which way to go. Although Mysql then had the mindshare I thought that Postgres was more similar to Oracle, more standards compliant, and more of a real enterprise type of DB. The rumor was also that Postgres was heavier than MySQL. Too many horror stories of lost data (MyIsam), bad transactions (MyIsam lacks transaction integrity), and the number of Mysql gotchas being a really long list influenced me.
In time I actually found out that I had underestimated one of the most important attributes of Postgres that was a huge strength over Mysql: the power of community. Because Postgres has a really superb community that can be found on Libera Chat and elsewhere, and they are very willing to help out, I think Postgres has a huge advantage over Mysql. RhodiumToad [Andrew Gierth] https://github.com/RhodiumToad & davidfetter [David Fetter] https://www.linkedin.com/in/davidfetter are incredibly helpful folks.
I don't know that Postgres' licensing made a huge difference or not but my perception is that there are a ton of 3rd party products based on Postgres but customized to specific DB needs because of the more liberalness of the PG license which is MIT/BSD derived https://www.postgresql.org/about/licence/
Some of the PG based 3rd party DBs:
Enterprise DB https://www.enterprisedb.com/ - general purpose PG with some variants
Greenplum https://greenplum.org/ - Data warehousing
Crunchydata https://www.crunchydata.com/products/hardened-postgres - high security Postgres for regulated environments
Citus https://www.citusdata.com - Distributed DB & Columnar
Timescale https://www.timescale.com/
Why Choose PG today?
If you want better ACID: Postgres
If you want more compliant SQL: Postgres
If you want more customizability to a variety of use-cases: Postgres using a variant
If you want the flexibility of using NOSQL at times: Postgres
If you want more product knowledge reusability for other backend products: Postgres
Even distributing the MySQL JDBC driver with your software means that you need to provide source code of the Java app that uses it.
I used MySQL by default for years. If you use it for internal-only apps then the GPL doesn't matter. But once you cross the line into sharing what you've built with others, MySQL is a non-starter unless you want to GPL everything you are sharing.
This is good to know. Thanks!
For most applications, either choice is going to be just fine. Use what your team has the most experience with. If you have no experience, try them both out and go with whatever you're most comfortable with.
Vim, *NIX (so Mac), monoliths, and lambda calculus all the way—FP!
jk, but FWIW I think sometimes, in rare instances, there does exist a pretty unequivocal consensus “right answer” to this sort of question … maybe such as like: Git vs any other distributed VCS ?
I think many of us can’t be bothered to go over (again) the issues we’ve had with MySQL in the past. The last straw for me was about ten years ago, when I caught MySQL merrily making up nonsense results for a query I’d issued that accidentally didn’t make any sense.
Very likely this particular issue, and others like it, have been fixed in the meantime. But I just got the sense that MySQL was developed by people who didn’t quite know what they were doing, and that people who really did know what they were doing weren’t ever likely to be attracted to that to fix it.
I'm sure there are use cases where MySQL will be the better choice over postgres, but the future for the stack looks bleak.
I'm not sure if there has been a recent breakthrough that has changed that. I think that still applies today. Correct me if I'm wrong.
Oracle seems to have let what used to be a core
technology of the industry languish
I think slowly squeezing the life from MySQL was a very explicit goal for them. After the big wins (Wal-Mart, etc) MySQL had 15-20 years ago I think it was very clear MySQL was going to chip away at more and more of Oracle's business.I wonder how much Oracle spends on MySQL every year? They're spending a lot of money to keep MySQL at kind of a "not quite good enough" state. But they can't kill it outright - it'd be like boiling a frog fast instead of slow.
In the end, I wonder what extinguishing MySQL really accomplished for them. It might have bought them some breathing room but Postgres quickly filled MySQL's old segment.
I'm unfamiliar with Faire but the rest were already using MySQL at the time of Oracle's acquisition in 2010. Switching backends would have been rough for those companies and this was... 2010, meaning Postgres was not nearly as performant or full featured as it is today. As mentioned in other comments, FB's investment in customizing MySQL has been extensive. They've poured a lot into their own fork of it.
More to the point, look at MySQL's progress since 2010. Do you think it has been largely stifled since then, or do you think it has kept pace with Postgres? It's been largely stagnant.
I'd love to hear your alternative theory, of course. You think Oracle bought MySQL to... what, exactly? Make it amazing?
I don’t think either DB is bad, but the whole MySQL is dead and needs serious work idea doesn’t make sense to me. What problem does MySQL still have that needs fixing in your opinion?
Anyone who says no investment has been into MySQL I suspect never took the time to read the features/release notes for MySQL 8
Very recently they've mentioned they'll be changing this again to have separate LTS releases, which is a positive change stability-wise.
Facebook definitely has the resources to migrate to Postgres, if there was any motivating reason to do so. Indeed, Facebook developed their own LSM-based key-value store (RocksDB) and MySQL storage engine based on it (MyRocks) and then migrated the majority of their db fleet to that. In total that's massively more work than migrating to Postgres would have been.
Part of the reason was that MyRocks provides better compression than pretty much any other possible choice, and that adds up to an astronomical amount of cost savings at Facebook's scale. In contrast, what would Facebook have gained from moving to Postgres instead?
see, I'm pretty sure there basically weren't. It lucked out at the right moment in the late 1990s. Also, Slashdot used it.
The only use case I can think of is when you want an application, and it requires or is highly optimised to MySQL. Otherwise, it should actively be avoided.
Not that there aren't reasons. There are some. But for starting out with a new app without a very, very good reason to do something different? PostgreSQL every day of the week.
But I’m like you, MySQL did some nonsense once that took me hours to work out. So now I really can’t be bothered with any potential quirks it may still have. This is not an SNL sketch.
Here is an 18-year-old bug, that DELETE triggers don't work for foreign key cascades: https://bugs.mysql.com/bug.php?id=11472
That makes the entire feature mostly worthless. Reported in 2005, last updated in 2008.
---
While I would choose PostgreSQL every time, MySQL has the following advantages:
1. Write-performance, due to fundamental design tradeoffs. [1]
2. Per-connection resources, due to single-process design.
3. Related to #1, no vacuum requirement.
Just as example - how do you create read-only user (SELECT only) in Postgres? In MySQL it's extremely simple and it works, while in Postgres it's a nightmare to create and maintain
Isn’t that
GRANT SELECT ON ALL TABLES IN SCHEMA foo TO bar;
?> and maintain
If you mean you want to grant a user select rights to whatever table gets created in the future (a somewhat questionable idea from a security viewpoint):
ALTER DEFAULT PRIVILEGES IN SCHEMA foo GRANT SELECT ON TABLES TO bar;
I think both are possible in PostgreSQL 9 and later (https://www.postgresql.org/docs/9.0/sql-grant.html , https://www.postgresql.org/docs/9.0/sql-alterdefaultprivileges.html)That version is from 2010.
I guess the lesson is that both these systems evolve fairly rapidly. You can’t use quirks you remember from over 5 years ago to judge their current versions.
I just want to either save a local date and time, or an utc timestamp. Postgresql’s timestamp(tz) types do neither and both at the same time.
- It's less featureful, and I'd consider that a strong virtue in the YAGNI camp - less to go wrong, less mental overhead.
- Maintenance is simpler and far less necessary in my general experience.
- Replication is simpler and more reliable.
- You can tell the query optimizer what to do. When this is needed, you'll be thankful. It's a godsend.
That said, I wouldn't run Oracle MySQL. I opt for MariaDB on smaller projects and AWS Aurora MySQL for larger projects. Aurora scales insanely well, and replication lag is almost non-existent.
In my general experience MySQL was always significantly faster but it's been a number of years since I've worked with Postgres and the comments here seem to indicate that that may no longer be the case. YMMV
I do remember getting bad tables with myisam tables a decade ago, sometimes after a bad shutdown.
Not saying it can't happen, but I don't think it's a common occurrence.
This doesn't really hold water in my opinion.
It's not like PostgreSQL is some minefield of misfeatures and quirky behavior. Some of these features exist, but have zero impact on you unless you actually opt to use them. But if you end up needing to: they're there, and you can just start using them.
Compare this to MySQL where they simply don't exist no matter how much you may need them. Need to apply an index to the result of a function to quickly fix a performance issue in prod? Sorry, you can't. Need window functions to accurately compute some analytics in a sane period of time? Sorry, you can't. The list of things you can do in PostgreSQL that you simply can't with MySQL is massive and grows every day.
The odds that you'll want, need, or greatly benefit at least one of these features is not small. Having the flexibility of knowing these features exist should you ever have a use-case for them is massive.
Need to apply an index to the result of a function? No problem, use a functional index, supported since October 2018: https://dev.mysql.com/doc/refman/8.0/en/create-index.html#create-index-functional-key-parts
Need to use a window function? No problem, supported since April 2018: https://dev.mysql.com/doc/refman/8.0/en/window-functions.html
There are far more examples than just those two though. :)
There's no silver bullets in software or databases, no "one size fits all". Just various trade-offs between different design decisions.
but IMO, I think it’s worth noting that occasionally, some rare unique and functionally superlative technology comes along that in practice transcends every alternative from the onset and indefinitely going forwards, sometimes even at a more prominent scale than the MySQL / Postgres projects topic of discussion (which are not small by any means).
something maybe like Git, most immediately comes to mind, as an example of the de-facto standard for distributed VCS basically since … 2005* when Linus decided to create it?
edit: not 1995
hot take. might be the most popular, maybe even by a large margin, but I think you'll find a good chunk of people who have actually tried different VCSs don't think it's the best.
I suspect the die-hard proponents of Mercurial, or SVN, or whatever else, these few pagan heretics that might exist out there wherever they’re hiding, have found themselves in a camp different to the Git standard likely on the basis of electing to be intentionally contrarian / anti-normative as the general catalyst, and rather not, as a function of struggling with Git to the point of being so disillusioned they call it quits and head out looking for greener pastures. I think in practice the most common result of encountering problems with Git is, fix the problems. And functionally I think that’s resulted only in furthering it’s supremacy over alternatives, despite there existing a handful of cultish weirdos who are _really_ into Mercurial and prefer not to fux with Git as a personal lifestyle choice haha)
Secondly, no one has disagreed with me on the matter, as I haven’t put forth a personal opinion, I’m simply impartially making referencing to the fact that Git is the de-facto standard. And from an intellectual perspective, was hoping someone might elucidate more into why that is the case, given my conception is an inferential deduction. at best).
While I’m personally relatively familiar with Git internals, by no means an expert, its the only distributed VCS I’ve ever used and I don’t know anything substantive enough about the alternatives to credibly make a relative value comparison here. They could be the bees knees for all I know, but it seems unlikely given the position Git holds as far as consensus standard choice.
Again, no one size fits all...
* Handling several thousand connections per second without needing a proxy or pool (MySQL's connection model is thread/conn vs Postgres using process/conn)
* Handle extremely high volume of primary key range scan queries (MySQL's InnoDB uses a clustered index, see https://news.ycombinator.com/item?id=35909053)
* Handle workloads that lean heavily on UPDATE or DELETE, without major MVCC pain (see https://ottertune.com/blog/the-part-of-postgresql-we-hate-the-most for example)
* Semi-synchronous logical replication, for environments which cannot tolerate any data loss during failover; this ensures the statements have reached at least one replica but without the huge latency of synchronous replication
* Use index hints, to ensure random index stats changes don't cause unexpectedly negative query plan adjustments (see discussion in subthread https://news.ycombinator.com/item?id=35909340)
* Handle very high-volume OLTP workloads using direct I/O, since InnoDB's buffer pool design is completely independent of filesystem/OS caching
* Achieve best-in-industry compression by using the MyRocks storage engine (MySQL's pluggable storage engine design has a lot of tradeoffs but it is inherently what makes this even possible)
* Use UNSIGNED int types, to store twice as high max value in the same number of bytes, if you know negative numbers are not going to be present
* Use case-insensitive or accent-insensitive collations out-of-the-box without having to monkey with confusing user-defined CREATE COLLATION
* Ease-of-use commands like SHOW CREATE TABLE
* Silly cosmetic things like the ability to reorder columns in an ALTER TABLE (see https://wiki.postgresql.org/wiki/Alter_column_position)
* A tooling ecosystem which includes multiple battle-tested external online schema change tools, for safely making alterations of any type to tables with billions of rows
And that one case I remember was perfectly solvable the regular way, with a little more time.
Do you know how to fix the table statistics quickly? Do you know how to change that query to force the execution plan you want? Do you know how long the solution will last until the stats change again?
MySQL is a bit more predictable for this case and if things go really bad for some unexpected reason, one comment can fix it.
I'm looking at it from ops perspective. The ratio during development doesn't matter that much - all issues are solvable at that stage. For me it's rather "which situation would I rather be in at 3am".
I think a better fix lies in the direction of making queries more of a first class object with options to nail down plans, add custom logic to pick plans dependent on parameter values, etc.
Personally never ran into this with Postgres nor had anyone I know worry about it - the query planner was reliable for me in 99.99% of cases but yeah, I admit that it's a black box for me that I expect to take care of internals - hopefully it continues to do so, but I got to give it to MySQL for allowing to override it then.
This is the usual culprit (cure: "ANALYZE ((tablename))").
Collecting more samples (ALTER TABLE SET STATISTICS...) may be useful.
"Extended Statistics" covers most(?) other cases: https://www.postgresql.org/docs/current/planner-stats.html#PLANNER-STATS-EXTENDED https://www.postgresql.org/docs/current/multivariate-statistics-examples.html
Imagine when you actually need any of the features that PostgreSQL provides like pub/sub, logical replication, JSONB etc. With MySQL you might have to hack a solution that is much more complex or you have to set up an entirely separate tool. What I find nice with PostgreSQL is that for simple cases you can get away without a dedicated key/value store or a queue or a full text search engine. You can do a lot of these kinds of tasks with just a single database.
MySQL has offered logical replication for considerably longer than Postgres, and it's a substantially more configurable and mature implementation. MySQL's built-in replication has always been logical replication. It's a feature MySQL has had for 23 years -- built-in logical replication is literally one of the top reasons why all the biggest MySQL users chose MySQL originally!
- for some reason it hang periodically and had to be restarted during the night for no apparent reason
- compared to Postgres, the tooling is garbage (both for backups and even more for general database administration)
- essential features are missing, the most important one of which, for me, was proper CSV import/export. CSV-related functionality is so broken and terribly inconvenient to use. In a specific case I had to write a program to export millions of records manually since MySQL could not generate correct CSV export due to some columns containing text with special characters, quotes, newlines. Any combination of the export parameters ("ENCLOSED BY", "ESCAPED BY" and all the other garbage options) failed in one way or another. I even tried to use non-standard characters like \x15 and \r for column and row separation but even that failed. With Postgres, "with csv header" is simple and works every time.
I also managed bigger Postgres databases (up to tens of terabytes) and never had the issues I encountered with MySQL.
I also use the same logic applied to document databases. Mongo or Postgres? Postgres
Also pub sub. Postgres or redis? Postgres
Use postgres until it’s not technically feasible to use it anymore.
From a former MySQL developer:
> let me point out something that I've been saying both internally and externally for the last five years (although never on a stage—which explains why I've been staying away from stages talking about MySQL): MySQL is a pretty poor database, and you should strongly consider using Postgres instead.
I also highly recommend investing in psql skills though if you are a Postgres user.
SHOW CREATE TABLE;
SHOW TABLES;
SHOW DATABASES;
SHOW PROCESSLIST;
CockroachDB added these aliases ages ago.
For anything at home, I would use MySQL just for those things. The psql client feels very primitive by comparison to me - even though it isn't.
During those few weeks I'm actively using the database on the project, I can either get frustrated beyond belief with the CLI for Postgres, or just use what's at hand with MySQL. In fact, these days SQLite is getting more of my attention anyway, and I wrote a small CLI for it a decade or so back (before the sqlite3 client gave us most of the below) to provide:
- Timings for the queries (in fact I called it 'tsql')
- Aligned-column displays, with | separators between columns and index-indicators
- Ability to parse some blobs (Plists on the Mac, for example) and display
- Things like "SHOW DATABASES", "SHOW TABLES", "SHOW TABLES LIKE" etc.
Mainly I wrote it to do some benchmarking, but I eventually preferred it over sqlite3 as the CLI.
Note that all this is personal stuff - When I do commercial stuff, the answer is always "what is best understood by the people maintaining it afterwards"...
\dt
\l
SELECT * FROM pg_stat_activity;
Not really; PostgreSQL doesn't store the original query, so you'll need to re-create it from pg_class, pg_attribute, and all of that (which is really what \d and such in psql do). The easiest way is probably pg_dump, but it's best to just get used to \-commands because it's really just the same thing.
But nerdy stuff:
Postgres stores data linearly (in heap - which has nothing to do with the heap data structure used for sorting, it just means pile of data). If you need to have fast access to data, you need to add secondary indexes - and the secondary indexes point to location in the heap as "this is where you find the data".
MySQL stores data in a tree - a table is a tree sorted by primary key. You can create secondary indexes and instead of a pointer they contain the primary key value.
That means for example that data with similar primary key will be located physically nearby each other, in MySQL but not in Postgres. At the same time, inserting new data with random (like UUID) primary key in MySQL will write all over the table, but will mostly "append at the end" in Postgres.
Postgres also implements MVCC with Serializable Snapshot Isolation - so data that someone changes exists in multiple copies and needs to be cleaned up later - but there's no locking. MySQL relies on locks instead so there's no duplication but you might see transactions waiting for each other. I don't remember if MySQL implements a proper serializable isolation - but that is not really the default on any database anyway.
Interestingly, Oracle has very similar design to Postgres (though it uses rollback segment for old data, so there's no bloat and vacuum but you might get "snapshot too old" error) while MS SQL Server is also tree and lock-based database like MySQL.
Does this impact you? It might, like in cases where MySQL performs terribly due to UUID keys or Postgres can't vacuum fast enough due to high volume of updates or something. Or you're implementing money settlement logic and need proper serilizable transactions, who know. But it is cool to know the implementation details.
That said, I'm not going to be sad with MySQL, though I'd probably go with MariaDB just because of full open source (note, I don't know any details there, being a postgres guy)
1. It's solid as a really reach data platform (more than just a relational database). It's extension framework is quite unique compared to others. It's JSONB support was the first among other relational databases and is feature rich and performant. Multiple index types. Transactional DDL. The list goes on.
2. No central owner. A lot of open source is source code is open, but it's maintained by a central company.
3. I mentioned extensions, but really that is understated. It can do really advanced geospatial, full text search, time series, the list goes on.
Having explained this a ton of times first 10 years ago - https://www.craigkerstiens.com/2012/04/30/why-postgres/ and then again 5 years later with and updated version, most recently tried to capture more of this in an updated form on the Crunchy Data blog - https://www.crunchydata.com/why-postgres
In my former work life we used Percona MySQL for the commercial support and very fast response to fix bugs and add features, but we also used Postgres and Oracle. In those cases it was more important to have awesome DBA's that could do anything with any database. I learned a lot from them and they earned a lot of respect from me. One of them could find significant design flaws in the application just be reviewing the schema. They cut our memory usage in half and bought us time to wait for the server vendors to support more memory in their next motherboard release.
MySQL failed me big time in the past, so my functional requirement would be "don't f** with my data". And so far, PostgreSQL and SQLite never did. Don't have time to give second chance, don't need to.
If there is a problem I have probably seen it. MySQL won’t start for some reason I know what to do. Adding and removing users , permissions, running with selinux, optimizing queries, indexes - all with in my scope. Backups and restores, working with snapshots, MySQL relocation or master slave. I have been able for years to upgrade mysql versions with out much hassle. I know the table structure well enough I could downgrade in almost all cases - with adjustments I once downgraded mysql 8 to 5.7 because years ago it was too slow.
Now if I had the desire (I’m not past 40 so meh) or i didn’t have such a difference in knowledge I’d probably seriously look at postgresql. So my suggestion is go with in your circle of competence. Regardless if you go with mysql it is a fine database that many use at a high scale.
It's the default choice for a number of reasons but chief among them is just that it's higher quality. That is it's developed to higher standards due to community bar being really high (thanks Tom Lane, et al for your stewardship) and testing and analysis of changes to the database being ingrained into the culture.
By pursuing correctness first before performance for many years PostgreSQL has built a stronger foundation that is now paying dividends in terms of both performance but also ability to ship powerful features quickly. This has generally resulted in the gap between MySQL and PostgreSQL only continuing to widen over the last 10 years.
So when would you consider picking MySQL?
To me that comes down to exactly one set of use-cases and that is workloads that are fundamentally incompatible with VACUUM. The PostgreSQL MVCC system requires that table heap be maintained by the VACUUM process to both ensure safety (txid wraparound) and reclaim/reuse heap storage. This process is very expensive for workloads that do a lot of updates, especially on indexed columns (as indices need VACUUMing also), less of an issue for non-indexed columns if you can use HOT (heap only tuple) updates and tune the target fill ratio of heap pages appropriately.
In most cases it's highly unlikely your business is going to reach the level of write load where these deficiencies in write behaviour actually matter but it is possible. Uber famously migrated from PostgreSQL primarily because their experiences with write amplification and VACUUMing.
If for instance though your data consists of a smaller live hot set and a warm set that is less frequently updated and easily separable by a deterministic factor like time you can very easily use PostgreSQL table partitioning to isolate the two and continue to scale for a very very long time on pure PostgreSQL.
In practice this may be fixed in PostgreSQL one day, there was a project called zheap to implement an UNDO log style storage system for PostgreSQL (which would avoid all the VACUUM maintenance etc) but it stalled out, largely I believe because it wasn't able to provide obvious wins quick enough to stimulate further interest. However OrioleDB has picked up the torch now and does in fact seem to be showing very impressive results.
If such a storage engine is merged in my mind there will no longer any reason to consider MySQL for production workloads.
One exception: I did migrate two very large tables (15B+ rows) from Postgres to MySQL for performance reasons. InnodB (MySQL storage engine) can arrange the records on page by primary key, and if you have a multi-value PK (user_id, uuid) it means all records from a user are in the same set of pages. Huuuuge improvement over having your data for a user spread out over N different pages. Memory cache way more efficient. Orders of magnitude speed up on cold queries, better cache hit rate, and cost reduction from smaller servers.
- Row-level security (RLS) - depending on the tenancy and DB level safe-guards you need Postgres supports it and MariaDB doesn't - transactions for schema changes - Postgres (and even sqlite) support this but MariaDB doesn't. Due to not having it, if there's a faulty migration manual clean-ups might be required to get it back to the state you need
I once tried to migrate a SQL Server DB to Postgres and eventually gave up, with MySQL being a pretty easy switch with some minor stored procedure rewrites.
Also it tends to do things way differently than every other DB. VACUUM is just a completely different concept that can footgun you pretty fast.
Postgres is pretty powerful but it has certainly made some interesting design choices.
If you create the index with lower() it will uses that; e.g. "create index on tbl (lower(email))" and then "select * from tbl where lower(email)=lower($1)". That's more or less the standard way to do this but there are some other ways as well. It's more explicit than MySQL, so in that way it's better. It's more effort and easy to forget, and in that way it's worse – either way: it's definitely possible to do case-insensitive searches with indexes.
When I first used PostgreSQL I ran in to "how the hell do I do this?! MySQL just lets me [...]" kind of issues, but after many years of PostgreSQL usage I now have the opposite when I use MariaDB, which also has its share of awkwardness and issues (just different ones).
The "MySQL just lets me" stuff eventually adds up and hinders development. For example, your lower() on the param example is now incompatible with most ORMs unless you do that in code or create a special SQL statement for that. This isn't all fringe cases that you run into when you're months in either. I really wonder on some of these comments saying they've vetter both and Postgres wins hands down.
Postgres is solid but it definitely has its warts and downsides too.
It sounds like you're under the impression that MySQL just makes everything case-insensitive and is silent about this? That's decidedly not the case.
MySQL 8 ships with 41 different character sets, supporting a total of 286 different collations. Collation names explicitly include "ci" (case-insensitive) vs "cs" (case-sensitive), as well as "ai" (accent-insensitive) vs "as" (accent-sensitive), and also the language/region conventions used for sorting purposes.
You can choose collation at the column-level granularity, as well as setting defaults at the table, schema, and server levels. It's completely explicit and very configurable.
Besides, MySQL also supports functional indexes, so you could do the (lower(email)) approach in MySQL too if you really want!
I never said you can't use functional indexes in MySQL. Someone said "you can't do this in PostgreSQL" and I just commented "here's (one way) to do this, maybe that's helpful some day". That's it.
Interests: I work at TiDB
There is no returning for insert/update/delete what forces me do a second query to get new data. Why dont just return what I have updated? Because it is some thing like: UPDATE wallets SET balance = balance + 1
I have to give up json function because its hard to work with it on mysql. Have to do the aggregation in my code instead
No uuid v4 support by default
PostgreSQL. The answer has always been PostgreSQL, even at the height of MySQL's popularity and LAMP craze.
It is a VASTLY better piece of software and ecosystem - yet it's boring at the same time - all things I demand from the DB system whether it's a toy project or an enterprise app.
Mixing OLTP and blazing fast analytic queries on the same database, at the very same time, removing ETL needs...
Postgres - is it pg, pgsql, psql, postgres, postgresQL? The answer is "yes."
Plus the case behavior for tables and column names drives me crazy. It's like some leftover VMS shit. I mean seriously fix it. Can you or can you not use a capital letter for a table/column name? I can never remember. Or you can, but you have to quote it? Fuck.
Until recently (which to be fair might be 8-10 years ago) postgres' performance monitoring tools sucked compared to mysql. I know at one point in the last 10 years they still used sunos4 as their base configuration because you know, the OS had been EOL for like a decade at that point.
MySQL is fire and forget. psql (or postgres or pg or postgresql?) is not fire and forget. It's twitchy and requires constant vigilance. I don't want a piece of infrastructure that requires constant vigilance.
That's not to say I won't use it. It's geo stuff is really great. It's JSON support is better than MongoDB's, from what I've heard. Row level security is awesome. But are those features good enough to overcome psql's quirks? Sometimes.
Learning any new CLI client is a bit daunting at first. With repetition and intention, I think the commands become very memorable. Eg “describe table” is “dt”.
What exactly do you mean with this? I tried to find some more information, and all I could find were some fixes from 2001[1] (SunOS 4 was supported until 2003), a minor refactor in 2008 with "SunOS 4 is probably broken anyway"[2], and that's pretty much it. SunOS 4 was moved to "Unsupported Platforms" with the release of 7.3, in 2002.[3]
[1]: https://postgrespro.com/list/thread-id/1598869
[2]: https://www.postgresql.org/message-id/20081211091708.072607545A4%40cvs.postgresql.org
[3]: https://www.postgresql.org/docs/7.3/supported-platforms.html
If that’s not your interest, I will admit that Postgres array support is far ahead of any of the MySQLs. Most ORMs don’t use it but you can get optimal query prefetching via array subqueries.
Kinda j/k. I'm intrigued my Postgres but I don't start that many new projects. It has at least 1 feature that MySQL doesn't -- deferred referential integrity which there's no good workaround for that I'm aware of. Arises extremely rarely when you need to have 2 tables have a FK to eachother.
* always use Postgres.
* if you need something else, it better have an actual reason.
MySQL/MariaDB is only for when you have a framework that requires it or is highly optimised for it (e.g., WordPress or MediaWiki).
I think that MySQL is more popular in enterprise as there's transparent encryption in enterprise version that's single click.
We have over the years compared the rate of production incidents Postgres vs MySQL. It's roughly 1:10 (MySQL has around 10 times more production incidents than Postgres).
You may consider this anecdotal evidence, but numbers managed here are quite significant.
The gist is that Postgres is not perfect nor free from required maintenance and occasional production incidents. But for the most part, it does the job. MySQL too, but with (at least from an operational perspective) many more nuances.
I have a production app on MySql, but that was before docker and MySQL was a fair bit easier to setup then. That was my need at the time.
Easy, you pick Postgres. There's no reason to ever use MySQL.
Plenty of big reasons, but it's the small quality of life stuff that makes a lot of difference.
Postgres - Better SQL semantics.
Having supported both in 'bet the business' scenarios, I would choose MySQL hands down. Operating state of the art HA postgres clusters today feels like running MySQL clusters in 2005.
- Better SQL standard support (modernish?).
- JSONB type, also with indexes! Very useful to have JSON fields to allow dynamic fields or avoid sub-querying some tables (i.e. I mean caching some computed results in a JSON field).
I have to say that it was in the 2018-2019, maybe MySQL has improved since then.Postgres is better in almost every possible aspect you can come up with.
So you choose mysql in 2023 only if you have very very specific requirements and constraints to justify the sacrifice.
1. MySQL's Replication is simple, more reliable and takes up less disk space than Postgres.
2. Ability to control the optimization of queries with Optimizer hints. At scale, to lower tail latencies you will definitely need to help the query planner since you have some domain knowledge about your app and queries that the planner doesn't have. Yes, In PG you can use pg_hint_plan, but still it is not an official solution.
3. MySQL has better connection scaling without a separate component like pgBouncer. Also, cost of connection creation is lower due to MySQL's thread per connection model vs Postgres' process per connection model
4. Collations suck in Postgres since it decided to depend on the OS for collation support. Version updates of other packages (glibc) or the OS can corrupt your database due to this collation mess. The fix for this is to use ICU collations but even they have multiple limitations (You can't specify a non-deterministic collation at the database level etc)
5. Postgres's MVCC implementation is the most inefficient among modern databases [1] Not only is it inefficient, it causes maintenance headaches with managing and tuning auto_vaccuum. It also causes increased disk usage, write amplification (entire row is rewritten on each update and index is updated even if the column being updated is not indexed) and increased bloat due to multiple dead copies of the rows. If you use wide tables (100s of columns) with updates (not even high frequency but moderate updates), MySQL will be far better. Heap-only-tuples (HOT) will only be helpful if your rows are narrow and there is sufficient free space in the page of the row being updated. For wide-tables, most often this is not the case, so even HOT won't be much helpful. Almost everyone would have read Uber's story of moving to MySQL [2], but even if you are not at Uber scale, Postgres's MVCC implementation, and its associated pains are better avoided. Unfortunately attempts to fix this in Postgres (zheap) have been long abandoned.
6. In MySQL (InnoDB), the rows are physically ordered by Primary Key. This improves cache-hit ratio and performance if most of you queries involve selecting or ordering by the PK, or a prefix of the PK in case of composite PKs.
So, if you need performance at scale, reliable replication for HA, less maintenance, good connection scaling, go for MySQL. If your app depends on a extension or FDW that only Postgres has, then choose Postgres.
Often, people may complain about some obscure SQL syntax that does not work in MySQL or that Postgres correctly implements but mostly there will be an alternative you can use in MySQL.
[1] - https://ottertune.com/blog/the-part-of-postgresql-we-hate-the-most/ [2] - https://www.uber.com/en-US/blog/postgres-to-mysql-migration/