I think this article is over the top. You don't lose control of your data if you store it, let's say, on S3. Use encryption if you wanna be safe.
You can write a similar article about storing data on a local system. It can get stolen, it can get hacked (ransomware). If you delete a file from your local drive there may be still data on disk. Software and OS often ping home and may read along. To think you have control if not in the cloud is an illusion.
Sure, cloud services have different security concerns but to think that it is uncontrollable is just a step to far for me. Also note that there a lot of different cloud services. A lot of privacy concerns have been related to SaaS. To think that you lose control of your data at the moment when using dropbox or S3, start a container (or an enclave),is just not realistic.
I think this is a great point. The primitives that cloud providers have are pretty good. I wish that there was some way to federate data storage, e.g. I provide third-party services an S3 bucket where they can store any data related to my account. I grant them access to read/write just data related to their service, and I can revoke it at any time.
For example, with YouTube, my account credentials, settings, watch history, uploaded videos, would all be stored in an S3 (or S3 compatible) bucket that I allow YouTube access to. If I ever want to view my data, I can. If I ever want to revoke access, I can do that too. If YouTube gets hacked my data would still be safe (depending on the attack of course), since it's all in S3. If S3 gets hacked I'm still safe as long as there is encryption at rest.
I know that there will be some tradeoffs, e.g. data locality, pricing, etc., but I feel like such a model would work so well. Taking the idea one step further: sell a plug-and-play consumer server that hosts the S3 storage. A consumer would purchase some off-the-shelf box, plug it in, and use a device they physically control to host all of their online data. This would require things like IPv6 to be more widely adopted, and for upload speeds to increase, but it would solve so many privacy issues.
Don't forget the lies of cost savings that the Cloud providers have shoved down our industry's throats. We are paying out the nose for cloud services and we are giving up all the rights to our data. It's a bad deal in the end.
I have a bunch of friends that work at SaaS companies and their cloud spend for pretty basic deployments is in the many thousands of dollars a month. Most of their deployments could be handled by a half rack with beefy servers in a couple of datacenters for a fraction of the cost. I pay for a full rack myself and it costs me ~$1200 a month for space, power and bandwidth (10Gb pipe with a current 1Gb commit), and my hardware costs for everything in that rack were a one time cost of around $3000. I have 160 GHz of CPU and 141 GiB of memory for my workloads with a few servers that are not yet provisioned into my Nomad cluster.
And before you say well there are costs involved with finding people that have the skills to do that kind of thing and time needed to set all of that up, yes that is true, but our industry has moved from one bucket to another one that is more expensive in the end with a bunch of downsides. I think there is a middle ground where you can use some cloud services and run the important stuff on hardware you own. The tooling to self-host your own stack in a rack of servers you own is light years better than it was 10 years ago and it keeps getting better. Tools like https://nebula.defined.net/docs/ and https://github.com/poseidon/typhoon for example enable you to use whatever providers you want and build a deployment can cost less, gives you more control over your data, while being agile enough to make changes when the team needs something new or different.
I am excited for the next 10 years of progress and I'd expect we are going to see more companies self-hosting their deployments on bare metal.
When I switched from colo and managed datacenter services to AWS in its early days, I saved a ton of money. But all of that hardware was much more expensive back then, and the quality of OSS products wasn't quite as good, and all kinds of other temporal realities were at play.
Now, it's flipped. Hardware is pretty cheap. There are high quality OSS products that have easy-to-manage, native HA capabilities. It's no longer impractical to operate a production-ready deployment for a few hundred dollars.
I guess the early cloud folks saw this coming, and now they're reaping the rewards.
> Now, it's flipped. Hardware is pretty cheap. There are high quality OSS products that have easy-to-manage, native HA capabilities. It's no longer impractical to operate a production-ready deployment for a few hundred dollars.
Yes, and cloud got cheaper and easier as well. Why are we pretending here that the cloud stood still while hardware made strides?
Cloud is still cheaper
For sure. I was at a company in the mid 2000s and we had 50+ racks of servers and managing those back in the day was a very big lift. If we had the tooling we do now back then, we could have done it with a fraction of the cost. When I told some of my old coworkers I was going to get my own rack two years ago, there were some comments that I am a glutton for pain and suffering. Overall, it's been a great experience and cost savings on my end. I couldn't afford to run my workloads on any cloud provider.
I also understand that I have a skill set that enables me to take advantage of running my own hardware, but with the advancements in tooling and cheaper hardware costs I believe the folks that don't have these skills have a better chance of doing the same. I am hopeful that datacenter providers will shift their business models in the future to enable easier access to getting space, power and bandwidth. That would help enable smaller teams to run their own hardware. The network/bandwidth part has gotten better. Almost all of the major datacenter providers offer some sort of software defined network service so you can get cheaper bandwidth and IP space than from the cloud providers. It is still more expensive in the long run than getting a bandwidth provider yourself. IPv4 space is still expensive but you can lease blocks pretty easily too.
we are still way cheaper than before at a previous company.
No need to worry about maintaining/monitoring HVAC (plus emergency AC systems) generators, security systems, water intrusion alarms, fire suppression, backhaul links..
Secondary site for DR that has duplicates of everything else.
Firewalls that can handle the actual IPSec traffic on those backhaul links. (large Junipers are not cheap)
Crazy lead times on new systems (months?!), and needing to size everything to fit our peak load. (we have some machines that add 12x more CPU to the database servers, run their reports in an hour, then remove all those extra CPU's now.)
It sounds like your workloads need that kind of flexibility and using a cloud provider makes total sense. My point is a lot of deployments most likely don't have those requirements (and never will) and there are way more cost effective means run them.
> And before you say well there are costs involved with finding people that have the skills to do that kind of thing and time needed to set all of that up, yes that is true, but our industry has moved from one bucket to another one that is more expensive in the end with a bunch of downsides
You really glossed over this point as if it wasn’t one of the most important points of debate in the cloud vs colo debate.
I've seen companies host a ton of tiny web sites, each in their own AWS account, replicating the same infrastructure N times. Tons of overhead. Total costs is many times what it would cost to host on a few mid range servers in a colo. But, it's "secure" and "scalable". Actual utilization is single digit requests/second, summed across all the sites, on a good day.
The problem is every company wants to have Google level usage so they plan for it and it never happens. This leaves them with an over engineered solution that costs an insane amount of money. The cloud providers continue feed our industry that FUD so companies feel using their services is the best and only option.
When cloud was a young buzzword there was a popular test: replace "in the cloud" with "on the internet" and see if you want to continue.
- We store pictures of our kids [in the cloud|on the internet].
- We store all our proprietary code [in the cloud|on the internet].
- We store all our secrets [in the cloud|on the internet].
- We store all the sensitive customer information that we could be fined millions for losing [in the cloud|on the internet].
It still is a good test, but I guess this ship has sailed...
Disclaimer: Senior Cloud Engineer for a $billion+ SaaS company here
I think this argument is only valid if you would use cloud services without private networking set up. The #1 skill a company needs if it wants to leverage the cloud is network engineering/security. There are things like Azure ExpressRoute and AWS DirectConnect that give you private access to the cloud providers own backbone network infrastructure to avoid sending traffic over the public internet. And if you are worried about securing the data at rest, you have everything available to encrypt and protect it. In my experience the problem is not that "the cloud" is insecure but companies trying to avoid the extra mile to properly set up their infrastructure for the sake of saving money and efforts. Sure, the hardware is not owned by you. But why should it be? Running this stuff at hyperscale is the more efficient and ultimately secure and reliable way.
I might just move a few things to Resilio Sync and/or Cryptomator where possible, altho I wonder if you feel similarly about such quasi-replacements. I definitely need something that preserves something somewhere dynamically so I don't have the worry about data loss which would absolutely be catastrophic. Some things are impractical to constantly have to manually and consciously version control to preserve, is the way I would phrase my sentiments.
This article reads like the ramblings of a security absolutist that's unfamiliar with the concept of a threat model.
Yes, the cloud is not perfectly secure, breaches happen and customer data gets leaked. No, overwhelming majority of people will never be capable of hosting their data more securely, and should therefore host it in the cloud or not host it at all.
This article is a good place to send people who don't understand why their ignorance plus the assertion that "everyone else does it" makes the cloud somehow OK.
The cloud has its uses, but people need to be aware of the complete lack of accountability and poor security before they decide to upload important data. This goes a long way to helping with the "I didn't know" defense. Read this article. Now you know.
> This article is a good place to send people who don't understand why their ignorance plus the assertion that "everyone else does it" makes the cloud somehow OK.
I'm not sure, the article seems to have been written with a highly technical audience in mind, and those folks tend to already care more about their privacy than people who are not. Example:
> Additionally, please note that I am using a simplified term of "cloud" which refers to storing data or meta-data of us in the public cloud. I am specifically not referring to cloud-computing in terms of processing nodes that may be even stateless.
Most people are not gonna have any idea what most of the domain terms in there mean. "data" vs "meta-data"? "public cloud" vs "private cloud"? processing nodes? stateless?
Sending this to someone who doesn't care about privacy/where data is hosted, won't be convinced by an article speaking a language they do not understand.
True, but it gives people a jumping off point if they want to learn more. Many / most people don't want more than just the summary ("assume everything you put in the cloud might be leaked, so consider the consequences"), but for those who want to learn more, this is where I can point them :)
Using the popular technique of `s/cloud/somebody else's computer` this becomes:
> You Can't Control Your Data in Somebody Else's Computer
Which seems like a pretty "common sense" sentiment.
> "2021-12: Gravatar lost 167 million names, usernames and MD5 hashes of email addresses (Source, German Source)"
Bad to pick one item and grumble, but I'm going to anyway. With Gravatar your identity is the MD5 hash of your email address. Go to any forum which uses Gravatar and look at the URL of avatar image, that blob of text is the MD5 hash of their email. And of course you click to go to the Gravatar profile, and see their username on the forum page. It makes no sense to say they are "lost" or leaked - they are published in normal use.
You can then make a few guesses that their email will be some variant of [email protected] and that the first part will be nice enough for a human to type, and the second part must be a valid domain - and likely a common domain. So if you think "MD5 is too big to brute force", this is a very restricted input. [At least, this used to be true, I haven't verified if Gravatar have changed anything - but I doubt it because the idea is that you sign up your email address at a new forum and it's magically connected to your Gravatar on another domain, so if it had extra salts and passwords it would lose all its convenience].
I'm not sure where the state of art is currently, but what about Homomorphic encryption? (https://en.wikipedia.org/wiki/Homomorphic_encryption) Data that can be encrypted at rest while non-readable by the one holding it, while still being able to perform computations on it.
I'm not experienced/knowledgeable enough to know if it's feasible or not, it seems kind of immature at this point, but on the surface it sounds like something that can give more privacy than today where most things are clear-text, while still enabling the use case of hosting said data somewhere else.
Regarding control of data, the strangest diacussions i have had is related to the customers being in control of the keys used to encrypt the data. For some reason it seems like using customer managed keys, or even customer managed hsm keys in the cloud is enough to keep the lawyers happy and all compliance in check. Of course anyone who has a basic understanding of key encryption and data encryption keys will know that the actual keys used to decrypt the data will be available at the cloud provider. So in the end it all comes down to, do the customers trust the cloud service provider
It's not just "in the cloud" - once data leaves your person, it is no longer under your control
This has been true for countless millennia: write it down? It's leakable.
To control your data you'd already have to have a notion of "data" and "control", and that's already lacking from most people's daily life. And when you do have those concepts swirling around in your mind, you'll still find that out of the people who feel they care about it, most don't know what they're doing, how they could change, or what they can do to affect any change. By the time you've informed and sorted everyone you ever talked to, to get to the people that care enough and want to control their data, you might be left with 1 or 2 people...
My reaction to the article was "huh? this is news?" Hasn't all this been obvious for years now to anyone paying attention?
I am not convinced installing competing services locally on a NAS or home server device will get all that much easier to setup. We are at the stage where you buy a thing, run some setup for storage and then a few clicks for installing the application. The challenge is setting up the bits on your phone too and dealing with networking. This is just too much for too many people who long ago decided renting cloud whatever meant when it broke someone else would fix it.
I would love to see a move to Self hosted but I just can't see it happening. Cloud software as a service doesn't have to be a dishonest business but it happens to become so every single time so someone can make even more money.
It blows me away that this even needs to be said. You have precisely zero control over anything in the cloud, ever. Your provider is free to lie to you, scan or monetize your data, use anything you "own" to make competing products, etc. and the only protection you have is a thin ToS that likely mandates arbitration if it gives you any rights whatsoever.
Most B2C cloud ToS documents reserve all rights to the provider. You own nothing and control nothing, peon.
B2B can be better but it's still pretty meaningless unless you have negotiated a specific side document or contract. That'll never happen unless you are a big spender. If you aren't a high roller there's numerous cases of cloud and SaaS providers destroying businesses by randomly changing things, locking accounts for no reason, etc. There is nothing you can do to compel them to care.
There is a reason for all this though. The market's "revealed preference" is that people hate system administration so much they will trade all rights, all freedom, all privacy, and control/ownership of their own data as well as pay rent forever with zero leverage to control cost increases to avoid dealing with it.
The market has written cloud and SaaS an almost literal blank check in exchange for them taking on the hateful burden of system administration and IT.
I'm creating a new programming language and while it hasn't been launched I'm starting to put out teasers
One of the benefits of this new language is the privacy, solves this exact problem https://ingig.substack.com/p/privacy