What tools would you recommend to obfuscate sensitive data when dumping a database? I know there are some gems, but not sure how they would fit in this process.
This example is where the db and rails app are separate (since we are passing in the host). But, since the rails app has access to the db server, we can access it via the rails app server.
As far as obfuscating sensitive data, things would be more simple in smaller datasets, but when you're dealing with millions of records, it can quickly get complicated and take a long time. Some of the issues with large data would be
the amount of time required to obfuscate the data
exporting last(1000) records could miss referencing objects
lock tables from writing if you're coping data over to a separate table
conflicts of unique constraint columns
I know that we're sometimes in situations where we get to the point of saying, "but I need the production data". At that point, I would probably do a "hotfix" to add more logging in the area of the application and see what makes this particular org/company/user so unique. From there, you should, hopefully, be able to replicate it in your staging environment. If, you're seeing the same issues, then try to replicate it locally. If you're still unable to replicate the issue locally, then you could pull the staging database where the issue is being experienced. If pulling and loading the data to my local environment doesn't replicate the issue then there could be a difference in the environments (at an infrastructure level or some kind of environment flag within the code).
So, while it is kind of avoiding the question, I think that we shouldn't ever pull the production database (regardless of obfuscation or not).
Is this only applicable when your DB server and App Server are in the same docker container?
I know that we're sometimes in situations where we get to the point of saying, "but I need the production data". At that point, I would probably do a "hotfix" to add more logging in the area of the application and see what makes this particular org/company/user so unique. From there, you should, hopefully, be able to replicate it in your staging environment. If, you're seeing the same issues, then try to replicate it locally. If you're still unable to replicate the issue locally, then you could pull the staging database where the issue is being experienced. If pulling and loading the data to my local environment doesn't replicate the issue then there could be a difference in the environments (at an infrastructure level or some kind of environment flag within the code).
So, while it is kind of avoiding the question, I think that we shouldn't ever pull the production database (regardless of obfuscation or not).