Final answer:
B) The best choice for a custom machine learning pipeline that requires frequent data access is to use a single region bucket with standard storage, due to its speed, efficiency, and cost-effectiveness.
Step-by-step explanation:
The correct answer is option (B). Retain the data in use in a single region bucket with standard storage. This storage option is optimal for data that is accessed frequently, such as the common source data for machine learning model development.
Standard storage provides low latency and high throughput, which are essential for data scientists who need quick access to data for iterative model training and testing. Single-region storage ensures data residency and compliance with regulations that may limit data storage to a certain geographical location.
Moreover, this option is usually more cost-effective than multi-region or dual-region solutions, which are better suited for serving global audiences or ensuring higher redundancy.
The correct answer is option C) Retain the data in use in a multi-region bucket.
When building a custom machine learning pipeline with multiple teams of data scientists, it is important to have a storage option that can scale and accommodate collaboration. A multi-region bucket in Cloud Storage would be the best choice in this scenario.
A multi-region bucket allows for global accessibility to the data, making it easy for different teams to access and share the common source data irrespective of their geographical locations. It also provides high availability and redundancy by storing data in multiple regions, ensuring that the data is accessible even if one region experiences an outage.