Question

Your client is building a custom machine learning pipeline to identify lesions in the lungs based on x-rays. Different teams of data scientists are sharing common source data and building many ver-sions

asked Feb 1, 2024 69.5k views

Your client is building a custom machine learning pipeline to identify lesions in the lungs based on x-rays. Different teams of data scientists are sharing common source data and building many ver-sions of ML models. Which of these Cloud Storage options would be best for them?

(A). Retain the data in use in a single region bucket with nearline storage. Retain the data in use in a dual-region bucket.
(B). Retain the data in use in a single region bucket with standard storage.
(C). Retain the data in use in a multi-region bucket.
(D). Retain the data in use in a dual-region bucket.

Roopunk asked

by Roopunk

8.6k points

1 Answer

Ask a Question

Fazal · Answer 1 · 2024-02-05T23:58:57+0000

Final answer:

B) The best choice for a custom machine learning pipeline that requires frequent data access is to use a single region bucket with standard storage, due to its speed, efficiency, and cost-effectiveness.

Step-by-step explanation:

The correct answer is option (B). Retain the data in use in a single region bucket with standard storage. This storage option is optimal for data that is accessed frequently, such as the common source data for machine learning model development.

Standard storage provides low latency and high throughput, which are essential for data scientists who need quick access to data for iterative model training and testing. Single-region storage ensures data residency and compliance with regulations that may limit data storage to a certain geographical location.

Moreover, this option is usually more cost-effective than multi-region or dual-region solutions, which are better suited for serving global audiences or ensuring higher redundancy.

The correct answer is option C) Retain the data in use in a multi-region bucket.

When building a custom machine learning pipeline with multiple teams of data scientists, it is important to have a storage option that can scale and accommodate collaboration. A multi-region bucket in Cloud Storage would be the best choice in this scenario.

A multi-region bucket allows for global accessibility to the data, making it easy for different teams to access and share the common source data irrespective of their geographical locations. It also provides high availability and redundancy by storing data in multiple regions, ensuring that the data is accessible even if one region experiences an outage.

0 Comments

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Final answer:

Step-by-step explanation:

0 Comments

Please log in or register to add a comment.

Other Questions