diff --git a/doc/caf/source/_static/images/SCR-20230808-frw.png b/doc/caf/source/_static/images/SCR-20230808-frw.png new file mode 100644 index 0000000..4c3755a Binary files /dev/null and b/doc/caf/source/_static/images/SCR-20230808-frw.png differ diff --git a/doc/caf/source/_static/images/SCR-20230808-g42.png b/doc/caf/source/_static/images/SCR-20230808-g42.png new file mode 100644 index 0000000..e338345 Binary files /dev/null and b/doc/caf/source/_static/images/SCR-20230808-g42.png differ diff --git a/doc/caf/source/_static/images/image1.png b/doc/caf/source/_static/images/image1.png new file mode 100644 index 0000000..5912e9f Binary files /dev/null and b/doc/caf/source/_static/images/image1.png differ diff --git a/doc/caf/source/_static/images/image10.png b/doc/caf/source/_static/images/image10.png new file mode 100644 index 0000000..1b41ad6 Binary files /dev/null and b/doc/caf/source/_static/images/image10.png differ diff --git a/doc/caf/source/_static/images/image11.png b/doc/caf/source/_static/images/image11.png new file mode 100644 index 0000000..3535746 Binary files /dev/null and b/doc/caf/source/_static/images/image11.png differ diff --git a/doc/caf/source/_static/images/image11_1.png b/doc/caf/source/_static/images/image11_1.png new file mode 100644 index 0000000..a34b997 Binary files /dev/null and b/doc/caf/source/_static/images/image11_1.png differ diff --git a/doc/caf/source/_static/images/image11_2.png b/doc/caf/source/_static/images/image11_2.png new file mode 100644 index 0000000..eb8b7b3 Binary files /dev/null and b/doc/caf/source/_static/images/image11_2.png differ diff --git a/doc/caf/source/_static/images/image12.png b/doc/caf/source/_static/images/image12.png new file mode 100644 index 0000000..49c1f6d Binary files /dev/null and b/doc/caf/source/_static/images/image12.png differ diff --git a/doc/caf/source/_static/images/image12_1.png b/doc/caf/source/_static/images/image12_1.png new file mode 100644 index 0000000..45ae12e Binary files /dev/null and b/doc/caf/source/_static/images/image12_1.png differ diff --git a/doc/caf/source/_static/images/image12_2.png b/doc/caf/source/_static/images/image12_2.png new file mode 100644 index 0000000..45ae12e Binary files /dev/null and b/doc/caf/source/_static/images/image12_2.png differ diff --git a/doc/caf/source/_static/images/image13.png b/doc/caf/source/_static/images/image13.png new file mode 100644 index 0000000..d50ab7e Binary files /dev/null and b/doc/caf/source/_static/images/image13.png differ diff --git a/doc/caf/source/_static/images/image13_1.png b/doc/caf/source/_static/images/image13_1.png new file mode 100644 index 0000000..cc174d6 Binary files /dev/null and b/doc/caf/source/_static/images/image13_1.png differ diff --git a/doc/caf/source/_static/images/image14.png b/doc/caf/source/_static/images/image14.png new file mode 100644 index 0000000..817fa1f Binary files /dev/null and b/doc/caf/source/_static/images/image14.png differ diff --git a/doc/caf/source/_static/images/image15.png b/doc/caf/source/_static/images/image15.png new file mode 100644 index 0000000..4caac0b Binary files /dev/null and b/doc/caf/source/_static/images/image15.png differ diff --git a/doc/caf/source/_static/images/image16.png b/doc/caf/source/_static/images/image16.png new file mode 100644 index 0000000..7fc1e52 Binary files /dev/null and b/doc/caf/source/_static/images/image16.png differ diff --git a/doc/caf/source/_static/images/image17.png b/doc/caf/source/_static/images/image17.png new file mode 100644 index 0000000..d84bc44 Binary files /dev/null and b/doc/caf/source/_static/images/image17.png differ diff --git a/doc/caf/source/_static/images/image18.png b/doc/caf/source/_static/images/image18.png new file mode 100644 index 0000000..e48f9ab Binary files /dev/null and b/doc/caf/source/_static/images/image18.png differ diff --git a/doc/caf/source/_static/images/image19.png b/doc/caf/source/_static/images/image19.png new file mode 100644 index 0000000..f4cf609 Binary files /dev/null and b/doc/caf/source/_static/images/image19.png differ diff --git a/doc/caf/source/_static/images/image2.png b/doc/caf/source/_static/images/image2.png new file mode 100644 index 0000000..5ff6119 Binary files /dev/null and b/doc/caf/source/_static/images/image2.png differ diff --git a/doc/caf/source/_static/images/image20.png b/doc/caf/source/_static/images/image20.png new file mode 100644 index 0000000..f4d3326 Binary files /dev/null and b/doc/caf/source/_static/images/image20.png differ diff --git a/doc/caf/source/_static/images/image21.png b/doc/caf/source/_static/images/image21.png new file mode 100644 index 0000000..20dec07 Binary files /dev/null and b/doc/caf/source/_static/images/image21.png differ diff --git a/doc/caf/source/_static/images/image22.png b/doc/caf/source/_static/images/image22.png new file mode 100644 index 0000000..ba79e62 Binary files /dev/null and b/doc/caf/source/_static/images/image22.png differ diff --git a/doc/caf/source/_static/images/image23.png b/doc/caf/source/_static/images/image23.png new file mode 100644 index 0000000..c0560bb Binary files /dev/null and b/doc/caf/source/_static/images/image23.png differ diff --git a/doc/caf/source/_static/images/image24.png b/doc/caf/source/_static/images/image24.png new file mode 100644 index 0000000..981b29e Binary files /dev/null and b/doc/caf/source/_static/images/image24.png differ diff --git a/doc/caf/source/_static/images/image25.png b/doc/caf/source/_static/images/image25.png new file mode 100644 index 0000000..649c355 Binary files /dev/null and b/doc/caf/source/_static/images/image25.png differ diff --git a/doc/caf/source/_static/images/image26.png b/doc/caf/source/_static/images/image26.png new file mode 100644 index 0000000..74605bf Binary files /dev/null and b/doc/caf/source/_static/images/image26.png differ diff --git a/doc/caf/source/_static/images/image27.png b/doc/caf/source/_static/images/image27.png new file mode 100644 index 0000000..3e6be40 Binary files /dev/null and b/doc/caf/source/_static/images/image27.png differ diff --git a/doc/caf/source/_static/images/image28.png b/doc/caf/source/_static/images/image28.png new file mode 100644 index 0000000..e35c55b Binary files /dev/null and b/doc/caf/source/_static/images/image28.png differ diff --git a/doc/caf/source/_static/images/image29.png b/doc/caf/source/_static/images/image29.png new file mode 100644 index 0000000..5a24cc7 Binary files /dev/null and b/doc/caf/source/_static/images/image29.png differ diff --git a/doc/caf/source/_static/images/image3.png b/doc/caf/source/_static/images/image3.png new file mode 100644 index 0000000..8880b89 Binary files /dev/null and b/doc/caf/source/_static/images/image3.png differ diff --git a/doc/caf/source/_static/images/image30.png b/doc/caf/source/_static/images/image30.png new file mode 100644 index 0000000..917c683 Binary files /dev/null and b/doc/caf/source/_static/images/image30.png differ diff --git a/doc/caf/source/_static/images/image31.png b/doc/caf/source/_static/images/image31.png new file mode 100644 index 0000000..c066aa5 Binary files /dev/null and b/doc/caf/source/_static/images/image31.png differ diff --git a/doc/caf/source/_static/images/image32.png b/doc/caf/source/_static/images/image32.png new file mode 100644 index 0000000..94f2514 Binary files /dev/null and b/doc/caf/source/_static/images/image32.png differ diff --git a/doc/caf/source/_static/images/image33.png b/doc/caf/source/_static/images/image33.png new file mode 100644 index 0000000..b338773 Binary files /dev/null and b/doc/caf/source/_static/images/image33.png differ diff --git a/doc/caf/source/_static/images/image34.png b/doc/caf/source/_static/images/image34.png new file mode 100644 index 0000000..335de15 Binary files /dev/null and b/doc/caf/source/_static/images/image34.png differ diff --git a/doc/caf/source/_static/images/image35.png b/doc/caf/source/_static/images/image35.png new file mode 100644 index 0000000..64427e2 Binary files /dev/null and b/doc/caf/source/_static/images/image35.png differ diff --git a/doc/caf/source/_static/images/image35_1.png b/doc/caf/source/_static/images/image35_1.png new file mode 100644 index 0000000..9a2d637 Binary files /dev/null and b/doc/caf/source/_static/images/image35_1.png differ diff --git a/doc/caf/source/_static/images/image36.png b/doc/caf/source/_static/images/image36.png new file mode 100644 index 0000000..0aeb145 Binary files /dev/null and b/doc/caf/source/_static/images/image36.png differ diff --git a/doc/caf/source/_static/images/image37.png b/doc/caf/source/_static/images/image37.png new file mode 100644 index 0000000..eec9861 Binary files /dev/null and b/doc/caf/source/_static/images/image37.png differ diff --git a/doc/caf/source/_static/images/image38.png b/doc/caf/source/_static/images/image38.png new file mode 100644 index 0000000..1ca95a7 Binary files /dev/null and b/doc/caf/source/_static/images/image38.png differ diff --git a/doc/caf/source/_static/images/image39.png b/doc/caf/source/_static/images/image39.png new file mode 100644 index 0000000..19e5819 Binary files /dev/null and b/doc/caf/source/_static/images/image39.png differ diff --git a/doc/caf/source/_static/images/image4.png b/doc/caf/source/_static/images/image4.png new file mode 100644 index 0000000..a6ddb34 Binary files /dev/null and b/doc/caf/source/_static/images/image4.png differ diff --git a/doc/caf/source/_static/images/image40.png b/doc/caf/source/_static/images/image40.png new file mode 100644 index 0000000..c8e6859 Binary files /dev/null and b/doc/caf/source/_static/images/image40.png differ diff --git a/doc/caf/source/_static/images/image41.png b/doc/caf/source/_static/images/image41.png new file mode 100644 index 0000000..438fa4d Binary files /dev/null and b/doc/caf/source/_static/images/image41.png differ diff --git a/doc/caf/source/_static/images/image41_1.png b/doc/caf/source/_static/images/image41_1.png new file mode 100644 index 0000000..0e8c7f7 Binary files /dev/null and b/doc/caf/source/_static/images/image41_1.png differ diff --git a/doc/caf/source/_static/images/image42.png b/doc/caf/source/_static/images/image42.png new file mode 100644 index 0000000..722d7a9 Binary files /dev/null and b/doc/caf/source/_static/images/image42.png differ diff --git a/doc/caf/source/_static/images/image43.png b/doc/caf/source/_static/images/image43.png new file mode 100644 index 0000000..14bc360 Binary files /dev/null and b/doc/caf/source/_static/images/image43.png differ diff --git a/doc/caf/source/_static/images/image44.png b/doc/caf/source/_static/images/image44.png new file mode 100644 index 0000000..fbc28c0 Binary files /dev/null and b/doc/caf/source/_static/images/image44.png differ diff --git a/doc/caf/source/_static/images/image45.png b/doc/caf/source/_static/images/image45.png new file mode 100644 index 0000000..6e877a9 Binary files /dev/null and b/doc/caf/source/_static/images/image45.png differ diff --git a/doc/caf/source/_static/images/image46.png b/doc/caf/source/_static/images/image46.png new file mode 100644 index 0000000..d0c62e2 Binary files /dev/null and b/doc/caf/source/_static/images/image46.png differ diff --git a/doc/caf/source/_static/images/image47.png b/doc/caf/source/_static/images/image47.png new file mode 100644 index 0000000..abad62f Binary files /dev/null and b/doc/caf/source/_static/images/image47.png differ diff --git a/doc/caf/source/_static/images/image48.png b/doc/caf/source/_static/images/image48.png new file mode 100644 index 0000000..533f18a Binary files /dev/null and b/doc/caf/source/_static/images/image48.png differ diff --git a/doc/caf/source/_static/images/image49.png b/doc/caf/source/_static/images/image49.png new file mode 100644 index 0000000..5055125 Binary files /dev/null and b/doc/caf/source/_static/images/image49.png differ diff --git a/doc/caf/source/_static/images/image4_1.png b/doc/caf/source/_static/images/image4_1.png new file mode 100644 index 0000000..068d370 Binary files /dev/null and b/doc/caf/source/_static/images/image4_1.png differ diff --git a/doc/caf/source/_static/images/image5.png b/doc/caf/source/_static/images/image5.png new file mode 100644 index 0000000..2b419a5 Binary files /dev/null and b/doc/caf/source/_static/images/image5.png differ diff --git a/doc/caf/source/_static/images/image50.png b/doc/caf/source/_static/images/image50.png new file mode 100644 index 0000000..6ce9a4c Binary files /dev/null and b/doc/caf/source/_static/images/image50.png differ diff --git a/doc/caf/source/_static/images/image51.png b/doc/caf/source/_static/images/image51.png new file mode 100644 index 0000000..d20976f Binary files /dev/null and b/doc/caf/source/_static/images/image51.png differ diff --git a/doc/caf/source/_static/images/image52.png b/doc/caf/source/_static/images/image52.png new file mode 100644 index 0000000..79cf044 Binary files /dev/null and b/doc/caf/source/_static/images/image52.png differ diff --git a/doc/caf/source/_static/images/image53.png b/doc/caf/source/_static/images/image53.png new file mode 100644 index 0000000..a9af4b7 Binary files /dev/null and b/doc/caf/source/_static/images/image53.png differ diff --git a/doc/caf/source/_static/images/image54.png b/doc/caf/source/_static/images/image54.png new file mode 100644 index 0000000..7c7d984 Binary files /dev/null and b/doc/caf/source/_static/images/image54.png differ diff --git a/doc/caf/source/_static/images/image55.png b/doc/caf/source/_static/images/image55.png new file mode 100644 index 0000000..28012dc Binary files /dev/null and b/doc/caf/source/_static/images/image55.png differ diff --git a/doc/caf/source/_static/images/image56.png b/doc/caf/source/_static/images/image56.png new file mode 100644 index 0000000..d8b50c7 Binary files /dev/null and b/doc/caf/source/_static/images/image56.png differ diff --git a/doc/caf/source/_static/images/image57.png b/doc/caf/source/_static/images/image57.png new file mode 100644 index 0000000..4a72cd0 Binary files /dev/null and b/doc/caf/source/_static/images/image57.png differ diff --git a/doc/caf/source/_static/images/image58.png b/doc/caf/source/_static/images/image58.png new file mode 100644 index 0000000..a9771cb Binary files /dev/null and b/doc/caf/source/_static/images/image58.png differ diff --git a/doc/caf/source/_static/images/image59.png b/doc/caf/source/_static/images/image59.png new file mode 100644 index 0000000..229fbab Binary files /dev/null and b/doc/caf/source/_static/images/image59.png differ diff --git a/doc/caf/source/_static/images/image6.png b/doc/caf/source/_static/images/image6.png new file mode 100644 index 0000000..7e90c17 Binary files /dev/null and b/doc/caf/source/_static/images/image6.png differ diff --git a/doc/caf/source/_static/images/image60.png b/doc/caf/source/_static/images/image60.png new file mode 100644 index 0000000..2a19a0e Binary files /dev/null and b/doc/caf/source/_static/images/image60.png differ diff --git a/doc/caf/source/_static/images/image61.png b/doc/caf/source/_static/images/image61.png new file mode 100644 index 0000000..9a40aa2 Binary files /dev/null and b/doc/caf/source/_static/images/image61.png differ diff --git a/doc/caf/source/_static/images/image62.png b/doc/caf/source/_static/images/image62.png new file mode 100644 index 0000000..4f23cca Binary files /dev/null and b/doc/caf/source/_static/images/image62.png differ diff --git a/doc/caf/source/_static/images/image63.png b/doc/caf/source/_static/images/image63.png new file mode 100644 index 0000000..f5ff3cf Binary files /dev/null and b/doc/caf/source/_static/images/image63.png differ diff --git a/doc/caf/source/_static/images/image64.png b/doc/caf/source/_static/images/image64.png new file mode 100644 index 0000000..fb3a203 Binary files /dev/null and b/doc/caf/source/_static/images/image64.png differ diff --git a/doc/caf/source/_static/images/image65.png b/doc/caf/source/_static/images/image65.png new file mode 100644 index 0000000..163c26d Binary files /dev/null and b/doc/caf/source/_static/images/image65.png differ diff --git a/doc/caf/source/_static/images/image66.png b/doc/caf/source/_static/images/image66.png new file mode 100644 index 0000000..ad84d2f Binary files /dev/null and b/doc/caf/source/_static/images/image66.png differ diff --git a/doc/caf/source/_static/images/image67.png b/doc/caf/source/_static/images/image67.png new file mode 100644 index 0000000..525b4d1 Binary files /dev/null and b/doc/caf/source/_static/images/image67.png differ diff --git a/doc/caf/source/_static/images/image68.png b/doc/caf/source/_static/images/image68.png new file mode 100644 index 0000000..d4f3a38 Binary files /dev/null and b/doc/caf/source/_static/images/image68.png differ diff --git a/doc/caf/source/_static/images/image69.png b/doc/caf/source/_static/images/image69.png new file mode 100644 index 0000000..6e5b23e Binary files /dev/null and b/doc/caf/source/_static/images/image69.png differ diff --git a/doc/caf/source/_static/images/image6_1.png b/doc/caf/source/_static/images/image6_1.png new file mode 100644 index 0000000..68fc44b Binary files /dev/null and b/doc/caf/source/_static/images/image6_1.png differ diff --git a/doc/caf/source/_static/images/image7.png b/doc/caf/source/_static/images/image7.png new file mode 100644 index 0000000..e1e6dea Binary files /dev/null and b/doc/caf/source/_static/images/image7.png differ diff --git a/doc/caf/source/_static/images/image70.png b/doc/caf/source/_static/images/image70.png new file mode 100644 index 0000000..3702346 Binary files /dev/null and b/doc/caf/source/_static/images/image70.png differ diff --git a/doc/caf/source/_static/images/image71.png b/doc/caf/source/_static/images/image71.png new file mode 100644 index 0000000..e12d600 Binary files /dev/null and b/doc/caf/source/_static/images/image71.png differ diff --git a/doc/caf/source/_static/images/image72.png b/doc/caf/source/_static/images/image72.png new file mode 100644 index 0000000..d9a498d Binary files /dev/null and b/doc/caf/source/_static/images/image72.png differ diff --git a/doc/caf/source/_static/images/image73.png b/doc/caf/source/_static/images/image73.png new file mode 100644 index 0000000..3ebf45c Binary files /dev/null and b/doc/caf/source/_static/images/image73.png differ diff --git a/doc/caf/source/_static/images/image74.png b/doc/caf/source/_static/images/image74.png new file mode 100644 index 0000000..4221b9c Binary files /dev/null and b/doc/caf/source/_static/images/image74.png differ diff --git a/doc/caf/source/_static/images/image75.png b/doc/caf/source/_static/images/image75.png new file mode 100644 index 0000000..9d20af9 Binary files /dev/null and b/doc/caf/source/_static/images/image75.png differ diff --git a/doc/caf/source/_static/images/image76.png b/doc/caf/source/_static/images/image76.png new file mode 100644 index 0000000..88581aa Binary files /dev/null and b/doc/caf/source/_static/images/image76.png differ diff --git a/doc/caf/source/_static/images/image77.png b/doc/caf/source/_static/images/image77.png new file mode 100644 index 0000000..883c8b5 Binary files /dev/null and b/doc/caf/source/_static/images/image77.png differ diff --git a/doc/caf/source/_static/images/image78.png b/doc/caf/source/_static/images/image78.png new file mode 100644 index 0000000..2fa9efe Binary files /dev/null and b/doc/caf/source/_static/images/image78.png differ diff --git a/doc/caf/source/_static/images/image79.png b/doc/caf/source/_static/images/image79.png new file mode 100644 index 0000000..015162c Binary files /dev/null and b/doc/caf/source/_static/images/image79.png differ diff --git a/doc/caf/source/_static/images/image8.png b/doc/caf/source/_static/images/image8.png new file mode 100644 index 0000000..c3276af Binary files /dev/null and b/doc/caf/source/_static/images/image8.png differ diff --git a/doc/caf/source/_static/images/image9.png b/doc/caf/source/_static/images/image9.png new file mode 100644 index 0000000..2465f5f Binary files /dev/null and b/doc/caf/source/_static/images/image9.png differ diff --git a/doc/caf/source/adopt/application-migration.rst b/doc/caf/source/adopt/application-migration.rst new file mode 100644 index 0000000..3706d1a --- /dev/null +++ b/doc/caf/source/adopt/application-migration.rst @@ -0,0 +1,129 @@ +Application Migration +--------------------- + +Based on T-Systems' successful practices and experience in serving a large +number of customers, Open Telekom Cloud had identified four main phases in the +migration process. An understanding of these four phases makes it easier +to ensure a smooth migration. + +.. image:: ../_static/images/image42.png + +Phase 1: Application analysis +****************************** + +The services and functions provided +need to be clearly identified, along with the technology stacks, +deployment modes, SLAs, and dependencies. You need to know where the +applications come from and how the O&M is handled. + +The research includes, but is not limited to: + +- Application architectures +- Application modules, internal and external dependencies, and the + languages and frameworks used +- Application hosts, including host configurations, specifications, + operating systems, total data volume, NICs, HA deployment, as well as + DR and backup requirements +- Databases, including the types and versions, how much data needs to + be handled, and the performance and HA requirements +- Middleware, including the types (such as message middleware and cache + database), versions, as well as cluster scale and capacity + +Phase 2: Migration planning +*************************** + +Based on the information collected in +Phase 1, assess applications using the **6R model** (Rehost, Replatform, +Refactor/Rearchitect, Repurchase, Retire, and Retain) and following +the general principles described here. Analyze application readiness +and benefits and identify proper migration paths. + +- General principles for migrating applications to the cloud +- For third-party SaaS services, as long as they can keep up with + service development, keep them unchanged. + +- For purchased software deployed on hosts, rehost or replatform them + to migrate them to the cloud. + +- For applications that are not ready for cloud, for example, + applications with incompatible host OSs, applications with + outdated, unavailable, or unsupported components, or applications + whose benefits from migration seem elusive, keep these + applications on premises. + +- For self-developed applications, rehost, replatform, or rearchitect + them, depending on the technical stack requirements of customers. + +- For components such as databases and middleware, if there are + available cloud services, replatform them, use pay-per-use billing + for the best cost-effectiveness, and so you don't have to worry + about the O&M. + +- Principles for prioritizing applications for migration + +Prioritize applications to be migrated based on the expected benefits +and on their readiness. Migrate applications in the order shown in the +following figure. Start with those that are easy to migrate and can +benefit much from migration, and give low priority to applications that +are more difficult or that will benefit little. + +.. image:: ../_static/images/image43.png + +Business Benefits Considerations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Considerations for business benefits include but not limited to: + +- Improving performance, increasing efficiency, and reducing costs +- Changing service requirements +- Enhancing user experience +- Serious architecture issues +- Auto scaling requirements +- Compliance + +Cloud Readiness Considerations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Considerations for cloud readiness include but are not limited to: + +- Service complexity +- Maturity of business and IT design +- Dependency +- Organizations and capabilities + +Phase 3: Migration planning of pilot applications +************************************************* + +Select pilot +applications based on the overall migration path planned for your +applications, plan the migration of the pilot applications, and +estimate the costs involved. Proper planning streamlines the +architecture and helps ensure a successful experience, which, in +turn, supports future migrations. + +Phase 4: Implementation and summary +*********************************** + +After developing the migration +plan and getting budget approved, migrate the pilot applications to +identify what the company should focus on in terms of technology, +organization, process, talent, and cost. This phase is critical to +accumulating practical experience. This phase helps companies more +confidently migrate applications to the cloud and benefit more from +the migration. + +When using the 6R methodology, the migration paths are as follows: + +.. image:: ../_static/images/image44.png + + +The following section covers the Rehost, Replatform, and Rearchitect +migration paths. + +.. toctree:: + :maxdepth: 1 + + rehost.rst + replatform.rst + rearchitect.rst + migration.rst \ No newline at end of file diff --git a/doc/caf/source/adopt/big-data-migration.rst b/doc/caf/source/adopt/big-data-migration.rst new file mode 100644 index 0000000..361d173 --- /dev/null +++ b/doc/caf/source/adopt/big-data-migration.rst @@ -0,0 +1,142 @@ +Big Data Migration +~~~~~~~~~~~~~~~~~~ + +Big data migration is a part of data migration and complies with the +cloud migration theory and project management logic. In terms of project +management, a big data migration plan contains **four phases**: + +- Business survey +- Migration solution design +- Migration implementation +- Migration assurance + +.. image:: ../_static/images/image60.png + +Migration Plan Design +^^^^^^^^^^^^^^^^^^^^^^^^^ + +In the **business survey phase**, the customer is the main party, and +T-Systems is the supporting party. They should work together to conduct the +survey and determine the customer's business status, including: + +- Customer's big data platform and services +- Physical deployment and data flows of the customer's big data + platform +- Big data assets, including resources, data, and permission + configurations + +In the **migration solution design phase**, T-Systems is the main party, +and the customer is the supporting party. They should work together to +design the migration solution and determine the migration content, +including: + +- The reconstruction and optimization needed to adapt to the + destination platform and how to ensure a smooth migration +- Whether to perform migration in different phases and the objectives + for each phase +- Cloud services involved in a platform migration +- Volume of the data to be migrated and the migration method to be used +- The customer's task scheduling system is the core for task migration +- The traffic switchover solution involves real-time data flows and + basic service capabilities such as Nginx and ELB. +- How to ensure a successful rollback + +In addition, required cloud resources should be planned and evaluated, +including: + +- Platform building resources +- Migration network bandwidth. This is closely related to the entire + cloud migration solution and involves cost estimation. In addition, + the networking on the cloud should also be planned, including how to + enable network connectivity and protect network security. + +In the **migration implementation phase**, the customer is the main +party in resource preparation and deployment, job system deployment and +process verification, data migration and verification, incremental data +migration and verification, parallel running, and performance tuning. + +The **migration acceptance phase** covers service cutover, verification, +and inspection, risk identification and handling, special training and +enablement, and formal handover. + +Example Migration Plan and Period +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The migration plan and period vary depending on the data volume, number +of tasks, components used, scheduling system, private line bandwidth, +and time period when data can be migrated. + +The following figure shows an example migration plan and period for a +medium-sized customer (data volume: x PB; computing resources: x +thousand cores). + +.. image:: ../_static/images/image61.png + +Big Data Migration Tools +^^^^^^^^^^^^^^^^^^^^^^^^ + +Open Telekom Cloud provides various migration tools for customers to choose +based on the scenario, data source, data volume, and requirements for +applications' responses to data. Customers can also choose open-source +or third-party migration tools. + +.. image:: ../_static/images/image62.png + +Open Telekom Cloud provides the following big data migration tools: + +CDM +*** + +CDM is an efficient and easy-to-use batch data migration service. It +provides easy-to-use migration capabilities and can integrate a wide +range of data sources into the data lake, reducing the data migration +and integration complexity and improving the efficiency. For details, +visit https://www.huaweicloud.com/intl/en-us/product/cdm.html. + +.. image:: ../_static/images/image63.png + +Kafka MirrorMaker +***************** + +It forwards streaming data to MRS-Kafka on Open Telekom Cloud in real time. It +is applicable to sequential messages as the consumer group has strict +requirements on the message sequence. + +Generally, the MirrorMaker process needs to be started on the +destination. The metadata of the source and destination Kafka clusters +must be the same and manually configured. + +DRS +*** + +DRS aims to enable database migration to the cloud without downtime. It +supports migration between homogeneous, heterogeneous, distributed, and +sharded databases. It also enables data integration and transmission +from databases to databases, data warehouses, and big data clusters in +seconds, laying a solid foundation for enterprise digital +transformation. For details, visit +https://www.huaweicloud.com/intl/en-us/product/drs.html. + +OMS +*** + +OMS helps you migrate data from the object storage on other clouds to +OBS on Open Telekom Cloud. + +OMS applies to the following scenarios: + +- Object migration: When you migrate typical web applications to Open Telekom + Cloud, OMS helps you easily migrate objects to OBS buckets on Open Telekom + Cloud. + +- Cloud disaster recovery: OMS allows you to replicate your objects to + OBS buckets on Open Telekom Cloud for disaster recovery and backup. + +- Object restoration: OMS allows you to use data backups from other + cloud service providers to quickly restore data on Open Telekom Cloud. + +For details, visit +https://www.huaweicloud.com/intl/en-us/product/oms.html. + +.. toctree:: + :maxdepth: 1 diff --git a/doc/caf/source/adopt/data-management-and-analytics-platform.rst b/doc/caf/source/adopt/data-management-and-analytics-platform.rst new file mode 100644 index 0000000..2210322 --- /dev/null +++ b/doc/caf/source/adopt/data-management-and-analytics-platform.rst @@ -0,0 +1,150 @@ +Data Management and Analytics Platform +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Data Lakes +^^^^^^^^^^ + +Data lakes are a new type of centralized repository that can store both +structured and unstructured data at any scale and does not require data +to be structured first. + +First-generation data lakes use the distributed architecture of the +open-source Apache Hadoop ecosystem. They use common hardware in local +data centers to allocate and process a large amount of raw data. The +Hadoop Distributed File System (HDFS) enables customers to store data in +its native form. Administrators of first-generation data lakes must keep +an eye on complex tasks such as capacity planning, resource allocation, +and performance optimization. Due to the complexity, slow valuation, and +heavy system management workloads, many local data lake projects failed +to meet expectations. + +Next-Generation Data Lakes +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Next-generation data lakes are based on cloud object-based storage. The +cloud provides diverse high-performance, scalable, and reliable +analytics engines and huge economies of scale, making data lakes more +cost-effective and scalable. + +Open Telekom Cloud's next-generation data lakes are built on Object Storage +Service (OBS) and feature storage-compute decoupling. This means that +compute and storage resources can be scaled separately, preventing +unbalanced allocation of computing and storage resources on a single +node. + +A data lake is a big data platform that converges data sources in +various formats within an enterprise. It provides data and compute power +through strict data permissions and resource control. A data lake is big +with multiple small marts. The most notable characteristic of a data +lake is that one piece of data can be analyzed in multiple ways. + +The evolution of data lakes is divided into three phases: + +- Offline data lake: Data is imported to the data lake more than 15 + minutes after the data is generated. + +- Real-time data lake: Data is imported to the data lake in real time + (usually in less than one minute) or quasi real time (1 minutes to 15 + minutes) after the data is generated. + +- Logical data lake: Data is integrated into a virtual data lake formed + by multiple physically isolated data platforms. + +.. image:: ../_static/images/image54.png + +Specialized data marts store data in specific formats for query and +analysis in specific scenarios. They are an important supplement to a +data lake. Customers may choose different data marts to meet their +varied data analysis requirements. For example, customers who have +ultra-high performance requirements may choose real-time OLAP or +in-memory databases, and those who prioritize their existing +applications' requirements may choose search databases. + +If customer data is used only for query and analysis of certain types, +specialized data marts do not depend on a data lake. However, such cases +are rare now, and specialized data marts are usually used together with +a data lake. + +Apart from the basic characteristics of data warehouses, data warehouse +marts have the following characteristics: + +- They are small and flexible, and can be organized in various ways, + such as by application, department, or region. +- Development is generally defined, designed, implemented, managed, and + maintained by business departments. +- They can be implemented quickly at a low cost. Investment can be paid + back in a short period of time. +- They integrate a wide range of tools. + +To reduce costs, it is recommended that the source data and detail data +be stored in OBS and that the summary data be stored in GaussDB(DWS). + +Specialized data marts are classified into real-time and offline marts +based on application scenarios. Real-time marts are used together with +Kafka and Flink. A typical case in point is the mart for querying bank +transactions. + +Open Telekom Cloud FusionInsight Intelligent Data Lake +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This next-generation data lake takes full advantage of cloud-native +advantages, such as fast deployment, auto scaling, almost infinite +scalability, cost-effective storage-compute decoupling, and Serverless +data analytics services. It aims to provide enterprises with a highly +scalable, available, and intelligent next-generation data lake +ecosystem, helping enterprises reduce O&M time and costs, and allowing +them to devote more resources to data analysis and business. + +.. image:: ../_static/images/image55.png + +FusionInsight provides extensive analytics services that adapt to all +types of data analysis scenarios and enable organizations of all sizes +and industries to reshape their business. Open Telekom Cloud provides +cost-effective and scalable dedicated services throughout data +collection, data management, data storage, data analysis, log analysis, +stream analysis, and machine learning (ML). + +If you want to use a big data platform for data +processing, you must integrate your data into the big data platform. +You can use different data integration tools based on the data type. +For example, you can use Data Ingestion Service (DIS) to import data +in real time, use Cloud Data Migration (CDM) to move massive amounts +of on-premises data to Open Telekom Cloud, and use Data Replication Service +(DRS) to migrate databases. + +DIS enables you to easily collect, process, and distribute real-time +streaming data so that you can quickly respond to new information. DIS +can be interconnected with a wide range of third-party data collection +tools and provides various cloud service connectors, agents, and SDKs. +DIS is applicable to scenarios such as device monitoring, real-time +recommendations, and log analysis in industries such as IoT, Internet, +and media. For details, visit +https://www.huaweicloud.com/intl/en-us/product/dis.html. + +For details about CDM and DRS, see section "‎4.2.4.4 Big Data Migration +Tools". + +.. tip:: + + It is recommended that you store the data migrated to the cloud in + OBS. If the data is small in size and needs to be processed in a + timely manner, you can also store it in HDFS. + +OBS is an object-based storage service that provides secure, reliable, +and low-cost data storage with an unlimited capacity. OBS provides +various storage types to meet customer requirements. For details, visit +https://www.huaweicloud.com/intl/en-us/product/obs.html. + +As for data computing, we provide different components for different +scenarios. You can use Data Ingestion Service (DIS) for stream +processing, MapReduce Service (MRS) or Data Lake Insight (DLI) for +offline batch processing, CloudTable for real-time query, +GaussDB(DWS) for interactive analysis or BI analysis, and Cloud +Search Service (CSS) for search. + +Big data analysis results can be used for enterprise management, +including report analysis, OLAP analysis, track mining, and user +tagging, helping enterprises make informed business decisions. + +.. toctree:: + :maxdepth: 1 diff --git a/doc/caf/source/adopt/data-migration.rst b/doc/caf/source/adopt/data-migration.rst new file mode 100644 index 0000000..ada655f --- /dev/null +++ b/doc/caf/source/adopt/data-migration.rst @@ -0,0 +1,38 @@ +Data Migration +-------------- + +With the expansion of the mobile Internet has come explosive growth in +data. Data forms and data processing requirements have also undergone +profound changes. In addition, application silos and data silos have +become the biggest obstacles to enterprises' digital transformation. The +main reasons for data silos include: + +- The information channels of different departments generate different + data storage formats. +- Departments define data based on their own business. As a result, + there is no standardized definition of data and the same data may be + given different meanings. + +In data governance, we may face challenges such as scattered resources, +data unavailability, and siloed applications. + +.. image:: ../_static/images/image53.png + +The following are the most urgent issues that enterprises need to +address: + +- Quickly integrating new and historical data to avoid information + silos +- Processing and analyzing various types of data with different value + densities in a cost-effective, efficient, and real-time manner to + meet business requirements +- Turning data into assets and paving the way for data-driven + innovation to stimulate business growth + +.. toctree:: + :maxdepth: 1 + + data-management-and-analytics-platform.rst + typical-data-lake.rst + big-data-migration.rst + diff --git a/doc/caf/source/adopt/index.rst b/doc/caf/source/adopt/index.rst new file mode 100644 index 0000000..c449c6c --- /dev/null +++ b/doc/caf/source/adopt/index.rst @@ -0,0 +1,8 @@ +Phase 3: Adopt +============== + +.. toctree:: + :maxdepth: 1 + + application-migration.rst + data-migration.rst \ No newline at end of file diff --git a/doc/caf/source/adopt/migration.rst b/doc/caf/source/adopt/migration.rst new file mode 100644 index 0000000..55893bd --- /dev/null +++ b/doc/caf/source/adopt/migration.rst @@ -0,0 +1,281 @@ +Migration Implementation +~~~~~~~~~~~~~~~~~~~~~~~~ + +Migration Team Establishment +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A large-scale migration project usually involves migration of a large +number of applications in a short period of time and complex +cross-application troubleshooting. The project management offices (PMOs) +will take the lead, and organize and arrange personnel from both parties +to carry out related work in an orderly and efficient manner based on +the project objectives. + +T-Systems and the customer, need to set up their own project teams based on +Open Telekom Cloud's Customers Success Engineers experience in cloud migrations. +The two teams will work together during the migration. + +The following figure shows the structure of the recommended migration team: + +.. image:: ../_static/images/image49.png + +- Project manager (PM): The PM sets up a joint project PMO to manage + project progress, identify and manage risks and other issues, and + promote the project within the organization. + +- Architecture and migration implementation team: The team designs + solutions for cloud migration and cutover, manage and implement + migration, and control technical risks during the implementation. + +- Test team: The team tests solutions and performs function, + performance, and joint commissioning tests. + +- O&M team: The team creates, manages, and monitors cloud resources. + +- Migration and R&D support team (Open Telekom Cloud): The team processes + escalated technical issues. + +- Development team (customer): The team is responsible for application + development, deployment, and migration. + +Migration Assurance +^^^^^^^^^^^^^^^^^^^ + +- Project kick-off meeting: A formal project kick-off meeting is + initiated to specify the project scope, objectives, delivery period, + and responsibilities. + +- Project communication management: Regular communication is important + between project members and project stakeholders. Communication + exposes and helps address potential issues. + +- Project progress management: The execution of project activities and + key tasks is monitored to ensure that the project progresses as + planned. If there is a deviation from the schedule, corrective + measures will be taken and reported to project members and + stakeholders in the form of weekly or daily reports. + +- Issue and risk management: Project assumptions and risks are + continuously monitored to quantify risks. The risk management plan + may need to be updated at time and the implementation of risk + mitigation measures needs to be ensured. Issues and risks are + recorded and tracked, including details such as the issue owners and + the times when the issues were identified. These details then need to + be reported to project members and other stakeholders in the form of + weekly or daily reports. + +- Delivery matrix: A joint project delivery matrix covering Open Telekom + Cloud and the customer will be developed. + +Testing and Verification +^^^^^^^^^^^^^^^^^^^^^^^^ + +Before services are cut over to the cloud, functional and performance +tests will be performed to verify that the applications are stable in +the cloud environment. + +Service function testing +************************ + +The Open Telekom project personnel use the cloud resource list to make sure +all the required resources on Open Telekom Cloud are enabled. They initialize +environment configurations, deploy applications, and migrate some data +to perform joint commissioning tests. After the environment is deployed, +customers can use their own test cases to test applications and confirm +that services can run properly on the cloud. + +Performance testing +******************* + +Performance testing tests not just performance. There are also load +testing, pressure testing, and stability testing. Performance testing +means continuously raising the access pressure on the system (in a +system test environment, the number of concurrent requests is constantly +increasing for the test program) to obtain system performance KPIs and +maximum load-bearing capacity. + +For performance testing, you can select JMeter or any other offering that +supports JMeter test projects or use third-party pressure test tools. + +.. tip:: + + JMeter (for simulating user traffic) and TCPCopy or GoReplay + (for recording or copying real user traffic) can be used to perform + pressure testing on all-link applications on the cloud to check whether + functions and performance meet requirements. + +During the test, test records and reports are generated based on the +monitoring metrics of Open Telekom Cloud monitoring systems (such as CES, AOM, +APM) and the customer's monitoring systems as well as application logs. + +After the pressure test is complete, the dirty data generated during the +test is deleted, and a full and incremental data migration is performed. +After the migration is complete, at an appropriate time, service traffic +is switched over to the application on the cloud. + +Service Cutover and Rollout +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Service cutover and rollout are the most critical steps in a cloud +migration. Any issues in this step may lead to major faults. To ensure +smooth service cutover and rollout, a detailed pre-cutover checklist +must be formulated. + +After the service cutover is complete, service running and data are +continuously monitored and observed until the services are stable on the +cloud. + +After the final data synchronization is complete and dirty data +generated during testing is deleted from the target environment on +Open Telekom Cloud, service cutovers are started in off-peak hours. Generally, +service cutover can be implemented all at once, or layer by layer. + +All at once +*********** + +For a simple or small-scale system, as long as a +full-service verification is performed first, the cutover can be +completed at a time. This kind of cutover is fast and has little +impact on ongoing services. + +The one-off cutover process is as follows: + +.. image:: ../_static/images/image50.png + +1. Stop the tests on the target and delete the test data. +2. Stop the services on the source. +3. Complete the incremental data synchronization. +4. (Optional) Configure reverse data synchronization to prepare for a rollback in case the cutover fails. +5. Modify the DNS configuration, switch the EIP, and switch traffic to the target. +6. Observe the service stability on the target. +7. Provide continuous assurance. + +Layer by layer +************** + +For complex services or large-scale systems, services +can be decoupled and cut over layer by layer. If there are any +problems, a rollback can be performed for an individual layer, which +reduces the risk of impacting services. However, a hierarchical +cutover like this requires multiple cutovers, which are +time-consuming and involve a heavy workload. + +Layer-by-layer migration consists of two steps. First, the application +layer is cut over to Open Telekom Cloud, but with the service data still being +read from and written to on the source. After the application layer +cutover is complete, a data migration or application dual-write is +performed to synchronize service data from the source to the target on +Open Telekom Cloud in real time. After the incremental data migration is +complete, the second step, the data layer cutover, will be executed. + +Layer-by-layer migration involves cross-cloud database access. The +network latency needs to be evaluated to see if it meets application +requirements. + +The layer-by-layer cutover process is as follows: + +.. image:: ../_static/images/image51.png + +Step 1: Application layer cutover ++++++++++++++++++++++++++++++++++ + +a. Stop the tests on the target and delete the test data. +b. Modify the configuration of the middleware layer on the target to + point to the data layer on the source (through Direct Connect or + VPN connections). +c. Modify the DNS configuration, switch the EIP, and switch traffic to + the target. +d. Observe the service stability on the target. + +Step 2: Database layer cutover +++++++++++++++++++++++++++++++ + +a. Stop the services on the target. +b. Complete the incremental data synchronization. +c. (Optional) Configure reverse data synchronization to prepare for + rollback upon migration failure. +d. Modify the configuration of the middleware layer on the source to + point to the data layer on the target. +e. Start services on the target. +f. Observe the service stability on the target. +g. Provide continuous assurance. + +Cutover risks and service rollback +********************************** + +The rollback solution in the all-at-once cutover scenario is simple. +This section describes the rollback solution in the layer-by-layer +cutover scenario. + +.. image:: ../_static/images/image52.png + +Rollback solutions in the hierarchical cutover scenario are different +if: + +The data layer has not been cut over +++++++++++++++++++++++++++++++++++++ + +You can directly switch the DNS to switch traffic back to the source +system. + +The data layer has been cut over +++++++++++++++++++++++++++++++++ + +Step 1: Roll back the data layer +################################ + +a. Stop the services on the target. +b. Complete the reverse data synchronization. +c. Modify the configuration of the middleware layer on the target to + point to the data layer created after the reverse synchronization on + the source. +d. Start services on the target. +e. Observe the service stability. + +Step 2: Roll back the application layer +####################################### + +a. Modify the configuration of the middleware layer on the source to + point to the data layer created after the reverse synchronization on + the source. +b. Modify the DNS configuration, switch the EIP, and switch traffic to + the source. +c. Start services on the source. +d. Observe the service stability. + +Assurance solution for seamless DNS switchover +********************************************** + +After a public domain name record is modified, it needs to be delivered +to all DNS servers around the world. Due to certain restrictions +inherent to international basic network infrastructure, the modification +takes effect within 2 hours in the Chinese mainland, but takes 48 hours +outside the Chinese mainland. During this time, the domain name may be +resolved to the original IP address, resulting in access exceptions. + +If the domain name is resolved to the original IP address because the +modification to the domain name resolution record has not taken effect, +you can deploy Nginx or iptables on the source and use the Nginx HTTP +proxy or the iptables NAT to forward traffic to Open Telekom Cloud, achieving +seamless switchover of domain name records. + +The following uses Nginx HTTP proxy as an example to describe the +procedure for configuring traffic forwarding: + +1. Before the switchover, deploy an Nginx reverse proxy server at the + back of the load balancer on the source to forward traffic to Open Telekom + Cloud ELB. +2. After the domain name is switched, bind the source domain name and IP + address to the load balancer. +3. The load balancer forwards the traffic to the backend Nginx server. + The backend Nginx server forward the traffic to the EIP of Open Telekom + Cloud ELB load balancer over the public network. +4. (Optional) Deploy traffic monitoring software (for example, ntop) on + the source Nginx server and view Nginx access logs. If no traffic is + generated and access logs are not updated, DNS resolution has been + switched over to Open Telekom Cloud. +5. Delete the load balancer and Nginx proxy server deployed on the + source. + +.. toctree:: + :maxdepth: 1 diff --git a/doc/caf/source/adopt/rearchitect.rst b/doc/caf/source/adopt/rearchitect.rst new file mode 100644 index 0000000..ebfb625 --- /dev/null +++ b/doc/caf/source/adopt/rearchitect.rst @@ -0,0 +1,134 @@ +Rearchitect +~~~~~~~~~~~ + +Rearchitecting, also called **application refactoring**, involves +re-imagining how an application is designed and developed, typically +using cloud-native features, such as transitioning from monolith to +microservices. This is typically driven by a strong need to add +features, scale, or performance that would otherwise be difficult to +achieve in the application's existing environment. This strategy tends +to be the most expensive, but will better meet future expansion +requirements in the long run. Splitting monolithic applications into +microservice requires in-depth involvement of business personnel. + +Traditional Application Architecture Issues +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Common issues of traditional monolithic applications include but are not +limited to: + +- Low resource utilization, and complex deployment and O&M +- Coarse system granularity, complex scheduling, and poor scalability +- A lack of systematic application standards, snowflake servers + (servers that are fragile and hard to replicate due to environment or + component upgrades) +- A lack of unified or comprehensive application monitoring and O&M, + since O&M personnel are busy in maintaining the underlying + infrastructure +- Complex application architecture +- Too many functional modules, an overly complex architecture +- Coupling between applications and states, which makes expansion + difficult +- Slow application iteration +- Complex application development where developers need to keep track + of all the details of an application architecture from service + governance (such as rate limiting, circuit breaker, and downgrade) + and data access, to message communication +- Command-based APIs that means developers have to focus too much on + the small execution details +- Manual testing that decelerates application release +- Applications that iterate too slow, so services cannot be developed + fast enough + +Cloud-Native Application Architecture +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +As defined by Cloud Native Computing Foundation (CNCF), cloud native +technologies empower organizations build and run scalable applications +in dynamic environments like public, private, and hybrid clouds. +Containers, service meshes, microservices, immutable infrastructures, +and declarative APIs exemplify this approach. + +Cloud-native application architectures involve: + +Microservices +************* + +In a microservice architecture, an application is broken down into +various small service components. This process simplifies the complex +application architecture. + +Containers +********** + +Containers facilitate the implementation of microservices and help +resolve the problem of low resource utilization. + +DevOps +****** + +Containers enhance the efficiency of software development and system +O&M, promote the maturity and development of the DevOps system, and help +resolve problems such as long application iteration period and complex +deployment and O&M. + +.. image:: ../_static/images/image47.png + +Application Reconstruction +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Application reconstruction involves microservices, containers, and +DevOps processes: + +- Microservice-based reconstruction includes, but is not limited to, + analyzing the current architecture, in-depth involvement of various + business departments, dividing microservices based on required + capabilities, defining interfaces between services, formulating + development specifications for microservice reconstruction, and + managing microservices. +- Container-based reconstruction includes but is not limited to + determining the reconstruction scope, identifying application + dependencies, creating container images, and orchestrating and + managing containers. +- DevOps-based reconstruction includes but is not limited to analyzing + R&D process, tools, and peripheral dependencies, identifying gaps, + selecting pilot applications, and conducting training and promotion + of agile principles. + +The following image shows a typical architecture where companies use VMs +to carry individual service applications and lack a unified service +governance tool and pipeline platform. As a result, resource utilization +is low, capacity expansion difficult, R&D inefficient, rollout slow, and +O&M expensive. + +.. image:: ../_static/images/image48.png + +Application reconstruction can bring the following benefits: + +- Microservice-based reconstruction: Monolithic applications can be + split into small, independent, light, and loosely coupled + microservice clusters based on the AKF Scalability Cube model, + front-end and back-end separation, and application and status + separation. Service mesh technologies, such as Istio, can be used to + manage microservices across programming languages. Applications, + databases, and middleware are deployed in multiple AZs. Traffic is + distributed through ELB. The underlying data and statuses are + synchronized, and applications are deployed in active-active HA mode. +- Container-based reconstruction: Applications are hosted in fast, + cost-effective containers that are decoupled from the underlying + operating systems. K8S-based CCE provides elastic scheduling for + automatic scheduling and self-healing and elastic capacity scaling on + demand, ensuring services run smoothly and resources are not wasted. +- DevOps-based reconstruction: Automatic pipelines and application + environment orchestration enable resource triggering within seconds + and environment deployment within hours. E2E O&M includes faster + logging, comprehensive monitoring, efficient alarming, and + comprehensive insights in to the entire system. +- Self-organizing teams: Small teams complete service analysis, + development, testing, deployment, and O&M. They identify delivery + bottlenecks, and quickly verify and optimize services using + principles of lean production. + + +.. toctree:: + :maxdepth: 1 diff --git a/doc/caf/source/adopt/rehost.rst b/doc/caf/source/adopt/rehost.rst new file mode 100644 index 0000000..2acc4ed --- /dev/null +++ b/doc/caf/source/adopt/rehost.rst @@ -0,0 +1,86 @@ +Rehost +~~~~~~ + +Rehost, also known as **lift and shift**, is the most common way to +migrate applications to the cloud without changing the running +environment of applications. It is usually used for Physical to Virtual +(P2V) and Virtual to Virtual (V2V) scenarios. It can help companies +quickly migrate applications such as SAP, ERP, and CRM to the cloud. + +Open Telekom Cloud provides three rehosting solutions: + +Application redeployment +************************ + +In this solution, applications can be redeployed on ECSs or BMSs. This +solution is ideal for stateless applications that do not involve data +migration. The OSs of cloud servers can be changed as needed, for +example, if an old OS is no longer supported. This solution is +recommended when a new OS is required, but this means the applications +will be offline for a time. + +Image import & export +********************* + +By exporting system images of source servers and then importing those +images to the cloud as private images, you can quickly create cloud +servers with the same OSs and other details as your legacy servers. This +solution is a good choice when you need to migrate on-premises servers +that do not have too much data on them. The servers will have the same +OSs before and after the migration, but there will be a fair bit of +downtime. + +Server Migration Service (SMS) +****************************** + +SMS can migrate applications to the cloud and synchronize incremental +data to minimize the downtime. However, the OS cannot be upgraded during +the migration. + ++--------------+---------------------+---------------------------------+ +| Object | Migration Method | Pros and Cons | ++==============+=====================+=================================+ +| Virt | Redeployment | - Easy OS change | +| ual/physical | | | +| servers | | - Long downtime | ++--------------+---------------------+---------------------------------+ +| | Image import & | - OS consistency | +| | export | | +| | | - Long downtime | ++--------------+---------------------+---------------------------------+ +| | SMS | - OS consistency | +| | | | +| | | - Long downtime | ++--------------+---------------------+---------------------------------+ + +Take a typical three-layer application architecture as an example. The +following figure shows how the architecture is different before and +after the migration. + +.. image:: ../_static/images/image45.png + +Rehost has the following benefits: + +- The application architecture is consistent before and after the + migration, so you know the original technology stack still work. + Rehosting ensures the migration of your applications can go smoothly. +- If the databases were built using Open Telekom Cloud ECS, the database + licenses can be reused in commercial database scenarios to save + money. +- Applications are deployed across AZs, so you can configure DC-level + HA. +- With Open Telekom Cloud ELB and Auto Scaling, services can be flexibly + scaled to adapt to workload changes. +- ELB replaces traditional offline hardware load balancing devices and + the network ACLs replace traditional hardware firewalls, further + reducing the hardware investments required. +- The O&M is simpler. CES provides comprehensive O&M monitoring of + cloud infrastructure, and LTS provides quick collection and analysis + of application logs. +- The reliability is enhanced. CBR backs up cloud servers for restore + or other server issues. +- The security is hardened. HSS protects cloud servers, WAF filters web + application traffic, and DBSS hardens cloud databases. + +.. toctree:: + :maxdepth: 1 diff --git a/doc/caf/source/adopt/replatform.rst b/doc/caf/source/adopt/replatform.rst new file mode 100644 index 0000000..16c48c7 --- /dev/null +++ b/doc/caf/source/adopt/replatform.rst @@ -0,0 +1,106 @@ +Replatform +~~~~~~~~~~ + +Replatforming involves upgrading an application from it's existing +legacy platform to a more modern cloud platform. It means replacing +traditional application components (such as databases and middleware) +with Open Telekom Cloud services, but without changing the core architecture +of applications. + +For example, you can replace relational databases with +cloud database services from Open Telekom Cloud, replace self-built message +middleware with message queue services provided by Open Telekom Cloud, and +replace self-built cache databases with cache database services on +Open Telekom Cloud. This makes management less expensive and makes +applications more efficient and scalable. + +Open Telekom Cloud provides the following solutions to migrate customers' +self-built databases and middleware or those on third-party cloud +platforms: + ++------+------+-----+------------+----------------+-------------------+ +| Ob | Type | Sou | Target | Migration | Pros and Cons | +| ject | | rce | | Method | | ++======+======+=====+============+================+===================+ +| Data | SQL | S | OTC | Data | The target is | +| base | Se | elf | Cloud RDS | Replication | RDS, and the | +| | rver | -bu | for SQL | Service | downtime is a few | +| | | ilt | Server | (OTC Cloud) | minutes. | +| | | /DB | | | | +| | | aaS | | | | ++------+------+-----+------------+----------------+-------------------+ +| | M | | OTC | Data | The target is | +| | ySQL | | Cloud RDS | Replication | RDS, and the | +| | | | for MySQL | Service | downtime is a few | +| | | | | (OTC Cloud) | minutes. | ++------+------+-----+------------+----------------+-------------------+ +| | Po | | OTC | Data | The target is | +| | stgr | | Cloud RDS | Replication | RDS, and the | +| | eSQL | | for | Service | downtime is a few | +| | | | PostgreSQL | (OTC Cloud) | minutes. | ++------+------+-----+------------+----------------+-------------------+ +| | Mon | | OTC | Data | The target is | +| | goDB | | Cloud | Replication | RDS, and the | +| | | | Document | Service | downtime is a few | +| | | | Database | (OTC Cloud) | minutes. | +| | | | Service | | | ++------+------+-----+------------+----------------+-------------------+ +| Mi | R | S | OTC | DCS-Migration | The target is | +| ddle | edis | elf | Cloud | | DCS. | +| ware | | -bu | D | | | +| | | ilt | istributed | | | +| | | /Cl | Cache | | | +| | | oud | Service | | | +| | | se | (DCS) for | | | +| | | rvi | Redis | | | +| | | ces | | | | ++------+------+-----+------------+----------------+-------------------+ +| | | S | OTC | Redis-port | Offline export | +| | | elf | Cloud | | and import | +| | | -bu | D | | | +| | | ilt | istributed | | | +| | | /Cl | Cache | | | +| | | oud | Service | | | +| | | se | for Redis | | | +| | | rvi | | | | +| | | ces | | | | ++------+------+-----+------------+----------------+-------------------+ +| | K | S | OTC | MirrorMaker | Only data in | +| | afka | elf | Cloud | | Kafka clusters | +| | | -bu | D | | can be | +| | | ilt | istributed | | synchronized. | +| | | | Message | | Consumer groups | +| | | | Service | | or consumption | +| | | | (DMS) for | | progress cannot | +| | | | Kafka | | be synchronized. | ++------+------+-----+------------+----------------+-------------------+ + +Consider the following typical architecture: A company uses Kafka +message middleware to mask performance inconsistencies between front- +and back-end applications. The applications get decoupled. They use a +Redis database cache for hot data and MySQL databases for core service +data. In a traditional IDC, the company needs to build their own +middleware and databases, implement HA deployment, backup and restore +solutions, and maintain corresponding components. Deployment can be +inefficient, O&M expensive, and capacity expansion difficult. + +.. image:: ../_static/images/image46.png + +Open Telekom Cloud provides cloud services that let companies deploy +middleware and database components on the cloud. These services simplify +middleware and database deployment and O&M. Companies can enjoy the +following benefits: + +- Instances can be provisioned in just minutes, so they can take + advantages of pay-per-use middleware and database services. +- Cloud services can be deployed in HA configurations. They can deploy + active and standby MySQL instances and Kafka/Redis clusters, and use + cross-AZ deployment for data center-level HA. +- There is pay-per-use cloud middleware, such as message middleware and + cache database, and diverse instance specifications. Easy capacity + expansion is easy so they can start small and grow big +- They don't have to worry about O&M of middleware and databases, which + saves money on O&M. + +.. toctree:: + :maxdepth: 1 diff --git a/doc/caf/source/adopt/typical-data-lake.rst b/doc/caf/source/adopt/typical-data-lake.rst new file mode 100644 index 0000000..f06f68f --- /dev/null +++ b/doc/caf/source/adopt/typical-data-lake.rst @@ -0,0 +1,142 @@ +Typical Data Lake Scenarios +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Data Warehouse and Report Analysis +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This is a traditional data warehouse mode in which a real-time data +warehouse is available and data mainly comes from databases. The data +warehouse aggregates data from different business systems (such as the +ERP, CRM, OA, and financial systems) and on-premises data centers, and +processes, governs, and gains insight into the data by layer. The data +warehouse eliminates data silos between departments, helps build a +decision-making and analytics system, and provides data for business +analysis and decision-making. + +.. image:: ../_static/images/image56.png + +This solution mainly uses GaussDB(DWS). It has the following +characteristics: + +- One-stop big data BI platform: a comprehensive and efficient data + collection, analysis, and BI platform that streamlines data from + multiple business systems and provides full-stack technical + capabilities. + +- Cost-effective data analytics foundation: Open Telekom Cloud + high-performance GaussDB(DWS) service and data synchronization + solution can be used to analyze massive amounts of data quickly to + unleash data value. + +- Efficient development and simple management: DGC provides visualized, + easy-to-use, flexible, and efficient development, management, and + scheduling of data ETL tasks. + +- Mature and reliable BI tools: Open Telekom Cloud cooperates with top BI + vendors to provide mature, reliable, flexible, and efficient + visualized BI tools, making operational analysis much easier and + accelerating value monetization. + +Integrated Solution for Streaming and Batch Processing and Query +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This solution mainly applies to event tracking logs, which are commonly +used in user behavior analysis and content and product recommendations. +Event tracking logs are transferred to Flink through Kafka for real-time +processing. The dimension tables required by Flink are stored in +GaussDB(DWS) which allows millions of queries per second (QPS) for +Flink, stores the analysis results from Flink, and enables upper-layer +services to query the results. + +Spark is a batch processing engine. The report data processed by Spark +is stored in GaussDB(DWS) or OBS, and read using GaussDB(DWS) foreign +tables. + +.. image:: ../_static/images/image57.png + +This solution mainly uses GaussDB(DWS). It has the following +characteristics: + +- DLI-Flink supports stream processing and can efficiently process + event tracking logs. + +- GaussDB(DWS) supports efficient indexing and clustering and millions + of QPS for Flink dimension tables. + +- GaussDB(DWS) can read OBS data so that the data processed by Spark + can be queried using GaussDB(DWS) foreign tables, avoiding redundant + data storage. + +- GaussDB(DWS) enables query of fixed reports and self-service + analysis, and provides high-performance, high-concurrency, and + flexible report query capabilities. + +Real-Time Import of Data from Databases to a Data Lake +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Incremental data in databases can be imported to the data lake in real +time. + +DRS provides the CDC capability for TP/HTAP databases and can push +real-time incremental data to a standard Kafka cluster for consumption +in the data lake. + +.. image:: ../_static/images/image58.png + +1. The built-in Kafka cluster of Change Data Lake (CDL) can directly + connect to DRS and write data to Hudi, ClickHouse, and DLI in real + time. By default, DRS can dump data to CSS and OBS (low priority) in + real time for real-time search and AI training. CDL establishes a + link to DLI. DLI and MRS share CDL's capability to import data to the + data lake. CDL has embedded Spark scripts. You can import data directly to Hudi or + ClickHouse without writing SQL scripts. + +2. CDC imports data to the data lake through DRS and Kafka. + DIS/DMS provides a standard Kafka message pipe for big data services + such as MRS and DLI. DIS must support standard Kafka APIs. + +3. The database CDC imports data to GaussDB(DWS) in real time through + DRS and Kafka. + + - Direct import: This mode applies to scenarios where less than or + equal to 3,000 lines of data need to be synchronized per second. DRS + parses the real-time incremental data in the source database and + directly writes it to GaussDB(DWS). + + - Import through a buffer: This mode applies to scenarios where more + than 3,000 lines of data need to be synchronized per second. DRS + obtains real-time incremental data from the source database and + pushes the data to the backend Kafka message cluster. Then the + built-in GDS-kafka Connector of GaussDB(DWS) writes the data to + GaussDB(DWS) tables. + +Real-Time Import of Messages and Logs to a Data Lake +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +DIS allows you to build Serverless message clusters and collect and dump +data to OBS, DLI tables, and CloudTable. + +LTS provides application operations and O&M capabilities, such as log +collection, query, and analysis. In addition, LTS can dump log data to +data lake components such as OBS, DIS, and DLI tables for further +analysis. + +.. image:: ../_static/images/image59.png + +**Real-time import of streaming data such as messages to the data lake +through DIS** + +- DIS provides Serverless message clusters with message pipelines for + dumping messages to big data cloud services such as MRS and DLI. + +- DIS can also collect and dump data to big data ecosystem services + such as OBS, DLI, and CloudTable at scheduled time. + +**Import of streaming data such as application logs to the data lake +through LTS** + +LTS provides log collection and O&M functions and can dump logs to +DIS/DMS, OBS, and DLI for further analysis. + +.. toctree:: + :maxdepth: 1 diff --git a/doc/caf/source/concluding-remarks.rst b/doc/caf/source/concluding-remarks.rst new file mode 100644 index 0000000..b440d21 --- /dev/null +++ b/doc/caf/source/concluding-remarks.rst @@ -0,0 +1,17 @@ +Concluding Remarks +================== + +The CAF white paper is a cloud migration strategy based on best +practices derived from Huawei Cloud customers' cloud migration cases and +based on our own IT migration. It outlines four stages: migration plan, +cloud construction, application setup, and system governance and O&M. +CAF provides full lifecycle guidance for enterprises migrating services +to the cloud, including service plan, preparation, architecture, +organization, management, and O&M. It aims to help enterprises smoothly +migrate services to the cloud and ensure that services can run +efficiently on the cloud. In addition, the risks of migrating to and +using the cloud are reduced, while the value is increased. + +If you have any comments or suggestions while reading this white paper, +we sincerely welcome you to send them to our official website. We will +keep working to improve. diff --git a/doc/caf/source/govern-and-manage/account-security-management.rst b/doc/caf/source/govern-and-manage/account-security-management.rst new file mode 100644 index 0000000..809af23 --- /dev/null +++ b/doc/caf/source/govern-and-manage/account-security-management.rst @@ -0,0 +1,160 @@ +Account Security Management +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This section describes common account types and management measures. + +Account types +************* + +- CBH accounts +- OS account +- Database accounts +- Cloud management platform account +- Business system account +- Tenant account provided by the cloud provider +- Other business-related accounts + +Roles and responsibilities +************************** + ++-------------------+--------------------------------------------------+ +| Role | Description | ++===================+==================================================+ +| Account applicant | The one who is applying for an account or | +| | permission (for themselves or others) | +| (Maintenance, | | +| development, or | | +| audit engineer) | | ++-------------------+--------------------------------------------------+ +| Account user | User of an account or permission | +| | | +| (Maintenance, | | +| development, or | | +| audit engineer) | | ++-------------------+--------------------------------------------------+ +| Account | Responsible for routine account and permissions | +| administrator | management, such as reviewing account creation | +| | and deregistration requests, managing passwords, | +| (System | and changing the owner of an account. | +| administrator) | | ++-------------------+--------------------------------------------------+ +| Account approver | Responsible for reviewing and approving account | +| | permissions applications. Their work includes: | +| (O&M manager) | | +| | - Checking whether the applications for | +| | accounts and permissions are authentic, | +| | necessary, and appropriate. | +| | | +| | - Regularly reviewing accounts and permissions. | +| | Revoke unnecessary accounts or permissions in | +| | a timely manner. | +| | | +| | - Ensuring the overall security of accounts and | +| | permissions. | ++-------------------+--------------------------------------------------+ + +Constraints +*********** + +- Employees can apply for permissions to access the production + environment only when necessary. +- For employees who have worked in your company for one month or + longer, but have not completed their probationary periods, their + supervisors shall assume responsibilities when they apply for system + permissions. +- Employees who have violated O&M regulations shall not be granted + long-term change permissions. Only one-time change permissions can be + granted. + +Validity period +*************** + +- The validity period of a permission depends on the role. If a + permission expires, you need to apply for it again. + ++--------------------+-------------------------------------------------+ +| Permission | Usage | +| Validity Period | | ++====================+=================================================+ +| 1 year | Long-term network O&M. Common applicants | +| | include: | +| | | +| | - O&M personnel | +| | | +| | - O&M manager | +| | | +| | - O&M platform developers responsible for | +| | long-term O&M tool development, system | +| | interconnection, and release | ++--------------------+-------------------------------------------------+ +| 3 months | Project development. Permissions shall be | +| | revoked after the project delivery. | ++--------------------+-------------------------------------------------+ +| 1 week | Short-term project delivery, acceptance, and | +| | issue tracking. | ++--------------------+-------------------------------------------------+ +| 1 day | Short-term access for incident/problem | +| | handling, issue escalation, and security check. | ++--------------------+-------------------------------------------------+ + +Application process +******************* + +1. The applicant fills out the application form, sends it to the account + approver, and CC's it to the account administrator. +2. The account approver checks whether the application is necessary and + appropriate, and then approves it or provides review comments. +3. If the application is approved, the account administrator creates an + account and send it to the account user. +4. The account user changes the initial password before using the + account. + +.. note:: + To change the permissions of an account, you do not need the account + approver to approve the account again. Submit an online request or send + an email to obtain approval for the change. Then, send the proof of + approval to the account administrator. The account administrator will + assign permissions accordingly. + +Deregistration process +********************** + +1. The account applicant sends an email or submits an online application + for deregistration and explains the reason (such as resignation, job + transfer, or job responsibility change). +2. The account administrator deregisters the account and revokes its + permissions. +3. The account administrator records the deregistration results and + sends them to the account applicant, owner, and user. + +The account administrator regularly checks account validity periods, +reports expired accounts, and deletes accounts as needed after +conferring with related business managers and users. + +.. note:: + - Accounts that are valid for only one week or less will automatically + be deregistered upon expiry. + - For employees who have submitted resignation applications, ensure + their permissions are revoked before they leave the company. + +Precautions +*********** + +- Configure appropriate password complexity rules. A password must be + at least 8 characters long and must contain digits, letters, and + special characters. Do not allow common weak passwords or + easy-to-guess passwords. +- Do not allow users to lend their accounts to others. A user is + responsible for all the operations performed through their account. +- For accounts that have not been logged in for 30 days, the account + administrator can deregister them after notifying the account users + by email. +- If the account administrator finds that the password of an account + was leaked or compromised, they must immediately report the problem + to the account owner, reset the password and notify the account user, + and send a detailed report about this incident to the account owner. +- Passwords need to be changed regularly. Password cannot be used for + longer than 90 consecutive days. + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/govern-and-manage/backup-and-restoration.rst b/doc/caf/source/govern-and-manage/backup-and-restoration.rst new file mode 100644 index 0000000..add6cba --- /dev/null +++ b/doc/caf/source/govern-and-manage/backup-and-restoration.rst @@ -0,0 +1,127 @@ +Backup and Restoration +~~~~~~~~~~~~~~~~~~~~~~ + +Data security is a critical concern for most enterprises today. When +migrating services to the cloud, cloud data security is a major concern. +Huawei Cloud provides a comprehensive backup and restoration system to +guarantee data security. + ++------------------+---------------------------------------------------+ +| Object | Backup Policy | ++==================+===================================================+ +| ECS | Cloud Backup and Recovery (CBR) uses snapshots to | +| | back up entire ECSs, or you can configure it to | +| | back up the system disk and data disks separately | +| | if needed. You can configure a backup policy with | +| | whatever execution time and backup frequency you | +| | need. | ++------------------+---------------------------------------------------+ +| Database | - After the **automated backup policy** is | +| | enabled, a full backup is automatically | +| | triggered. Then, a full backup will be | +| | performed based on the time window and backup | +| | cycle specified in the backup policy. | +| | | +| | - When an instance status is **backed up**, data | +| | is copied from the instance, compressed, and | +| | uploaded to an OBS bucket, and stored in OBS | +| | for as long as you specified in the backup | +| | policy. Amount of time required to complete | +| | the backup depends on the amount of data to be | +| | backed up. | +| | | +| | - After the automated backup policy is enabled, | +| | an **incremental backup** is automatically | +| | performed every 5 minutes to ensure data | +| | reliability. | ++------------------+---------------------------------------------------+ +| Key service data | You can set the backup cycle and retention period | +| (such as | for key service data based on the **file | +| configuration | importance** and **change frequency**. You are | +| files and key | advised to back up service data to OBS buckets. | +| documents) | | ++------------------+---------------------------------------------------+ + +Cloud Server Backup and Restoration +*********************************** + +The following figure shows the backup capabilities provided by Huawei +Cloud CBR for ECSs, BMSs, and EVS disks. + +.. image:: ../_static/images/image74.png + ++------------+----------------------------+---------------------------+ +| Category | Function | Feature | ++============+============================+===========================+ +| Backup | Cloud server backup | Backs up ECSs including | +| | | their EVS disks using the | +| | | snapshot technology. | ++------------+----------------------------+---------------------------+ +| | BMS backup | Backs up BMSs including | +| | | their EVS disks using the | +| | | snapshot technology. (The | +| | | BMSs must use EVS disks.) | ++------------+----------------------------+---------------------------+ +| | Cloud disk backup | Backs up cloud disks. | ++------------+----------------------------+---------------------------+ +| Recovery | Cloud disk restoration | Restores backup data to | +| | using backups | specified points in time. | ++------------+----------------------------+---------------------------+ +| | Cloud server restoration | Restores servers to | +| | using backups | specified points in time. | ++------------+----------------------------+---------------------------+ +| | Cloud disk creation using | Creates disks from | +| | backups | backups. | ++------------+----------------------------+---------------------------+ +| | Image creation using | Uses backups to create | +| | backups | images and then provision | +| | | ECSs. | ++------------+----------------------------+---------------------------+ + +Database Backup and Restoration +******************************* + +For service scenarios that demand HA, OBS (cold standby) or DRS (hot +standby) are deployed in different AZs in the same city to establish a +remote DR center and ensure geo-redundant data security. If both AZ1 and +AZ2 fails, the DR center can still ensure data security. + +.. image:: ../_static/images/image75.png + +If a database or table is maliciously or mistakenly deleted, the standby +database is also deleted and cannot be restored. Backups can protect you +from this sort of malicious or accidental operations. + +- Automated backup: Huawei Cloud database service creates automated + backups for the DB instance during the backup period and saves the + backups for a length of time determined by the configured backup + retention period. + +- Manual backup: Manual backups of DB instances are saved based on the + configured backup retention period. You can restore data to any point + in time within the backup retention period if needed. + +- Full backup: The system backs up all selected data even if no changes + were made to the data after the last backup. + +- Incremental backup: RDS automatically backs up data every 5 minutes. + +Backup data restoration is implemented by restoring the data backup +files stored in OBS to databases. You can use full and incremental +backups to restore data to a required point in time. In addition, the +Huawei Cloud GaussDB-based new database architecture (such as GaussDB +for MySQL), with decoupled storage and compute, provides backup and +restoration capabilities using snapshots. The backup and restoration is +significantly faster than with traditional logical or physical backups +(see the official website for more information). These databases can +ensure zero RPO for customer databases. + +Cloud Backup and Recovery (CBR) enables you to easily back up ECSs, +BMSs, EVS disks, SFS Turbo file systems, local files, and on-premises +VMware virtual environments. If there is a virus attack, accidental +deletion, or software or hardware fault, you can restore data to any +point in the past when the data was backed up. For details about CBR, +visit https://www.huaweicloud.com/intl/en-us/product/cbr.html. + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/govern-and-manage/change-management.rst b/doc/caf/source/govern-and-manage/change-management.rst new file mode 100644 index 0000000..c13f28f --- /dev/null +++ b/doc/caf/source/govern-and-manage/change-management.rst @@ -0,0 +1,230 @@ +Change Management +~~~~~~~~~~~~~~~~~ + +Change management aims to standardize change activities (such as change +request, review, scheduling, implementation, and verification) in the +production environment. With clear requirements, make full preparation +to reduce impacts on services, ensure successful change activities and +safe, stable production environment, maximize system availability, and +meet SLAs. + +Roles and Responsibilities +^^^^^^^^^^^^^^^^^^^^^^^^^^ + ++------------------------+---------------------------------------------+ +| Role | Responsibility | ++========================+=============================================+ +| Change applicant | Submits change applications according to | +| | standards, checks the submitted | +| (Maintenance engineer) | information, verifies changes, and confirms | +| | results. | ++------------------------+---------------------------------------------+ +| Change manager | Reviews and assigns change applications, | +| | conducts communication, schedules | +| (O&M owner) | resources, and follows up with the change | +| | implementer to track the change progress | +| | and close the change applications. | ++------------------------+---------------------------------------------+ +| Change review team | Reviews and approves common change | +| | solutions. | +| (O&M owner and | | +| development owner) | | ++------------------------+---------------------------------------------+ +| Change implementer | Implements changes based on approved | +| | solutions. | +| (Maintenance engineer) | | ++------------------------+---------------------------------------------+ + +Change Classification +^^^^^^^^^^^^^^^^^^^^^ + +Emergency Changes ++++++++++++++++++ + +These changes are proposed when the production environment is or about +to be unavailable. There is not enough time to evaluate or approve such +changes by the usual process. For example, an emergency change is +required when there is a new version defect that directly affects user +operations. + +Major Changes ++++++++++++++ + +These changes are made during version upgrade and O&M of the service +system. If service interruption lasts longer than 30 minutes during the +change, notify affected users in advance. + +Normal Changes +++++++++++++++ + +These changes are made during version upgrade and O&M of the service +system. If service interruption lasts less than 30 minutes during the +change, notify affected users in advance. + +Routine Changes ++++++++++++++++ + +These are production environment changes that are low-risk. The change +impact has been evaluated by O&M and development departments in advance, +procedures are standard, implementers have been determined, and change +effectiveness has been proven more than three times. This type of change +is performed according to standard operation guides and will not affect +services. Customers are unaware of such changes during upgrades. + +Change Process +^^^^^^^^^^^^^^ + +.. image:: ../_static/images/image76.png + +An applicant submits a change application ++++++++++++++++++++++++++++++++++++++++++ + +There are factors to determine before the change. If any are missing, +provide reasonable explanation. + +- Reason and purpose: Explain why the change is required and what the + change will do. If necessary, provide outline proof that the change + is feasible. + +- Risks: Describe the potential problems brought about by the change + and how to avoid them. Assess both current and related systems and + domains for risks. + +- Impact: Describe how the change will affect customers and other + domains. + +- Test report: If a test was previously conducted, attach its report to + the application. + +The change guide should detail the backup, implementation, rollback, and +verification schemes. If the change requires special implementation +times, the applicant should specify them in the application. + +The change manager reviews the application +++++++++++++++++++++++++++++++++++++++++++ + +This review includes the change time window, plan, and type. + +- Check that the change time fits the overall planning. If not, advise + the applicant to postpone the change or assist them in initiating the + emergency change process. + +- Check that the change type (emergency, major, or normal) is + reasonable and the change level is valid. + +- Determine whether the implementation time is reasonable, and tell the + applicant to adjust it if necessary. + +The change review team reviews the change solution +++++++++++++++++++++++++++++++++++++++++++++++++++ + +- Check that the reason and purpose of the change are clear. + +- Check that the risks and measures are clear. + +- Check that the impact of the change is clear. For example, how users + and the production environment will be affected during a specific + period. + +- Check that the implementation process is valid, the procedure is + reasonable and clear, and the purpose is achievable. + +- Determine whether a test report is required for the change. If so, + check for an attached test report. + +- Check for any temporary backup, rollback, and verification before the + change. + +- **The change manager schedules the change.** + +- Check for any conflicts with other changes in time, content, and + components, and assist the applicant in checking whether the change + affects routine operations, shutdown, and backup. + +- Approve the change. + +The change implementer implements the change +++++++++++++++++++++++++++++++++++++++++++++ + +- Wait for approval before proceeding. + +- Use the approved solution for the change. + +- If an unexpected service degradation, interruption, or data loss + occurs during implementation, contact the change manager. Record and + handle the exception. + +- After implementing and verifying the change, report the actual + results and time taken. + +The change applicant verifies the change +++++++++++++++++++++++++++++++++++++++++ + +- Verify the change according to the plan and check whether the purpose + has been achieved. If it has not, report this to the change + implementer. + +- Complete the verification report. + +- If an unexpected service degradation, interruption, or data loss + occurs during verification, ask the change manager how to handle the + exception. Record the exception. + +The change implementer rolls back the change +++++++++++++++++++++++++++++++++++++++++++++ + +- If the change result does not meet expectations, roll back the change + as planned. If services cannot be recovered within the expected time + window, escalate the problem by following the incident management + process. + +- If the change passes the verification or cannot be verified in the + time window, and needs a rollback, submit an application. + +The applicant reports the change results +++++++++++++++++++++++++++++++++++++++++ + +- Report the results promptly (ideally within two working days) after + the change is completed. + +- Close the work order if the change is successful. + +- If the change fails, analyze why and provide the causes and + improvement measures. + +KPIs +^^^^ + +Change success rate ++++++++++++++++++++ + +The percentage of changes in a month that were successful. Only changes +whose status is Successful when closed count. + +Formula: (Number of successful changes/Total number of changes) x 100% + +Service interruptions ++++++++++++++++++++++ + +The number of service interruptions caused by changes in a month. + +Formula: Number of service interruptions caused by all changes in a +month + +Service interruption change rate +++++++++++++++++++++++++++++++++ + +The percentage of changes in a month that interrupted services. + +Formula: (Number of service interruption changes/Total number of +changes) x 100% + +Emergency change rate ++++++++++++++++++++++ + +The percentage of changes in a month that were emergencies. + +Formula: (Number of emergency changes/Total number of changes) x 100% + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/govern-and-manage/cloud-based-om.rst b/doc/caf/source/govern-and-manage/cloud-based-om.rst new file mode 100644 index 0000000..dcb4339 --- /dev/null +++ b/doc/caf/source/govern-and-manage/cloud-based-om.rst @@ -0,0 +1,25 @@ +Cloud-based O&M +--------------- + +As services migrate to the cloud, traditional O&M also shifts to the +cloud. Cloud platforms provide abundant products, massive resources, +elastic scaling, E2E security, open APIs, and diversified billing. These +accelerate service development and reduce costs. How does cloud-based +O&M work, and how can enterprises select and maintain the right +resources? + +Cloud-based O&M does not mean simply transferring IDC capabilities to +the cloud. Industry surveys show that fewer than 20% of enterprises +fully utilize their cloud service capabilities. In addition to +maximizing their resources, they must maintain their cloud services, +make their data more secure, and quickly respond to changes and faults +to stay competitive in the digital landscape. + +.. toctree:: + :maxdepth: 1 + + trends-and-challenges.rst + multi-dimensional-om.rst + backup-and-restoration.rst + change-management.rst + emergency-handling.rst diff --git a/doc/caf/source/govern-and-manage/cost-center.rst b/doc/caf/source/govern-and-manage/cost-center.rst new file mode 100644 index 0000000..b9c86cf --- /dev/null +++ b/doc/caf/source/govern-and-manage/cost-center.rst @@ -0,0 +1,221 @@ +Cost Center +~~~~~~~~~~~ + +Cost Center is a cost management service provided by Huawei Cloud free +of charge. Cost Center can help collect information about Huawei Cloud +costs and usage, explore and analyze Huawei Cloud cost usage, and +monitor and track Huawei Cloud costs. + +Cost Forecasting and Tracking +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Estimating and forecasting costs +++++++++++++++++++++++++++++++++ + +The businesses and cloud service expenditures are dynamically changing, +which makes it difficult to precisely forecast costs. + +Before you use cloud services, you can use Huawei Cloud Price Calculator +to estimate their costs. You can view the costs and usage in **Cost +Analysis** of Huawei Cloud Cost Center and learn about the forecasted +costs in the following days (90 days at most) or months (12 months at +most). Forecasting is based on your historical costs and usage on Huawei +Cloud. + +Creating budgets to track cost and usage +++++++++++++++++++++++++++++++++++++++++ + +Budgets can be used to efficiently track costs. +Once completing cost estimation and forecasting, you can create +different types of budgets on the **Budgets** page of Cost Center to +track your costs against the budgeted amount you specified and send +alerts to the recipients you configured if the thresholds you defined +are reached. You can also create budget reports, and Huawei Cloud will +periodically generate and send them to the recipients you configured on +a schedule you set. + +Cost Allocation +^^^^^^^^^^^^^^^ + +Accurate and effective cost allocation is conducive for cost +transparency and accountability in an enterprise. Cost transparency and +accountability are critical to accounting management of an enterprise. + +Determine how the costs are organized ++++++++++++++++++++++++++++++++++++++ + +Before performing accounting management, an enterprise needs to figure +out its cost allocation to ensure that the expenditures on Huawei Cloud +can be allocated across different organizations. + +The expenditures of an enterprise with multiple accounts can be +allocated across its member accounts. In addition, the enterprise can +use tags to attach organization information to resources, and the tag +information will be displayed in costs. The tags can also be used to +identify the costs of different environments (such as production and +testing) or to identify different organizations, products, and owners. + +You can also activate the tags for enterprise accounts, including the +master account, in **Cost Tags** of Huawei Cloud Cost Center to help +analyze your Huawei Cloud costs and track your budgets. You are advised +to plan and activate cost tags as early as possible because they take +effect only for cost data generated after tag activation. + +Some of costs cannot be directly grouped by tag, for example, the costs +for shared resources in an enterprise, cloud services that do not +support tags, and untagged resources. You can define rules to split the +costs across organizations in the enterprise evenly or proportionally, +or based on the custom percentage. + +Original costs and amortized costs +++++++++++++++++++++++++++++++++++ + +There are two types of costs in Huawei Cloud Cost Center: + +- Original cost: reflects costs of cloud services purchased at the list + price with available discounts applied. + +- Amortized cost: reflects amounts (prepaid for yearly/monthly + services) amortized on a daily basis. For example, if you purchased a + one-year cloud service for ¥365 CNY, the amortized cost per day is ¥1 + CNY. + +Amortized costs are an accrued expense and calculated based on accrual +accounting, and the costs need to be amortized across organizations of +the enterprise. + +To learn more about the rules for amortized costs, visit +https://support.huaweicloud.com/intl/en-us/usermanual-cost/costcenter_000002_01.html. + +Cost Analysis +^^^^^^^^^^^^^ + +Understanding the cost trends and cost driving forces in an enterprise +is the key for efficient cost management, control, and optimization. + +Analyzing the trends and distribution of costs and usage +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +**Cost Analysis** of Huawei Cloud Cost Center visualizes your original +costs or amortized costs of last 18 months using various dimensions and +display filters for cost analysis so that you can analyze the trends and +drivers of your service usage and costs from a variety of perspectives +or within different defined scopes. An enterprise master account can +analyze the costs and usage of its member accounts. + +You can use Preconfigured Analysis Reports in Cost Center to analyze +your costs and usage. + ++-------------------------+--------------------------------------------+ +| Report Name | Description | ++=========================+============================================+ +| Monthly Costs by | Types of services with high original costs | +| Service Type | over the last six months | ++-------------------------+--------------------------------------------+ +| Monthly Costs by Linked | Linked accounts with high original costs | +| Account | over the last six months | ++-------------------------+--------------------------------------------+ +| Daily cost | Trend of daily original costs in the last | +| | three months and forecasted costs in the | +| | next month | ++-------------------------+--------------------------------------------+ +| Monthly Amortized Costs | Monthly amortized costs over the last six | +| | months | ++-------------------------+--------------------------------------------+ +| Pay-Per-Use ECS Monthly | Monthly original costs and usage of a | +| Costs and Usage | pay-per-use ECS over the last six months | ++-------------------------+--------------------------------------------+ + +If the preconfigured reports cannot meet your requirements, you can +create custom reports, which you can choose a specific period of time, +the aspects measured, display filters, and cost types. You can save your +analysis results in a custom report so that you can run the same +analysis again later if needed. + +You can export the preconfigured reports and custom reports as CSV +files. In addition, Huawei Cloud provides monthly or daily cost details +that carry tag information for in-depth analysis. + +Cost Optimization +^^^^^^^^^^^^^^^^^ + +The expenditures on cloud resources mainly depend on the billing rate +and resource usage. An enterprise can optimize its costs from these +aspects. + +Billing rate +++++++++++++ + +For the pay-per-use products you will use for a long term, you can view +the optimization option (changing the billing mode from pay-per-use to +yearly/monthly) in Cost Center to find opportunities to reduce costs. +You can analyze the usage of your pay-per-use ECS, EVS, and RDS +resources in Cost Center. Cost Center provides optimization options +based on these analyses, identifying places where you can save money by +changing the billing mode from pay-per-use to yearly/monthly. + +Cost Center allows you to learn about the actual utilization or coverage +of resource packages in a specific period to check whether your resource +packages are fully used or they are able to cover your pay-per-use +resource usage. You can adjust the purchase of resource packages in the +following period based on the analysis results. + +Resource usage +++++++++++++++ + +You can use CES to monitor the usage of cloud services to identify idle +resources or resources with low usage and release idle resources or +downgrade specifications of resources with low usage to reduce costs. +Note that you need to confirm with the business department to ensure +that releasing resources or downgrading specifications will not affect +services. + +Customers can optimize service technical solutions to improve resource +utilization, such as decoupled storage and compute and time-based +resource sharing, or use cost-effective instances. + +Functions +^^^^^^^^^ + +Cost Center provides the following functions: + +- Cost Analysis: presents your cost data in graphs and tables. You can + view your Huawei Cloud costs and usage of different billing cycles + and dive deeper by service type, region, linked account, billing + mode, and cost tag. + +- Cost and Usage Forecasting: forecasts your future costs and usage + based on your historical costs and usage on Huawei Cloud. + +- Budget Management: allows you to configure budgets and stay informed + of how your costs and usage progress. + +- Report Management: allows you to save the cost analyses as reports so + that you can share with the various member accounts under your master + account. On the **Reports** > **Analysis Reports** page, you can use + some commonly used analysis reports preconfigured by Huawei Cloud. It + also allows you to track your budgeting on a regular basis after you + create budget reports on the **Reports** > **Budget Reports** page. + +- Cost Anomaly Detection: monitors the costs of your pay-per-use + resources to detect cost anomalies and reduce unnecessary + expenditures. + +Billing Modes +^^^^^^^^^^^^^ + +- Yearly/Monthly: Cost Center analyzes the usage of your pay-per-use + ECS, EVS, and RDS resources, and provides optimization options + (changing from pay-per-use to yearly/monthly) to find opportunities + to reduce costs. + +- Resource Packages: Cost Center allows you to view the analyses of + your resource packages to determine whether they are properly used + and whether costs can be reduced. + +- Cost Tag: identifies your Huawei Cloud resources. You can use the + cost tags to manage your resources and activate them to track your + costs. Cost tags can be used for cost analysis and budget management. + +.. toctree:: + :maxdepth: 1 diff --git a/doc/caf/source/govern-and-manage/cost-management.rst b/doc/caf/source/govern-and-manage/cost-management.rst new file mode 100644 index 0000000..eaffb0f --- /dev/null +++ b/doc/caf/source/govern-and-manage/cost-management.rst @@ -0,0 +1,205 @@ +Cost Management +--------------- + +With the business expansion, enterprises pay more attention to cost +management and resource utilization. This section mainly introduces the +resource selection and functions of Cost Center to help enterprise +optimize their costs. + +ECS cost optimization +********************* + +Selecting the right instance type and the right specifications for a +given application scenario and workload can help you reduce your ECS +costs. Selecting the right billing mode for a given service continuity +cycle can also help you save money. Specific practices include: + +Instance type selection ++++++++++++++++++++++++ + +Huawei Cloud provides a broad range of +instance types for different application scenarios. For example, +general-computing/memory-optimized instances are suitable for +scenarios such as websites, web applications, or medium and +light-load enterprise applications; high-performance +computing/disk-intensive/GPU instances are suitable for scenarios +such as high-performance computing, video encoding, and 3D +rendering. Selecting the right instance type for a given +application can help reduce costs. If you intend to use ECS in +e-commerce website operations, it is a best practice to use +general-computing or memory-optimized instances instead of +high-performance computing instances. They can be 40% to 50% less +expensive than high-performance computing instances of the same +specifications (for example, 8 vCPUs \| 16 GB). + +Instance specifications selection ++++++++++++++++++++++++++++++++++ + +Huawei Cloud provides dozens of +instance specifications for you to choose from. You can choose +different specifications to optimize costs based on business +workloads. For example, if the traffic volume of an e-commerce +website is within 500,000 PV and the transaction volume is within +3,000 orders per day, it is a best practice to choose 4 vCPUs \| 8 +GB instead of 8 vCPUs \| 16 GB to save costs by 40% to 50%. + +Billing mode selection +++++++++++++++++++++++ + +Huawei Cloud offers three billing modes: +pay-per-use, monthly, and yearly subscriptions. The recommended +billing modes are as follows: + +- If the application duration is less than 20 days, such as short-term + testing and e-commerce promotions, it is a best practice to use + the pay-per-use billing. + +- If the application duration is more than 20 days but less than 10 + months, such as game online testing and operations, it is a best + practice to purchase monthly subscriptions. + +- If the application duration is more than 10 months, such as + enterprise website operations or government affairs and civil + information query and operations, it is a best practice to + purchase yearly subscriptions. + +EVS cost optimization +********************* + +EVS disk types +++++++++++++++ + +Huawei Cloud provides four types of EVS disks. +Extreme SSD is excellent for large-scale OLTP databases, NoSQL +databases, stream processing, and log processing. Ultra-high I/O is +designed for high-performance computing (HPC) and data warehouses. +General Purpose SSD is suitable for enterprise applications and +large- and medium-sized development and testing. High I/O is suitable +for office applications. For an e-commerce or enterprise website, +high I/O EVS disks are recommended as they are 65% less expensive +than ultra-high I/O EVS disks with the same capacity. + +EVS capacity +++++++++++++ + +Purchase the capacity you expect to use for the current +month, and leverage EVS capacity expansion capabilities to ensure +that the usage remains at 80%. Instead of purchasing the maximum +capacity you expect to use for the whole year (which may result in +average utilization of less than 50%), you can use monthly billing to +reduce your costs by 20% to 30%. To further reduce costs, +periodically check your account and delete any independent and +unnecessary EVS disks, which may have been created with ECSs that +have already been deleted. + +Billing modes ++++++++++++++ + +If your service has been running on Huawei Cloud for +some time and you have long-term EVS pay-per-use or monthly +subscriptions, you can reduce your costs by changing the billing mode +to yearly. The costs for EVS disks of a given capacity will be +reduced by at least 17%. In addition, migrating rarely used +non-critical or archived data to OBS can also greatly reduce your +costs. + +OBS cost optimization +********************* + +By OBS storage class +++++++++++++++++++++ + +Huawei Cloud provides three types of OBS storage classes, each of +different availability and durability levels: + +- Standard storage for frequently accessed data in scenarios such as + big data and hot videos + +- Infrequent Access storage for less frequently accessed data in + scenarios such as file synchronization and enterprise backups + +- Archive storage for rarely accessed data in scenarios such as data + archive and long-term backups + +Selecting the right object storage classes based on your business +needs can significantly reduce costs. For instance, you can store +enterprise data that needs to be backed up in the Infrequent Access +storage class, which is 45% less expensive than Standard storage. For +long-term backup, the Archive storage class will be a good choice, as +it is 78% less expensive than Standard storage. + +By billing mode ++++++++++++++++ + +Huawei Cloud offers resource packages with a range of capacity +specifications and durations for Standard storage. For data that will +be stored for a long time using Standard storage class, you can +purchase a yearly package based on the amount of data stored. Using a +package is 25% less expensive than pay-per-use pricing. + +EIP cost optimization +********************* + +The cost of bandwidth is an important factor to consider during cloud +service configuration, as it may account for up to 30% of the total +cloud resource costs. Huawei Cloud provides static and dynamic BGP +bandwidth. In most cases, static BGP bandwidth is enough. Financial or +gaming platforms, which tend to have demanding bandwidth requirements, +can use dynamic BGP bandwidth. The price of static BGP bandwidth is 20% +lower than that of dynamic BGP bandwidth. If your required bandwidth is +up to 5 Mbit/s, billing by fixed bandwidth monthly is more cost +effective than billing by traffic. If your required bandwidth is higher +than 5 Mbit/s, it is necessary to calculate which billing is more +cost-effective based on the bandwidth size and estimated bandwidth +usage. + +- If you need 10 Mbit/s of static BGP bandwidth billed on a pay-per-use + basis and the bandwidth usage is greater than 45%, it is more + cost-effective to be billed by bandwidth. + +- If you need 1 Mbit/s of static BGP bandwidth billed on a pay-per-use + basis and the bandwidth usage is greater than 18%, it is more + cost-effective to be billed by traffic. + +Using ELB to reduce bandwidth cost +********************************** + +Imagine a small game with a login and recharging zone, and two gaming +zones. Each of these three zones is deployed on different ECSs. Each ECS +has 10 Mbit/s of bandwidth, so they need a total of 30 Mbit/s of +bandwidth. Unfortunately, the bandwidth usages of the three ECSs differ +greatly from each other, so bandwidth utilization is low. In this case, +ELB can be used to balance traffic among the three ECSs. ELB could +maximize utilization to the point where 20 Mbit/s of bandwidth would be +enough, and their bandwidth costs could go down. + +Using CDN to reduce public network bandwidth usage and lower TCO +**************************************************************** + +If you are providing static content such as images, video files, and +other file downloads to Internet users through ECS or OBS, you can use +CDN to reduce traffic costs by 50% to 57%. Moreover, CDN offers a better +experience to your customers. If you use OBS and CDN together, you can +reduce costs by another 20%. + +Cost optimization with DES +************************** + +Take a user generating 35 TB of images that need to be uploaded to OBS +every day. If Direct Connect is used for data transmission, a 4 sg/s of +bandwidth is required and the monthly cost is about ¥ 320,000 CNY. DES +can transmit 120 TB of data at a time and newly generated images can be +transmitted every 3 days. The cost of 10 transmissions per month is +about ¥ 50,000 CNY, which is up to 80% less expensive than using Direct +Connect. + +Cost-effective big data storage-compute decoupling solution +*********************************************************** + +Compute and storage resources are decoupled and can be scaled +separately, avoiding unbalanced resource allocation on a single node. In +addition, Kunpeng computing power can be used, which can further save +computing power costs. + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/govern-and-manage/emergency-handling.rst b/doc/caf/source/govern-and-manage/emergency-handling.rst new file mode 100644 index 0000000..a66d2b2 --- /dev/null +++ b/doc/caf/source/govern-and-manage/emergency-handling.rst @@ -0,0 +1,160 @@ +Emergency Handling +~~~~~~~~~~~~~~~~~~ + +The process of receiving, handling, and escalating incidents must be +standardized to ensure that customer issues are handled at the promised +service level. The incident handling responsibilities, time +requirements, and notification mechanism must be clearly defined. +Services must be quickly recovered to ensure the promised quality and +availability. + +Roles and responsibilities +************************** + ++--------------+-------------------------------------------------------+ +| Role | Responsibility | ++==============+=======================================================+ +| O&M engineer | - Receive customer incidents reported by hotline or | +| | email. | +| | | +| | - Record all information about received incidents, | +| | including contact methods of reporters, incident | +| | features, details, and time. | +| | | +| | - Diagnose and analyze incidents and provide | +| | solutions by referring to relevant documentation. | +| | For incidents that cannot be resolved, transfer | +| | them to developers or seek help from O&M leaders. | +| | | +| | - Hold first accountability to track incidents, | +| | record handling progress, and keep customers | +| | updated on the progress as required. | +| | | +| | - Demarcate incidents and provide solutions within a | +| | specified period. Transfer unresolvable incidents | +| | to developers within the specified time. | +| | | +| | - Close incidents that the reporters have confirmed | +| | resolved with the provided solutions. | +| | | +| | - Transfer reoccurring incidents and those with | +| | unknown causes or known defects to the issue | +| | management process. | +| | | +| | - Summarize lessons learned from typical and general | +| | incidents. | ++--------------+-------------------------------------------------------+ +| Developer | - Locate and analyze the causes of incidents | +| | transferred from O&M engineers, and resolve the | +| | incidents. | +| | | +| | - Follow up with incident owners to confirm | +| | resolution. | +| | | +| | - Locate causes of bugs in the production | +| | environment, and provide and implement | +| | comprehensive solutions. | ++--------------+-------------------------------------------------------+ +| Incident | - Coordinate and monitor the incident handling | +| manager | process. | +| | | +| (O&M leader) | - Coordinate resources for major incidents. | +| | | +| | - Review and approve solutions for major incidents. | +| | | +| | - Trace the output of major incident reports. | ++--------------+-------------------------------------------------------+ + +Incident severity and response +****************************** + +Level 1: incidents that have major impacts on services, such as serious +damage, data loss, service data or function errors (which cause multiple +customer complaints), and system faults recurring within a short period +of time. + +Level 2: incidents that have minor impacts on services, such as +unavailability of a few functions (service degradation), function +impairment on some users, data inconsistency (no financial loss), and +common system faults. + +Level 3: incidents that have no impact on services, such as data query +and consulting. Services are normal but experience is affected. + +Response and resolution time requirements vary by incident level. The +response timing is 24/7 and starts once an incident is reported. + ++----------------+----------------+-----------------+-----------------+ +| Incident Level | Response Time | Recovery Time | Resolution Time | +| | (minutes) | (hours) | (days) | ++================+================+=================+=================+ +| 1 | 10 | 2 | 7 | ++----------------+----------------+-----------------+-----------------+ +| 2 | 30 | 6 | 20 | ++----------------+----------------+-----------------+-----------------+ +| 3 | 60 | 24 | 60 | ++----------------+----------------+-----------------+-----------------+ + +Remarks: + +- The incident levels and response time above are only examples. Adjust + them as required. + +- O&M engineers can transfer out incidents that they cannot resolve by + referring to relevant documentation within the specified time. + +- The response time is the maximum delay before an incident handler + starts handling an incident after receiving it. + +- The recovery time is the maximum duration needed to recover services + after an incident occurs. + +- The resolution time is the maximum duration taken by an O&M engineer + to resolve or transfer out an incident. + +Incident escalation and notification +************************************ + +The notification mechanism defined in the following table is for +incidents that are Level 1 and 2. + ++----------------------------+--------------+-------------------------+ +| Notification | Method | Recipients | ++============================+==============+=========================+ +| Initial notification | SMS and | (Send to) Service owner | +| | email | | +| (30 minutes) | | (CC) O&M team | ++----------------------------+--------------+-------------------------+ +| | Phone call | R&D leader | ++----------------------------+--------------+-------------------------+ +| Handling progress | SMS and | (Send to) Service owner | +| | email | | +| (1 hour) | | (CC) O&M team | ++----------------------------+--------------+-------------------------+ +| Fault rectification | SMS and | (Send to) Service owner | +| | email | | +| | | (CC) O&M team | ++----------------------------+--------------+-------------------------+ +| Escalation of overdue | SMS and | (Send to) Service owner | +| incidents | email | | +| | | (CC) O&M team, R&D | +| | | leader | ++----------------------------+--------------+-------------------------+ +| | Phone call | R&D leader | ++----------------------------+--------------+-------------------------+ + +If a Level-2 incident's impact on services and users worsens, escalate +it to Level 1 and then handle it with Level-1 standards. + +Precautions during incident management +************************************** + +- Record all incidents on the live network in the unified event + management system for analysis. + +- Respond to and resolve incidents within the specified time. + +- Analyze incidents handled every month. + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/govern-and-manage/index.rst b/doc/caf/source/govern-and-manage/index.rst new file mode 100644 index 0000000..42c47e3 --- /dev/null +++ b/doc/caf/source/govern-and-manage/index.rst @@ -0,0 +1,29 @@ +Phase 4: Govern & Manage +========================== + +This section is formulated based on Huawei Cloud's industry experience +and practices to enhance the availability of enterprises' services on +cloud, reduce costs, and ensure the safety and reliability of services, +aiming to provide following benefits for enterprises: + +- Professional capability construction: Provides guidance for + enterprises to deeply understand the cost management, security + compliance, and O&M governance of cloud services, and helps them + build professional management organizations and capabilities. +- Cost optimization: Optimizes the costs on the cloud through + reasonable resource selection and visualized cost management. +- Security compliance: Standardizes the security governance system by + referring to related security compliance and governance methodologies + to ensure secure running of services. +- Stability improvement: Identifies the potential risks, bottlenecks, + and availability problems for services based on the analysis and + governance methods for cloud O&M to continuously improve the + stability of the system. + +.. toctree:: + :maxdepth: 1 + + cost-management.rst + cost-center.rst + security-compliance-and-governance.rst + cloud-based-om.rst diff --git a/doc/caf/source/govern-and-manage/multi-dimensional-om.rst b/doc/caf/source/govern-and-manage/multi-dimensional-om.rst new file mode 100644 index 0000000..683b05a --- /dev/null +++ b/doc/caf/source/govern-and-manage/multi-dimensional-om.rst @@ -0,0 +1,245 @@ +Multi-Dimensional O&M +~~~~~~~~~~~~~~~~~~~~~ + +The O&M system is designed to provide high-quality IT services. It +monitors underlying resources, applications, user experience, as well as +overall system running. It enables personnel to quickly respond to +issues, ensuring stable service running. With the changes in service +formats, architectures, resource models, and call relationships, the +cloud-based O&M system must provide refined and large-scale management. +There are several typical O&M requirements. + +1. As service architecture evolves from monolithic and service-oriented + to microservice-based, the number of services increases + exponentially. Finer-granularity services are faster to iterate and + more refined to manage. However, many microservices also pose great + challenges to O&M in terms of management scale and timeliness. + +.. image:: ../_static/images/image66.png + +2. In a large cloud-based distributed application system, service call + relationships are complex. The call relationships, call quality, and + latency in each phase need to be visualized so that problems can be + detected and resolved in time. + +.. image:: ../_static/images/image67.png + +3. After applications undergo microservice reconstruction, underlying + resources are no longer hosted in physical machines and VMs; instead, + they are hosted in containers or even run in serverless mode. Rapidly + increasing service refinement and scale also mean exponential + resource growth. When different types of resources are combined in + services, O&M is even more complex. Personnel need refined and + large-scale management capabilities to monitor resource usage in + time. + +.. image:: ../_static/images/image68.png + +4. Currently, applications use layered architectures, typically + consisting of the web, service logic, data access, and database + layers. To obtain required data, a request may go through multiple + layers. O&M personnel need to be able to drill down links to monitor + inter-layer access quality and serial/parallel relationships during + troubleshooting. + +.. image:: ../_static/images/image69.png + +5. Complex cloud resources and services are complicated by many accounts + and operators, as well as frequent service changes. To ensure system + stability, O&M personnel must have strong resource management, + control, and audit capabilities to quickly detect abnormal operations + and locate causes. + +Huawei Cloud has built a multi-dimensional O&M system for cloud +applications. It integrates AOM, APM, log collection, and monitoring. It +monitors VMs, storage devices, networks, databases, and applications in +real time, and uses technologies such as application and resource alarm +correlation, log analysis, intelligent threshold, distributed tracing, +and mobile app exception analysis to quickly diagnose and rectify faults +within minutes, ensuring long-term and stable running of cloud +applications. + +.. image:: ../_static/images/image70.png + +Multi-dimensional O&M monitors the infrastructure, application, and user +experience layers, and also provides logs and auditing. + +CES +*** + +CES is a multi-dimensional resource monitoring service. Monitor +resources, set alarm rules, and improve resource utilization and +performance. + +AOM +*** + +AOM is a one-stop cloud operations management platform for problem +management, monitoring, security, and performance. Streamline cloud +operational processes and effectively manage cloud hardware, software, +services, and networks. + +APM +*** + +APM consists of application and frontend monitoring. Manage the +performance of your distributed applications, container environments, +browsers, applets, and apps. With full-stack performance monitoring and +E2E full-link tracing and diagnosis, application management is easy and +efficient. + +APM has several features. + +Full-link Topology +++++++++++++++++++ + +Clear Call Relationships, Easy Error Identification, and Convenient Resource Drill-down + +The full-link topology displays the call relationships and dependencies +between applications, including application status, latency, errors, and +loads. View multiple open-source components, such as databases, caches, +message middleware, and NoSQL, filter information by time, service, +transaction, top SQL statistics, or other metrics, and drill down +resources to locate faults. + +.. image:: ../_static/images/image71.png + +Tracing ++++++++ + +Bottleneck Identification and Fault Locating in Minutes + +APM traces and records service calls, and displays the distributed +system's request execution tracks and statuses. When a service method is +called, the caller, detailed stack, and parameters of the method are +automatically captured for fast fault locating. + +.. image:: ../_static/images/image72.png + +Transaction Analysis +++++++++++++++++++++ + +APM analyzes service flows on the service side in real time and displays +key metrics such as throughput, error rate, and latency. Abnormal +transactions trigger alarm reporting. Application Performance Index +(Apdex) is used to evaluate user satisfaction with applications. You can +view and trace the topology of any transaction with poor user experience +to locate the cause. An e-commerce application is used as an example +here. + +.. image:: ../_static/images/image73.png + +LTS +*** + +LTS collects log data from hosts and cloud services. By processing +massive amounts of logs efficiently, securely, and in real time, LTS +provides useful insights for you to optimize the availability and +performance of your services and applications. It also helps you make +quick decisions, better manage and maintain devices, and analyze service +trends. + +The following table shows open-source O&M products and their +corresponding Huawei Cloud products. + ++------------+---------+------+---------------------------------------+ +| Category | Common | Hu | Function | +| | Tool | awei | | +| | | C | | +| | | loud | | +| | | Ser | | +| | | vice | | ++============+=========+======+=======================================+ +| Infr | Zabbix | ` | Monitors and generates alarms for | +| astructure | | CES | infrastructure and instances. | +| O&M | | `__ | | ++------------+---------+------+---------------------------------------+ +| | ELK | ` | Collects and analyzes logs, generates | +| | | LTS | alarms, and archives logs. | +| | | `__ | | ++------------+---------+------+---------------------------------------+ +| Applica | Prom | ` | Monitors applications and cloud | +| tion-layer | etheus+ | AOM | resources in real time, analyzes | +| O&M | Grafana | `__ | | ++------------+---------+------+---------------------------------------+ +| A | Zi | ` | Monitors and manages application | +| pplication | pkin/Pi | APM | performance and faults in real time, | +| p | npoint/ | `__ | | ++------------+---------+------+---------------------------------------+ +| Service | Service | ` | Monitors and analyzes behavior and | +| monitoring | system | LTS | logs on the service side. | +| | O&M | `__ | | ++------------+---------+------+---------------------------------------+ + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/govern-and-manage/personnel-security-management.rst b/doc/caf/source/govern-and-manage/personnel-security-management.rst new file mode 100644 index 0000000..2e3f0d8 --- /dev/null +++ b/doc/caf/source/govern-and-manage/personnel-security-management.rst @@ -0,0 +1,99 @@ +Personnel Security Management +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This section describes how to enhance security through personnel +management. + +Recruitment +*********** + +- Recruited personnel need a basic understanding about the technologies + involved and of security management. Arrange a test on this knowledge + before formal appointment. +- Check the identities, backgrounds, and qualifications of recruited + personnel. Archive related materials. +- Test recruited personnel's technical skills. +- Introduce the roles and responsibilities to recruited personnel and + arrange job training. +- Sign a confidentiality agreement with recruited personnel to forbid + information leakage. + +Staff position change or resignation +************************************ + +- When an employee leaves their position, revoke their system + permissions. +- When an employee resigns, they need to submit any passwords they have + been using for company devices. After their resignation, the + administrator shall change the passwords as soon as possible. +- Resigning employees shall go follow a strict resignation procedure in + the HR department and sign a non-disclosure agreement. + +Staff appraisal +*************** + +- Regularly conduct training focused on improving security skills and + awareness. Managers, regular employees, and third-party managers and + users all participate if necessary. Ensure they can identify + information security threats and risks, and can comply with security + policies. +- Conduct strict, comprehensive security audits for personnel in key + positions. For system administrators, review their identities and how + they are fulfilling job responsibilities. Review their system + permissions and job responsibilities, and their compliance with + non-disclosure regulations. For other key personnel, check their + identities and evaluate their fulfillment of job responsibilities. +- Punish those who violate the security policies or regulations. For + minor violations, a warning and a requirement to write a formal + apology are appropriate. For major violations, relevant departments + need to investigate the person's legal responsibilities. If the + person responsible for a minor violation comes from a third party, + ask them to correct their mistakes and inform their company. + +Security awareness training +*************************** + +- Regularly conduct security training. Ask employees to learn about + network security and ensure they understand relevant policies and + regulations. Ensure employees understand that they will be held + responsible even if they meant no harm and even if they violated + security regulations unintentionally. +- Accurately define the security responsibilities of every position and + the punishments for different violations. Arrange meetings to explain + important or complex regulations if necessary. +- Arrange publicity activities to help employees learn about network + security community operations and common violations, for example, by + making short videos. +- Develop security training plans. Conduct training on basic + information security knowledge and job procedures. +- Record security training in detail and archive the records. +- Ask employees to sign the Information Security Commitment Letter and + promise to abide by the company's network security policies and + regulations. + +Security capability training +**************************** + +- Establish a network security training system based on the industry + best practices. Arrange security capability training during different + stages, for example, during new employee orientation, during their + regular work week, or before promotions. Ensure employees are capable + of delivering secure products, solutions and services. +- Basic network security training: Develop training plans for different + roles and positions. For example, new employees must pass on-the-job + training and exams on network security and privacy protection before + they become regular employees. On-the-job employees need to take + courses as required by their positions. Managers must participate in + network security training and seminars. +- Precise training: Identify typical security problems in the product + development process and those responsible for the problems. Recommend + security training programs (including cases, training courses, + exercises, etc.) to them. +- Drills: Adopt industry best practices, develop a platform for network + security drills, and arrange confrontational role playing exercises. + Improve employees' security capabilities through practices. +- Incorporate network security requirements in the acceptable criteria + for jobs and promotion. + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/govern-and-manage/security-compliance-and-governance.rst b/doc/caf/source/govern-and-manage/security-compliance-and-governance.rst new file mode 100644 index 0000000..5f41cbf --- /dev/null +++ b/doc/caf/source/govern-and-manage/security-compliance-and-governance.rst @@ -0,0 +1,55 @@ +Security Compliance and Governance +---------------------------------- + +An increasing number of security and compliance laws and regulations are +being enacted all over the world. Companies that fail to meet these +regulatory requirements may face various penalties and suffer +significant losses. + +In accordance with the Cloud Service Cybersecurity & Compliance Standard +(3CS) of Huawei Cloud, the following measures shall be taken: + +- **Develop governance strategies,** including your organization's + security governance goals, roles and responsibilities, executives' + commitment, security governance priorities, and core KPIs. Ensure the + governance system can be efficiently effectively implemented and + continuously improved. + +- **Incorporate** **security control measures into management + processes**, so that business departments can better understand and + implement these measures in their routine work. + +- **Use tools to facilitate security and compliance governance,** + because some measures involve massive workloads. For example, if the + responsibilities of a job are changed, all the related account + permissions must be modified within 24 hours. To defend against + threats in a timely manner, an organization must develop advanced + tools, or use the security products or solutions of cloud service + providers. Huawei Cloud provides 20+ proprietary security services + and 200+ partner security services for you to choose from. + +- **Set up a governance organization** and assign a director to + implement governance strategies for network security and privacy + protection. + +- **Enhance data security.** Focus on eliminating security risks + throughout the data lifecycle. Avoid just emphasizing data security + everywhere with no clear focus. + +- **Use metrics to evaluate information security.** Determine the + dimensions of your security evaluation, collect the required + information, and develop metrics based on the records and statistics + generated for the security management activities in your + organization. The metrics need to be calculable and reflect the key + points in your organizational security governance. + +The following sections describe security compliance and governance in +more detail: + +.. toctree:: + :maxdepth: 1 + + security-management-organization.rst + personnel-security-management.rst + security-pmi.rst + account-security-management.rst diff --git a/doc/caf/source/govern-and-manage/security-management-organization.rst b/doc/caf/source/govern-and-manage/security-management-organization.rst new file mode 100644 index 0000000..e557ad8 --- /dev/null +++ b/doc/caf/source/govern-and-manage/security-management-organization.rst @@ -0,0 +1,192 @@ +Security Management Organization +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Set up a security management organization and define its working +processes and the roles and responsibilities of its members. + +A security management organization consists of an information security +committee and an information security team. The information security +team consists of a security administrator, a network administrator, an +application administrator, a data administrator, and the users. The +following figure shows a sample organization structure. + +.. image:: ../_static/images/image64.png + +Responsibilities of the information security committee +****************************************************** + +- Implement information security guidelines and policies, and review + and approve the information system security construction plan. +- Make decisions on major issues related to information system + security. +- Study and review systems, standards, and related policies in the + construction and management of information system security, and + coordinate relevant departments to supervise the implementation of + systems and policies. +- Arrange and guide activities to promote information security. + +Information security team responsibilities +****************************************** + +- Implement government requirements and regulations on information + security. +- Manage information security for business departments. +- Be responsible for the security management system of the information + system, including system construction, technical support, and + operational regulations. +- Handle the daily management of, and regularly improve, the disaster + recovery system. Formulate and regularly revise standards for + evaluation of a disaster recovery system. +- Monitor the status of the disaster recovery system, organize drills, + audit and evaluate the system, and provide suggestions for + improvement. +- If a major problem occurs in an information system, help related + departments identify the cause and take immediate measures, such as + initiating an emergency procedure. +- Raise awareness of information system security through training or + publicity. +- Manage, inspect, and report information system security. +- Prevent system-wide malicious code and manage network security. +- Work with external security agencies. Obtain support from external + security agencies in case of major security incidents. +- Develop and improve information system security plans. +- Manage, arrange, and supervise the security measures taken on + information systems. +- Build, operate, and manage the O&M system and technical support + platform. +- Arrange the formulation and implementation of the information system + security O&M regulations. +- Arrange responses to faults and emergencies. +- Manage data utilization, query, and visualization. +- Manage operations in the DR center. +- Follow the instructions of the information security committee. +- Develop information system security regulations, and technical + assurance and operation regulations. +- Regularly check information security status. If there are security + incidents, report them in a timely manner, and work with related + personnel in investigations and for troubleshooting. +- Conduct training to raise security awareness among internal + employees. + +Security administrator responsibilities +*************************************** + +- Be responsible for the security-related affairs of information + systems, and assist in supervising the implementation and + modification of the security system. +- Be responsible for the implementation of the security regulations. +- Develop and improve information system security plans. +- Make a list of security equipment, systems, or other related assets. + The list should include the departments, categories, IDs, names, + importance, and locations of the assets. +- Standardize security equipment or system management. Record asset + owners and users, security equipment or system identifiers, and + actions taken on the assets, such as purchase, use, change, and + retirement. +- Develop O&M procedures and related documentation for security + equipment or systems, so that relevant personnel can operate the + system in a standardized manner, avoiding information security + incidents caused by misoperations. +- Manage O&M of security equipment or systems, including regular + inspections, maintenance, troubleshooting, and change management. +- Assist managers in security training. Develop annual security + training plans. Raise security awareness through training and + publicity. +- Develop a security inspection plan. Arrange inspections at least once + a quarter. The plan should describe inspector responsibilities, + inspection frequency, inspection scope and content, problem + rectification, and inspection report. Conduct security inspections + and report results. Collect, sort, and archive related materials. +- Report security incidents in time. Assist with troubleshooting and + rectification. +- When a major problem occurs in the operation of the information + system, assist the relevant departments in correctly determining the + cause, and immediately take safety measures to initiate relevant + processing procedures according to the instructions. +- Work with external security agencies. Obtain their support when there + are incidents. +- Summarize security documents and forms semi-annually, and report the + progress made in implementing security regulations. + +Network administrator responsibilities +************************************** + +- Be responsible for network-related O&M, review, and changes, and + archive related documents and data. +- Manage network security development. Check whether the owners are + fulfilling their responsibilities, catch up with the plan, and meet + task requirements and quality standards. +- Standardize the network management process. Record asset owners and + users, security equipment or system identifiers, and actions taken on + the assets, such as purchase, use, change, and retirement. +- Adopt technical and management measures to enhance the security + control over information systems. Continuously improve network + security and stability, and logically isolate the external network of + the information system from the Internet. Review and record network + access. Arrange training about information system security. +- Develop O&M procedures and related documentation for security + equipment or systems, so that relevant personnel can operate the + system in a standardized manner, avoiding information security + incidents caused by misoperations. +- Assist security administrators in deploying network security + products. Manage the O&M of security equipment or systems, including + regular inspections, maintenance, troubleshooting, and change + management. +- Implement multi-level approval regulations for network system + changes, important operations, and access. Review routine + applications. Report major changes to the Information Security + Department. +- Arrange emergency responses to network faults or incidents. + +Application administrator responsibilities +****************************************** + +- Manage O&M, application review, and changes in the application + system. Archive the documents and data related to the application + system. +- Manage software development. Check whether the owners fulfill their + responsibilities, catch up with the plan, and meet task requirements + and quality standards. Set information security objectives during + project establishment and approval to enhance the security of the + system design and development process. Enhance code security. For + outsourced software development, sign a confidentiality agreement + with the service provider. During delivery acceptance, invite a + third-party security organization to evaluate product security. +- Standardize the management process of the information system. Record + asset owners and users, security equipment or system identifiers, and + actions taken on the assets, such as purchase, use, change, and + retirement. +- Develop O&M procedures and related documentation for important + applications, so that relevant personnel can operate the system in a + standardized manner, avoiding information security incidents caused + by misoperations. +- Standardize the asset management process of the application system. + Record asset owners and users, security equipment or system + identifiers, and actions taken on the assets, such as purchase, use, + change, and retirement. +- Manage routine security measures regarding password management, + authorization, and approvals in the application system, so that + employees can better comply with information security regulations. +- Implement multi-level approval regulations for application system + changes, important operations, and access. Review routine + applications. Report major changes to the Information Security + Department. +- Arrange emergency responses to network faults or incidents. + +User and data administrator responsibilities +******************************************** + +- Provide technical support of large-scale application systems. Monitor + and analyze daily operational data. Identify opportunities and pain + points in business operations. +- Conduct in-depth analysis and mining of customer demand models and + data. Generate metrics-based warnings. +- Manage and monitor accounts. +- Ensure the database is running properly. +- Identify and eliminate background problems and risks in a timely + manner. +- Fine-tune the system. +- Plan and implement backup strategies. + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/govern-and-manage/security-pmi.rst b/doc/caf/source/govern-and-manage/security-pmi.rst new file mode 100644 index 0000000..04fbbb0 --- /dev/null +++ b/doc/caf/source/govern-and-manage/security-pmi.rst @@ -0,0 +1,242 @@ + +Security PMI +~~~~~~~~~~~~ + +Develop Preventive Maintenance Inspection (PMI) standards. Define the +monitoring and alarm handling process for services or systems after they +are migrated to the cloud. Describe the key monitoring points, +remediation actions, and issue escalation process to minimize the impact +of security incidents on services. + +PMI Check Items +^^^^^^^^^^^^^^^ + +Network security PMI +******************** + +- Check for ports accessible from the Internet. For example, check the + configurations of ELB and NAT Gateway. Ensure there are no unsafe + ports accessible from the Internet, and no resources directly bound + to an EIP. Unsafe ports include internal management ports (for + example, 22 and 3389) and high-risk ports (135, 139, and 445 for TCP; + 137 and 138 for UDP; or other ports that can be exploited by + EternalBlue or other ransomware). +- Check intranet connections. Ensure the network ACL, security group, + or other network security configurations are clearly defined, only + necessary ports are open, and arbitrary access to ports is denied. + +System security PMI +******************* + +- Vulnerability management: Scan for vulnerabilities, fix them in a + timely manner, and track the vulnerabilities until they are fixed. + Our vulnerability management functions can help you detect + vulnerabilities in Linux, Windows, and Web-CMS. +- Baseline checks: Detect unsafe settings based on security baselines, + including weak password policies, common weak passwords, and unsafe + configuration items. +- Asset management: Inventory all the assets on your servers, including + accounts, open ports, processes, web directories, software, and + auto-started items. +- Intrusion detection: Use HSS to check for malicious programs, + suspicious processes, web shells, abnormal shells, unsafe accounts, + and rootkits. Work with related personnel to eliminate risks in a + timely manner. + +Application security PMI +************************ + +- Defense against common attacks: Ensure your business systems have + configured web firewall policies to monitor and block common web + application attacks, such as SQL injections, XSS, file inclusion, + directory traversal, sensitive file access, command/code injection, + Trojans, and web shells. +- Service access control: Meet with business personnel and determine + the scope of allowed access sources. Minimize this scope by + configuring WAF, VPN, and CBH; blocking IP addresses in certain + regions; and creating an IP address blacklist and whitelist. +- Common vulnerability monitoring: Scan for and fix vulnerabilities in + a timely manner. +- Web page tamper-proofing: Configure static Web Tamper Protection + (WTP) in HSS to protect website root directories, preventing + attackers from uploading scripts and launching reverse shells. +- Penetration testing: Manually check for and fix logical + vulnerabilities in a timely manner. +- Account management: Work with business experts to review all the + application accounts and revoke unnecessary permissions, especially + those for authentication, authorization, account, and audit (4A) + systems. Enable multi-factor authentication (MFA) and delete zombie + accounts. + +Data security PMI +***************** + +- Data access permission control: Review all the database accounts and + permissions, delete test accounts and other unnecessary accounts, and + retain only the least required permissions for accounts. +- Database protection: Deploy database audit and protection equipment + to filter, monitor, and audit database operations in real time. +- O&M plane checks: Prevent insiders from using O&M accounts to access + the internal network. Ensure CBH, VPN, and other systems used for O&M + access require two-factor authentication (2FA) and strong passwords. + This can prevent account cracking, which may result in intrusions + into a large number of servers. +- Account security checks: Ensure that the length and complexity of + account passwords meet security requirements. Manage permissions at a + fine granularity. Allow administrators to access only the resources + related to their jobs. +- Backups: Configure backup mechanisms for important business systems. + Use data backups, server backups, and snapshots to restore services + if there is an emergency. + +PMI Report Template +^^^^^^^^^^^^^^^^^^^ + ++-----------+---------------+------------------------------------------+ +| Category | Check Item | Security Suggestion | ++===========+===============+==========================================+ +| Account | Password | Password should be 10 characters or | +| password | complexity | longer and contain at least one digit, | +| | | uppercase letter, lowercase letter, and | +| | | special character. Passwords cannot be | +| | | the same as the username or the username | +| | | spelled backwards. | ++-----------+---------------+------------------------------------------+ +| | Account | Account lockout time: 30 minutes | +| | lockout | | +| | policy | Account lockout threshold: 5 invalid | +| | | logins | +| | | | +| | | Reset account lockout counter after: 30 | +| | | minutes | ++-----------+---------------+------------------------------------------+ +| | Password | The password needs to be changed | +| | change policy | regularly, at least once every 90 days. | +| | | Notify users 7 days before their | +| | | passwords expire. | ++-----------+---------------+------------------------------------------+ +| | Account not | Delete any accounts not in use. | +| | in use | | ++-----------+---------------+------------------------------------------+ +| Network | Network ACL | After a system is built, the system | +| | | development team shall provide a | +| | | configuration list. The O&M personnel | +| | | shall check for security risks in the | +| | | ACL. | ++-----------+---------------+------------------------------------------+ +| | Security | After a system is built, the system | +| | group | development team shall provide a | +| | | configuration list. The O&M personnel | +| | | shall check for security risks in | +| | | security groups. | ++-----------+---------------+------------------------------------------+ +| | Port | After a system is built, the system | +| | | development team shall provide the list | +| | | of necessary ports. The O&M personnel | +| | | shall check for and disable unnecessary | +| | | ports accordingly. | ++-----------+---------------+------------------------------------------+ +| | Network | Check for malicious traffic and analyze | +| | traffic | their statistics. Ask cloud service | +| | | providers to assist the analysis if | +| | | necessary. | ++-----------+---------------+------------------------------------------+ +| Operation | Account | Delete or disable accounts not required | +| | | for business operations. | +| system | | | ++-----------+---------------+------------------------------------------+ +| | Account | Use strong passwords. | +| | password | | ++-----------+---------------+------------------------------------------+ +| | Session | Generally, set it to 15 minutes. | +| | timeout | | ++-----------+---------------+------------------------------------------+ +| | Default | Disable default shared folders and drive | +| | sharing | letters. | ++-----------+---------------+------------------------------------------+ +| | Process | After a system is built, the system | +| | | development team should provide the list | +| | | of necessary process. The O&M personnel | +| | | shall check for and shut down | +| | | unnecessary processes, such as | +| | | alsasound, cups, fbset, nfs, postfix, | +| | | rpcbind, smbfs, snmpd, splash, | +| | | splash_earl, etc. | ++-----------+---------------+------------------------------------------+ +| | System | Scan for and fix vulnerabilities. | +| | vulnerability | | ++-----------+---------------+------------------------------------------+ +| | System log | Check system logs and analyze suspicious | +| | | events and attacks (if any). | ++-----------+---------------+------------------------------------------+ +| | Messenger | Stop the service and disable it. Enable | +| | service | it only when necessary. | ++-----------+---------------+------------------------------------------+ +| | Remote | Stop the service and disable it. Enable | +| | registry | it only when necessary. | +| | service | | ++-----------+---------------+------------------------------------------+ +| | TCP/IP | Stop the service and disable it. Enable | +| | NetBIOS | it only when necessary. | +| | Helper | | +| | service | | ++-----------+---------------+------------------------------------------+ +| | Wireless | Stop the service and disable it. Enable | +| | configuration | it only when necessary. | +| | service | | ++-----------+---------------+------------------------------------------+ +| | Error | Stop the service and disable it. Enable | +| | reporting | it only when necessary. | +| | service | | ++-----------+---------------+------------------------------------------+ +| | Help and | Stop the service and disable it. Enable | +| | support | it only when necessary. | +| | service | | ++-----------+---------------+------------------------------------------+ +| | Telnet | Stop the service and disable it. Enable | +| | service | it only when necessary. | ++-----------+---------------+------------------------------------------+ +| | Print spooler | Stop the service and disable it. Enable | +| | service | it only when necessary. | ++-----------+---------------+------------------------------------------+ +| | Computer | Stop the service and disable it. Enable | +| | browser | it only when necessary. | +| | service | | ++-----------+---------------+------------------------------------------+ +| | Themes | Stop the service and disable it. Enable | +| | service | it only when necessary. | ++-----------+---------------+------------------------------------------+ +| | Telephony | Stop the service and disable it. Enable | +| | service | it only when necessary. | ++-----------+---------------+------------------------------------------+ +| M | System log | Back up system logs for auditing. | +| onitoring | backup | | +| Audit | | | ++-----------+---------------+------------------------------------------+ +| | Monitoring | A system shall provide monitoring | +| | information | capabilities, so that users can learn | +| | | the overall system status by checking | +| | | system monitoring items, database | +| | | monitoring items, business system | +| | | metrics, and network status. | ++-----------+---------------+------------------------------------------+ +| Database | Database | Delete default and unnecessary database | +| | account | accounts. | ++-----------+---------------+------------------------------------------+ +| | Database | Use strong passwords. | +| | password | | ++-----------+---------------+------------------------------------------+ +| | Database file | Minimize file and folder permissions. | +| | access | Allow data to be written only by | +| | permission | database administrators and the user | +| | | that runs the database process. | ++-----------+---------------+------------------------------------------+ +| Web | Web | Scan for and fix web vulnerabilities. | +| Service | vulnerability | | ++-----------+---------------+------------------------------------------+ +| | Website log | Check website logs and analyze | +| | | suspicious events and attacks (if any). | ++-----------+---------------+------------------------------------------+ + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/govern-and-manage/trends-and-challenges.rst b/doc/caf/source/govern-and-manage/trends-and-challenges.rst new file mode 100644 index 0000000..220beae --- /dev/null +++ b/doc/caf/source/govern-and-manage/trends-and-challenges.rst @@ -0,0 +1,84 @@ +Trends and Challenges +~~~~~~~~~~~~~~~~~~~~~ + +O&M Trends +^^^^^^^^^^ + +As cloud computing and AI become more popular, cloud-based O&M evolves +with them. Key trends include: + +1. Artificial Intelligence for IT Operations (AIOps). As data volume and + environment complexity grow rapidly, O&M is increasingly powered by + AI and big data. +2. Cloud-native, microservice, containerization, and distributed + technologies mean that labor-intensive O&M no longer meets enterprise + needs. O&M systems must be automated to better track and locate + faults. +3. Private cloud, public cloud, and multi-cloud are inevitable choices + for many enterprises, so they need a cross-cloud O&M assurance + system. +4. The scope of O&M increases in addition to maintaining systems. + Enterprises also need overall planning, HA capability, security + system construction, and R&D and product enablement. + +Challenges of Cloud-based O&M +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Cloud-based O&M differs greatly from traditional O&M. + +Model +***** + +- Traditional O&M: Enterprises need to build most capabilities on their + own. They directly manage computing, network, and storage, and other + resources. They take all the responsibility and full control of their + resources. However, capability construction is slow and difficult. +- Cloud-based O&M: Cloud services provide standard basic capabilities. + Based on these services, enterprises build required architectures and + O&M capabilities. They use software interfaces or open APIs to manage + abstract resources. Enterprises share responsibility with cloud + vendors and do not need full awareness of underlying resources. + However, capability construction is fast and flexible. + +Scope +***** + +- Traditional O&M focuses on maintaining IDC equipment rooms. O&M + environments are rarely changed and therefore familiar to + enterprises. +- Cloud-based O&M varies by IT development and industry requirement. + Environments include offline IDC, private cloud, public cloud, or + even multi-cloud. Increasingly complex O&M environments require + higher O&M control and skills. + +Overall capability +****************** + +- The cloud provides many more products and functions than traditional + IDCs. Although O&M personnel are not tasked with underlying + maintenance, they need to be familiar with the various cloud products + for better selection and usage. + +Security risks +************** + +- Cloud resources are mostly logically isolated, which is riskier than + physical isolation. Therefore, proper security planning is crucial. + For example, cloud storage must be encrypted to protect data. +- The cloud architecture solution is flexible. Most management is based + on open APIs, and resource access is by running single commands. + Therefore, security needs to be controlled during external access + design. In addition, strict audit is recommended for risky commands. + +Troubleshooting +*************** + +- Cloud service faults are more difficult to locate. Once microservices + are deployed, the relationship between services becomes more complex. + Therefore, cloud-based O&M is key to locating faults. +- Some faults need to be located with the cloud service support team. + Therefore, in-depth communication with the cloud service support team + is needed. + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/index.rst b/doc/caf/source/index.rst index 771ec97..6f4d0e7 100644 --- a/doc/caf/source/index.rst +++ b/doc/caf/source/index.rst @@ -1,3 +1,71 @@ -=== -CAF -=== +Cloud Adoption Framework +======================== + +Digital technologies, the digital economy, and new forms of digital competition are key initiatives for governments and enterprises across the globe to improve their digital services and cloud adoption. +Before migrating workloads to the cloud, governments and enterprises need to identify and prioritize the workloads. +They also need to know the value being brought by the cloud and ensure organizational capabilities. + +Open Telekom Cloud Adoption Framework (OTCAF) aims to: + +* Help organizations that need to go to the cloud clearly define the plan, strategies, methods, and best practices for cloud adoption so that they can systematically prepare for cloud adoption, and govern and manage services on the cloud. +* Help IT, finance, and security teams determine cloud adoption methods and governance to establish required capabilities. +* Help various roles achieve business objectives, and support organizations in building digital competitiveness for business success. + +The OTCAF is based on references of our experts, partners and customers collected during +their own digital transformation journey, combined with our industry best practices. +It is divided into *four* phases as follows: + +.. image:: _static/images/image4_1.png + +| + +Phase 1: Plan ++++++++++++++++ + +Clarify the motivation for cloud adoption at the management level. It aims to define a cloud adoption blueprint, implementation and plan, and organization assurances. The scope is to address the cloud adoption strategy and convert the goals into action plans. + +Phase 2: Ready +++++++++++++++ + +Build a Landing Zone and design the cloud architecture in preparation for the cloud adoption. +Landing Zone is used for unified management and governance of people, finances, resources, permission, and security compliance of multiple business units. +The cloud architecture may include IaaS, PaaS and SaaS capabilities, cloud management, +and O&M capabilities, providing features such as high availability, robust scalability, security compliance, +and cost-effectiveness. + +Phase 3: Adopt +++++++++++++++ + +Migrate applications and data to the cloud in a proper order, and innovate services on the cloud. + +Phase 4: Govern & Manage +++++++++++++++++++++++++++ + +Perform cost management, security compliance and governance, and cloud operation and maintenance (O&M) to ensure cost-effective, efficient, secure, and stable services running on the cloud. + +Let's see these phases one by one: + +.. toctree:: + :maxdepth: 1 + + plan/index.rst + ready/index.rst + adopt/index.rst + govern-and-manage/index.rst + concluding-remarks.rst + +Intended Audiences +++++++++++++++++++ + +Migrating workloads to the cloud is for any organisation large or small, public or private a very big +endeavor that requires the support and collaboration of various departments, people and skills. The +Open Telekom Cloud Adoption Framework is here to guide those heterogeneous stakeholders in every step +of the road. The audience that could be benefited from using OTCAF are mainly, but not limited to, +the following: + +- Cloud architects +- Business and technology decisions makers +- Product owners +- Legal and finance +- Information technology specialists on various fields (e.g administration, networking, security, governance) + diff --git a/doc/caf/source/plan/cloud-adoption-blueprint.rst b/doc/caf/source/plan/cloud-adoption-blueprint.rst new file mode 100644 index 0000000..02b9576 --- /dev/null +++ b/doc/caf/source/plan/cloud-adoption-blueprint.rst @@ -0,0 +1,74 @@ +Cloud Adoption Blueprint +========================== + +The adoption blueprint specifies the Cloud migration strategy, scope, +and objectives of key capability development: + +Cloud migration strategy +************************ + +- Determine the cloud deployment model: **public cloud** or **hybrid + cloud**. Public cloud is more suitable for agile services and optimized + costs. Hybrid cloud is a good choice for ensuring low service latency + and enhanced security. + +- Define a **single-cloud** or **multi-cloud** strategy. A single-cloud + strategy helps build leading capabilities while a multi-cloud + strategy allows you to leverage the best capabilities of different + cloud providers. + +High level of collaboration +*************************** + +- Vertical collaboration between central and local governments (states + or provinces), as well as between enterprise headquarters and + branches, including business, data, resource, and cloud-edge + collaboration. + +- Horizontal collaboration includes hybrid cloud collaboration and DR + across regions. + +Cloud migration architecture and key capabilities +************************************************* + +Complete the blueprint for cloud migration by layer and module in terms +of IaaS, PaaS, application enablement, application or SaaS, cloud +reliability, cloud security, cloud O&M, and cloud operations, including: + +- **Cloud infrastructure:** public cloud and hybrid cloud, multi-cloud + management, multi-architecture computing, big data storage and + computing, high-performance computing. + +- **Data lake:** data lake house, flexible import to the lake, data + governance, data security, real-time self-service analysis, and + intelligent data application. + +- **Enabling platforms:** AI, video, communications, IoT, blockchain, and + related technologies + +- **Application migration to the cloud:** cloud migration scope and + capability objectives + +- **Application innovation (optional):** cloud-native applications, + DevCloud, and industry cloud + +- **Cloud reliability:** DR, HA, and related capabilities + +- **Cloud security:** security system for network, host, application and + data, operations security, ecosystem security, and industry security + certification + +- **Cloud O&M:** resource management, cloud monitoring + +- **Cloud operations:** operations platform, organization, and processes + +A detailed cloud adoption blueprint is designed and planned based on the +current situation of every organisation. The blueprint should be focused on cloud applications, +cloud infrastructure, application enablement or AI enablement or data +enablement platform capabilities. In addition, collaboration with +device-side or edge-side intelligent awareness interaction and +cloud-network synergy must be considered to ensure comprehensive access +to data and applications. + +.. toctree:: + :maxdepth: 1 diff --git a/doc/caf/source/plan/cloud-adoption-dimensions.rst b/doc/caf/source/plan/cloud-adoption-dimensions.rst new file mode 100644 index 0000000..6f5d3e1 --- /dev/null +++ b/doc/caf/source/plan/cloud-adoption-dimensions.rst @@ -0,0 +1,99 @@ +Cloud Adoption Dimensions +========================== + +As governments and enterprises have different strategic priorities and +development phases, the core motivations for them to go to the cloud may +differ. These core motivations should be determined based on actual +conditions. It is a good practice to focus on the following three +dimensions: + +IT transformation +***************** + +Currently, IT systems are facing the following challenges: + +- Low IT resource utilization, high costs, and complex maintenance and + lifecycle management. Service units and departments use different + data centers deployed with various types of servers and storage + devices, and applications are bound to servers. + +- Insufficient capabilities of disaster recovery, security, + scalability, and maintainability affect service stability and + ability to expand on demand. + +- They are too slow to introduce new technologies, such as containers, + cloud-native, and blockchain. Current IT systems are limited by + insufficient capabilities and by their organization members' lack + of experience. They urgently require new capabilities based on + mature products and extensive experience on the cloud, so they can + modernize and upgrade their IT systems, reduce costs and enhance + efficiency, and build IT support capabilities for future industry + competition. + +- IT organization transformation. + Currently, IT departments are often positioned in support positions, + passively supporting business development. However, as the digital + transformation of the governments and enterprises deepens, IT and + digital capabilities have become parts of planning, R&D, production, + sales, service and operation, directly affecting business results and + competitiveness. IT departments urgently need to change their roles. + They need to be integrated into the production chain. IT departments and + awareness, a culture, focused on a service-oriented cloud platform and + on transforming their digital capabilities. + +Data intelligence and data security +*********************************** + +Data has become a new production factor together with traditional +factors such as land, labor, capital, and technology. + +- In the government sector, data-driven government services and + government governance collaboration across departments help with + government administration, policy formulation, and decision-making. + Data intelligence provides insights and responds quickly to social + and economic trends, enhancing public satisfaction and government + efficiency. + +- In the financial sector, data intelligence assists in customer + marketing, risk control, and product design. It supports the + expansion of key services such as supply chain finance and digital + currency. + +- In the enterprise domain, data intelligence enables design and + testing simulation, intelligent raw materials allocation, intelligent + production scheduling, supply chain risk management, and operational + visibility. It fully enhances efficiency and reduces risk. + +Over time, data platforms have had to provide increasingly more robust +capabilities. The huge volumes of diverse types of data, the levels of +access concurrency, the constantly evolving data technologies and new +application scenarios demand high performance, efficiency, and +reliability. They demand platforms that can flexibly expand and rapidly +evolve. The requirements make a cloud service model the obvious choice. +Cloud service providers provide platforms with these technical +capabilities. The industry has become focused on scenario-specific +capabilities related to its own data. + +Data security is increasingly related to security of both enterprises +and governments. Cloud service models facilitate of advanced security +technologies to centrally manage data security and provide maximum +security assurance. + +Service and business innovations +******************************** + +Digitalization drives service innovation. It has been driving advances +in smart production, services, and operations, as well as innovations +like the sharing economy and industry chain collaboration, and there is +capability spillover. Technologies are needed for innovations: + +- New technological capabilities such as AI, IoT and blockchain. + +- Rapid rollout and iteration are required for first-mover + opportunities. Cloud service models provide IaaS, PaaS, and SaaS + capabilities that are always industry leading, lowering barriers to + entry for service innovation, reducing costs, and accelerating + innovation. + +.. toctree:: + :maxdepth: 1 diff --git a/doc/caf/source/plan/cloud-adoption-motivations.rst b/doc/caf/source/plan/cloud-adoption-motivations.rst new file mode 100644 index 0000000..5465d7c --- /dev/null +++ b/doc/caf/source/plan/cloud-adoption-motivations.rst @@ -0,0 +1,12 @@ +Cloud Adoption Motivations +========================== + +.. todo: + + fill in an overview + +.. toctree:: + :maxdepth: 1 + + cloud-adoption-reasons.rst + cloud-adoption-dimensions.rst diff --git a/doc/caf/source/plan/cloud-adoption-reasons.rst b/doc/caf/source/plan/cloud-adoption-reasons.rst new file mode 100644 index 0000000..6a86e52 --- /dev/null +++ b/doc/caf/source/plan/cloud-adoption-reasons.rst @@ -0,0 +1,92 @@ +Cloud Adoption Reasons +========================== + +The main reasons for cloud adoption are as follows: + +Addressing problems related to software and hardware lifecycles +*************************************************************** + +When data centers and servers reach the end of their lifecycles, or when +multiple data centers are integrated, instead of using traditional data +centers, this can be a good time to start using advanced cloud service +models. A cloud service model can also help governments and enterprises +eliminate the complex lifecycle management involved with the physical IT +hardware, middleware, and technology platforms used in a traditional +model. + +Enhancing service agility +************************* + +Cloud services enhance service agility in the following ways: + +- Infrastructure resources are obtained as required for timely launch + of new services. With physical hardware, you may have to wait weeks + or even months for new equipment to arrive. + +- Applications and resources can be added on demand to rapidly scale + services up or out as needed. + +- Technical platform services such as middleware, cloud-native, and + DevOps can be obtained on demand to accelerate service rollout and + help customers gain first-mover advantages. + +Reducing IT costs +***************** + +A cloud service model can adjust the amount of resources deployed based +on service requirements. This flexibility eliminates unnecessary +expenditures. It also lowers the capability requirements for O&M +personnel for the infrastructure and related technical platforms, which +reduces the cost of O&M. In a public cloud model, local data centers do +not need to be managed or maintained, and cloud resources and cloud +service capabilities can be obtained as required, greatly reducing IT +costs. In addition, the trial-and-error costs of trying out a new +service can be significantly reduced. + +Enhancing O&M efficiency +************************ + +Experienced O&M teams with hundreds or thousands of people from cloud +service vendors provide professional services, significantly enhancing +O&M quality and efficiency. + +Improving reliability and security compliance +********************************************* + +Cloud service models provide highly reliable, secure technical +capabilities fully compliant with industry regulations, based on a +robust library of best practices. It also provides assistance with +policy formulation, organization process development, and standard +certification. + +Supporting global deployment +**************************** + +The global resources, networks, and platforms deployed on the cloud help +enterprises quickly launch new multinational services and collaborate +with headquarters in terms of services, data, and management. + +Building data foundation and data assets +**************************************** + +The cloud service model helps organisations build data +foundations and data assets by using: + +- Advanced technologies such as big data, AI, data governance, and data + security. + +- Numerous industry best practices, including data platforms, + performance optimization, data governance, organization process for + data operation, and data intelligence application. + +- Continuously introducing new technologies to accelerate innovations. + +Cloud service models ensure the industry-leading technologies and best +practices are always available to accelerate service and business +innovation. For example, cloud-native enables more agile services. AI +enables more intelligent decision-making and unmanned or less-manned +production. IoT provides connectivity of everything and intelligent +sensing, and blockchain enables trusted smart contracts. + +.. toctree:: + :maxdepth: 1 diff --git a/doc/caf/source/plan/cloud-migration-plan-and-implementation.rst b/doc/caf/source/plan/cloud-migration-plan-and-implementation.rst new file mode 100644 index 0000000..a5af5d3 --- /dev/null +++ b/doc/caf/source/plan/cloud-migration-plan-and-implementation.rst @@ -0,0 +1,118 @@ +Cloud Migration Plan and Implementation +======================================= + +Cloud migration is planned and implemented based on the infrastructure, +technical platform and key applications on the live network, along with +the cloud migration scope and key capability objectives specified in the +migration blueprint. + +Our experts have summarized the following four steps based on its experience +in cloud migration: + +- **Step 1**: Apps improving external user experience + +- **Step 2**: Apps improving workplace productivity + +- **Step 3**: IT apps supporting enterprise core business processes + +- **Step 4**: Apps supporting innovation in business and product forms + +| + +.. image:: ../_static/images/image11_1.png + +| + +Cloud migration strategies +************************** + +- **Retire**: Back up and decommission application systems that are no + longer used to reduce resource waste. + +- **Retain**: Retain applications deemed unnecessary or unsuitable for + cloud migration off the cloud due to resource costs, application + lifecycles, or enterprise service policies. However, communication + and data integration with other applications on the cloud are often + required. + +- **Rehost**: Directly migrate applications to the cloud without changing + the application running environment. The common migration operations + are Physical to Virtual (P2V) and Virtual to Virtual (V2V). + +- **Replatform**: Replace the cloud-based PaaS platform, such as middleware + and database, used by on-premises applications without changing the + core application architecture. This lowers platform technical + resource investment required, makes management less expensive, and + enhances efficiency. + +- **Rearchitect**: Change the application architecture and development mode + to build cloud-native capabilities. For example, monolithic + applications are broken down into microservices. This strategy is + used to support the long-term development of enterprise services on + the cloud when existing applications cannot support subsequent + function, performance, and scalability requirements. + +- **Repurchase**: Decommission legacy applications and replace them with + new ones, such as newly developed SaaS applications or third-party + SaaS services. + +| + +.. image:: ../_static/images/image12_2.png + +Cloud migration approach +************************ + +Before determining the cloud migration approach, prioritize applications +based on the following factors: + +- **Value and urgency**: For example, applications that have high + requirements on elastic scaling and agility iteration in Internet + services are of high value. If application servers are about to reach + End of Life (EOL) or require cloud-based DR capabilities, cloud + migration is urgent. + +- **Risk or service impact**: Low-latency applications (such as + manufacturing applications) and core financial systems requires high + performance, availability and data security control. Cloud migration + has great impact on these services. + +- **Difficulty or service complexity of technical solutions**: If + applications are not decoupled by layer and are complex with high + dependency on peripheral equipment, the migration solution is + difficult. + +The following principles, which have been identified based on Open Telekom +Cloud's robust experience with system migration and application +reconstruction, can help ensure a smooth migration: + +- **Determine the migration sequence**: Evaluate the complexity and + importance of applications. Preferentially migrate applications with + high value, low migration impact, and low complexity. + +- **Migrating from edge to core**: Separately migrate services of different + types, such as test services, office services, production services, + and billing services. Start your migration with supporting + applications and then move on to core applications. + +- **Independent applications first**: Migrate simple applications with few + peripheral dependencies first. + +- **Integrity of application migration**: Ensure the integrity of the + migration scope, objects, and process. Make sure that services are + running properly after migration. + +- **Minimal impacts**: Consider all possible impacts of cloud migration on + the running of the destination systems and take necessary measures to + minimize risks. + +Pilot project +************* + +Select applications that are easy to migrate to the cloud, which has +strong driving force and great value for pilot projects to achieve quick +wins. During migration practice, build organizations, streamline +processes, build capabilities, and eliminate risks. + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/plan/gap-analysis.rst b/doc/caf/source/plan/gap-analysis.rst new file mode 100644 index 0000000..5dd9b61 --- /dev/null +++ b/doc/caf/source/plan/gap-analysis.rst @@ -0,0 +1,65 @@ +Gap Analysis +============ + +Before formulating your Cloud migration strategy: + +- **Review** the current accounts, special requirements, key issues and challenges of the current IT systems. +- **Identify** new possible requirements for future service evolution. +- **Define** the key cloud adoption objectives as required. + +To get a complete analysis, refer to the following **Cloud Maturity Assessment Model**: + +.. image:: ../_static/images/image6_1.png + +Key gap analysis tasks include: + +Infrastructure analysis +*********************** + +- Infrastructure resources and configurations, including servers, + storage types, basic configurations, resource quantities and usages, + lifecycle configurations, virtualization technologies, + containerization, and application distribution. + +- Special requirements for performance, security, or operating system + or hardware dependencies. + +- Key challenges, such as low availability and complicated maintenance. + +Technical platform analysis +*************************** + +- Basic platform details, such as the amount, scale, and usage of + middleware, database, data warehouse, big data, and development and + test platforms. + +- Special requirements and key challenges. For example, middleware and + big data are developed based on open source software, which often has + weak performance, poor stability, or no DR capabilities for critical + services. + +Major application analysis +************************** + +Key applications need to be identified and analyzed to classify key +requirements for the implementation. This includes systems such as +channel systems, service systems, mission-critical systems, and data +related systems. Channel systems require flexible performance scaling +and fast iteration. Service systems need to be reliable and can expand +on demand. Mission-critical and data related systems require high +reliability, stability, and concurrency. + +Design for X (DFX) capability analysis +************************************** + +Review the challenges related to DR, security, performance, and O&M. + +Evolution of innovation +*********************** + +New technologies such as AI, blockchain, and fast iteration based on +containers, microservices, and cloud-native can leverage the public +cloud for rapid reconstruction. + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/plan/index.rst b/doc/caf/source/plan/index.rst new file mode 100644 index 0000000..d824b5a --- /dev/null +++ b/doc/caf/source/plan/index.rst @@ -0,0 +1,28 @@ +Phase 1: Plan +============= + +The purpose of the **Plan** stage is to determine the cloud adoption +motivations, IT status quo, gap analysis, and high-level design planning. +It includes cloud adoption strategy, cloud adoption blueprint, and cloud +migration roadmap. + +The cloud plan primarily supports grant budget, organizational +optimization, and talent training. It provides an assurance that the +people, finances, resources, permission, and security compliance needed +for cloud adoption are all in place. In addition, the baseline should be +released to relevant organizations after being approved by the +management to ensure the right strategy is selected for subsequent cloud +adoption. *The plan is usually focused on less business +critical applications with high value as pilot projects*. +Quick wins help build organizational capabilities and reduce risks. + +The plan needs to include the following *five* key tasks: + +.. toctree:: + :maxdepth: 1 + + cloud-adoption-motivations.rst + gap-analysis.rst + cloud-adoption-blueprint.rst + cloud-migration-plan-and-implementation.rst + organization-and-capability-assurance.rst \ No newline at end of file diff --git a/doc/caf/source/plan/organization-and-capability-assurance.rst b/doc/caf/source/plan/organization-and-capability-assurance.rst new file mode 100644 index 0000000..b83d145 --- /dev/null +++ b/doc/caf/source/plan/organization-and-capability-assurance.rst @@ -0,0 +1,134 @@ +Organization and Capability Assurance +===================================== + +It is a best practice to build organizations with appropriate skills to +ensure successful implementation of the cloud strategy and the +continuous evolution with vitality. The most efficient way is to build a +centralized governance team. + +.. image:: ../_static/images/image13_1.png + +Management +********** + +Service migration to the cloud involves changes in process, +organization, and culture. It requires a holistic approach and the +sponsorship of the top management. The core elements are as follows: + +- **Strategic clarity:** Identify the motivation for cloud migration + and design a cloud migration plan at the strategic level, perform a + risk/benefit analysis, and ensure that there is a clear consensus + among the management team and relevant organizations. + +- **Grant budget:** During cloud migration, changes in resource + procurement, resource usage, and internal settlement modes are + involved. Budgeting and financial management are required to support + smooth cloud migration. + +- **Cloud service provider selections:** To support continuous business + innovation, cloud service providers need to have a strong B2B DNA. + They need to be able to provide E2E services, share their + experiences, and support continuous leaderships in cloud + technologies. Selecting the right cloud service providers sets the + stage for a successful cloud adoption. + +- **Organization optimization:** Set up a cloud migration organization + that streamlines business and IT departments, specify the + responsibilities and collaboration mechanism, and develop a + decision-making and monitoring mechanism. + +Cloud Center of Excellence +************************** + +Cloud Center of Excellence (CCoE) supports the development of cloud +adoption strategies, assumes architecture responsibilities, selects best +solutions, and enhances organizational skills. This includes defining +cloud strategies and policies, designing blueprints, supporting cloud +service provider selection, leading architecture design, guiding cloud +service selection, and O&M governance. CCoE can be a virtual team that +requires collaboration between business and technical departments. + +.. warning:: + + CCoE **does not** handle routine operations and is not a project management organization. + +CCoE includes, but not limited to, the following roles: + +- **Cloud strategy expert**: formulating and breaking down cloud strategies +- **Cloud architecture expert**: designing cloud architecture and + formulating migration strategies +- **Security compliance expert**: planning security cloud services, as well + as building, verifying, and deploying security policies +- **Business, finance, legal affairs, quality and operation experts** + +Service team +************ + +The cloud Service team is responsible for the following stages: + +- **Cloud construction**: A cloud infrastructure needs to be built. This + includes a PaaS, service enablement platform, DR, and various + security capabilities. In addition to cloud management functions, + cloud management needs to be integrated into the overall enterprise + O&M management system to further build self-service resource + management and intelligent O&M capabilities. + +- **Cloud migration**: One or more *development teams* also need to get involved in + the cloud migration process. They will be responsible for developing + and implementing the cloud migration solution. + +- **Cloud management**: Resources and permissions need to be managed, and + costs on the cloud need to be controlled and optimized. Configuration + and deployment policies need to be optimized for reliability, + security, cost, performance, security compliance. + +Data team +********* + +The Data team is responsible for the following tasks: + +- **Data platform setup**: A data platform needs to be set up based on + cloud service capabilities and it needs to meet industry requirements + for performance, scalability, cost, reliability, and security. + +- **Data management**: The data, business and cloud service teams need to + work together on data integration, data development, data governance, + data services, and data security. They need to support data + requirements, ensure data quality, control data risks, and handle or + escalate data issues and disputes as need for final decisions. + +- **Data asset operations**: The team needs to conduct internal data + sharing and exchange, formulate data usage rule and process, monitor + traffic, and run external data asset monetization operations to + promote trusted data circulation. + +.. note:: + + Depending on the organization's needs, the cloud Service team and Data + team can be deployed *together or separately*. In the government sector, + for example, the cloud Service team and the Data team can work together. + In some enterprises, however, the cloud service team and the data team + have to work separately. + +Personnel development plan +************************** + +After services are migrated to the cloud, traditional IT work, such as +O&M and security, will change significantly. Cloud service providers +offer infrastructure and technology platforms. IT departments of +governments and enterprises focus more on service requirements and +capability building. Employees need to master cloud architecture +technologies for cloud solutions, cloud migration, cloud governance, and +data management and operation capabilities that integrate with their own +scenarios. They also need to always be learning the application +capabilities of new technologies to meet government and enterprise +service innovation requirements. They need to keep up with the latest +AI, IoT, blockchain, microservices, and DevSecOps concepts on the cloud. + +The personnel development plan needs to cultivate new capabilities that +match the cloud service model through the training and transformation of +original personnel, as well as introduction of talent to implement the +cloud migration strategy. + +.. toctree:: + :maxdepth: 1 diff --git a/doc/caf/source/ready/account-and-organizational-structure.rst b/doc/caf/source/ready/account-and-organizational-structure.rst new file mode 100644 index 0000000..c2fc3f5 --- /dev/null +++ b/doc/caf/source/ready/account-and-organizational-structure.rst @@ -0,0 +1,116 @@ + +Account and Organizational Structure +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Landing Zone solution needs a secure, compliant, and scalable +multi-account environment on the cloud. + +.. important:: + It is a best practice to plan the account and organizational structure first. + +Open Telekom Cloud provides a reference structure. It is recommended that you +design organizational levels and accounts based on your business +architecture, geographic architecture, and IT functions. + +1. Different organizational levels and organizational units (OUs) are + defined on Open Telekom Cloud based on your service architecture. + Independent member accounts can be created for each service OU based + on service systems. Independent or shared accounts can be created + based on service scales and isolation requirements. + +2. Different organizational levels and OUs are defined on Open Telekom Cloud + based on the geographic architecture. Independent member accounts can + be created under geographic OUs by country or region. On-premises + customer relationship management systems and customer service systems + can be deployed in these OUs. + +3. For the central IT department of an enterprise, corresponding OUs are + created on Open Telekom Cloud and the member accounts described in the + following table are created based on different IT functions to + isolate responsibilities and permissions in the IT management domain + and to manage multiple member accounts. + +.. list-table:: + :widths: 25 25 25 25 + :header-rows: 1 + + * - Account + - IT Function + - Responsible Team + - Resource/Cloud Service + * - Network operations account + - Centrally deploy and manage enterprise network resources including network border + security resources to unify network resource management and networking between VPCs + under multiple accounts. + - Network management team and security management team + - NAT Gateway, EIP, VPC, Direct Connect, Cloud Connect, VPN, CFW, WAF, Anti-DDoS + * - Public service account + - Centrally deploy and manage enterprise public resources, services, and application systems, and share them with other member accounts. + - Public service management team + - NTP server, AD server, Self-built DNS, OBS bucket, SWR, Collaborative office system + * - Security operations account + - - Serve as the enterprise security operations center. + - Centrally control the security policies, security rules, and security resources of the entire enterprise. + - Set security configuration baselines for other accounts. + - Be responsible for the information security of the entire enterprise. + - Security management team + - Services that support cross-account security control, such as DEW, SCM, and VSS + * - O&M and monitoring account + - Implement unified monitoring and O&M of resources and applications under each member account, and identify potential issues and send pre-warnings in a timely manner. + - O&M team + - CBH, Grafana, Prometheus, 3rd-party O&M and monitoring systems + * - Log account + - Centrally store the run logs and audit logs of other accounts. + - Log analysis team and compliance audit team + - LTS, OBS bucket, SIEM system + * - Data platform account + - Centrally deploy the big data platform of the enterprise and collect service data of other accounts to the data platform for storage, processing, and analysis. + - Data processing team and business analysis team + - Data lake, Big data analysis platform, Data access service, Data governance platform + * - DevOps account + - Centrally manage CI/CD pipelines for the entire enterprise and deploy them across accounts. + - Software R&D team + - DevCloud, Self-built DevOps pipeline + * - Sandbox account + - Test functions and security policies of cloud services. + - Test team + - On-demand resources and services that need to be tested + +.. tip:: + In addition to these member accounts, you can create more IT functional + accounts such as application integration accounts or collaborative + office account as needed. + +By default, a master account is created under the root of the +organization. It is recommended that no cloud resources be deployed +under this master account. You can use this master account to do the +following: + +- **Centrally manage organizations and accounts**: Create and manage + organizational structures and OUs, create member accounts for OUs, or + invite other existing accounts to the organization as member + accounts. + +- **Centrally manage finances**: Collect and analyze statistics on the + costs of the entire enterprise spent on Open Telekom Cloud; top up + accounts, apply for credit limits, activate coupons on Open Telekom Cloud + and allocate them to member accounts; regularly review the usage of + funds, credit limits, and coupons of member accounts and reclaim them + in a timely manner. + +- **Centrally manage organizational policies**: Set organizational policies + for OUs and member accounts, forcibly restrict user permissions (also + for account administrators) under member accounts to prevent security + risks caused by excessive permissions. If you apply an organizational + policy to a specific OU, the policy applies to all member accounts + and lower-level OUs in that OU. + +.. tip:: + You can use enterprise projects or tags to group resources at a fine + granularity under member accounts. For example, you can group + application subsystems or sub-products into an enterprise project or tag + them on Open Telekom Cloud. You can also perform cost allocation and + find-grained permissions control based on these groupings. + +.. toctree:: + :maxdepth: 1 diff --git a/doc/caf/source/ready/account-management.rst b/doc/caf/source/ready/account-management.rst new file mode 100644 index 0000000..8e14ca5 --- /dev/null +++ b/doc/caf/source/ready/account-management.rst @@ -0,0 +1,50 @@ +Financial Management +^^^^^^^^^^^^^^^^^^^^^ + +Open Telekom Cloud provides the accounting management solution for enterprises +with multiple accounts to help them implement unified management for +accounts, organizations, funds, invoices, bills, and costs. + +Unified Accounting Management for Multiple Accounts +''''''''''''''''''''''''''''''''''''''''''''''''''' + +Accounting management allows multiple Open Telekom Cloud accounts to be +associated with each other for accounting purposes. You can create a +hierarchical organization and a master account, add member accounts to +this organization and associate them with the master account, and use +the master account to perform accounting management of associated member +accounts. + +1. **Association between master and member accounts** + +A new Open Telekom Cloud account can be directly associated with a master +account or an existing Open Telekom Cloud account can be invited for +association. On Open Telekom Cloud, a master account can create organizations +that match your organizational structure and create new accounts for or +invite existing ones to the organizations. + +2. **Funds management** + +A master account can allocate its balance and cash coupons to its member +accounts for resource provisioning. A member account can use its own +balance for resource provisioning. + +3. **Commercial discount inheritance** + +After member accounts are associated with a master account, they can use +the commercial discounts of the master account in their expenditures. + +4. **Expenditure query** + +A master account and its associated member accounts can log in to Open Telekom +Cloud to view their expenditures. The master account can view the +expenditure data of its member accounts after being approved. + +5. **Invoices** + +A master account and its associated member accounts can separately +request Open Telekom Cloud to issue invoices for their expenditures. A master +account can request invoices for member accounts. + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/ready/cloud-architecture-design.rst b/doc/caf/source/ready/cloud-architecture-design.rst new file mode 100644 index 0000000..cae1395 --- /dev/null +++ b/doc/caf/source/ready/cloud-architecture-design.rst @@ -0,0 +1,26 @@ +Cloud Architecture Design +------------------------- + +The most important objective of architecture design is to ensure the +continuous availability of the system along with the development of +enterprise services. Architecture design mainly includes the design of +the application architecture and the technical architecture. The +application architecture design involves industry-specific features, +technology stacks, and enterprise development phases. Designing the +technical architecture is more general. In the following sections, we +will describe the five aspects of architecture design that affect +service continuity the most: high availability (HA), scalability, +performance, security, and cost. + +.. image:: ../_static/images/image35_1.png + +| + +.. toctree:: + :maxdepth: 1 + + high-availability.rst + scalability.rst + performance.rst + security.rst + cost.rst \ No newline at end of file diff --git a/doc/caf/source/ready/cost.rst b/doc/caf/source/ready/cost.rst new file mode 100644 index 0000000..69ac261 --- /dev/null +++ b/doc/caf/source/ready/cost.rst @@ -0,0 +1,12 @@ +Cost +~~~~ + +On-demand resource usage, usage-based billing, elastic scaling, and +resource utilization determine the costs on cloud. The following figure +shows the principles of cost optimization design. For details about the +design content, see section "Cost Management". + +.. image:: ../_static/images/image41_1.png + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/ready/high-availability.rst b/doc/caf/source/ready/high-availability.rst new file mode 100644 index 0000000..ad4ba7b --- /dev/null +++ b/doc/caf/source/ready/high-availability.rst @@ -0,0 +1,216 @@ +High Availability +================= + +Availability Definition +----------------------- + +Availability refers to the ability of a product or service to perform a +specified function under specified conditions at a specified time or +within a specified period of time. It is a measure of the reliability +and maintainability of the product. + +Service availability is generally +measured by the SLA. Each type of cloud service provides services based +on their SLA commitment. The following table lists the downtimes +acceptable for different SLA commitments: + ++---------+-------------------+--------------------+------------------+ +| SLA | Weekly Downtime | Monthly Downtime | Yearly Downtime | ++=========+===================+====================+==================+ +| 99% | 1.68 hours | 7.2 hours | 3.65 days | ++---------+-------------------+--------------------+------------------+ +| 99.90% | 10.1 minutes | 43.2 minutes | 8.76 hours | ++---------+-------------------+--------------------+------------------+ +| 99.95% | 5 minutes | 21.6 minutes | 4.38 hours | ++---------+-------------------+--------------------+------------------+ +| 99.99% | 1.01 minutes | 4.32 minutes | 52.56 minutes | ++---------+-------------------+--------------------+------------------+ +| 100.00% | 6 seconds | 25.9 seconds | 5.26 minutes | ++---------+-------------------+--------------------+------------------+ + +High Availability Solutions +--------------------------- + +Most Open Telekom Cloud services have HA deployment options available. They +provide you with HA capabilities at multiple layers, including data +centers, hardware, data, and self-service. Open Telekom Cloud data centers are +deployed in Germany, Netherlands and Switzerland to meet resource requirements +for different regions. Each region is divided into multiple availability zones (AZs). +Each AZ has independent cooling, fire extinguishing, moisture-control, +and electrical facilities, and the failure of one AZ does not affect +other AZs. There are four types of HA deployments: + +- **Single-AZ HA**: For services that do not require high availability, + active/standby or cluster deployment models of cloud services can be + used to quickly recover services in the event of a single service + node failure. By using automatic fault detection and switchover of + nodes in a cluster, single points of failure (SPOFs) are eliminated + and service interruptions are prevented. + +- **Dual-AZ (intra-city) HA**: For services that require high + availability, you can deploy services in multiple equipment rooms in + the same city. This way, service continuity is guaranteed even if the + network, physical device, or power supply of one equipment room + fails. Open Telekom Cloud users can deploy services across AZs. AZs are + isolated from each other, so if one AZ fails, you can switch services + to another AZ to quickly recover services. Most cloud service + products have corresponding capabilities. Select your desired + capabilities during purchase and complete deployment. + +- **Two-site three-center HA**: For some ultra-large or commercial + systems that demand extra robust protection, the dual-AZ (intra-city) + HA solution is still insufficient. It cannot guard against regional + disasters, such as an earthquake or a flood. A remote equipment room + is required for that. So in a two-site three-center HA solution, you + add a remote DR equipment room to the intra-city solution. That way, + you are protected even in the event of a regional disaster. + +- **Cross-cloud HA**: To meet the requirements for multi-cloud HA of + some enterprises, Open Telekom Cloud also supports multi-cloud DR + deployment. Enterprises can deploy production services on Open Telekom + Cloud and deploy the DR site on a cloud vendor platform. + +Single-AZ HA Solution Design +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Solution description and key points of the design: + +- **Layered service deployment**: web access layer, service layer, data + layer, and management zone +- **Service HA**: No single-node deployment, HA deployment (in cluster or + active/standby mode) +- **Cloud service HA**: The selected Open Telekom Cloud services are deployed in + the HA mode. + +From the perspective of service continuity and data availability, this +solution achieves cluster and active/standby HA. Instead of using single +nodes, applications are deployed in clusters or active/standby mode. +This may be a more expensive option, but it significantly improves +availability. + +Key points of the HA design: + ++--------+-------------------------------------------------------------+ +| Item | HA Design Focuses | ++========+=============================================================+ +| S | Decoupled deployment: Different components are deployed on | +| ervice | different ECSs. | +| HA | | +| | HA deployment: HA deployment is used for all nodes. If HA | +| | (active/standby or cluster) deployment is not supported, an | +| | emergency solution must be available, for example, cloud | +| | server backup (via CSBS) and an emergency environment. | +| | | +| | Layered deployment: Services are deployed separately on the | +| | web layer, service layer, and database layer. | +| | | +| | Auto scaling: The Auto Scaling service can be leveraged to | +| | adjust compute resources to deal with changes in service | +| | volume. | ++--------+-------------------------------------------------------------+ +| Cloud | - Network access layer: | +| s | | +| ervice | - Private line: Active-active or active/standby HA | +| HA | | +| | - VPN: private line (active) + VPN (standby) or VPN | +| | (active) + VPN (standby) | +| | | +| | - ELB: Multiple ECSs run at the ELB backend to ensure | +| | system availability and scalability. Health check is | +| | enabled. ELB is a potential fault point in the system | +| | and needs to be monitored by CES. | +| | | +| | - NAT gateway: If a large number of ECSs need to access | +| | the Internet, use SNAT to prevent too many ECSs from | +| | being exposed to the Internet. | +| | | +| | - Selection of cloud service types (RDS, DCS, and more): | +| | | +| | Production services must use active/standby or cluster | +| | deployment. For example, RDS DB instances must be deployed | +| | in primary/standby mode, and Redis must be deployed in | +| | clusters. Self-built services on ECSs must also meet this | +| | requirement, for example, deployed using Redis Cluster. | ++--------+-------------------------------------------------------------+ +| Data | - Data backup and restoration solutions are available. For | +| relia | example, VBS and CSBS are used to back up ECS data, and | +| bility | an RDS backup policy is enabled. Critical data is backed | +| | up to other regions or offline IDCs. | +| | | +| | - Data backup and recovery reliability, and emergency | +| | drills are periodically verified. | ++--------+-------------------------------------------------------------+ + +.. image:: ../_static/images/image36.png + +Dual-AZ HA Solution Design +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Solution description and key points of the design: + +- **Service modules**: For services supporting cluster deployment, + resources are deployed in two AZs, with the loads balanced using ELB. + For single-node ECSs, SDRS is used for AZ-level DR. +- Cloud service HA: Primary and standby nodes are deployed in different + AZs. +- **Database synchronization**: RDS is used, and RDS DB instances are + deployed in primary/standby mode across AZs and the data kept + synchronized. +- **DR switchover**: If an AZ fails, RDS database services automatically + switch to the standby databases. Application services can be taken + over by the DR servers automatically or in just a few clicks. +- **DR drills**: Users can perform DR drills with just a few clicks. + +.. image:: ../_static/images/image37.png + +Two-Site Three-Center (Cross-Region) HA Solution Design +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Solution description and key points of the design: + +- The production centers and DR center are deployed in two different + regions of Open Telekom Cloud. +- The production centers are deployed in two AZs, and the DR center is + deployed in one AZ. +- RDS DB instances are deployed in both the production centers and DR + center for 1:1:1 primary/standby replication. +- Configurations, logs, snapshots, and backups generated at the + production and DR centers are replicated across regions using OBS. +- If one production AZ fails, services are switched to the other AZ, + and a database switchover is performed. +- If both production centers fail, a database switchover is performed, + and by modifying DNS configuration, 100% of the user traffic is + directed to the DR center. +- After the production centers recover, a database switchover is + performed, and DNS directs 100% of the user traffic to the production + centers. +- To improve the utilization of the DR center, some read-only or + analysis services can be distributed to the DR center. + +.. image:: ../_static/images/image38.png + +.. note:: + - This solution provides the highest possible service continuity and + data availability. Data and services can be protected even in the + event of regional disasters. + + - The RPO it determined by the database replication interval. Servers + at the DR center are always running, so an RTO of almost zero can be + achieved. The time needed to complete a DR switchover depends on how + long the DNS cache takes to update. It usually takes a few minutes + and can be faster if GSLB is used. + +Establishing HA DR capabilities is a complex project that includes +ingress traffic control, service layer reconstruction, middleware and +database control, and the collaboration of numerous systems. It requires +professional skills to build HA DR systems. If customers lack related +experience and want to quickly build an HA DR system, the Multi-cloud +high Availability Service (MAS) is a good choice. This service is +derived from the multi-cloud application HA solution of Open Telekom Cloud consumer +services. It provides end-to-end service failover and DR drill +capabilities that include everything from the traffic ingress and +application layer, to the data layer. MAS ensures quick service recovery +and improved service continuity. + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/ready/identity-and-permissions-design.rst b/doc/caf/source/ready/identity-and-permissions-design.rst new file mode 100644 index 0000000..6311fba --- /dev/null +++ b/doc/caf/source/ready/identity-and-permissions-design.rst @@ -0,0 +1,80 @@ +Identity and Permissions Design +------------------------------- + +Based on a large number of successfully delivered projects, Open Telekom Cloud +has summarized the following user and permissions management principles: + +- Establish trust between the enterprise's identity management system + (such as AD) and Open Telekom Cloud IAM for federated identity + authentication so that enterprise employees can use single sign-on + (SSO) for the Open Telekom Cloud console. The enterprise's identity + management system can better control the permissions of employees as + they are recruited and can revoke permissions in a timely manner for + employees who have transferred to different departments or have + resigned. + +- Do not use Open Telekom Cloud IAM as the enterprise's user management + system. There is no need to create users or user groups on Open Telekom + Cloud IAM for enterprise employees who do not interact with Open Telekom + Cloud. + +- Do not share passwords with others. Instead, create an independent + user and assign permissions to the user for people who needs to + manage or use Open Telekom Cloud resources. In this way, all operations + performed on Open Telekom Cloud can be tracked and audited. + +- Create user groups based on IT functions and add corresponding + employees to user groups that match their responsibilities. For + example, in terms of resource O&M and management, apply unified + management and O&M principles to improve efficiency. Under the O&M + and monitoring account, create user groups based on O&M + responsibilities. These user groups include a computing management + group, storage management group, network management group, and + database management group. + +- Follow the principle of least privilege (PoLP). Grant only the + minimum levels of access or permissions needed to user groups. If the + responsibilities of a user group change, adjust the permissions of + that user group in a timely manner. To simply operations, grant + permissions to user groups rather than individual users. + +- The IAM account administrator (with the same name as the IAM user) + has high permissions. As such, you should not use this account to + access Open Telekom Cloud directly. Instead, create an IAM user and grant + permissions based on the PoLP principle to perform routine + management, protecting the security of IAM accounts. + +Based on these principles, user groups are planned for accounts in +Landing Zone and corresponding cloud service access permissions are +assigned to these user groups based on the PoLP principle on Open Telekom +Cloud. User groups in the enterprise's identity management system are +mapped to the user groups on Open Telekom Cloud so that they have the +corresponding cloud service access permissions. + +.. image:: ../_static/images/image28.png + +| + +To achieve unified management and control, IT functional accounts of +Landing Zone need to access and manage cloud resources under other +accounts through cross-account delegation. For example, a security +operations account is designed to centrally manage security resources +and services (such as SA and HSS) across accounts through federated +authentication and cross-account delegation. The security administrator +logs in to the console of the security operations account through SSO, +switches the role to a service account, and then accesses and manages +security cloud services under that service account. + +.. image:: ../_static/images/image29.png + +| + +Another similar scenario is that the O&M and monitoring account can +access resources in other accounts through federated authentication and +cross-account delegation to monitor and manage resources across accounts +in an enterprise. + +.. image:: ../_static/images/image30.png + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/ready/index.rst b/doc/caf/source/ready/index.rst new file mode 100644 index 0000000..76920fb --- /dev/null +++ b/doc/caf/source/ready/index.rst @@ -0,0 +1,19 @@ +Phase 2: Ready +============== + +To build a highly reliable and available system on the cloud, a +systematic top-level design framework and unified standards are +required. Unified planning at the macro level is required to remove +obstacles to cloud adoption. Based on the industry standards and +practices, here in Open Telekom Cloud we have developed a landing zone system +architecture. This architecture is intended to make it easier for +enterprises to build reliable and systematic capabilities from the +perspectives of the people, finances, resources, permissions, and +security compliance. In addition, Open Telekom Cloud has developed an HA cloud +architecture to ensure high availability of cloud services. + +.. toctree:: + :maxdepth: 1 + + landing-zone-construction.rst + cloud-architecture-design.rst diff --git a/doc/caf/source/ready/landing-zone-construction.rst b/doc/caf/source/ready/landing-zone-construction.rst new file mode 100644 index 0000000..e5b98ca --- /dev/null +++ b/doc/caf/source/ready/landing-zone-construction.rst @@ -0,0 +1,64 @@ +Landing Zone Construction +------------------------- + +The advantages of the public cloud in terms of security and stability, +service quality, execution efficiency, and cost-effectiveness are +becoming more widely recognized and accepted by enterprises. More and +more enterprises are gradually migrating their application systems to +the cloud and preferentially developing future-oriented, cloud-native +application systems. An era of cloudification is coming. However, in +practice, the following challenges are often encountered: + +1. How to isolate the security and faults of service units (such as BGs, + departments, and project teams) to ensure the isolation of cloud + resources, applications, and data between service units + +2. How to flexibly adjust cloud resources + +3. How to design a network architecture across multiple service units + and establish controlled network connection channels + +4. How to plan the production, development, and test environments + +5. How to share common resources among multiple service units + +6. How to centrally manage and control the budgets and costs of each + service unit and how to optimize cloud costs + +7. How to prevent service units from overusing cloud resources + +8. How to divide user groups and how to set permissions for user groups + +To address these challenges, Open Telekom Cloud has designed a Landing Zone +solution to effectively manage service units, personnel, permissions, +cloud resources, data, applications, costs, and security. A landing zone +is the area where an aircraft, like a helicopter or an airplane, can +land safely. Cloud vendors have borrowed this term to describe a place +where you can smoothly migrate enterprise service systems on the public +cloud. The Open Telekom Cloud Landing Zone solution helps enterprises build a +secure, compliant, and scalable multi-account environment on the cloud +where multiple accounts can share resources and there is unified +management of the people, finances, resources, permissions, and security +compliance. + +- **People**: Unified management of service units, accounts, users, user + groups, and roles for multiple accounts + +- **Finances**: Unified management of funds, budgets, costs, invoices, and + discounts for multiple accounts + +- **Resources**: Unified O&M, monitoring, and management of cloud resources + including computing, storage, network, data, and applications for + multiple accounts + +- **Permissions**: Unified management of permissions for cloud resources of + multiple accounts based on the principle of least privilege (PoLP) + +- **Security compliance**: Unified management of security compliance in + accordance with the security compliance requirements of countries, + industries, and enterprises + +.. toctree:: + :maxdepth: 1 + + landing-zone-reference-architecture.rst diff --git a/doc/caf/source/ready/landing-zone-reference-architecture.rst b/doc/caf/source/ready/landing-zone-reference-architecture.rst new file mode 100644 index 0000000..0044c7a --- /dev/null +++ b/doc/caf/source/ready/landing-zone-reference-architecture.rst @@ -0,0 +1,21 @@ +Reference Architecture +---------------------- + +The people, finances, resources, permissions, and security compliance +requirements are mapped to the account and organizational structures, +financial management, network planning, identity and permissions design, +security protection, and compliance audit. + +.. image:: ../_static/images/image14.png + +The following describes each module in detail. + + +.. toctree:: + :maxdepth: 1 + + account-and-organizational-structure.rst + account-management.rst + network-planning.rst + identity-and-permissions-design.rst + security-and-compliance.rst \ No newline at end of file diff --git a/doc/caf/source/ready/network-planning.rst b/doc/caf/source/ready/network-planning.rst new file mode 100644 index 0000000..47054d9 --- /dev/null +++ b/doc/caf/source/ready/network-planning.rst @@ -0,0 +1,360 @@ +Network Planning +^^^^^^^^^^^^^^^^ + +An enterprise migration to the cloud requires proper planning of the +network architecture. You need to review the current status of the data +center network and operational model and translate them as multi-account and network design plans on +the cloud (including VPCs, subnets, and communication between cloud and on-premises networks). + +On-Premises Network Review +'''''''''''''''''''''''''' + +Review the current status of the on-premises data center network to plan +for the cloud account and networking. + +- Functions and service bearing of each data center. + +- Mapping between on-premises data centers and regions and AZs on the + cloud. + +- Network architectures and partitions within and between data centers. + +- Application system distribution and data flow between services. + +Cloud Network Account Design +'''''''''''''''''''''''''''' + +Plan cloud accounts based on the network plan. There are three scenarios +to consider: + +- **One account and one VPC**: Account management and O&M are easy as there + is a small team but security is low. + +- **One account, multiple VPCs**: Account management is easy and there is + better security. The VPCs are designed for different functions and + have different access control rules. + +- **Multiple accounts or primary account and sub-accounts, multiple VPCs**: + Account management is more challenging as there is a large team with + multiple branches and complex services, and meeting security + requirements can be challenging too. If multiple VPCs are deployed in + the same region, enterprise router should be used to connect them. If + multiple VPCs are deployed in different regions, you can use Cloud + Connect. + + +The first and second scenarios are *simple*. They mainly involve VPC and +subnet planning. For details, see section "Cloud Networking +Planning and Design". **The following section will be focused on the third +scenario**. + +The following figure shows a multiple account, multiple VPC network +architecture. + +| + +.. image:: ../_static/images/image15.png + +| + +At the core of this architecture is the *network operations account*, +which serves as a hub to connect different accounts. Its enterprise +router allows communication between resources of different accounts. The +enterprise router lets you configure routes to determine which VPCs can +communicate with each other. + +A large service system usually has independent sub-accounts. You are +advised to create three VPCs: + +- a production VPC +- a development VPC +- and a test VPC + +These VPCs are kept isolated from each other for the +service system. Each VPC must have *at least two subnets*: an +application subnet and a data subnet. These correspond to, +respectively, the application layer and data layer of the service +system. Network ACLs are then used for access control between +subnets. Resources such as ECSs, DCS instances, and RDS instances can +be added to security groups for instance-level access control based +on security group rules. Application clusters of service systems can +be deployed across AZs to ensure high availability at the application +layer. The active/standby database clusters and DCS clusters across +AZs on Open Telekom Cloud are used to ensure high availability at the data +layer. + +.. image:: ../_static/images/image16.png + +.. note:: + It is a best practice to create an enterprise project for each + small-scale service system, associate the resources of each system in + the production, development, and test VPCs with the same enterprise + project or tag, and perform cost collection and fine-grained + authorization based on the enterprise project or tags. + +Cloud Networking Planning and Design +'''''''''''''''''''''''''''''''''''' + +.. image:: ../_static/images/image10.png + +VPCs and Subnets +**************** + +VPC is a virtual private cloud. It is a logical isolated environment that all the networking components +live, e.g. subnets, security groups, routing tables, gateways etc. + +VPCs +```` + +- Each VPC can have **up to** 5,000 IP addresses. +- VPCs are isolated by default. Enterprise routers can be used to allow communication between VPCs. +- There are O&M VPCs, DMZ VPCs, and service VPCs. Different VPCs are connected through enterprise routers. +- Each service VPC belongs to an independent service system or a department. Service systems or departments that do not need to access each other are deployed in different VPCs. +- VPCs can be deployed across AZs. Services can be deployed in such VPCs for high availability. +- By default, the production VPC is isolated from the development and test VPCs and no VPC peering connection is established. You are advised to use OBS for data transmission. + +Subnets +``````` + +- Enterprise subnets must be planned to prevent them from overlapping + with each other. +- Subnets in the same VPC cannot overlap. Subnets in VPCs that need to + communicate cannot overlap. +- Different service systems should be deployed in different subnets. + You can use network ACLs to control access between subnets. +- Different subnets should be used for application deployment and data + storage. By default, data subnets can only be accessed by application + subnets. Application subnets can be accessed by other subnets or from + the Internet if required. +- O&M subnets allow access from other subnets if needed. + +Internet communication on the cloud +*********************************** + +**Solution 1:** With WAF for web service protection + +.. image:: ../_static/images/image19.png + +#. Enable Anti-DDoS for EIPs for traffic scrubbing. If the traffic is heavy, use Advanced Anti-DDoS. +#. Configure the CNAME in DNS to redirect domain names to WAF. +#. Use WAF to protect the web domain names and then redirect traffic to ELB. +#. Use an ELB whitelist to allow access only from the WAF IP addresses. +#. Use the security group of ECSs in the application subnet to allow access only from specific IP address ranges over specific ports. +#. Use the network ACL of the application subnet to allow communication with only WAF IP addresses and the database subnet. +#. Use the network ACL of the database subnet to allow access only from the application subnet to the database IP address and port. +#. Use the security group of the database to restrict the ports and IP address ranges that it can access. +#. Host security and database security are alternative services. + +**Solution 2:** Without WAF for web service protection + +.. image:: ../_static/images/image20.png + +#. Enable Anti-DDoS for EIPs for traffic scrubbing. If the traffic is heavy, use Advanced Anti-DDoS. +#. Configure the DNS to resolve the domain name to the EIP of the ELB. +#. (Optional) Use an ELB whitelist and blacklist to allow access only from specific IP address ranges. +#. Use the security group of ECSs in the application subnet to allow access only from specific IP address ranges over specific ports. +#. Use the network ACL of the application subnet to allow communication with only the database subnet. +#. Use the network ACL of the database subnet to allow access only from the application subnet to the database IP address and port. +#. Use the security group of the database to restrict the ports and IP address ranges that it can access. +#. Host security and database security are alternative services. + +Cloud O&M network +***************** + +.. image:: ../_static/images/image21.png + +#. Enterprise maintenance personnel can access the enterprise internal + network through the public O&M VPC and then perform O&M on cloud + resources through the CBH. +#. The public O&M VPC communicates with other VPCs through the + enterprise router. Network ACLs are configured to allow on-premises + subnets to access only the subnets that provide services externally + on the cloud. All instances are maintained only through the O&M subnet. +#. The network ACL and security group of the O&M subnet allow access only from specific on-premises subnets. +#. Each VPC uses a network ACL to allow communication with the O&M subnet on specific ports. +#. All maintenance operations must be performed on the CBH. The operation processes are recorded and can be audited. + +Cloud and on-premises network communication +******************************************* + +The following table compares the services that can be used to connect +cloud and on-premises networks. You can select one based on your +requirements. + ++----------+------------+-------------+-------------+------------------+ +| Co | | Direct | VPN | SD-WAN Network | +| mparison | | Connect | | | +| Item | | | | | ++==========+============+=============+=============+==================+ +| Arch | Network | High, | Low, no | Medium | +| itecture | quality | guaranteed | guaranteed | | +| | | SLA | SLA | The Internet is | +| | | | | connected to the | +| | | | | nearest POP | +| | | | | ( | +| | | | | active/standby). | +| | | | | Direct Connect | +| | | | | uses the | +| | | | | backbone network | +| | | | | and ensures the | +| | | | | network SLA on | +| | | | | OTC. | ++----------+------------+-------------+-------------+------------------+ +| | Network | Low, billed | High, | Medium, billed | +| | f | annually, | pay-per-use | annually, | +| | lexibility | bandwidth | billing, | bandwidth can be | +| | | not easy to | can be | flexibly | +| | | adjust | adjusted at | adjusted | +| | | | any time | | ++----------+------------+-------------+-------------+------------------+ +| | Hardware | Universal | Universal | Independently | +| | dependency | router | VPN device | purchased SD-WAN | +| | | | | devices | ++----------+------------+-------------+-------------+------------------+ +| | Extended | None | None | Security and | +| | services | | | acceleration | +| | | | | services | ++----------+------------+-------------+-------------+------------------+ +| Cost | Line | High, 10X | Low, 1X | Medium, 3X | ++----------+------------+-------------+-------------+------------------+ +| | Time | Months | Hours | Days (including | +| | required | | | equipment | +| | | | | delivery) | ++----------+------------+-------------+-------------+------------------+ +| | M | P | P | No professional | +| | aintenance | rofessional | rofessional | maintenance | +| | | maintenance | maintenance | personnel | +| | | personnel, | personnel, | required, easy | +| | | no unified | no unified | on-premises | +| | | management | management | configuration, | +| | | | | unified cloud | +| | | | | management, and | +| | | | | guaranteed | +| | | | | network quality | ++----------+------------+-------------+-------------+------------------+ + +Connect networks of on-premises data centers, campus, branches, and mobile offices to Open Telekom Cloud +******************************************************************************************************** + +.. image:: ../_static/images/image22.png + +On-premise data center/campus +````````````````````````````` + +The on-premises data center or campus +is connected to Open Telekom Cloud securely and reliably, can access the +Internet through the resources on the cloud to reduce costs, and can +directly use public cloud services through the private network. + +**Solution 1:** Use Direct Connect (one or two private lines) to connect +to Open Telekom Cloud public O&M VPC. + +**Solution 2:** Use IPsec VPN to connect to Open Telekom Cloud public O&M VPC. + +Branches +```````` + +An enterprise with multiple branches needs to connect each +of its branch networks to Open Telekom Cloud in a secure and low-latency way. + +**Solution 1:** Use IPsec VPN to connect to Open Telekom Cloud public O&M VPC. + +**Solution 2:** Use the VPN or the Direct Connect connection of the +on-premises data center or campus to connect to Open Telekom Cloud public O&M +VPC. + +Mobile offices or stores +```````````````````````` + +An enterprise may have many offline stores +or branches. Business systems such as mobile POS systems need to +interact with cloud systems (such as ERP) in real time and require +robust encryption and low network latency. The security and stability of +a public network cannot be guaranteed, and Direct Connect is an +expensive solution. + +**Solution 1:** Use an SSL VPN to connect to a Open Telekom Cloud public O&M +VPC. + +**Solution 2:** Use the VPN or Direct Connect of the on-premises data +center or campus to connect to Open Telekom Cloud public O&M VPC. + +Migration of on-premises servers to the cloud without IP address changes +```````````````````````````````````````````````````````````````````````` + +Sometimes, to ensure service continuity, the customer does +not want to change the IP addresses of their systems after the +systems are migrated to the cloud. + +**Solution:** T-Systems provides L2CG to allow cloud and on-premises +networks to communicate at Layer 2. Migration can be performed on an IP +address basis and IP addresses can remain unchanged to ensure that +services are not affected during the migration. + +.. image:: ../_static/images/image23.png + +On-premises data center access through the private network +`````````````````````````````````````````````````````````` + +Instead of using an Internet connection, enterprise users access +cloud services (such as OBS, SWR, and API Gateway) directly over a +private network. + +**Solution:** After an on-premises data center connects to Open Telekom Cloud +through Direct Connect or VPN, the data center can access cloud services +through the private network using VPC endpoint, which is secure, +efficient, and cost-effective. + +.. image:: ../_static/images/image24.png + +On-premises data center accessing the Internet through Open Telekom Cloud +````````````````````````````````````````````````````````````````````````` + +If the public network used by the data center is of poor +quality and expensive, the data center can be connected to Open Telekom +Cloud through Direct Connect. Then it can use the resources on the +cloud to access the Internet, both reducing costs and improving the +quality of their public network access. + +**Solution 1:** With Direct Connect and DNAT, on-premises and cloud +networks share a public network egress. + +**Solution 2:** With Direct Connect and SNAT, campus offices share the +cloud public network egress for Internet access. + +.. image:: ../_static/images/image25.png + +Overlapping IP address ranges +````````````````````````````` + +Suppose the departments of an +enterprise use overlapping subnets. The enterprise has decided to +migrate to the cloud, but they want to keep the subnets unchanged, +and they need them to be able to communicate with each other after +they are migrated. + +**Solution:** Open Telekom Cloud provides private NAT gateways for private IP +address mapping. As shown in the following figure, you can create a +transit VPC and use a private NAT gateway to convert the IP address of a +service department. IP address 192.168.0.3, of the service department, +is mapped to 10.0.0.33, and 192.168.0.3, of the security department, is +mapped to 10.0.0.22. In this way, the two departments can communicate +with each other even though their subnets overlap. + +.. image:: ../_static/images/image26.png + +Cloud and on-premises load balancing +```````````````````````````````````` + +In this scenario, some +services are still in the on-premises equipment room and use ELB on +the cloud to provide services for external systems. Cloud resources +supplement the data center capacity to handle traffic peaks. + +**Solution:** Use ELB to distribute traffic requests for both cloud and +on-premises resources. + +.. image:: ../_static/images/image27.png + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/ready/performance.rst b/doc/caf/source/ready/performance.rst new file mode 100644 index 0000000..d475613 --- /dev/null +++ b/doc/caf/source/ready/performance.rst @@ -0,0 +1,44 @@ +Performance +~~~~~~~~~~~ + +Performance is a key metric for any software system. It is also an +important part of cloud design. In performance design, scalability must +be considered as it is vital to high performance. In addition, solution +selection, performance measurement, performance monitoring, and +performance trade-off must also be considered. + +Factors that affect the performance of cloud applications +********************************************************* + +- **Compute latency**: the wait time between operations and a direct reflection of cloud computing performance +- **Network throughput**: the rate at which data is processed +- **Transmission throughput (bytes/second or bit/second)**: a key measure of performance +- **Storage input/output operations per second (IOPS)**: a measure of data transmission +- **Data concurrency**: the ability to run multiple programs at the same time + +Solution selection +****************** + +- Select and combine the solutions that best suit your needs. +- Upgrade solution selection methods and optimize the selection of resources and configurations through data. + +Performance measurement +*********************** + +- Configure performance measurement and monitoring metrics. +- Enable performance tests to be triggered automatically after the fast-running test is complete. +- Use data visualization to identify performance issues, hot topics, waiting states, or low utilization. + +Performance monitoring +********************** + +- Determine the monitoring scope, metrics, and thresholds. +- Create a full view from multiple dimensions. + +Performance tradeoffs +********************* + +- Strike a balance in the architecture for better performance, for example by using compression or caching techniques. + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/ready/scalability.rst b/doc/caf/source/ready/scalability.rst new file mode 100644 index 0000000..1832639 --- /dev/null +++ b/doc/caf/source/ready/scalability.rst @@ -0,0 +1,100 @@ +Scalability +~~~~~~~~~~~ + +Scalability on the Cloud +^^^^^^^^^^^^^^^^^^^^^^^^ + +Compared with traditional IDCs, the cloud has more abundant resources +and powerful scalability. There are different types of scaling designed +for different service scenarios. + +- **Vertical scaling:** For monolithic applications, independent + applications, and stateful applications, hardware sometimes needs to + be rapidly upgraded to handle changes in demand as services develop. + For example, during promotions, these applications often require many + times more resources than normal. In this case, companies can use a + UI or open APIs to quickly upgrade resources by adding more vCPUs, + memory, bandwidth, and disk space, to handle increasing workloads. + After the activities have ended, they can restore the resources to + the original specifications to keep costs down. + +- **Horizontal scaling:** For distributed applications, stateless + applications, and rapidly changing applications, resources allocated + in fixed ratios struggle to keep up with rapid changes in demand. + With horizontal scaling, applications can take advantage of abundant + resources of the cloud to rapidly scale out or in based on + pre-configured scaling policies to handle traffic fluctuations both + during and after promotional activities. In addition, these + applications can consistently, flexibly get more resources as + workloads keep rising. + +- **Extreme scaling:** To respond to unexpected traffic bursts, such as if + there are major breaking news, a system needs to support extreme + expansion capability. When such events take place, it may need to + rapidly add thousands of compute cores. In this case, scaling on the + cloud is the best choice. + +There are several different ways to configure scaling: + +- **On a schedule:** You can create a scheduled task to scale in or out + resources at a specified time. + +- **Using metrics:** You can create an alarm-triggered task to monitor + resource performance metrics such as CPU usage and average network + traffic. When the monitored metric reaches the specified value, an + alarm is triggered and resource scaling is performed. + +- **Based on a configured range:** You can configure upper and lower + capacity limits. When the number of compute instances is below the + lower limit or above the upper limit, the system automatically adds + or removes instances so that the number of instances stay within a + certain range. + +- **Manually:** You can add, remove, or delete existing resources manually. + +Scalable Solution Design +^^^^^^^^^^^^^^^^^^^^^^^^ + +.. image:: ../_static/images/image39.png + +| + +Scalable capabilities can be designed by layer. The preceding figure +shows the scalability of Open Telekom Cloud services at different levels. The +scalable design of each layer is as follows: + +- **Application layer:** If this layer uses a microservice architecture and + container-based application deployment using Cloud Container Engine + (CCE) on Open Telekom cloud, with the auto scaling capability of CCE, + applications can automatically scale out and in on demand. During the + auto scaling triggered by alarms from AOM, service pods are + automatically added or removed in response to workload fluctuations. + If this layer is deployed using Open Telekom Cloud Elastic Cloud Server + (ECS), applications can automatically scale out or in based on + scaling policies configured on Auto Scaling. + +- **Message middleware layer:** Open Telekom Cloud Distributed Message Service + (DMS) for RabbitMQ Premium instances are deployed in clusters, and + can scale up or down as the message volume and workload changes. + +- **Cache middleware layer:** The master/standby Distributed Cache Service + (DCS) for Redis instances from Open Telekom Cloud can scale up or down as + the hot data volume increases or decreases. + +- **Database middleware layer:** The distributed database middleware uses + Open Telekom Cloud Distributed Database Middleware (DDM), which is deployed + in a cluster. With the increase of database services, the DDM cluster + specifications can be smoothly expanded to cope with more database + processing. +- **Database layer:** Open Telekom Cloud Relational Database Service (RDS) + supports smooth expansion of read-only database instances for + scenarios where much more data is read than written. By working with + DDM, multiple instances can be scaled out. Data in large tables is + horizontally split and evenly distributed to database instances, + improving database capacity and performance. In addition, GaussDB + databases use an architecture with decoupled storage and compute to + support minute-level horizontal expansion and reduce service + interruptions. + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/ready/security-and-compliance.rst b/doc/caf/source/ready/security-and-compliance.rst new file mode 100644 index 0000000..15000fb --- /dev/null +++ b/doc/caf/source/ready/security-and-compliance.rst @@ -0,0 +1,228 @@ +Security & Compliance +===================== + +Security +-------- + +The security of landing zone includes: + +- Virtual network security +- Host security +- Application security +- Data security +- Security management + +Virtual network security +************************ + +Virtual Private Cloud (VPC) allows you to build isolated, configurable, +and manageable virtual networks for Elastic Cloud Servers (ECSs), +improving the security of cloud resources and simplifying the network +deployment of service systems. + +.. image:: ../_static/images/image31.png + +| + +Related VPC functions: + +- **Subnet**: A subnet is a range of IP addresses in your VPC and provides + IP address management and DNS resolution functions for ECSs in it. By + default, ECSs in all subnets of the same VPC can communicate with one + another, but ECSs in different VPCs cannot. +- **Network ACL**: A network ACL allows you to create rules to control + traffic in and out of one or more subnets. +- **Security group**: A security group is a collection of access control + rules for ECSs that have the same security protection requirements + and that are mutually trusted within a VPC. You can define access + rules for ECSs in a security group and between security groups. +- **VPN**: A VPN is used to establish a secure and encrypted communication + channel between remote users and a VPC, allowing remote users to use + resources in the VPC. By default, ECSs in a VPC cannot communicate + with user data centers or private networks. To enable the + communication, you can use VPN. +- **Direct Connect**: Direct Connect is a private line that connects + on-premises data center to the cloud. Users can use Direct Connect to + connect its on-premises data centers, offices, or hosting areas to + the cloud, reducing network latency and obtaining a faster and more + secure network experience. + +Host security +************* + +.. warning:: + this doesnt exist in OTC + +Host Security Service +````````````````````` + +Host Security Service (HSS) helps you identify and manage the assets +on your servers; manage programs, file integrity, security +operations, and vulnerabilities; check for unsafe settings; and +defend against risks, intrusions, and web page tampering in real +time. There are also advanced protection and security operations +functions available to help you easily detect and handle threats. HSS +functions include intrusion detection, file integrity management, +remote login monitoring, ransomware protection, unified asset +management, vulnerability management, baseline checks, web page +tamper-proofing, and user-defined policies. + +.. image:: ../_static/images/image32.png + +Container Guard Service +``````````````````````` + +Container Guard Service (CGS) scans for vulnerabilities in container +images, manages container security policies, and prevents container +escapes. CGS can help you with vulnerability management, process +whitelisting, file protection, and runtime monitoring. + +Application security +******************** + +Web Application Firewall +```````````````````````` + +Web Application Firewall (WAF) helps you inspect and protect website +service traffic. WAF uses deep machine learning to identify malicious +requests and defend against unknown threats, blocking common attacks +such as SQL injections and cross-site scripting (XSS). WAF prevents +intrusions and attacks from affecting the availability and security +of web applications or consuming excessive resources, reducing the +risk of data tampering and theft. WAF enables HTTPS protection, IP +blacklist and whitelist, traffic blocking based on location +information, common web attack blocking, attack penalty, user-defined +access control, CC attack mitigation, zero-day vulnerability virtual +patching, dynamic anti-crawler, and alarm notification. + +.. image:: ../_static/images/image33.png + +Advanced Anti-DDoS +`````````````````` + +Advanced Anti-DDoS (AAD) is deployed at the network border of cloud +services. AAD can protect Internet servers (including off-cloud +servers), so that services will not be interrupted by heavy-traffic +DDoS attacks. You can divert attack traffic to a high-defense IP +address for cleaning, so your source servers will be stable and +reliable. AAD features include network defense, web application +defense, geographical location filtering, and traffic forwarding & +load balancing. + +Data security +************* + +Data Encryption Workshop +```````````````````````` + +.. warning:: + I see DEW everywhere in the documentation but I cannot find anything relevant in Console + +On the public cloud, Data Encryption Workshop (DEW) has been integrated +into multiple cloud services, such as EVS, OBS, and SFS. You can take +advantage of DEW APIs to develop your own encryption applications. + +DEW encryption keys include data encryption keys (DEKs), customer +master keys (CMKs), and root keys. The following figure shows their +dependencies. + +.. image:: ../_static/images/image34.png + +DEKs are encrypted using CMKs, and CMKs are protected by root keys. A +root key is generated using a UKey when third-party hardware is +initialized and is available to users and cloud service providers. + +Database Security Service +````````````````````````` +DBSS can detect SQL injection attacks, manage high-risk operations, and +audit databases. + +Security management +******************* + +Identity and Access Management +`````````````````````````````` + +Identity and Access Management (IAM) enables fine-grained +hierarchical authorization to control tenant operations and resource +usage under an enterprise account. You can configure password +policies, login policies, ACLs, multi-factor authentication (MFA), +and manage permissions to prevent destructive operations from being +performed by individual users. + +Cloud Bastion Host +`````````````````` + +.. warning:: + CBH doesnt exists in OTC services, although there is a reference once + in the documentation. Should we provide a solution in Terraform instead ? + +Cloud Bastion Host (CBH) has various functional modules, such as +department, user, resource, policy, operation, and audit modules. It +integrates features such as single sign-on (SSO), unified asset +management, multi-terminal access protocols, file transfer, and +session collaboration. With the unified O&M login portal, +protocol-based forward proxy, and remote access isolation +technologies, CBH enables centralized, simplified, secure auditing +for cloud resources, such as servers, cloud hosts, databases, and +application systems. + +Compliance Audit +---------------- + +To ensure that the runtime environment of an enterprise meets the +security compliance requirements of countries, industries, and +enterprises after cloud migration, Landing Zone provides the following +security compliance measures: + +- **Separation of duty (SoD)**: A multi-account architecture is used for + SOD. Each account is an SOD unit. An enterprise can group accounts + based on service units, geographic units, and functional units. The + loss of any account does not affect the system as a whole, limiting + the impact radius. + +- **Operation audits**: Operation audits are enabled for each account so + anytime a resource is accessed by any entity, that activity is + logged. All operations can be tracked. Audit logs of all accounts are + centrally stored and analyzed. + +- **Configuration change tracking**: Resource configuration recording is + enabled for each account to log any resource configuration changes. + All resource changes can be tracked. Change logs are centrally stored + and analyzed. + +- **Security guardrails**: There are two types of security guardrails. They + are security redlines and security baselines. Security redlines, also + called preventive security guardrails, forcibly restrict the + permissions of member accounts to avoid security risks caused by + excessive permissions. The fine-grained authorization mentioned + earlier can be used to set security redlines. Security baselines, + also called detection security guardrails, require that member + accounts meet basic security compliance requirements. For example, + MFA must be enabled for the root user and EVS disks must be + encrypted. + +.. note:: + + For details about the **complete security baseline check + items**, see the Open Telekom Cloud official website: https://support.huaweicloud.com/intl/en-us/usermanual-sa/sa_01_0021.html + +- **Unified identity and permissions management**: Users, user groups, and + permissions in a multi-account environment are centrally managed to + enable a user to access resources under multiple accounts. Unified + identity and permissions management reduces the workload of + permissions management and facilitates the development and + implementation of unified permissions standards within an enterprise, + reducing security risks caused by improper permissions assignment. + +- **Unified security control**: Security events and risks in a + multi-account environment are identified, processed, and analyzed, + and events are responded to and handled centrally. Unified security + control reduces the workload involved in security control and + facilitates the development and implementation of unified security + regulations within an enterprise. The implementation requires cloud + security services to support unified security management and control + of multiple accounts. + +.. toctree:: + :maxdepth: 1 \ No newline at end of file diff --git a/doc/caf/source/ready/security.rst b/doc/caf/source/ready/security.rst new file mode 100644 index 0000000..4b59afa --- /dev/null +++ b/doc/caf/source/ready/security.rst @@ -0,0 +1,126 @@ +Security +~~~~~~~~ + +Cloud customers need the following security capabilities: + +- Service continuity, such as network attack blocking, intrusion + prevention, and legal compliance. +- Data confidentiality, such as defense against unauthorized access + from external parties, insiders, and cloud service providers. +- Manageable O&M, such as security policies, risk identification and + handling, and operation audit and tracing. + +The architecture is designed to enhance the following aspects of network +security: + +- Region boundaries + +- Boundary protection: controlled connections, illegal private + connection prevention, illegal external connection prevention, and + wireless access restriction + +- Access control policies + +- Intrusion prevention: known threat prevention, unknown threat + prevention, and intrusion audits + +- Malicious code prevention: malicious code detection and spam + filtering + +- System audits: user behavior audit, security event audit and analysis + +- Network communications + +- Network architecture: performance redundancy, link redundancy, device + redundancy, and partition isolation + +- Communication and transmission: The encryption technology is used to + ensure data confidentiality and integrity during transmission. + +- Computing environment + +- Identity identification: identity uniqueness and credential + complexity + +- Access control: user permissions management and redundant account + clearance + +- Security audit: user behavior audits and audit process protection + +- Intrusion prevention: intrusion detection, closing unused ports, and + vulnerability scans + +- Malicious code detection and blocking + +- Image integrity check and snapshot protection + +- Data integrity and confidentiality during transmission and storage. + +- Secure data destruction: When service application data is deleted, + all copies in the cloud storage need to be deleted too. + +- Management center + +- System management: identity authentication and system configuration + for system administrators + +- Audit management: permissions management and operation audits + +- Security management: permissions management and operation audits + +- Centralized management and control: independent secure partitions, + network monitoring, centralized log audit, and security event + awareness + +The following figure shows the security architecture on the cloud: + +.. image:: ../_static/images/image40.png + +| + +Abbreviations: + +- AAD: Advanced Anti-DDoS + +- Anti-DDoS: traffic cleaning service + +- WAF: Web Application Firewall + +- ELB: Elastic Load Balance + +- OBS: Object Storage Service + +- EVS: Elastic Volume Service + +- SFS: Scalable File Service + +- SG: security group + +- NACL: network access control list + +- HSS: Host Security Service + +- CGS: Container Guard Service + +- DBSS: Database Security Service + +- DEW: Data Encryption workshop + +- RDS: Relational Database Service + +- DCS: Distributed Cache Service + +- CBH: Cloud Bastion Host + +- CTS: Cloud Trace Service (used for auditing) + +- CES (used for monitoring) + +- IAM: Identity and Access Management (used for unified authentication) + +- SA: Situation Awareness + +- SCM: SSL Certificate Manager + +.. toctree:: + :maxdepth: 1 \ No newline at end of file