Application System Design Architecture Outline

Mindwatering Incorporated

Author: Tripp W Black

Created: 03/21 at 06:24 PM

 

Category:
General Web Tips
Other

Task:
Assemble a working outline or template from which design decisions can be made.
  • Core Summary (5 mins - hrs.)
    • High-level
    • What is it for? What need does it provide?
    • Where is it needed?
    • Why is it needed?
  • Feature Expectations: (5 mins - 1 hr)
    • Use cases
    • Use cases/scenarios not covered by this app
    • Who will use?
    • How many will use?
    • Usage Patterns (times, and methods)
    • Roles of types of users (readers/consumers, content creators, approvers, etc.)
  • Estimations (15 mins - hrs)
    • Throughput of NICs Receive/Send Ratio
      • Throughput by Queries/second (QPS)
      • Throughput by Size, Upload and Download, of those queries/second (MBs/sec or GBs/sec)
    • Throughput Read/Write Ratio
      • Write (QPS, volume of data)
      • Read (QPS, volume of data)
    • Latency
      • Expected latency for read/write queries
    • Storage Estimates
    • Memory Estimates
      • If a cache, how much cached in memory?
      • Horizontal scaling loading an option?
      • If a disk cache, why and how much to store?
      • Types of disks/SSDs needed based on answers above
    • CPU Estimates
      • Number of CPUs/node/VM
      • Number of CPUs total across nodes/VMs (horizontal scaling)
  • Design Performance Goals (15 mins)
    • Latency and Throughput minimum/maximum requirements for scaling
    • Consistency vs. Availability
      • Weak/strong --> eventual consistency
      • Failover/replication --> availability
  • High-Level Design (1 hr - hrs)
    • APIs for CRUD scenarios (read, create/write, update, delete) for main design elements (e.g. doc forms, views, reports, metrics, static content vs dynamic, etc.)
    • Containerization Platform Considerations
    • Database Schema
    • If performing Data Normalization/Machine Learning, design algorithms
      • Divide and Conquer (smaller subproblems) or Dynamic Programming (simpler/smaller subproblems)
      • Streaming Algorithms when data too big to fit all in memory or data is a continuous feed
      • Hashing and Indexing Algorithms for large data lookups and insertions
    • High-level design (workflow process) for primary read scenario
    • High-level design (workflow process) for primary read/write scenario
    • High-level design (workflow process) for data management roles (e.g. approvers, data librarians, analysts performing searches for discoveries, etc.)
    • Data archiving policies and where
  • Deeper-Dive Design (1 hr - hrs)
    • Scaling Code
      • Algorithms
      • App code base(s) ensuring they can scale horizontally/asymmetrically and vertically
    • Scaling App Components for Code
      • Availability, Consistency, for each App Component
        • Retries, Observability, and Reliability
      • Patterns of above, across all or sections of the App Components
    • App Components to Cover
      • DNS (internal and external)
      • CDN (Push vs. Pull)
      • Load Balancers (Active-Passive, Active-Active, Layer 4, Layer 7)
      • Reverse Proxy
      • Application Layer Scaling (Microservices, Service Discovery)
      • Database options:
        • RDBMS: ACID Properties, Primary-Secondary, Primary-Primary, Federation, Sharding, Denormalization, SQL Tuning - Postgres
          • Use-cases: Structured data with relationships
          • Index Scaling
        • NoSQL: Key-Value, Wide-Column, Document - Domino NSF, MongoDB, DynamoDB
          • Use-cases: Unstructured or semi-structured data
        • Graph: Neo4j, Amazon Neptune
          • Use-cases: Social networks, knowledge graphs, recommendation systems, and bioinformatics
        • NewSQL: Key-Value with ACID Properties - CockroachDB, Google Spanner, VoltDB
          • Use-cases: Transaction processing, real-time analytics and IoT device data
        • Time Series: Time-stamped data points - InfluxDB, TimescaleDB, Prometheus
          • Use-cases: IoT sensor data, financial market data, system metrics, and logs
        • High-dimensional Vector Data - Pinecone, Weaviate, KDB.AI
          • Use-cases: Machine learning, similarity search, and recommendation systems
        • Fast lookups:
          • RAM (Bounded size) => Redis, Memcached.
          • AP (Unbounded size) => Cassandra, RIAK, Voldemort, DynamoDB (default mode)
          • CP (Unbounded size) => HBase, MongoDB, Couchbase, DynamoDB (consistent read setting)
      • Caches:
        • Cache Types:
          • Client caching
          • CDN caching
          • Web server caching
          • Database caching
          • Application caching
          • Query level caching
          • Object level caching
        • Cache Eviction Policies:
          • Cache aside
          • Write through
          • Write behind
          • Refresh ahead
      • Asynchronism:
        • Message queues
        • Task queues
        • Back pressure
      • Network Communication:
        • TCP
        • UDP
      • Client to Server, Server to Server Communication Protocols:
        • TCP - REST/API
        • TCP - RPC
        • TCP - WebSockets
  • Justify (15 mins)
    • Throughput of Each Layer
    • Latency Caused Between Each Layer
    • Overall Latency Justification
  • Key Metrics to Measure (15 mins - 1 hr)
    • Identify Key Metrics relevant to your system's design:
      • Availability
      • Latency
      • Throttling
      • Request Patterns/Volume
      • Measure Customer Experience
      • App Component/Feature Specific Metrics
        • Search - What keyword searches = empty (failure from the user/customer perspective)
    • Define metrics for infrastructure and tools/resources:
      • Grafana with Prometheus
      • AppDynamics
  • System Health Monitoring (15 mins - 1 hr)
    • Measure app index and latency of microservices:
      • New Relic
      • AppDynamics
    • Monitoring health and performance:
      • Grafana with Prometheus
      • AppDynamics
    • Simulate Customer Experience:
      • Canaries - Pro-active detection of service degradation
  • Log Systems (15 mins - 1 hr)
    • Implement metrics gathering and visualization dashboards
    • Implement Log Collection and Analyzation:
      • Elastic, Logstash, Kibana (ELK)
      • Splunk
      • Logtail
  • Security (15 mins - 1 hr)
    • Firewall
    • TLS transmissions encryption
    • Data encryption at rest
    • Authentication / Authorization
    • Limited Egress/Ingress Rules
    • Implementation of Least Privilege for Roles







previous page

×