Well-Architected

Amazon S3 Files for Stateful Containers

Hasitha Wickramasinghe — Sun, 19 Apr 2026 10:55:13 GMT

Deploying stateless containers on AWS is a straightforward process, but deploying stateful applications on serverless containers has always been a bit of a "choose your own adventure" when it comes to storage. That's especially true for CMS applications like WordPress and Strapi, or any application that expects a writable shared filesystem. The database is only part of the story; user uploads, plugins, themes, and caches still need somewhere persistent to live.

Historically, that usually pushed you toward Amazon EFS, or toward application-specific workarounds like uploading media directly to S3 through a plugin. Both approaches can work, but they each come with tradeoffs.

Recently, AWS introduced Amazon S3 Files, a feature that allows you to mount S3 buckets as local file systems. In this post, we'll dive into a demo project that wires this up using the AWS CDK, using WordPress as an example workload.

Tip: While this demo uses WordPress, the strategy applies to any application requiring persistent, shared storage, from content management systems to data processing pipelines.

Why this Matters

Containers on Fargate are ephemeral by design. If a task is replaced, anything written to the container's local filesystem disappears. That's fine for APIs and workers that keep all state in external services. It's not fine for platforms that write important data to disk.

WordPress is notoriously stateful. While the database handles your posts and metadata, the filesystem handles:

Media Uploads: Images and videos stored in wp-content/uploads.
Customizations: Plugins and themes installed via the admin dashboard.

In a traditional Fargate deployment, these files are ephemeral. If your container restarts or scales out, your data disappears.

The exact same problem shows up in self-hosted headless CMS platforms like Strapi, internal admin tools, ML workflows that generate artifacts, and legacy applications that assume a shared writable directory exists.

What Amazon S3 Files Changes

Amazon S3 Files provides a middle ground between the massive scale of S3 and the POSIX-compliant interface required by apps like WordPress.

The "secret sauce" here is that S3 Files is built using Amazon EFS. It acts as an intelligent cache layer that loads your active working set onto high-performance EFS storage for low latency, while the authoritative source of truth remains your S3 bucket.

When you read a file, it's lazily loaded from S3 into this cache. When you write, the data is saved to the high-performance layer and then synchronized back to S3. This means you get the performance of a real filesystem with the durability and ecosystem of S3.

The Architecture in this Demo

Instead of redesigning the application to speak directly to S3, we mount storage into the container and keep the application model unchanged. The Fargate tasks mount S3 Files at /bitnami/wordpress, which is where the Bitnami WordPress image expects its data.

The demo project provisions a robust infrastructure through five CDK stacks:

VPC: A VPC with public, private, and isolated subnets. It uses VPC Endpoints (S3, ECR, Secrets Manager) instead of NAT Gateways to keep service-to-service traffic entirely within the AWS network.
Database: A MariaDB instance managed by RDS, tucked away in isolated subnets with credentials in Secrets Manager.
S3Files: An S3 bucket backed by an S3 Files file system, access point, and mount targets.
ECR: A private ECR repository seeded with the Bitnami WordPress Image.
App: An ApplicationLoadBalancedFargateService running two WordPress tasks mounted to S3Files.

Here is how the stacks and resources connect to each other:

At the stack dependency level, App depends on VPC, Database, S3Files, and ECR. At runtime, the WordPress tasks pull the container image from ECR, read credentials from Secrets Manager, connect to MariaDB, and mount the S3 Files volume through the access point and mount targets.

The important part is the storage path:

WordPress runs on ECS Fargate.
The task mounts an S3 Files volume.
That volume maps to an S3-backed file system.
The application writes to /bitnami/wordpress as if it were local persistent storage.

This means uploads, plugins, themes, and similar filesystem state survive task replacement and can be shared across tasks.

Complete Source Code: You can find the full implementation, including all the CDK stacks, in the GitHub Repository.

Implementing with CDK

Since S3 Files is a relatively new feature, the AWS CDK currently supports it primarily through L1 constructs (the low-level Cfn resources auto-generated from the CloudFormation schemas).

In our S3FilesStack, we define the CfnFileSystem, CfnAccessPoint, and CfnMountTarget. These are then wired into the ECS Task Definition in the AppStack using a bit of "escape hatch" patching to configure the s3FilesVolumeConfiguration.

The implementation breaks down into four practical steps.

1. Create the Backing Bucket and S3 Files Resources

In lib/s3-files-stack.ts, the stack creates a versioned S3 bucket and then wires that bucket into aws-s3files resources. We define the File System, the Access Point (defining how users/groups interact with the files):

// From lib/s3-files-stack.ts
this.s3FilesBucket = new Bucket(this, 'AppFilesBucket', {
  versioned: true,
  blockPublicAccess: BlockPublicAccess.BLOCK_ALL,
  removalPolicy: RemovalPolicy.DESTROY,
  autoDeleteObjects: true,
});

this.s3FilesFileSystem = new CfnS3FileSystem(this, 'S3FileSystem', {
  bucket: this.s3FilesBucket.bucketArn,
  roleArn: s3FilesBucketRole.roleArn,
  acceptBucketWarning: true,
});

this.s3FilesAccessPoint = new CfnS3AccessPoint(this, 'S3FilesAccessPoint', {
  fileSystemId: this.s3FilesFileSystem.attrFileSystemId,
  posixUser: { gid: '1001', uid: '1001' },
  rootDirectory: {
    path: '/wordpress',
    creationPermissions: {
      ownerGid: '1001',
      ownerUid: '1001',
      permissions: '755',
    },
  },
});

This is the heart of the pattern:

The bucket is the durable storage layer.
CfnS3FileSystem exposes that storage as an S3 Files file system.
CfnS3AccessPoint defines how the application will enter that file tree.

The access point is configured with UID/GID 1001 and a root directory of /wordpress, which lines up with the filesystem expectations of the Bitnami WordPress container.

2. Create Mount Targets Inside the VPC

The file system still needs network reachability from the ECS tasks. This creates mount targets in the PRIVATE_WITH_EGRESS subnets and opens NFS traffic on port 2049 from within the VPC:

// From lib/s3-files-stack.ts
const mountTargetSecurityGroup = new SecurityGroup(this, 'S3FilesMountTargetSg', {
  description: 'S3 Files mount target SG',
  vpc,
});

mountTargetSecurityGroup.addIngressRule(
  Peer.ipv4(vpc.vpcCidrBlock),
  Port.tcp(2049),
  'Allow NFS from VPC',
);

const mountTargetSubnets = vpc
  .selectSubnets({ subnetType: SubnetType.PRIVATE_WITH_EGRESS, onePerAz: true })
  .subnets;

this.s3FilesMountTargets = mountTargetSubnets.map((subnet, index) => {
  return new CfnS3MountTarget(this, `S3FilesMountTarget${index + 1}`, {
    fileSystemId: this.s3FilesFileSystem.attrFileSystemId,
    subnetId: subnet.subnetId,
    securityGroups: [mountTargetSecurityGroup.securityGroupId],
  });
});

That is an important detail because this is not just an S3 bucket reference in ECS. The workload needs the network path to the mounted file system, so the mount target configuration matters.

3. Give the Service and Tasks the Right Permissions (IAM)

There are two sides to the permission model here:

First, S3 Files itself needs a specific service-linked role that lets the service synchronize data between the file system and the S3 bucket. In lib/s3-files-stack.ts, the role is assumed by elasticfilesystem.amazonaws.com and gets bucket, object, and EventBridge permissions:

// From lib/s3-files-stack.ts
const s3FilesBucketRole = new Role(this, 'S3FilesBucketRole', {
  assumedBy: new ServicePrincipal('elasticfilesystem.amazonaws.com', {
    conditions: {
      StringEquals: { 'aws:SourceAccount': this.account },
      ArnLike: { 'aws:SourceArn': `arn:aws:s3files:\({this.region}:\){this.account}:file-system/*` },
    },
  }),
});

s3FilesBucketRole.addToPolicy(new PolicyStatement({
  actions: ['s3:ListBucket', 's3:GetObject*', 's3:PutObject*', 's3:DeleteObject*'],
  resources: [bucket.bucketArn, `${bucket.bucketArn}/*`],
}));

Second, the ECS task role needs permission to actually use the mounted file system. In lib/app-stack.ts, the task gets the AWS managed policy AmazonS3FilesClientReadWriteAccess plus explicit S3 read/list access to the bucket:

// From lib/app-stack.ts
fargateService.taskDefinition.taskRole.addManagedPolicy(
  ManagedPolicy.fromAwsManagedPolicyName('AmazonS3FilesClientReadWriteAccess'),
);

fargateService.taskDefinition.taskRole.addToPrincipalPolicy(new PolicyStatement({
  sid: 'S3ObjectReadAccess',
  actions: ['s3:GetObject', 's3:GetObjectVersion'],
  resources: [`${bucketArn}/*`],
}));

fargateService.taskDefinition.taskRole.addToPrincipalPolicy(new PolicyStatement({
  sid: 'S3BucketListAccess',
  actions: ['s3:ListBucket'],
  resources: [bucketArn],
}));

This split is easy to miss when you first look at the feature. You need permissions for the service-side bucket integration and for the task-side runtime access.

4. Patch the ECS Task Definition with `s3FilesVolumeConfiguration`

The ECS service itself is created with the familiar ApplicationLoadBalancedFargateService construct:

// From lib/app-stack.ts
const fargateService = new ApplicationLoadBalancedFargateService(this, 'AppService', {
  serviceName: 'WordpressDemoService',
  cluster,
  cpu: 256,
  memoryLimitMiB: 512,
  desiredCount: 2,
  publicLoadBalancer: true,
  taskImageOptions: {
    containerName: 'WordpressDemoContainer',
    family: 'WordpressDemoTask',
    image: ContainerImage.fromEcrRepository(ecrRepo, 'latest'),
    containerPort: 8080,
  },
  minHealthyPercent: 100,
});

Then the default container gets a mount point:

// From lib/app-stack.ts
const volumeName = 's3files';
const mountPath = '/bitnami/wordpress';  // App's data path

fargateService.taskDefinition.defaultContainer?.addMountPoints({
  containerPath: mountPath,
  sourceVolume: volumeName,
  readOnly: false,
});

And finally, since high-level L2 construct support is still evolving for this feature, we use a CDK Escape Hatch. This allows us to "drop down" to the underlying CloudFormation resource (CfnTaskDefinition) and manually configure the s3FilesVolumeConfiguration property:

// From lib/app-stack.ts
const cfnTaskDefinition = fargateService.taskDefinition.node.defaultChild as CfnTaskDefinition;
const existingVolumes = Array.isArray(cfnTaskDefinition.volumes)
  ? cfnTaskDefinition.volumes
  : [];

cfnTaskDefinition.volumes = [
  ...existingVolumes,
  {
    name: volumeName,
    s3FilesVolumeConfiguration: {
      accessPointArn: s3FilesAccessPoint.attrAccessPointArn,
      fileSystemArn: s3FilesFileSystem.attrFileSystemArn,
      rootDirectory: '/',
    },
  },
];

Note: Using Escape Hatches is a standard practice in CDK when you need to use a new AWS feature before the high-level constructs have been updated to support it.

End-to-End Request Flow

Once deployed, the runtime model is straightforward:

A request hits the Application Load Balancer.
One of the Fargate WordPress tasks handles it.
WordPress reads or writes persistent content under /bitnami/wordpress.
That path is backed by the S3 Files volume.
The durable backing store for that content is the S3 bucket from the S3Files stack.

That means if a task is replaced or if the service scales horizontally, the content directory is still shared and persistent.

Why the Mount Path Matters

One subtle but important detail in this demo is that the S3 Files volume is mounted at /bitnami/wordpress, not at some arbitrary side directory.

That is what keeps the application model simple. The container image already expects its writable application data there, so the infrastructure is adapting to the app rather than forcing the app to adapt to the infrastructure.

That is a big part of why this same pattern can apply to other stateful applications. For example, in the case of Strapi, the mount path would be /public/uploads. If you can identify the directory that the application treats as its durable shared state, you can often mount S3 Files there and avoid more invasive application changes.

Why this is Attractive for Stateful Applications

What I like about this approach is that it keeps the application architecture simple.

You do not need to teach WordPress how to store media in S3. You do not need to redesign the app around object storage APIs. And you do not need to split "database state" from "filesystem state" in an application-specific way just to make containers viable.

Instead, you can preserve the existing runtime assumptions:

The app writes files.
Multiple tasks can share those files.
The data survives task churn.
The durable backing store is S3.

That last point is especially appealing because S3 has operational advantages people already know how to use: lifecycle policies, replication strategies, inventory, access controls, and long-term storage economics.

Practical Note: Because S3 Files uses an EFS-backed cache, writes are available to other tasks almost instantly. However, the background synchronization to the S3 bucket itself can take 30 to 60 seconds. If you're checking the S3 console for your files immediately after an upload, don't panic if they don't show up right away!

How Does It Compare?

When deciding how to handle WordPress storage, you generally have three paths:

1. Amazon S3 Files (The New Way)

Pros: Near-infinite storage of S3, lower cost than high-performance EFS tiers, and easier data management (it's just a bucket!).
Cons: Currently requires lower-level configuration (L1 constructs) in CDK.

2. Amazon EFS (The Traditional Way)

Pros: Mature, fully POSIX-compliant, and has excellent L2 construct support in CDK.
Cons: More expensive than S3 for large amounts of "cold" media files, and managing EFS lifecycle policies can be more complex than S3.

3. "Offload Media" Plugins

Pros: No complex infrastructure needed; plugins like WP Offload Media sync your uploads folder directly to S3 and rewrite URLs.
Cons: Usually only handles media. Plugins and themes still need a persistent home, and these plugins often have a premium cost or require extra configuration within WordPress itself.

The "S3 Files" Advantage

S3 Files is particularly powerful because it treats your bucket as the source of truth. You can use standard S3 features like Lifecycle Policies, Replication, and Inventory while your application thinks it's just writing to a local disk.

This makes it an ideal fit for:

Legacy Migrations: Apps that expect a filesystem, but you want to store data in S3.
Shared Assets: Multiple containers needing access to a common set of images, logs, or configurations.
Cost Management: Leveraging S3's low-cost storage classes for large datasets.

The Future

Using L1 constructs today feels a bit like "coding close to the metal," but it gives us access to this powerful feature right now. As the feature matures, we can expect the AWS CDK team to release L2 constructs that will make this integration much simpler.

Taking it to Production

While this demo gives you a solid foundation, there are a few "Day 2" enhancements you'll want to consider before going live:

CloudFront & WAF: Place a CloudFront distribution in front of your ALB to cache static assets (like images from S3) at edge locations. This reduces the load on your Fargate tasks and saves you money on data transfer. Don't forget to attach AWS WAF to block SQL injection and cross-site scripting (XSS) attacks.
Database High Availability: In this demo, we use a single MariaDB instance. For production, you should enable Multi-AZ for RDS to ensure your database can failover automatically if an Availability Zone goes down.
Backup & Recovery: While S3 is highly durable, you should still use AWS Backup or S3 Versioning to protect against accidental deletions or application-level corruption.
Auto-scaling: Configure your Fargate service to scale the number of tasks automatically based on CPU or memory usage. This ensures your site stays responsive during traffic spikes without over-provisioning.
Monitoring: Set up CloudWatch Alarms for your ALB's 5XX errors and RDS CPU utilization, so you're the first to know if something goes wrong.

Conclusion

Amazon S3 Files represents a significant step forward in simplifying stateful container architecture. By combining the infinite scale of S3 with the accessibility of a file system, it provides a flexible, cost-effective solution for any stateful workload on AWS, including Lambda, EC2, ECS, EKS, Fargate, and Batch.

If you are working with a CMS, an internal platform, or any application that expects shared writable files, this is one of the most promising new AWS features to experiment with.

Check out the full source code for this demo project here and start modernizing your stateful apps today!

Deep Dive: Building a Secure, Event-Driven File Processing Pipeline with AWS CDK

Hasitha Wickramasinghe — Fri, 03 Apr 2026 14:10:59 GMT

When people say "file upload," they often mean a simple PUT to S3 and a database row. In a hobby project, that's fine. In production, it’s a liability.

Production-grade file processing requires answering a much harder set of questions:

How do I isolate untrusted files from my clean assets?
Where does malware scanning fit without blocking the user?
How do I validate file types beyond the easily spoofed browser MIME type?
How do I fan out status updates to users and other services?
How do I replace old files safely without leaking orphaned objects?
How do I keep the whole system event-driven instead of building a tightly coupled "upload monolith"?

In this article, we’ll walk through a self-contained file-processing microservice built with AWS CDK and TypeScript. We leverage S3, EventBridge, GuardDuty Malware Protection, Step Functions, and API Gateway WebSockets to build a pipeline that handles everything from ingestion to real-time status delivery.

Complete Source Code: You can find the full implementation, including all Lambda handlers and CDK stacks, in the GitHub Repository.

What We Are Building

At a high level, the service follows a strict security-first workflow:

Ingestion: A client requests a presigned POST and uploads a file to a Staging Bucket.
Registration: An S3 event triggers a Lambda to record the upload as PENDING_SCAN in DynamoDB.
Security Gate: AWS GuardDuty Malware Protection scans the object asynchronously.
Orchestration: A GuardDuty scan result event triggers an AWS Step Functions Express Workflow.
Processing: The workflow validates the file signature, moves it to the Clean Bucket, transforms images (using sharp), updates metadata, and cleans up old files.
Real-time Notify: Status changes are fanned out via EventBridge to a WebSocket API for immediate client feedback.

This design ensures a clean separation between ingestion, security, orchestration, and real-time communication.

Architecture Overview

The repository is organized into five modular CDK stacks, keeping domain concerns isolated:

lib/storage-stack.ts - Private S3 buckets + CloudFront distribution for serving assets.
lib/database-stack.ts - DynamoDB tables for uploads and relation tracking.
lib/guard-duty-stack.ts - GuardDuty Malware Protection for the staging bucket.
lib/upload-processing-stack.ts - The orchestration core: EventBridge, Step Functions, and Lambdas.
lib/websocket-stack.ts - API Gateway WebSocket API and connection tracking.

The app wiring in bin/app.ts is intentionally simple, demonstrating the power of stack composition:

const storageStack = new StorageStack(app, "Storage");
const databaseStack = new DatabaseStack(app, "Database");
const guardDutyStack = new GuardDutyStack(app, "GuardDuty", {
  stagingUploadBucket: storageStack.stagingUploadBucket,
});
const webSocketStack = new WebSocketStack(app, "WebSocket", {});

const uploadProcessingStack = new UploadProcessingStack(app, "UploadProcessing", {
  stagingUploadBucket: storageStack.stagingUploadBucket,
  uploadBucket: storageStack.uploadBucket,
  uploadsTable: databaseStack.uploadsTable,
  uploadRelationsTable: databaseStack.uploadRelationsTable,
  webSocket: webSocketStack,
});

Why the Dual-bucket Strategy Matters

The first important design choice is the storage layer. Instead of uploading directly into your final bucket, this service uses two buckets: a staging bucket for untrusted files and an upload bucket for processed, clean assets. This "DMZ" approach prevents your final storage from ever becoming a dumping ground for unscanned content.

Here is the essence of lib/storage-stack.ts:

this.stagingUploadBucket = new Bucket(this, "StagingUploadBucket", {
  blockPublicAccess: BlockPublicAccess.BLOCK_ALL,
  enforceSSL: true,
  minimumTLSVersion: 1.2,
  eventBridgeEnabled: true,
  removalPolicy: RemovalPolicy.DESTROY,
  autoDeleteObjects: true,
  lifecycleRules: [
    {
      abortIncompleteMultipartUploadAfter: Duration.days(1),
      expiration: Duration.days(7),
    },
  ],
});

this.uploadBucket = new Bucket(this, "UploadBucket", {
  blockPublicAccess: BlockPublicAccess.BLOCK_ALL,
  enforceSSL: true,
  minimumTLSVersion: 1.2,
});

const cachePolicy = new CachePolicy(this, "UploadBucketCachePolicy", {
    defaultTtl: Duration.days(7),
    minTtl: Duration.seconds(0),
    maxTtl: Duration.days(30),
});

new Distribution(this, "UploadBucketDistribution", {
  defaultBehavior: {
    origin: S3BucketOrigin.withOriginAccessControl(this.uploadBucket),
    viewerProtocolPolicy: ViewerProtocolPolicy.REDIRECT_TO_HTTPS,
    cachePolicy,
  },
  minimumProtocolVersion: SecurityPolicyProtocol.TLS_V1_3_2025,
});

There are a few good ideas packed into this:

The staging bucket is private and event-enabled.
Incomplete multipart uploads are cleaned up automatically.
The clean upload bucket is also private.
CloudFront uses Origin Access Control, so S3 stays off the public internet path.

Pro Tip: While this setup enforces TLS and private access, you should consider adding BucketEncryption.KMS for defense-in-depth and granular auditing in production environments.

A production-oriented version with a KMS key could look more like this:

const key = new kms.Key(this, "UploadsKey", {
  enableKeyRotation: true,
});

const uploadBucket = new Bucket(this, "UploadBucket", {
  encryption: BucketEncryption.KMS,
  encryptionKey: key,
  bucketKeyEnabled: true,
  // ...Other options
});

Modeling Uploads in DynamoDB

The database layer splits responsibilities across two tables to handle different access patterns:

UploadsTable stores every upload event (the historical record).
UploadRelationsTable tracks the current file for a specific business entity (e.g., "user:123:avatar").

lib/database-stack.ts defines two useful GSIs on UploadsTable:

ByRelation for querying uploads by logical owner or relation.
ByStagingKey for resolving the latest record tied to a staging object key.

this.uploadsTable = new TableV2(this, "UploadsTable", {
  partitionKey: { name: "uploadId", type: AttributeType.STRING },
  pointInTimeRecoverySpecification: {
    pointInTimeRecoveryEnabled: true,
  },
  globalSecondaryIndexes: [
    {
      indexName: "ByRelation",
      partitionKey: { name: "relationKey", type: AttributeType.STRING },
      sortKey: { name: "createdAt", type: AttributeType.STRING },
    },
    {
      indexName: "ByStagingKey",
      partitionKey: { name: "stagingKey", type: AttributeType.STRING },
      sortKey: { name: "createdAt", type: AttributeType.STRING },
    },
  ],
});

this.uploadRelationsTable = new TableV2(this, "UploadRelationsTable", {
    partitionKey: { name: "relationKey", type: AttributeType.STRING },
    removalPolicy: RemovalPolicy.DESTROY,
    pointInTimeRecoverySpecification: {
        pointInTimeRecoveryEnabled: true,
    },
});

This separation allows you to query "what happened to upload X" while simultaneously allowing the UI to instantly find the "current hero image" for a product.

The lambda/upload/update-status.ts handler uses that second table to atomically move a relation to the newest successful upload while keeping a reference to the previous one for cleanup.

Ingestion: Generating the Presigned Upload

The upload path starts in lambda/upload/generate-presigned-post.ts. A key detail here is binding metadata directly into the presigned POST conditions. By forcing x-amz-meta-author-id and x-amz-meta-relation-key into the upload itself, we ensure downstream processors never have to "guess" the context.

const { url, fields } = await createPresignedPost(s3, {
  Bucket: stagingBucket,
  Key: key,
  Conditions: [
    ["content-length-range", 0, 100 * 1024 * 1024],
    ["starts-with", "$Content-Type", contentType ?? ""],
    ["eq", "$x-amz-meta-relation-key", relationKey],
    ["eq", "$x-amz-meta-author-id", userId],
  ],
  Fields: {
    ...(contentType ? { "Content-Type": contentType } : {}),
    "x-amz-meta-relation-key": relationKey,
    "x-amz-meta-author-id": userId,
  },
  Expires: 600,
});

Security First: GuardDuty Malware Protection for S3

We use GuardDuty Malware Protection for S3 as our security gate. It scans objects asynchronously and emits EventBridge events, which is far more efficient and scalable than writing custom antivirus logic in Lambda.

From lib/guard-duty-stack.ts:

const plan = new CfnMalwareProtectionPlan(this, "S3MalwareProtectionPlan", {
  role: role.roleArn,
  protectedResource: {
    s3Bucket: {
      bucketName: stagingUploadBucket.bucketName,
    },
  },
  actions: {
    tagging: { status: "ENABLED" },
  },
});

The stack also grants the GuardDuty service role the minimum S3, EventBridge, and optional KMS permissions it needs.

Why this design works well:

S3 remains the system of record for raw files.
GuardDuty performs the malware check asynchronously.
The result shows up as an EventBridge event.
Object tagging can record scan outcomes such as NO_THREATS_FOUND and THREATS_FOUND.

EventBridge as the Backbone

The service uses EventBridge in two different ways.

First, AWS service events kick off work:

S3 Object Created events register the upload as pending.
GuardDuty malware scan result events start the Step Functions workflow.

Second, the service emits its own internal domain events on a dedicated bus: UploadStatusChanged.

That gives the system a very clean shape: inbound events drive orchestration, and outbound events describe state changes for consumers.

new Rule(this, "S3ObjectCreatedRule", {
  eventPattern: {
    source: ["aws.s3"],
    detailType: ["Object Created"],
    detail: {
      bucket: { name: [stagingUploadBucket.bucketName] },
    },
  },
  targets: [
    new LambdaFunction(registerUploadLambda, {
      event: RuleTargetInput.fromObject({
        bucket: EventField.fromPath("$.detail.bucket.name"),
        key: EventField.fromPath("$.detail.object.key"),
      }),
      retryAttempts: 4,
      deadLetterQueue: s3ObjectCreatedDeliveryDlq,
    }),
  ],
});

The lambda/upload/register-upload.ts handler calls HeadObject, reads metadata, and creates a DynamoDB row with a PENDING_SCAN status.

Orchestration: Step Functions Express Workflow

This is the "brain" of the system. The actual architectural center of the repo, which manages the complex branching logic of the pipeline:

Security Check: If GuardDuty finds a threat, delete the file and fail. Deep Validation: Use file-type to inspect the "magic numbers" of the file, ignoring spoofed MIME headers. Parallel Processing: Transform images (using sharp) and delete the staging file simultaneously.

The workflow is triggered by GuardDuty scan results. If the object is clean, the pipeline validates the file, resolves the final key, copies the object to the clean bucket, optionally generates image variants, stores metadata, updates the upload record, emits a status event, and cleans up a previously replaced upload if needed.

If the object is malicious or invalid, it deletes the staging object and marks the upload as failed.

Why Express Workflow?

We use Express Workflows with StateMachineType.EXPRESS for this orchestration because file processing is a high-volume, short-lived task where latency and cost-efficiency are paramount. Having said that, express workflows do not support long-running callback patterns like .waitForTaskToken or .sync. If you need to wait for external human approval (HITL) or a long asynchronous job, switch to Standard.

The Key Tasks in the Workflow

lib/upload-processing-stack.ts wires several Lambdas into the state machine:

validate.ts - checks file signature using file-type.
resolve-final-key.ts - generates the durable object key.
transform-image.ts - creates WebP derivatives with sharp.
add-metadata.ts - persists final key, dimensions, and formats.
update-status.ts - marks the upload successful or failed and updates the relation state.
cleanup-replaced-upload.ts - deletes the previous file version if the relation now points elsewhere.

It also uses direct AWS SDK integrations for S3 copy and delete operations, which keeps the workflow explicit and avoids extra Lambda glue.

The Workflow Definition

This is the part many teams hand-wave. This demo does not. The state machine is modeled directly in CDK:

const coreWorkflow = new Choice(this, "ScanResultOK")
  .when(
    Condition.stringEquals("$.scanResultStatus", "NO_THREATS_FOUND"),
    validateFileTask.next(
      new Choice(this, "IsFileValid")
        .when(
          Condition.booleanEquals("$.isValid", true),
          resolveFinalKeyTask
            .next(copyToUploadBucketTask)
            .next(transformAndDelete),
        )
        .otherwise(deleteInvalidStagingObjectTask.next(markValidationFailedTask)),
    ),
  )
  .otherwise(deleteThreatStagingObjectTask.next(markThreatDetectedTask));

const definition = new Parallel(this, "MainWorkflowGroup", {
  outputPath: "$.[0]",
})
  .branch(coreWorkflow)
  .addCatch(
    updateUploadStatusFailureTask
      .next(uploadStatusEmitFailureOnCatch)
      .next(workflowFailState),
    {
      errors: [Errors.ALL],
      resultPath: "$.error",
    },
  )
  .next(postProcessingFlow);

const stateMachine = new StateMachine(this, "FileUploadStateMachine", {
  definitionBody: DefinitionBody.fromChainable(definition),
  timeout: Duration.minutes(5),
  stateMachineType: StateMachineType.EXPRESS,
  logs: {
    destination: logGroup,
    level: LogLevel.ALL,
    includeExecutionData: true,
  },
});

There are a few design choices here worth highlighting.

First, the workflow branches early on the security outcome. That means a malware-positive file never reaches the rest of the processing path.

Second, image transformation and staging-object deletion run inside a Parallel state:

const transformAndDelete = new Parallel(this, "TransformAndDeleteStagingFile", {
  outputPath: "$.[0]",
});

transformAndDelete.branch(
  new Choice(this, "IsImage")
    .when(Condition.stringMatches("$.mime", "image/*"), transformImageTask)
    .otherwise(new Pass(this, "SkipTransformForNonImage"))
    .afterwards()
    .next(addMetadataTask),
);

transformAndDelete.branch(deleteStagingObjectOnCopySuccessTask);

That is a nice example of Step Functions as a real orchestration engine, not just a fancy switch statement.

Third, the stack centralizes retry policies for Lambda, S3, and EventBridge tasks:

private addServiceRetry(task: CallAwsService | LambdaInvoke | EventBridgePutEvents, errors: string[]) {
  task.addRetry({
    errors,
    interval: Duration.seconds(2),
    backoffRate: 2,
    maxAttempts: 3,
  });
}

That small helper keeps resilience consistent across the workflow.

File Validation Beyond the Content-Type Header

One of the easiest upload mistakes is trusting the browser-provided MIME type. This demo does better. The lambda/upload/validate.ts reads the first few kilobytes of the object and uses file-type to inspect the file signature:

const object = await s3.send(new GetObjectCommand({
  Bucket: bucket,
  Key: key,
  Range: "bytes=0-4095",
}));

const buffer = await object.Body.transformToByteArray();
const detected = await fileTypeFromBuffer(buffer);
const detectedMime = detected?.mime ?? null;
const mime = detectedMime ?? head.ContentType ?? "application/octet-stream";
const isValid = !(detectedMime && head.ContentType && detectedMime !== head.ContentType);

That is still lightweight, but it already closes a very common trust gap. If the content is invalid, the workflow deletes the staging object and marks the upload as VALIDATION_FAILED.

Image Transformation and Derivative Generation

If the file is an image, the workflow creates optimized WebP variants with sharp.

From lambda/upload/transform-image.ts:

async function createVariant(width: number, prefix: string) {
  const transformed = await sharp(buffer)
    .resize({ width, withoutEnlargement: true })
    .webp({ quality: 75 })
    .toBuffer({ resolveWithObject: true });

  const parts = event.finalKey.split("/");
  const fileName = parts.pop()!;
  const newKey = [...parts, `\({prefix}-\){fileName}`].join("/");

  await s3.send(new PutObjectCommand({
    Bucket: uploadBucket,
    Key: newKey,
    Body: transformed.data,
    ContentType: "image/webp",
  }));

  return {
    key: newKey,
    width: transformed.info.width,
    height: transformed.info.height,
    size: transformed.info.size,
    mime: "image/webp",
  };
}

The handler returns the original dimensions plus a formats array, and add-metadata.ts persists that back into DynamoDB. That means the final upload record is not just "file uploaded"; it is an asset descriptor that a frontend can actually use.

The `sharp` Challenge

Bundling native modules like sharp for Lambda can be tricky. We use NodejsFunction with Docker-based bundling to ensure the binary is compiled for the Lambda Linux environment:

bundling: {
  nodeModules: ["sharp"],
  forceDockerBundling: true,
  environment: {
    NPM_CONFIG_BIN_LINKS: "false",
  },
}

Note: If you're on Windows, you may have to set NPM_CONFIG_BIN_LINKS environment variable to false to disable symlink creation.

Relation-aware Status Tracking and Atomic Replacement

One of the more sophisticated patterns in this architecture is the decoupling of Uploads from Relations.

An Upload is an immutable historical record. It tracks the journey of a specific file from staging to final storage, including its metadata and processing results.
A Relation is a logical pointer within your application domain (e.g., user:42:avatar or product:99:hero-image) that resolves to a specific, successful upload.

Why This Decoupling Matters

In many systems, updating a file means overwriting the existing object or manually deleting the old one before uploading the new one. This often leads to race conditions, orphaned files, or broken UI states.

By using a dedicated UploadRelationsTable, we implement a fail-safe replacement strategy:

Late Binding: The application relation only updates after the entire processing pipeline succeeds.
Reference Tracking: When lambda/upload/update-status.ts marks a new upload as complete, it updates the relation to point to the new asset while capturing the previousUploadId.
Automatic Cleanup: This hand-off allows the lambda/upload/cleanup-replaced-upload.ts task to safely delete the old file and all its generated variants (WebP, thumbnails, etc.) without impacting the live application.

This approach ensures that your application always points to a valid, scanned asset, and your storage never becomes a graveyard of abandoned "Version 1" files. It provides the benefits of object versioning without the complexity of S3 Bucket Versioning for your application logic.

Real-time Communication with WebSockets

The last mile is status delivery. Because the pipeline is asynchronous, the user needs immediate feedback. The service emits UploadStatusChanged events to an internal EventBridge bus. The lambda/upload/emit-upload-status.ts subscriber Lambda then looks up active connections in DynamoDB and pushes the update via API Gateway WebSockets.

private createUploadStatusEmitTask(id: string, uploadStatusBus: EventBus) {
  return new EventBridgePutEvents(this, id, {
    entries: [
      {
        eventBus: uploadStatusBus,
        detailType: "UploadStatusChanged",
        detail: TaskInput.fromJsonPathAt("$"),
        source: "com.file-processing.upload",
      },
    ],
    resultPath: "$.eventBridgeResult",
  });
}

The WebSocket stack stores connections in DynamoDB with a TTL so stale sessions age out automatically.

From lib/websocket-stack.ts:

const connectionsTable = new TableV2(this, "WebSocketConnectionsTable", {
  partitionKey: { name: "PK", type: AttributeType.STRING },
  sortKey: { name: "SK", type: AttributeType.STRING },
  timeToLiveAttribute: "ttl",
  pointInTimeRecoverySpecification: {
    pointInTimeRecoveryEnabled: true,
  },
  dynamoStream: StreamViewType.NEW_AND_OLD_IMAGES,
});

And the connection handler refreshes TTL on inbound messages to keep live sockets active.

Note: The lambda/websocket/authorizer.ts in the demo is explicitly mocked to return a hard-coded demo subject after "verification". For a real deployment, replace that with actual JWT verification against Cognito or your identity provider.

End-to-End Flow

Here is the mental model I would use when explaining the pipeline:

Client
  -> Request presigned POST
  -> Upload file to staging bucket

S3 Object Created
  -> EventBridge rule
  -> Register-upload Lambda
  -> DynamoDB status = PENDING_SCAN

GuardDuty scans staging object
  -> EventBridge scan result
  -> Step Functions Express workflow

If clean
  -> Validate file signature
  -> Resolve final key
  -> Copy to clean bucket
  -> Optionally transform image
  -> Save metadata
  -> Mark upload complete
  -> Clean previous version
  -> Emit UploadStatusChanged

If invalid or malicious
  -> Delete staging object
  -> Mark failed status
  -> Emit UploadStatusChanged

Internal event bus
  -> WebSocket notifier Lambda
  -> Connected clients receive status update

Step Functions Graph for a Successful File Upload.

WebSocket Notification for a Successful File Upload

Architectural Benefits

The strength of this architecture lies in its strict adherence to the Single Responsibility Principle (SRP) at the infrastructure level. Rather than building a monolithic "Upload Lambda," we've distributed concerns across managed services that are natively optimized for their respective tasks.

Decoupled Storage (S3): S3 is used strictly for what it does best—durable, scalable object storage—rather than being cluttered with orchestration logic.
Security as a Service (GuardDuty): By offloading malware scanning to GuardDuty, we eliminate the operational burden and scaling limits of maintaining custom antivirus signatures in Lambda.
Explicit Orchestration (Step Functions): Complex branching, retries, and parallel execution are managed in a visual state machine, making the workflow observable and easy to modify without touching code.
State vs. Event (DynamoDB & EventBridge): DynamoDB maintains the source of truth for asset metadata, while EventBridge acts as the nervous system, routing state changes to downstream consumers without tight coupling.
Asynchronous Communication (WebSockets): Real-time feedback is treated as a reactive side-effect of domain events, not a synchronous dependency of the processing pipeline.

This modularity ensures that the service is not just easy to build, but easy to evolve. You can swap the image processor, add new security gates, or change how notifications are delivered without ever disrupting the core ingestion path.

Scaling to Production: Roadmap and Optimizations

While this architecture provides a robust foundation, transitioning to a high-scale production environment often requires additional optimizations for efficiency, cost, and observability.

1. Extending the Processing Pipeline

The Step Functions backbone makes it trivial to insert new processing stages. As your application grows, you can easily integrate:

AI/ML Insights: Add Amazon Rekognition for moderation or Textract for document analysis.
Frontend Optimization: Generate Blurhash values for instant placeholders or low-resolution image previews.
Rich Metadata: Extract EXIF data, strip sensitive location tags, or generate multi-format derivatives (AVIF, WebP, PDF).
Compliance: Implement automated PII (Personally Identifiable Information) detection before files reach the clean bucket.

2. Implementing Content De-duplication

To optimize storage costs and minimize redundant processing (like malware scanning or image transformation), implement content-addressable storage:

Client-side Hashing: Compute a SHA-256 digest on the client before upload.
Dedupe Check: Query DynamoDB to see if the digest already exists in the Clean Bucket.
Logical Mapping: If a match is found, create a new Relation record pointing to the existing finalKey instead of initiating a new upload.

3. Proactive Observability and Alarming

Relying on DLQs is the first step, but production systems require active monitoring. Implement CloudWatch Alarms for:

Pipeline Failures: Monitor Step Functions ExecutionsFailed and ExecutionsTimedOut.
Infrastructure Health: Track Lambda Errors and Throttles, specifically for the image processing and metadata tasks.
Queue Backlog: Alarm when DLQ message counts exceed zero to ensure human intervention for unhandled edge cases.

4. Explicit Business-level Retries

Beyond infrastructure retries, your workflow should handle transient business failures (e.g., a third-party moderation API being temporarily unavailable). Use Step Functions' Retry logic with custom error codes like TransientServiceError to implement sophisticated backoff strategies without cluttering your Lambda code.

5. Decoupling Notification Infrastructure

In a multi-service architecture, WebSocket management should often be extracted into a dedicated Notification Microservice. The file-processing service would remain a pure producer, publishing UploadStatusChanged events, while the notification service handles delivery across multiple channels (WebSockets, Push Notifications, Email).

6. Cross-Account Event Distribution

As your organization grows, consumers of your file-processing events may live in different AWS accounts. Leverage EventBridge Bus-to-Bus routing to forward domain events to a central integration bus or directly to consumer accounts, maintaining a clean event-driven contract across team boundaries.

Production Check-list

Replace mock authorizers with robust JWT/OIDC verification.
Implement customer-managed KMS keys for encryption at rest (S3 & DynamoDB).
Protected CloudFront distribution by associating it with a web ACL that includes AWS WAF managed rules and IP-based rate limiting.
Define granular IAM boundaries and S3 Bucket Policies (Least Privilege).
Add structured logging and X-Ray tracing for end-to-end correlation.
Enforce S3 Object Lock or Versioning for compliance-heavy domains.

Final Thoughts

In modern web applications, handling file uploads is a multi-stage challenge. You need to ensure security, perform transformations, and keep the user informed, all while maintaining a serverless, cost-effective footprint.

The beauty of this architecture lies in AWS Service Alignment. We aren't fighting the platform; we're using each service for what it's naturally good at:

S3 for durable storage.
GuardDuty for specialized security.
Step Functions for stateful orchestration.
EventBridge for decoupled, event-driven communication.

By separating ingestion from processing and using security as a gatekeeper, you create a system that is secure by default, observable, and ready to scale.

👉 View the full project on GitHub

Mastering Cross-Account SNS to SQS Subscriptions with AWS CDK

Hasitha Wickramasinghe — Sun, 22 Mar 2026 00:33:15 GMT

In a modern microservices architecture, decoupled communication is king. AWS SNS (Simple Notification Service) and SQS (Simple Queue Service) are the bedrock of this decoupling. But as your organization grows and you move toward a multi-account strategy, you'll inevitably hit a common hurdle: How do you subscribe an SQS queue in Account B to an SNS topic in Account A?

Cross-account subscriptions can be tricky because of permission boundaries and the "Pending Confirmation" dance. In this post, we’ll explore two ways to handle this using AWS CDK, and why one is clearly superior.

The Scenario

Imagine a Core Service in Account A that publishes "Task Status" events to an SNS Topic. A Consumer Service in Account B needs to process these events via an SQS Queue.

Important: For native SNS-to-SQS subscriptions, make sure the Topic and Queue are in the same AWS Region.

Method 1: Subscribe as the Queue Owner (Recommended)

In this approach, the owner of the SNS Topic (Account A) "opens the door" by granting permission, and the owner of the SQS Queue (Account B) "walks in" by creating the subscription.

Granting Consumer Account the Permission to Subscribe

Instead of managing every single subscription, the topic owner grants sns:Subscribe permissions to the consumer account.

CDK Implementation (Account A):

const topic = new sns.Topic(this, "StatusTopic", {
  fifo: true,
  contentBasedDeduplication: true,
});

// Grant a specific account permission to subscribe
topic.grantSubscribe(new iam.AccountPrincipal("123456789012"));

// OR: If you are in an AWS Organization, grant access to the entire Org
topic.addToResourcePolicy(new iam.PolicyStatement({
  actions: ["sns:Subscribe"],
  resources: [topic.topicArn],
  principals: [new iam.AnyPrincipal()],
  conditions: {
    StringEquals: { "aws:PrincipalOrgID": "o-xxxxxxxxxx" }
  }
}));

Subscribing SQS Queue to the Topic

Now, the consumer can subscribe their queue to the topic using the Topic ARN. Since they have permission on the topic and own the queue, the subscription is confirmed instantly.

CDK Implementation (Account B):

const queue = new sqs.Queue(this, "ConsumerQueue", { fifo: true });
const topicArn = "arn:aws:sns:us-east-1:111122223333:StatusTopic.fifo";

const topic = sns.Topic.fromTopicArn(this, "ImportedTopic", topicArn);

topic.addSubscription(new subs.SqsSubscription(queue, {
  rawMessageDelivery: true,
  filterPolicy: {
    status: sns.SubscriptionFilter.stringFilter({ allowlist: ["CREATED"] }),
  },
}));

Deployment tip: Deploy Account A (Topic policy/grants) before Account B creates the subscription.

The "Self-Service" Advantage

This is the Gold Standard for microservices. Account A provides the "Event Bus" (the Topic), and consumers can:

Create as many queues as they need.
Define their own Filter Policies without bothering the producer.
Manage their own DLQs and retry logic.
Scale independently without any manual "confirmation" steps.

Method 2: Subscribe as the Topic Owner

Here, the producer in Account A explicitly creates the subscription to the remote queue in Account B.

Creating a Subscription for the Remote SQS Queue

Instead of allowing the consumers to initiate the subscriptions, the topic owner creates and manages the subscriptions for the consumers.

CDK Implementation (Account A):

const queue = sqs.Queue.fromQueueArn(this, "RemoteQueue", "arn:aws:sqs:us-east-1:444455556666:ConsumerQueue.fifo");

topic.addSubscription(new subs.SqsSubscription(queue));

The "Symmetric Permission" Requirement

For this to work, you must also update the SQS Queue Policy in Account B to allow the SNS Topic to send messages.

CDK Implementation (Account B):

// In Account B's CDK code:
queue.addToResourcePolicy(new iam.PolicyStatement({
  actions: ["sqs:SendMessage"],
  resources: [queue.queueArn],
  principals: [new iam.ServicePrincipal("sns.amazonaws.com")],
  conditions: {
    ArnEquals: { "aws:SourceArn": "arn:aws:sns:us-east-1:111122223333:StatusTopic.fifo" },
    StringEquals: { "aws:SourceAccount": "111122223333" }
  }
}));

The Catch: When the producer creates the subscription cross-account, it enters a Pending Confirmation state. AWS sends a SubscriptionConfirmation message to the SQS queue. The consumer then needs a confirmation step (manual or automated, e.g., a Lambda that calls ConfirmSubscription). This introduces extra coupling and operational complexity versus Method 1.

Subscription confirmation

Step 1: In Account B, search for the subscription confirmation message by polling the queue messages.

Step 2: Extract the SubscribeURL from the message and enter it in a browser to confirm.

Step 3: Now in Account A, verify that the subscription is confirmed.

Essential Best Practices

1. Use Filter Policies

Don't make your consumers process every single message. Use SubscriptionFilter to ensure they only get what they need.

topic.addSubscription(new subs.SqsSubscription(queue, {
  filterPolicy: {
    status: sns.SubscriptionFilter.stringFilter({
      allowlist: ["CREATED", "COMPLETED"],
    }),
  },
}));

2. Dead-Letter Queues (DLQs)

Always attach a DLQ to your SQS queue. If a message fails to process after several retries, it moves to the DLQ instead of blocking the queue.

const dlq = new sqs.Queue(this, "DeadLetterQueue", {
  fifo: true,
  retentionPeriod: cdk.Duration.days(14),
});

const queue = new sqs.Queue(this, "ConsumerQueue", {
  fifo: true,
  deadLetterQueue: {
    queue: dlq,
    maxReceiveCount: 5, // Move to DLQ after 5 failed attempts
  },
});

3. Encryption with KMS

When using Customer Managed Keys (CMKs) across accounts, KMS permissions depend on the exact service interactions:

SNS-Encrypted Topics: Your Publishers (IAM roles or AWS services) must have kms:GenerateDataKey* and kms:Decrypt on the Topic's key to encrypt payloads. The sns.amazonaws.com principal does not need permissions on this key to fan out messages.
SQS-Encrypted Queues: Because SNS writes to the queue, you must grant the sns.amazonaws.com service principal kms:GenerateDataKey* and kms:Decrypt on the Queue's key. Downstream Consumers also require kms:Decrypt on this key to read the messages.
Cross-Account Setup: Access requires two-way trust. You must explicitly allow the cross-account actions in both the KMS Key Policy (in the key-owning account) and the IAM Policy (in the calling account). If either is missing, access fails.

Grant Amazon SNS KMS permissions to Amazon SNS to publish messages to the queue

// Create the Customer Managed Key (CMK) for the SQS Queue
const queueKey = new kms.Key(this, "QueueKey", {
  enableKeyRotation: true,
  alias: "alias/my-queue-key",
});

// Allow SNS Service Principal to use the Queue's KMS Key
queueKey.addToResourcePolicy(new iam.PolicyStatement({
  effect: iam.Effect.ALLOW,
  principals: [new iam.ServicePrincipal("sns.amazonaws.com")],
  actions: ["kms:GenerateDataKey*", "kms:Decrypt"],
  resources: ["*"], // Evaluates to this specific key
  conditions: {
    // Security Best Practice: Restrict to the specific cross-account topic
    ArnEquals: { "aws:SourceArn": crossAccountTopicArn }
  }
}));

Restrict message transmission to a specific Amazon SNS topic

// Create the Encrypted SQS Queue
const queue = new Queue(this, "MyEncryptedQueue", {
  encryption: sqs.QueueEncryption.KMS,
  encryptionMasterKey: queueKey,
});

// Allow the cross-account SNS topic to publish messages to the Queue
queue.addToResourcePolicy(new iam.PolicyStatement({
  effect: iam.Effect.ALLOW,
  principals: [new iam.ServicePrincipal("sns.amazonaws.com")],
  actions: ["sqs:SendMessage"],
  resources: [queue.queueArn],
  conditions: {
    ArnEquals: { "aws:SourceArn": crossAccountTopicArn }
  }
}));

Tip: The ArnEquals: { 'aws:SourceArn': crossAccountTopicArn } condition is crucial to prevent the confused deputy problem, ensuring only your specific topic can use this key and queue.

4. FIFO Symmetry

If you're using FIFO (as in the examples), ensure both the Topic and the Queue are FIFO. A FIFO Topic cannot deliver to a standard Queue, and a standard Topic cannot deliver to a FIFO Queue. Also, make sure your publishers provide a MessageGroupId.

Summary

Cross-account messaging is a powerful pattern for scaling. By granting subscription permissions at the Topic level (Method 1), you enable a self-service model that empowers consumers and keeps your infrastructure code clean and automated.

References

AWS Docs: Sending SNS messages to an SQS queue in a different account

https://www.youtube.com/watch?v=xIcNTObKIuc

Protecting Your Users with NSFWJS

Hasitha Wickramasinghe — Thu, 19 Mar 2026 22:04:32 GMT

In the modern web, user-generated content (UGC) is everywhere. While it drives engagement, it also brings a major challenge: content moderation. How do you prevent inappropriate images from being uploaded or displayed without building a massive, expensive backend infrastructure?

Enter NSFWJS, a powerful, open-source JavaScript library that brings indecent content checking directly to the client's browser.

Try the live demo

Why Client-Side Moderation?

Traditionally, moderation happens on the server. While powerful services like AWS Rekognition or Google Cloud Vision are industry standards, they come with trade-offs. You have to upload every user image to their cloud, which introduces latency, privacy concerns, and recurring per-image costs.

NSFWJS flips the script. By leveraging TensorFlow.js, the moderation happens on the user's device. This offers three massive advantages:

Privacy: Unlike cloud APIs, the image never has to leave the user's device. This is a huge win for user trust and data privacy compliance.
Scalability: You offload the heavy lifting (machine learning inference) to the client's hardware. Your servers stay lean, and you avoid the "bill-per-image" model.
Immediate Feedback: Users get instant results without waiting for a round-trip to a remote server. This enables unique use cases like real-time video feed or live webcam analysis, where you can classify several frames per second to provide continuous moderation.

Categorizing the Web

NSFWJS doesn't just give you a "Yes/No" answer. It provides probabilities across five distinct classes so that you can choose the level of moderation and intensity you want.

😐 Neutral: Everyday, safe-for-work images.
🎨 Drawing: Safe-for-work drawings and anime.
🔥 Sexy: Sexually explicit images, but not quite pornography.
🔞 Hentai: Hentai and pornographic drawings.
🔞 Porn: Pornographic images and sexual acts.

Getting Started in 30 Seconds

Integrating NSFWJS is incredibly straightforward. Here’s how you can classify an image element in your web app:

import * as nsfwjs from "nsfwjs";

// Load the model (MobileNetV2 is the default)
const model = await nsfwjs.load();

// Classify an image element, video, or canvas
const img = document.getElementById("user-upload-preview");
const predictions = await model.classify(img);

console.log("Predictions:", predictions);

Advanced: Optimizing for Production (Tree-Shaking)

If you're building a production app, you might want to keep your bundle as small as possible. By using the nsfwjs/core entry point, you can manually register only the models you need:

import { load } from "nsfwjs/core";
import { MobileNetV2Model } from "nsfwjs/models/mobilenet_v2";

// Only bundles and loads the specific model you want
const model = await load("MobileNetV2", {
  modelDefinitions: [MobileNetV2Model],
});

const predictions = await model.classify(img);

For a comprehensive, real-world React implementation using Web Workers and local caching in the browser, check out the Demo in the GitHub repository.

Shared Responsibility: Frontend + Backend

While client-side moderation is a powerful first line of defense, it should never be your only line of defense. A savvy user can always bypass frontend code. For a robust system, moderation is a shared responsibility:

Frontend: Provides instant feedback to the user, reduces server load, and acts as a filter for the majority of uploads.
Backend: Acts as the ultimate source of truth. You should always implement a middleware or a server-side hook (using @tensorflow/tfjs-node with NSFWJS) to verify images before they are permanently stored or served to other users.

Backend Verification (Node.js)

To ensure your moderation is tamper-proof, you should also verify the content on your server. Here is a quick example of how to classify an image in a Node.js environment:

const tf = require("@tensorflow/tfjs-node");
const nsfw = require("nsfwjs");

async function verifyImage(imageBuffer) {
  // Load the model (ideally from a local file)
  const model = await nsfw.load(); 

  // Convert buffer to a 3D Tensor
  const image = await tf.node.decodeImage(imageBuffer, 3);

  // Classify and dispose of the tensor to prevent memory leaks
  const predictions = await model.classify(image);
  image.dispose();

  return predictions;
}

Performance & Flexibility

Whether you're building a lightweight mobile site or a heavy-duty web app, NSFWJS has you covered:

Multiple Models: Choose between MobileNetV2 (fast & small), MobileNetV2Mid (balanced), or InceptionV3 (most accurate but also the heaviest).
Tree-Shaking: Using the nsfwjs/core entry point allows you to bundle only the models you need, keeping your JS payload tiny.
Host Your Own Models: While models can be bundled into your JS, for optimal performance, host the model files directly on your CDN. This allows you to load model binaries directly and enables browsers to cache them separately, reducing your initial JS bundle size.
Node.js Support: Need to do some server-side checks, too? NSFWJS works perfectly with @tensorflow/tfjs-node.
Backend Selection: It supports WebGL, WASM, and even the new WebGPU for blazing-fast performance.

Conclusion

As developers, we have a responsibility to create safe digital spaces. NSFWJS makes it easy to add a layer of protection to your applications without sacrificing user privacy or breaking the bank on server costs.

Note on Accuracy: Machine learning is never 100% perfect. NSFWJS is highly accurate (~90-93%), but false positives can happen. If you encounter an image that is misclassified, consider reporting it on GitHub so that it can be used to continue improving the model for everyone.

Give it a star on GitHub and join the community of contributors making the web a safer place!

Elevate Your UI with Dynamic Text Shadows in React with ShineJS

Hasitha Wickramasinghe — Mon, 16 Feb 2026 02:29:30 GMT

As developers, we're always looking for that extra "pop" to make our interfaces stand out. Whether you're a fan of neumorphism or just want to add a bit of tactile depth to your typography, static shadows often fall flat. They don't react to the environment, and they certainly don't feel "alive."

That’s why I built ShineJS, a modern, lightweight library designed to bring dynamic, light-reactive shadows to your React and Next.js projects.

In this article, I’ll show you how to get started with the core component and provide an interactive playground where you can experiment with neumorphic-text effects in real-time.

What is ShineJS?

ShineJS is an ESM-only TypeScript library based on the initial work from bigspaceship/shine.js that calculates and injects multi-layered shadows based on a virtual light source. By tracking the mouse position (or a fixed point), it creates a sense of physical presence for your text and UI elements.

It's particularly effective for:

Neumorphism aesthetics where soft, directional shadows define the UI.
Hero sections that need a premium, interactive feel.
Visualizing depth in typography-heavy designs.

Quick Start: The Component

The easiest way to add a shine effect is to use the high-level React component. It handles the ref management and updates automatically.

Installation

npm install @hazya/shinejs

Basic Usage

Here is a simple example of a heading that follows your mouse:

import { Shine } from "@hazya/shinejs/react";

export function HeroHeading() {
  return (
    <div className="bg-slate-100 p-20 flex justify-center">
      <Shine
        as="h1"
        className="text-6xl font-black text-slate-200 uppercase tracking-tighter"
        options={{
          light: {
            position: "followMouse",
            intensity: 1.2,
          },
          config: {
            blur: 30,
            opacity: 0.2,
            offset: 0.15,
            shadowRGB: { r: 15, g: 23, b: 42 }, // Slate-900
          },
        }}
      >
        Dynamic Depth
      Shine>
    div>
  );
}

Interactive Playground

I believe the best way to understand a tool is to break it. Below is a live playground that showcases the ShineJS options.

You can tweak the intensity, blur, and offset, etc., to see how the shadow layers interact. This playground uses the useShine hook under the hood to give you full control over the rendering logic.

"use client";

import { Shine } from "@hazya/shinejs/react";

export function PlaygroundStarter() {
  return (
    <Shine
      as="h1"
      options={{
        light: {
          position: "followMouse",
          intensity: 1,
        },
        config: {
          numSteps: 5,
          opacity: 0.15,
          opacityPow: 1.2,
          offset: 0.15,
          offsetPow: 1.8,
          blur: 40,
          blurPow: 1,
          shadowRGB: { r: 0, g: 0, b: 0 },
        },
      }}
    >
      Shine Playground
    Shine>
  );
}

Things to Try:

Shadow Color: Change the shadowRGB to match your background for a true neumorphic look.
Opacity Power: Crank up the opacityPow to see how it affects the falloff of the shadow layers.
Light Position: Switch from followMouse to a fixed point to simulate a static light source in your UI.

Typography and Localization

One of the strengths of ShineJS is its adaptability. Because it works with standard CSS text-shadows and box-shadows, it respects your typography choices, including font-family and font-weight.

In this next example, you can see how the effect maintains its integrity across different font families and languages. Whether it's bold Sans-Serif or elegant Serif, the shadows wrap perfectly around every glyph.

"use client";

import { Shine } from "@hazya/shinejs/react";

export function TypographyExample() {
  return (
    <Shine
      as="h1"
      style={{
        fontFamily: "Georgia, 'Times New Roman', serif",
        fontWeight: 700,
        fontStyle: "italic",
      }}
      options={{
        light: { position: "followMouse", intensity: 1.15 },
        config: {
          blur: 38,
          offset: 0.12,
          opacity: 0.32,
          shadowRGB: { r: 17, g: 24, b: 39 },
        },
      }}
    >
      Typography Shine
    Shine>
  );
}

Why this matters for Neumorphism

Neumorphic design relies heavily on the relationship between the light source and the "material" of the UI. By adjusting the typography options alongside ShineJS parameters, you can ensure that your neumorphic text remains readable and visually consistent across all device types and screen resolutions.

Built with Accessibility in Mind

Dynamic text effects can sometimes be a nightmare for screen readers if not handled correctly. When ShineJS splits your text into individual characters to apply granular shadows, it automatically manages the ARIA attributes for you.

The original element receives an aria-label containing the full text, while the internal split-text structure is marked with aria-hidden="true". This ensures that screen readers see a single, clean string of text rather than a fragmented series of letters, keeping your neumorphic text accessible to everyone.

Wrapping Up

ShineJS is designed to be unobtrusive yet impactful. It doesn't require heavy canvas rendering or complex WebGL setups. It uses smart CSS injection that follows the physics of light.

If you're building a landing page or a creative portfolio, give @hazya/shinejs a try.

Documentation: shinejs.vercel.app/docs
GitHub: haZya/shinejs

Happy coding!

Well-Architected

Amazon S3 Files for Stateful Containers

Why this Matters

What Amazon S3 Files Changes

The Architecture in this Demo

Implementing with CDK

1. Create the Backing Bucket and S3 Files Resources

2. Create Mount Targets Inside the VPC

3. Give the Service and Tasks the Right Permissions (IAM)

4. Patch the ECS Task Definition with s3FilesVolumeConfiguration

End-to-End Request Flow

Why the Mount Path Matters

Why this is Attractive for Stateful Applications

How Does It Compare?

1. Amazon S3 Files (The New Way)

2. Amazon EFS (The Traditional Way)

3. "Offload Media" Plugins

The "S3 Files" Advantage

The Future

Taking it to Production

Conclusion

Deep Dive: Building a Secure, Event-Driven File Processing Pipeline with AWS CDK

What We Are Building

Architecture Overview

Why the Dual-bucket Strategy Matters

Modeling Uploads in DynamoDB

Ingestion: Generating the Presigned Upload

Security First: GuardDuty Malware Protection for S3

EventBridge as the Backbone

Orchestration: Step Functions Express Workflow

Why Express Workflow?

The Key Tasks in the Workflow

The Workflow Definition

File Validation Beyond the Content-Type Header

Image Transformation and Derivative Generation

The sharp Challenge

Relation-aware Status Tracking and Atomic Replacement

Why This Decoupling Matters

Real-time Communication with WebSockets

End-to-End Flow

Step Functions Graph for a Successful File Upload.

WebSocket Notification for a Successful File Upload

Architectural Benefits

Scaling to Production: Roadmap and Optimizations

1. Extending the Processing Pipeline

2. Implementing Content De-duplication

3. Proactive Observability and Alarming

4. Explicit Business-level Retries

5. Decoupling Notification Infrastructure

6. Cross-Account Event Distribution

Production Check-list

Final Thoughts

Mastering Cross-Account SNS to SQS Subscriptions with AWS CDK

The Scenario

Method 1: Subscribe as the Queue Owner (Recommended)

Granting Consumer Account the Permission to Subscribe

Subscribing SQS Queue to the Topic

The "Self-Service" Advantage

Method 2: Subscribe as the Topic Owner

Creating a Subscription for the Remote SQS Queue

The "Symmetric Permission" Requirement

Subscription confirmation

Essential Best Practices

1. Use Filter Policies

2. Dead-Letter Queues (DLQs)

3. Encryption with KMS

4. FIFO Symmetry

Summary

References

Protecting Your Users with NSFWJS

Why Client-Side Moderation?

Categorizing the Web

Getting Started in 30 Seconds

Advanced: Optimizing for Production (Tree-Shaking)

Shared Responsibility: Frontend + Backend

Backend Verification (Node.js)

Performance & Flexibility

Conclusion

Elevate Your UI with Dynamic Text Shadows in React with ShineJS

What is ShineJS?

4. Patch the ECS Task Definition with `s3FilesVolumeConfiguration`

The `sharp` Challenge