Welcome to the final chapter of our comprehensive Node.js project guide! Throughout this series, we’ve built a robust, secure, and scalable Fastify application, containerized it with Docker, and deployed it to AWS ECS. In this pivotal chapter, we shift our focus to observability, a critical aspect of any production-grade application. Observability isn’t just about collecting data; it’s about understanding the internal state of your system from external outputs, enabling you to debug, optimize, and ensure reliability.
In this chapter, we will enhance our application’s logging capabilities using the high-performance pino logger, implement essential health check endpoints for robust service management, and integrate these outputs with AWS CloudWatch for centralized monitoring and alerting. We’ll explore why structured logging is superior for production environments and how health checks are vital for automated deployments and load balancing. By the end of this chapter, your application will not only be deployed but also equipped with the tools necessary for proactive issue detection and efficient operational management, marking the culmination of our journey from scratch to deployment.
Planning & Design
For a production application, knowing what’s happening inside is paramount. Our observability strategy will involve:
- Structured Logging: Utilizing
pinofor efficient, machine-readable logs that can be easily parsed and analyzed by monitoring tools. - Health Checks: Implementing dedicated endpoints to report the application’s operational status, crucial for load balancers and container orchestration platforms like AWS ECS.
- Basic Metrics (via logs/health): While a full-fledged metrics system like Prometheus/Grafana is beyond this chapter’s scope, we’ll ensure our logs and health checks provide enough data points for basic monitoring in CloudWatch.
Observability Architecture
The following diagram illustrates how our Fastify application will integrate logging and health checks with AWS services for enhanced observability.
File Structure Modifications
We’ll primarily modify existing files and introduce some configuration:
src/
├── config/
│ └── index.ts # Add logging configuration
├── plugins/
│ └── logger.ts # New: Pino logger plugin for Fastify
├── routes/
│ └── health.routes.ts # New: Health check route
├── server.ts # Integrate logger and health route
└── ... (existing files)
Step-by-Step Implementation
1. Enhanced Logging with Pino
Pino is a highly performant Node.js logger designed for production. It outputs JSON logs, which are ideal for consumption by log aggregation services like AWS CloudWatch Logs.
a) Setup/Configuration
First, let’s install pino and pino-pretty (for development-friendly log output).
npm install pino pino-pretty
npm install --save-dev @types/pino
Next, let’s create a Fastify plugin for Pino to ensure it’s properly integrated and configured across our application.
File: src/plugins/logger.ts
import fp from 'fastify-plugin';
import { FastifyInstance, FastifyPluginOptions } from 'fastify';
import pino, { LoggerOptions } from 'pino';
// Define the shape of our logger configuration
interface LoggerConfig {
level: string;
prettyPrint: boolean;
}
declare module 'fastify' {
interface FastifyInstance {
log: pino.Logger; // Augment FastifyInstance with pino's logger
}
}
export default fp(async (fastify: FastifyInstance, opts: FastifyPluginOptions) => {
const loggerConfig: LoggerConfig = fastify.config.LOGGER; // Assuming logger config is in fastify.config
const pinoOptions: LoggerOptions = {
level: loggerConfig.level || 'info',
// In production, we want JSON logs for CloudWatch.
// In development, pino-pretty makes logs readable.
transport: loggerConfig.prettyPrint
? {
target: 'pino-pretty',
options: {
colorize: true,
translateTime: 'SYS:HH:MM:ss Z',
ignore: 'pid,hostname',
},
}
: undefined, // undefined means default JSON output
};
const logger = pino(pinoOptions);
fastify.decorate('log', logger);
fastify.addHook('onRequest', (request, reply, done) => {
request.log = logger.child({ reqId: request.id, method: request.method, url: request.url });
done();
});
fastify.addHook('onResponse', (request, reply, done) => {
request.log.info({
statusCode: reply.statusCode,
responseTime: reply.getResponseTime(),
}, 'Request completed');
done();
});
fastify.log.info(`Logger initialized with level: ${loggerConfig.level}, prettyPrint: ${loggerConfig.prettyPrint}`);
});
Explanation:
- We use
fastify-pluginto make our logger available globally across our Fastify instance. - The
pinoOptionsare dynamically set based onfastify.config.LOGGER. In development,pino-prettyprovides human-readable output. In production, it defaults to JSON for easy parsing by CloudWatch. fastify.decorate('log', logger)makes thepinologger instance available asfastify.log.- Hooks
onRequestandonResponseare added to automatically log request details (with a uniquereqId) and response times, providing valuable context for each transaction. - We augment the
FastifyInstanceinterface to includelog, ensuring TypeScript type safety.
Now, let’s update our configuration to include logger settings.
File: src/config/index.ts
// ... (previous imports and types)
interface AppConfig {
port: number;
host: string;
// ... existing configs
LOGGER: {
level: string;
prettyPrint: boolean;
};
}
const config: AppConfig = {
port: parseInt(process.env.PORT || '3000', 10),
host: process.env.HOST || '0.0.0.0',
// ... existing configs
LOGGER: {
level: process.env.LOG_LEVEL || 'info',
prettyPrint: process.env.NODE_ENV !== 'production', // Only pretty print in non-production environments
},
};
export default config;
Explanation:
- We’ve added a
LOGGERobject to ourAppConfiginterface andconfigobject. LOG_LEVELcan be set via environment variables (e.g.,debug,info,warn,error).prettyPrintis set totrueby default unlessNODE_ENVisproduction, ensuring JSON logs in production.
b) Core Implementation
Now, register the logger plugin in your server.ts and use it.
File: src/server.ts
import Fastify from 'fastify';
import config from './config';
import { connectDB } from './utils/database';
// ... other plugin imports
import authRoutes from './routes/auth.routes';
import userRoutes from './routes/user.routes';
import loggerPlugin from './plugins/logger'; // Import our logger plugin
import healthRoutes from './routes/health.routes'; // Import health routes
const fastify = Fastify({
logger: false, // Disable Fastify's default logger as we're using Pino
genReqId: () => Math.random().toString(36).substring(2, 15) + Math.random().toString(36).substring(2, 15), // Unique request ID
});
// Decorate fastify with config before other plugins might need it
fastify.decorate('config', config);
// Register Pino logger plugin first
fastify.register(loggerPlugin, { logLevel: config.LOGGER.level });
// ... existing plugin registrations (e.g., helmet, cors, jwt, etc.)
// Example:
// fastify.register(helmet);
// fastify.register(cors);
// fastify.register(jwtPlugin);
// fastify.register(authMiddleware);
// Register routes
fastify.register(authRoutes, { prefix: '/api/auth' });
fastify.register(userRoutes, { prefix: '/api/users' });
fastify.register(healthRoutes, { prefix: '/api' }); // Register health routes
// Centralized error handling (from previous chapter)
fastify.setErrorHandler((error, request, reply) => {
request.log.error({ error: error.message, stack: error.stack, statusCode: reply.statusCode }, 'Unhandled error caught by error handler');
reply.status(reply.statusCode || 500).send({
statusCode: reply.statusCode || 500,
error: error.name,
message: error.message || 'Internal Server Error',
});
});
const start = async () => {
try {
await connectDB(fastify.log); // Pass logger to DB connection
await fastify.listen({ port: config.port, host: config.host });
fastify.log.info(`Server listening on ${config.host}:${config.port}`);
} catch (err) {
fastify.log.error(err, 'Server failed to start');
process.exit(1);
}
};
process.on('unhandledRejection', (reason, promise) => {
fastify.log.error({ reason, promise }, 'Unhandled Rejection at: Promise');
});
process.on('uncaughtException', (error) => {
fastify.log.error({ error }, 'Uncaught Exception thrown');
process.exit(1); // Exit process after uncaught exception
});
start();
Explanation:
- We disable Fastify’s built-in logger (
logger: false) to avoid duplicate logs. - Our
loggerPluginis registered early. - The
connectDBfunction (from previous chapters) now receives thefastify.loginstance, allowing it to log database connection status. - Error handling and process exit handlers now use
fastify.log.errorfor consistent logging.
Now, you can use request.log or fastify.log in any part of your application (routes, services, plugins).
Example usage in a route (e.g., src/routes/auth.routes.ts):
// ... existing imports
import { FastifyInstance } from 'fastify';
import { registerUser, loginUser } from '../services/auth.service';
import { registerSchema, loginSchema } from '../schemas/auth.schema'; // Assuming validation schemas
export default async function authRoutes(fastify: FastifyInstance) {
fastify.post('/register', { schema: registerSchema }, async (request, reply) => {
request.log.info('Received registration request'); // Using request.log
try {
const user = await registerUser(request.body as any);
request.log.info({ userId: user.id }, 'User registered successfully');
reply.status(201).send({ message: 'User registered successfully', userId: user.id });
} catch (error: any) {
request.log.error({ error: error.message }, 'Registration failed');
reply.status(400).send({ message: error.message });
}
});
fastify.post('/login', { schema: loginSchema }, async (request, reply) => {
request.log.debug('Attempting login for user'); // Using request.log
try {
const { token, user } = await loginUser(request.body as any);
request.log.info({ userId: user.id }, 'User logged in successfully');
reply.send({ token });
} catch (error: any) {
request.log.warn({ error: error.message }, 'Login failed'); // Using warn for failed login attempts
reply.status(401).send({ message: error.message });
}
});
}
c) Testing This Component
- Start your application locally:
npm run dev - Observe console output: You should see pretty-printed logs with timestamps, log levels, and messages.
- Make some requests: Use
curlor Postman to hit your API endpoints (e.g.,/api/auth/register,/api/auth/login). - Verify logs: Confirm that
info,debug, anderrorlogs appear as expected, including the request ID and response time logs.
If you set NODE_ENV=production locally, you would see JSON output:
NODE_ENV=production npm run dev
You’d then see output like:
{"level":30,"time":1673116800000,"pid":12345,"hostname":"my-host","msg":"Server listening on 0.0.0.0:3000"}
{"level":30,"time":1673116801000,"pid":12345,"hostname":"my-host","reqId":"abcdefg","method":"POST","url":"/api/auth/register","msg":"Received registration request"}
d) Production Considerations
- Log Level: In production, set
LOG_LEVELtoinfoorwarnto avoid excessivedebuglogs impacting performance and storage costs. - Structured Logs: The JSON output from
pinois automatically consumed by AWS CloudWatch Logs when deployed to ECS with appropriate IAM roles. CloudWatch can then parse, filter, and alert on these logs. - Sensitive Data: NEVER log sensitive information like passwords, API keys, or personal identifiable information (PII). Implement log redaction if necessary. Pino has features for this, but preventing it at the source is best.
2. Health Checks
Health checks are simple endpoints that indicate the operational status of your application. Load balancers use them to determine if an instance is healthy enough to receive traffic, and container orchestrators use them for liveness and readiness probes.
a) Setup/Configuration
Create a new file for health check routes.
File: src/routes/health.routes.ts
import { FastifyInstance } from 'fastify';
import { getDbConnectionStatus } from '../utils/database'; // Assume this function exists
export default async function healthRoutes(fastify: FastifyInstance) {
fastify.get('/health', async (request, reply) => {
request.log.debug('Health check requested');
try {
const dbStatus = await getDbConnectionStatus(); // Check database connection
if (dbStatus.connected) {
reply.status(200).send({
status: 'UP',
timestamp: new Date().toISOString(),
database: dbStatus.message,
});
} else {
request.log.error({ dbError: dbStatus.error }, 'Health check failed: Database not connected');
reply.status(503).send({
status: 'DOWN',
timestamp: new Date().toISOString(),
database: dbStatus.message,
error: dbStatus.error,
});
}
} catch (error: any) {
request.log.error({ error: error.message }, 'Health check encountered an error');
reply.status(500).send({
status: 'DOWN',
timestamp: new Date().toISOString(),
error: error.message,
});
}
});
// Optional: A simpler liveness check for quick response
fastify.get('/liveness', async (request, reply) => {
request.log.debug('Liveness check requested');
reply.status(200).send({
status: 'UP',
timestamp: new Date().toISOString(),
message: 'Application is running',
});
});
}
Explanation:
/health: This is a more comprehensive readiness probe. It checks if the application is running and if critical dependencies (like the database) are accessible. If the DB is down, it returns503 Service Unavailable./liveness: A simpler endpoint that just checks if the HTTP server is responding. This is useful for Kubernetes liveness probes or quick checks, indicating if the process is alive.
b) Core Implementation
We need to add getDbConnectionStatus to our database utility.
File: src/utils/database.ts (modification)
import { FastifyBaseLogger } from 'fastify';
import mongoose from 'mongoose';
import config from '../config';
let loggerInstance: FastifyBaseLogger; // Store logger instance
export const connectDB = async (logger: FastifyBaseLogger) => {
loggerInstance = logger; // Assign logger
try {
await mongoose.connect(config.DATABASE_URL);
logger.info('MongoDB connected successfully');
} catch (error) {
logger.error({ error }, 'MongoDB connection error');
// It's crucial to exit or handle this error carefully in production
// For now, we'll let the health check report it.
throw error; // Re-throw to indicate connection failure on startup
}
};
export const getDbConnectionStatus = async () => {
if (mongoose.connection.readyState === 1) { // 1 means connected
return { connected: true, message: 'Database is connected' };
} else if (mongoose.connection.readyState === 2) { // 2 means connecting
return { connected: false, message: 'Database is connecting', error: 'Connecting to database' };
} else { // 0 or 3 means disconnected
// Attempt to reconnect if disconnected, or just report failure
try {
if (loggerInstance) {
loggerInstance.warn('Database not connected, attempting to reconnect for health check.');
}
await mongoose.connect(config.DATABASE_URL);
return { connected: true, message: 'Database reconnected' };
} catch (error: any) {
if (loggerInstance) {
loggerInstance.error({ error: error.message }, 'Failed to reconnect database during health check');
}
return { connected: false, message: 'Database is disconnected', error: error.message };
}
}
};
Explanation:
- The
getDbConnectionStatusfunction checksmongoose.connection.readyState. - If disconnected (
readyState0 or 3), it attempts a reconnect. This is a robust approach for health checks, as a momentary disconnect shouldn’t immediately mark the service as unhealthy if it can self-recover. - We’ve added
loggerInstanceto make the logger available withingetDbConnectionStatus.
Remember to register healthRoutes in src/server.ts as shown in the previous server.ts update.
c) Testing This Component
- Start your application locally:
npm run dev - Test the health endpoint:Expected output (if DB is connected):
curl http://localhost:3000/api/health{"status":"UP","timestamp":"2026-01-08T12:00:00.000Z","database":"Database is connected"} - Test the liveness endpoint:Expected output:
curl http://localhost:3000/api/liveness{"status":"UP","timestamp":"2026-01-08T12:00:00.000Z","message":"Application is running"} - Simulate a database outage: Stop your local MongoDB instance.
- Retest
/api/health:Expected output (status code 503):curl http://localhost:3000/api/healthAnd you should see corresponding errors in your application logs.{"status":"DOWN","timestamp":"2026-01-08T12:00:00.000Z","database":"Database is disconnected","error":"...connection error details..."}
d) Production Considerations
- Load Balancers (AWS ALB): Configure your Application Load Balancer target group health checks to point to
/api/healthwith a 200/503 success code. This ensures unhealthy instances are removed from traffic rotation. - ECS Service: ECS services can use health checks for their task definitions. Configure a health check command (e.g.,
curl localhost:3000/api/health) within your container definition to ensure ECS replaces unhealthy tasks. - Response Time: Health check endpoints should be very fast. Avoid complex logic that could introduce latency.
- Authentication: Health check endpoints are typically unauthenticated, but ensure they don’t expose sensitive information.
3. Basic Application Metrics via Logs and CloudWatch
While dedicated metrics systems like Prometheus offer rich capabilities, for many applications, a combination of structured logs and CloudWatch’s ability to extract metrics from logs can suffice.
a) Core Implementation (Conceptual)
Our structured logs already provide data points like responseTime, statusCode, reqId, etc. CloudWatch can use these.
No direct code changes are needed here, but it’s important to understand how existing logs become metrics.
b) Production Considerations (AWS CloudWatch)
When your application is deployed to AWS ECS, and your tasks have the correct IAM role to publish logs to CloudWatch, the following happens:
- Log Streams: Each task instance will send its
pinoJSON logs to a dedicated log stream within a specified CloudWatch Log Group (e.g.,/ecs/my-fastify-app). - Metric Filters: In CloudWatch, you can create “Metric Filters” on your log group.
- Example 1: Error Count: Create a filter that matches
{"level":50}(Pino’s error level). This filter can then increment a custom metric likeMyFastifyApp/Errors. - Example 2: Request Latency: Create a filter that matches
{"responseTime":*}and extract theresponseTimevalue. You can then create a metric forMyFastifyApp/RequestLatencyand aggregate by average, sum, min, max. - Example 3: HTTP 5xx Count: Filter for
{"statusCode":5*}to track server errors.
- Example 1: Error Count: Create a filter that matches
- Alarms: Once you have metrics, you can create CloudWatch Alarms. For instance, an alarm could trigger if
MyFastifyApp/Errorsexceeds 10 in a 5-minute period, sending a notification via SNS to your team. - Dashboards: Create CloudWatch Dashboards to visualize these metrics alongside other AWS service metrics (e.g., CPU utilization of ECS tasks, database connections).
This approach leverages your existing robust logging setup to derive valuable monitoring insights without adding another complex metrics agent to your application.
Production Considerations (Chapter Summary)
- Centralized Logging: Ensure all application logs are sent to a centralized logging system (AWS CloudWatch Logs). This is crucial for debugging, auditing, and compliance in a distributed environment.
- Alerting: Set up CloudWatch Alarms on critical metrics (error rates, high latency, health check failures, CPU/memory usage) to get notified proactively.
- Log Retention: Configure appropriate log retention policies in CloudWatch to manage costs and compliance.
- Security: Regularly review logs for suspicious activity. Ensure logs themselves are secured (e.g., access control for CloudWatch Logs).
- Performance: High-volume logging can impact performance. Use appropriate log levels for different environments.
- Cost Management: Be mindful of the costs associated with log ingestion and storage in CloudWatch. Optimize log levels and retention.
Code Review Checkpoint
At this point, your application has significantly improved observability features:
src/plugins/logger.ts: Implements a Fastify plugin forpino, providing structured, high-performance logging.src/config/index.ts: Updated to includeLOGGERconfiguration, allowing dynamic log level and pretty-print settings based on the environment.src/server.ts: Integrates theloggerPluginandhealthRoutes, and ensures all critical application events (startup, errors, unhandled rejections) are logged using the newpinoinstance.src/routes/health.routes.ts: Introduces/api/healthand/api/livenessendpoints for robust application status checks, including database connectivity.src/utils/database.ts: ModifiedgetDbConnectionStatusto provide real-time database health, and integratedpinofor database-related logs.
These changes are fundamental for operating your application reliably in a production environment.
Common Issues & Solutions
Issue: Logs are not appearing in CloudWatch after deployment.
- Debugging:
- Check ECS Task Execution Role: Ensure your ECS task definition has an IAM role attached with permissions for
logs:CreateLogGroup,logs:CreateLogStream, andlogs:PutLogEvents. - Verify Log Driver: In your ECS task definition, confirm the
logConfigurationfor your container specifiesawslogsdriver and the correctawslogs-group,awslogs-region, andawslogs-stream-prefix. - Check Application Logs Locally: Does
pinoproduce logs locally? Are there any errors on application startup related to logging? - Check CloudWatch Log Group: Does the specified log group exist in CloudWatch?
- Check ECS Task Execution Role: Ensure your ECS task definition has an IAM role attached with permissions for
- Solution: Update your ECS task execution role and task definition’s log configuration.
- Debugging:
Issue: Health checks are failing even when the application seems to be running.
- Debugging:
- Access the endpoint manually: Can you
curlthe/api/healthendpoint from within the ECS container or from your local machine (if port is exposed)? - Check application logs: What does your application’s
pinolog say when the health check is hit? Is it reporting a database connection issue or another dependency failure? - Database Connectivity: If the health check checks the database, is the database actually accessible from the ECS task? Check security groups, VPC settings.
- Access the endpoint manually: Can you
- Solution: Address the underlying dependency issue (e.g., database connectivity, external service unavailability) or refine the health check logic to be more resilient or provide clearer error messages.
- Debugging:
Issue: Excessive debug logs in production, incurring high CloudWatch costs.
- Debugging: Check the
LOG_LEVELenvironment variable in your ECS task definition. - Solution: Ensure
LOG_LEVELis set toinfo,warn, orerrorin your production environment configuration. This will filter outdebuglogs, reducing log volume.
- Debugging: Check the
Testing & Verification
To thoroughly test and verify the observability features:
Local Testing:
- Run
npm run dev. - Confirm
pino-prettylogs appear in your console. - Hit various API endpoints (
/api/auth/register,/api/users, etc.) and observe correspondinginfo,debug, anderrorlogs. - Test
/api/healthand/api/livenessendpoints. - Intentionally break your database connection and re-test
/api/healthto confirm it reportsDOWNand logs errors.
- Run
Deployment to AWS ECS:
- Rebuild and Push Docker Image:
docker build -t <your-repo-uri>/fastify-app:latest . docker push <your-repo-uri>/fastify-app:latest - Update ECS Service: Update your ECS service to use the new Docker image.
- Verify Health Checks:
- In the AWS Console, navigate to your ECS service. Check the “Events” tab and “Tasks” tab. You should see tasks starting successfully, and the health status should be “Healthy.”
- Go to your Application Load Balancer target group, and verify the registered targets are “Healthy.”
- Verify CloudWatch Logs:
- Navigate to CloudWatch -> Log Groups. Find your application’s log group (e.g.,
/ecs/my-fastify-app). - Open the log group and select a log stream from one of your running tasks.
- You should see your structured JSON logs flowing in real-time. Make some requests to your deployed application and observe new logs appearing.
- Look for
infomessages from your application startup, and request/response logs. - (Optional) Simulate an error (e.g., try to register with an existing email) and verify an
errorlog appears.
- Navigate to CloudWatch -> Log Groups. Find your application’s log group (e.g.,
- Verify CloudWatch Metrics & Alarms (if configured):
- If you set up metric filters and alarms, verify that metrics are being generated and that alarms are in an
OKstate (unless an error condition is met).
- If you set up metric filters and alarms, verify that metrics are being generated and that alarms are in an
- Rebuild and Push Docker Image:
Summary & Next Steps
Congratulations! You have successfully completed the journey of building a production-ready Node.js Fastify application from scratch to deployment. In this final chapter, we equipped our application with essential observability features:
- Structured Logging: Implemented
pinofor efficient, machine-readable logs. - Health Checks: Created robust
/api/healthand/api/livenessendpoints for service management and load balancing. - AWS Integration: Discussed how these features integrate seamlessly with AWS CloudWatch Logs for centralized monitoring and alerting.
This final layer of observability is crucial for maintaining the health, performance, and reliability of your application in a real-world production environment. You now have a solid foundation for any future Node.js projects, encompassing best practices from development to deployment and operations.
What’s Next? (Continuous Improvement)
While this guide concludes here, the journey of a production application is continuous. Consider these areas for further exploration and enhancement:
- Advanced Monitoring: Integrate dedicated monitoring solutions like Prometheus & Grafana for more granular metrics, custom dashboards, and advanced alerting.
- Distributed Tracing: Implement distributed tracing (e.g., OpenTelemetry, AWS X-Ray) to visualize request flows across multiple services, invaluable for microservices architectures.
- Cost Optimization: Regularly review AWS resource usage and look for opportunities to optimize costs (e.g., right-sizing ECS tasks, optimizing database instances, managing log retention).
- Disaster Recovery: Develop and test a disaster recovery plan, including regular backups, multi-AZ deployments, and failover strategies.
- Performance Tuning: Continuously profile your application to identify bottlenecks and optimize code for better performance and scalability.
- Security Audits: Conduct regular security audits, penetration testing, and vulnerability scanning. Stay updated on Node.js and dependency security advisories.
- Automation: Automate more aspects of your operations, such as infrastructure provisioning (Infrastructure as Code with Terraform/CloudFormation) and advanced CI/CD pipelines.
Thank you for following along this comprehensive guide. May your Node.js applications be fast, reliable, and observable!