Analytics from Scratch
Privacy-respecting analytics with session tracking, Web Vitals, and Grafana dashboards.
Part 7 of the "Building a Modern Portfolio Without a Meta-Framework" series
Every developer has a Google Analytics horror story. Mine was realizing GA4 had been silently blocked by ad blockers for 40% of my users while simultaneously sending their data to Google's advertising infrastructure.
I wanted analytics that tell me what I actually need to know—which pages people visit, how fast they load, whether the chat feature gets used—without the privacy baggage. So I built my own.
Why Custom Analytics?
The alternatives exist. Good ones, even. But each has trade-offs:
Google Analytics: Free, powerful, and sends every visitor's behavior to Google's advertising network. Blocked by ~40% of users with ad blockers. The GA4 interface is bewildering, and simple questions require complex event configurations.
Plausible/Fathom: Privacy-focused, clean dashboards, reasonable pricing. But $9-19/month for a portfolio that might get 1,000 visitors feels wrong. And I still don't own the data.
Umami: Self-hosted, open source, privacy-first. The best alternative. But I'd need to run a server somewhere, manage a database, handle backups. More infrastructure than my actual portfolio.
PostHog: Feature flags, session replay, product analytics. Impressive. Also massive overkill for "how many people read my blog posts?"
Building custom analytics took about 300 lines of TypeScript. It runs on infrastructure I already have (Cloudflare Workers + D1). Zero additional cost. Complete control over what gets tracked. And I learned how analytics actually work.
The implementation uses React context and hooks for clean integration—no global singletons or manual initialization. Just wrap your app in a provider and use hooks wherever you need tracking.
The Privacy Model
Traditional analytics track users across sessions, sites, and devices. They use cookies, fingerprinting, and cross-site identifiers to build persistent profiles.
I don't need any of that. I need to know:
- How many people visited today?
- Which pages are popular?
- How fast do pages load?
- Does the AI chat get used?
All answerable with session-based tracking. No cookies. No personal data. No cross-site anything.
const getSessionId = useCallback(() => {
if (sessionIdRef.current) return sessionIdRef.current;
const stored = sessionStorage.getItem('analytics_session_id');
if (stored) {
sessionIdRef.current = stored;
return stored;
}
const newId = crypto.randomUUID();
sessionStorage.setItem('analytics_session_id', newId);
sessionIdRef.current = newId;
return newId;
}, []);
sessionStorage persists only for the browser tab's lifetime. Close the tab, the session dies. Open a new tab to the same site, new session. This means I can't track "returning visitors" in the traditional sense—but I don't need to. I care about page views and performance, not building user profiles.
What Gets Tracked
Four event types cover everything:
Sessions
When someone arrives, capture contextual information. This happens automatically in the AnalyticsProvider's initialization effect:
// Inside AnalyticsProvider's useEffect
const sessionId = getSessionId();
const ua = parseUserAgent();
const utm = parseUtmParams();
enqueue({
type: 'session',
id: sessionId,
deviceType: ua.deviceType, // mobile, tablet, desktop
browser: ua.browser,
browserVersion: ua.browserVersion,
os: ua.os,
screenWidth: window.screen.width,
screenHeight: window.screen.height,
referrer: getExternalReferrer(),
utmSource: utm.utmSource,
utmMedium: utm.utmMedium,
utmCampaign: utm.utmCampaign,
});
The user agent parsing is intentionally simple—I don't need exact browser versions, just broad categories:
export function parseUserAgent() {
const ua = navigator.userAgent;
let deviceType = 'desktop';
if (/Mobi|Android/i.test(ua)) {
deviceType = /Tablet|iPad/i.test(ua) ? 'tablet' : 'mobile';
}
let browser = 'Unknown';
if (ua.includes('Firefox/')) browser = 'Firefox';
else if (ua.includes('Chrome/') && !ua.includes('Edg/')) browser = 'Chrome';
else if (ua.includes('Safari/') && !ua.includes('Chrome/'))
browser = 'Safari';
else if (ua.includes('Edg/')) browser = 'Edge';
// ... os detection
return { deviceType, browser, browserVersion, os };
}
External referrers only—internal navigation doesn't count:
export function getExternalReferrer(): string | null {
const referrer = document.referrer;
if (!referrer) return null;
const referrerUrl = new URL(referrer);
if (referrerUrl.hostname === window.location.hostname) {
return null;
}
return referrer;
}
Page Views
Track which pages get visited and how long people spend. The trackNavigation callback uses refs to track time spent on each page:
const trackNavigation = useCallback(
(newPath: string) => {
const previousPath = currentPathRef.current;
const duration = Date.now() - pageEnteredAtRef.current;
if (previousPath && previousPath !== newPath) {
// Send duration for the page they're leaving
enqueue({
type: 'pageview',
sessionId: getSessionId(),
path: previousPath,
timestamp: pageEnteredAtRef.current,
duration,
});
}
currentPathRef.current = newPath;
pageEnteredAtRef.current = Date.now();
trackPageView(newPath, previousPath || undefined);
},
[enqueue, getSessionId, trackPageView],
);
Duration tracking is tricky in SPAs. You can't rely on page unload events—they're unreliable. Instead, track navigation: when someone navigates away, calculate how long they spent on the previous page.
The router integrates with analytics through context—no direct imports needed:
// In RouterProvider
const analytics = useAnalytics();
const navigate = useCallback(
(to: string) => {
// ... parse URL
analytics.trackNavigation(newPath);
// ... update state
},
[analytics],
);
Web Vitals
Google's Core Web Vitals measure real-world performance. The web-vitals library makes collection trivial. This is set up in the provider's initialization effect:
import { onCLS, onINP, onLCP, onFCP, onTTFB } from 'web-vitals';
// Inside AnalyticsProvider's useEffect
const reportMetric = (metric: Metric) => {
enqueue({
type: 'webvital',
sessionId,
path: currentPathRef.current,
timestamp: Date.now(),
name: metric.name,
value: metric.value,
rating: metric.rating, // 'good', 'needs-improvement', 'poor'
id: metric.id,
navigationType: metric.navigationType,
});
};
onCLS(reportMetric); // Cumulative Layout Shift
onINP(reportMetric); // Interaction to Next Paint
onLCP(reportMetric); // Largest Contentful Paint
onFCP(reportMetric); // First Contentful Paint
onTTFB(reportMetric); // Time to First Byte
The metrics that matter most:
- LCP (Largest Contentful Paint): When did the main content appear? Under 2.5s is good.
- INP (Interaction to Next Paint): How responsive is the page to clicks? Under 200ms is good.
- CLS (Cumulative Layout Shift): Does the page jump around while loading? Under 0.1 is good.
These run automatically. The web-vitals library handles the complex timing APIs and reports when values are final.
Custom Events
For anything else—chat interactions, button clicks, feature usage. The context exposes a trackEvent function:
const trackEvent = useCallback(
(
eventType: string,
eventCategory: string,
eventData?: Record<string, unknown>,
) => {
enqueue({
type: 'event',
sessionId: getSessionId(),
timestamp: Date.now(),
eventType,
eventCategory,
eventData,
});
},
[enqueue, getSessionId],
);
Components use the useTrackEvent hook:
import { useTrackEvent } from '@/lib/analytics';
function ChatWidget({ projectSlug }: { projectSlug: string }) {
const trackEvent = useTrackEvent();
const handleOpen = () => {
trackEvent('chat_opened', 'chat', { projectSlug });
};
const handleComplete = (responseLength: number, duration: number) => {
trackEvent('stream_completed', 'chat', {
projectSlug,
responseLength,
streamDurationMs: duration,
});
};
// ...
}
Flexible enough to track anything without schema changes. For chat-specific tracking, there's also a useTrackChat hook that pre-fills the category.
Client-Side Batching
Sending an HTTP request for every event is wasteful. Network requests have overhead—DNS lookup, TCP handshake, TLS negotiation. Batching amortizes this cost.
The provider uses refs to maintain the queue across renders:
function AnalyticsProvider({ children, config, enabled = true }) {
const queueRef = useRef<AnalyticsEvent[]>([]);
const flushTimerRef = useRef<number | null>(null);
const enqueue = useCallback(
(event: AnalyticsEvent) => {
if (!enabled) return;
queueRef.current.push(event);
// Flush when batch is full
if (queueRef.current.length >= mergedConfig.batchSize) {
flush();
}
},
[enabled, mergedConfig.batchSize],
);
// Start timer in initialization effect
useEffect(() => {
flushTimerRef.current = window.setInterval(
flush,
mergedConfig.flushInterval, // Default: 5 seconds
);
return () => {
if (flushTimerRef.current) {
clearInterval(flushTimerRef.current);
}
};
}, [flush, mergedConfig.flushInterval]);
}
Two triggers for sending data:
- Batch size reached: Default 10 events. Ensures high-traffic pages don't queue indefinitely.
- Timer interval: Default 5 seconds. Ensures low-traffic pages still report.
Reliable Delivery
The tricky part: what if the user closes the tab? Regular fetch requests get cancelled. Enter sendBeacon:
const flush = useCallback(() => {
if (queueRef.current.length === 0) return;
const batch = [...queueRef.current];
queueRef.current = [];
const blob = new Blob([JSON.stringify(batch)], {
type: 'application/json',
});
const sent = navigator.sendBeacon(mergedConfig.endpoint, blob);
// Fallback if sendBeacon fails
if (!sent) {
fetch(mergedConfig.endpoint, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(batch),
keepalive: true,
}).catch(() => {
// Re-queue on failure (with limit to prevent memory issues)
if (queueRef.current.length < 100) {
queueRef.current.unshift(...batch);
}
});
}
}, [mergedConfig.endpoint]);
sendBeacon is designed for this exact use case—sending data reliably even during page unload. The browser queues the request and sends it after the page is gone.
Flush on visibility change and page hide—handled in the provider's cleanup effect:
useEffect(
() => {
// ... initialization code
const handleVisibilityChange = () => {
if (document.visibilityState === 'hidden') {
flush();
}
};
document.addEventListener('visibilitychange', handleVisibilityChange);
window.addEventListener('pagehide', flush);
return () => {
if (flushTimerRef.current) {
clearInterval(flushTimerRef.current);
}
document.removeEventListener('visibilitychange', handleVisibilityChange);
window.removeEventListener('pagehide', flush);
flush(); // Final flush on unmount
};
},
[
/* dependencies */
],
);
When the user switches tabs or closes the page, any queued events get sent immediately. The cleanup function ensures we don't leak event listeners and sends any remaining events when the provider unmounts.
The Database Schema
All events go into a single unified table:
CREATE TABLE IF NOT EXISTS events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT,
timestamp INTEGER NOT NULL,
event_type TEXT NOT NULL,
event_category TEXT NOT NULL,
event_data TEXT
);
CREATE INDEX IF NOT EXISTS idx_events_timestamp ON events(timestamp);
CREATE INDEX IF NOT EXISTS idx_events_session ON events(session_id);
CREATE INDEX IF NOT EXISTS idx_events_type ON events(event_type);
CREATE INDEX IF NOT EXISTS idx_events_category ON events(event_category);
Sessions get their own table for the metadata:
CREATE TABLE IF NOT EXISTS sessions (
id TEXT PRIMARY KEY,
started_at INTEGER NOT NULL,
ended_at INTEGER,
country TEXT,
region TEXT,
device_type TEXT,
browser TEXT,
browser_version TEXT,
os TEXT,
screen_width INTEGER,
screen_height INTEGER,
referrer TEXT,
utm_source TEXT,
utm_medium TEXT,
utm_campaign TEXT
);
Geographic data (country, region) comes from Cloudflare headers—they add cf-ipcountry to every request automatically.
The event_data column stores JSON. This means I don't need schema migrations when adding new event properties. Query with SQLite's JSON functions:
SELECT
json_extract(event_data, '$.path') as path,
COUNT(*) as views
FROM events
WHERE event_type = 'pageview'
GROUP BY path
ORDER BY views DESC;
The Server Endpoint
The Worker receives batched events and writes to D1:
app.post('/api/analytics', async (c) => {
const db = c.env.DB;
const batch: AnalyticsPayload[] = await c.req.json();
// Cloudflare provides geo data
const country = c.req.header('cf-ipcountry') || null;
const region = c.req.raw.cf?.region || null;
const statements: D1PreparedStatement[] = [];
for (const event of batch) {
switch (event.type) {
case 'session':
statements.push(
db
.prepare(
`INSERT INTO sessions (...) VALUES (?, ?, ...)
ON CONFLICT(id) DO NOTHING`,
)
.bind(event.id, Date.now(), country, region /* ... */),
);
break;
case 'pageview':
statements.push(
db.prepare(`INSERT INTO events (...) VALUES (?, ?, ?, ?, ?)`).bind(
event.sessionId,
event.timestamp,
'pageview',
'navigation',
JSON.stringify({
path: event.path,
duration: event.duration,
}),
),
);
break;
// ... webvital, event cases
}
}
if (statements.length > 0) {
await db.batch(statements);
}
return c.json({ success: true });
});
D1's batch() method executes all statements in a single round-trip. Important for latency when a batch might contain 10+ events.
Querying the Data
With everything in D1, querying is just SQL. Page views by day:
SELECT
date(timestamp / 1000, 'unixepoch') as day,
COUNT(*) as views
FROM events
WHERE event_type = 'pageview'
GROUP BY day
ORDER BY day DESC;
Average LCP by page:
SELECT
json_extract(event_data, '$.path') as path,
AVG(json_extract(event_data, '$.value')) as avg_lcp
FROM events
WHERE event_type = 'LCP'
GROUP BY path;
Chat engagement:
SELECT
json_extract(event_data, '$.projectSlug') as project,
COUNT(*) as chats
FROM events
WHERE event_type = 'chat_opened'
GROUP BY project
ORDER BY chats DESC;
Visualizing with Grafana
Raw SQL queries work, but staring at tables gets old. Grafana turns the data into actual dashboards.
Cloudflare D1 doesn't have a native Grafana datasource, but there's a workaround: the Infinity datasource can query any JSON API. Cloudflare D1 already has a REST API, so we can use that.
In Grafana, configure Infinity to POST to this endpoint with a SQL query in the body. Now I can build panels:
Page Views Over Time: A time series showing daily traffic patterns. Helps spot if a blog post got traction or if something broke.
Web Vitals Distribution: Gauge panels showing the percentage of "good" vs "needs improvement" vs "poor" ratings for LCP, CLS, and INP. At a glance, I know if performance is acceptable.
Top Pages: A table ranking pages by view count. Shows what content resonates.
Geographic Distribution: A world map colored by session count per country. Mostly vanity, but interesting to see where visitors come from.
AI Chat Metrics: Token usage, response latency, and chat engagement per project. Tells me if the feature is worth the Cloudflare AI costs.
The beauty of owning your data: any question becomes a SQL query away. Want to know the p95 LCP for mobile users on blog posts? Write the query, add a panel. No waiting for GA4 to add a feature or Plausible to support a new dimension.
React Integration
The whole system comes together with a simple provider pattern. Wrap your app once:
// App.tsx
import { AnalyticsProvider } from '@/lib/analytics';
export function App({ initialPath }: AppProps) {
return (
<AnalyticsProvider>
<RouterProvider initialPath={initialPath} routes={routes}>
<Router />
</RouterProvider>
</AnalyticsProvider>
);
}
No manual initialization. No global singletons. The provider handles everything—session creation, web vitals, batching, cleanup.
The analytics context exports everything you need:
// What the provider exposes
interface AnalyticsContextValue {
trackPageView: (path: string, referrerPath?: string) => void;
trackNavigation: (newPath: string) => void;
trackEvent: (
eventType: string,
eventCategory: string,
eventData?: Record<string, unknown>,
) => void;
trackChat: (
eventType: ChatEventType,
eventData?: Record<string, unknown>,
) => void;
sessionId: string;
}
Use the hooks in any component:
// Full access
const { trackEvent, sessionId } = useAnalytics();
// Just event tracking
const trackEvent = useTrackEvent();
// Chat-specific (pre-fills category)
const trackChat = useTrackChat();
The enabled prop lets you disable analytics in development or for users who opt out:
<AnalyticsProvider enabled={process.env.NODE_ENV === 'production'}>
What's Not Tracked
Deliberately excluded:
- IP addresses: Not stored. Country comes from Cloudflare headers without storing IPs.
- User IDs: No accounts, no persistent identifiers.
- Exact timestamps: Stored as Unix milliseconds, but I don't need sub-second precision.
- Scroll depth: Could add it, but doesn't answer questions I have.
- Mouse movements: Creepy and useless for a portfolio.
The goal is minimal viable analytics. Track what informs decisions, ignore everything else.
What I'd Add for Production
For a larger site:
- Sampling: At high traffic, sample 10% of sessions instead of tracking everything.
- Rate limiting: The endpoint is open. Could be abused.
- Data retention: Auto-delete events older than 90 days.
- Dashboard: A simple page showing key metrics instead of raw SQL.
- Anomaly detection: Alert if traffic spikes or Web Vitals degrade.
Alternatives I'd Consider Now
If I were starting over and didn't want to build custom:
Umami remains the best self-hosted option. If you already have a server running, it's the obvious choice. Clean UI, good defaults, actively maintained.
Plausible Cloud if you're willing to pay. At $9/month, you get a polished product, no maintenance, and real privacy compliance. Worth it for a business site.
Vercel Analytics if you're on Vercel. Built-in Web Vitals, no setup, reasonable pricing. Vendor lock-in, but sometimes that's fine.
Nothing is also valid. For a portfolio, you might not need analytics at all. Build it, ship it, move on.