REST API Over SQL Database: The Complete Implementation Guide

Q: How should a REST API represent SQL NULL values in JSON responses?

JSON has a native null type that maps directly to SQL NULL. When a column value is NULL, serialize it as the JSON literal null rather than omitting the key or returning an empty string. Omitting the key creates ambiguity: the client cannot distinguish between a field that does not exist on the resource and a field that exists but has no value. Returning an empty string is worse because it conflates the absence of data with a zero-length string, which are semantically different in SQL. For PATCH operations, the convention is that a key set to null means 'set this column to NULL,' while an omitted key means 'do not modify this column.' Documenting this behavior in your OpenAPI specification eliminates client-side guessing.

Q: What is the best way to version a REST API that sits on top of a SQL database?

URI path versioning (e.g., /v1/orders, /v2/orders) is the most widely adopted approach because it is explicit, cacheable, and easy to route at the load balancer or gateway level. Header-based versioning using a custom header like Api-Version or the Accept header with a vendor media type (application/vnd.myapi.v2+json) keeps URIs clean but is harder to test in a browser and complicates CDN caching. When the underlying SQL schema changes, you can maintain backward compatibility by creating database views that present the old shape to the v1 layer while the v2 layer reads the new schema directly. This lets you evolve the database without breaking existing consumers. Deprecation headers (Sunset, Deprecation) should accompany old versions so clients have a migration window.

Building a REST API over a SQL database is one of the most common tasks in backend engineering, yet the gap between a working prototype and a production-grade implementation is substantial. The REST architectural style maps naturally onto relational data: tables become collections, rows become resources, and HTTP verbs correspond to CRUD operations. What makes the difference between a fragile demo and a reliable system is the set of decisions around schema introspection, endpoint design, relationship traversal, authentication, pagination, and filtering. This guide covers each of those decisions in depth, providing the technical detail needed to build a REST API that performs under load, evolves with the schema, and remains secure from the first deployment onward.

Key Takeaways

SQL tables map directly to REST collections, with primary keys serving as resource identifiers and foreign keys defining traversal paths between related endpoints.
Endpoint design should follow strict HTTP verb semantics: GET for reads, POST for creation, PUT for full replacement, PATCH for partial updates, and DELETE for removal.
Cursor-based (keyset) pagination outperforms OFFSET/LIMIT on large tables because it uses index seeks instead of sequential scans.
Authentication at the API layer (JWT, OAuth2, or API keys) must be decoupled from database-level credentials to avoid exposing connection strings or granting direct SQL access to clients.
Filtering and sorting should be expressed through query parameters with explicit whitelisting to prevent SQL injection and uncontrolled query plans.
Stored procedures and views deserve dedicated endpoints that do not follow the standard resource CRUD pattern.

Reading Database Schema to Define API Resources

The foundation of any SQL-backed REST API is the mapping between the relational schema and the API’s resource model. Every table in the database is a candidate for a collection endpoint, every row is a resource, and every primary key becomes the resource identifier embedded in the URI path. The schema itself contains enough metadata to generate a substantial portion of the API surface: column names become JSON field names, data types inform serialization rules, NOT NULL constraints indicate required fields, and UNIQUE constraints suggest natural keys that may serve as alternative lookup paths.

Schema introspection is the process of reading this metadata programmatically. In PostgreSQL, the information_schema.columns and information_schema.table_constraints views expose column types, nullability, defaults, and constraint definitions. MySQL provides equivalent information through INFORMATION_SCHEMA.COLUMNS and INFORMATION_SCHEMA.KEY_COLUMN_USAGE. SQL Server surfaces the same data via sys.columns, sys.types, and sys.foreign_keys. The specifics differ by engine, but the principle is the same: the database already knows the shape of the data, and the API layer should derive its resource definitions from that shape rather than duplicating it in application code. For a deeper discussion of how schema-driven approaches work, see the guide on auto-generating APIs from database schema.

A critical decision at this stage is whether to expose every table or only a curated subset. System tables, migration tracking tables, and internal audit logs rarely belong in a public API. A common pattern is to maintain an allow-list of tables and views that the API should expose, with the schema introspection layer filtering out everything else. Views are particularly useful here because they let you reshape the underlying data (renaming columns, joining tables, computing derived fields) without changing the physical schema.

Endpoint Design Conventions for SQL-Backed REST APIs

REST endpoint design for SQL databases follows a predictable grammar. A table named orders produces endpoints at /orders (the collection) and /orders/{id} (a single resource). The HTTP methods map to SQL operations: GET to SELECT, POST to INSERT, PUT to a full UPDATE (all columns), PATCH to a partial UPDATE (only the columns present in the request body), and DELETE to DELETE. This mapping is not coincidental; it is the reason REST and SQL work well together.

GET: Retrieving Resources

A GET request to /orders should execute a SELECT against the orders table with whatever filtering, sorting, and pagination parameters the query string provides. A GET to /orders/417 should return exactly one row where the primary key equals 417, or a 404 if no such row exists. The response content type is almost universally application/json, and the JSON keys should match the column names (or their aliased equivalents if you are mapping snake_case to camelCase at the serialization layer).

Idempotency is inherent to GET. The same request issued twice produces the same result, assuming no intervening writes. This property makes GET responses safe to cache, and you should set Cache-Control headers appropriately for read-heavy endpoints. ETags derived from row version columns or last-modified timestamps enable conditional requests that reduce bandwidth and database load.

POST: Creating Resources

POST to /orders inserts a new row. The request body contains the column values, and the response should return the created resource (including any server-generated fields like auto-increment IDs, default timestamps, or computed columns) with a 201 status code and a Location header pointing to the new resource’s URI. If the table has a composite primary key, the Location header must encode all key components.

Validation at the API layer should mirror the database constraints. Sending a request that violates a NOT NULL constraint will produce a database error, but the API should catch this before it reaches the database and return a 422 with a structured error body that identifies which fields failed validation and why. Relying on raw database errors for client-facing messages leaks implementation details and produces inconsistent error formats.

PUT, PATCH, and DELETE

PUT replaces the entire resource. Every column that is not server-managed (auto-increment IDs, audit timestamps) must be present in the request body. Missing columns are set to their default values or NULL, which can cause unintended data loss if the client omits a field by accident. PATCH, by contrast, updates only the columns present in the request body. The JSON Merge Patch format (RFC 7396) is the simplest approach: a key set to a value updates that column, a key set to null sets the column to NULL, and an absent key leaves the column unchanged.

DELETE on /orders/417 issues a DELETE FROM orders WHERE id = 417. The response is typically 204 No Content. Soft deletes (setting a deleted_at timestamp instead of removing the row) are a schema-level decision that the API layer respects by filtering out soft-deleted rows from GET queries unless an explicit include_deleted=true parameter is provided.

Handling Relationships: Foreign Keys as Linked Resources

Foreign keys are the relational model’s mechanism for expressing relationships, and the REST API must surface these relationships in a way that clients can traverse. A foreign key from orders.customer_id to customers.id creates two navigable paths: from a customer to their orders (/customers/42/orders) and from an order to its customer (embedding customer data in the order response or providing a link).

Nested collection endpoints like /customers/42/orders are the most intuitive representation of one-to-many relationships. The API translates this into SELECT * FROM orders WHERE customer_id = 42 with the same pagination and filtering support available on the top-level /orders endpoint. Many-to-many relationships mediated by a junction table (e.g., order_items linking orders and products) can be exposed as nested resources on either side or as a top-level collection with filter parameters for both foreign keys.

The depth of nesting matters. URIs like /customers/42/orders/17/items/3/product are legal but painful to maintain and route. A flat URL structure with query parameter filtering (/order-items?order_id=17) is often more practical for deeply nested relationships. The choice between nesting and flat filtering is covered extensively in the comparison of GraphQL and REST for database APIs, where the trade-offs around over-fetching and under-fetching directly influence which pattern you choose.

Authentication at the API Layer

Authentication for a REST API sitting on a SQL database must happen at the API layer, not at the database connection level. The API server connects to the database using a service account with a fixed set of privileges. Individual API consumers authenticate to the API through one of three standard mechanisms: API keys, JSON Web Tokens (JWT), or OAuth2 flows.

API keys are the simplest option. The client sends a key in an Authorization header or a custom X-API-Key header, and the API server validates it against a lookup table. API keys work well for server-to-server communication where the calling service can store the key securely, but they are inappropriate for browser-based clients because they cannot be refreshed or scoped to short-lived sessions.

JWT-based authentication is the standard for applications with human users. The client authenticates once (via username/password, SSO, or an external identity provider), receives a signed token with an expiration time and embedded claims, and presents that token on subsequent requests. The API server validates the token signature without hitting the database on every request, which is a significant performance advantage. Claims within the token (user ID, role, tenant ID) drive authorization decisions, including which rows the user can see. For example, a tenant_id claim can be injected into every SQL WHERE clause to enforce row-level data isolation.

OAuth2 is the appropriate choice when third-party applications need delegated access to the API. The authorization code flow with PKCE is the current recommendation for both server-side and single-page applications. The token endpoint returns an access token (and optionally a refresh token) that the client includes in the Authorization header using the Bearer scheme. A more comprehensive treatment of these patterns is available in the article on database API security.

Pagination Patterns for Large SQL Result Sets

An unqualified SELECT against a table with millions of rows is a denial-of-service attack on your own database. Every collection endpoint must enforce pagination, and the choice of pagination strategy has measurable performance consequences.

Offset-Based Pagination

Offset pagination uses LIMIT and OFFSET SQL clauses, exposed to the client as ?page=3&per_page=50. The API translates this to LIMIT 50 OFFSET 100. The advantage is simplicity: clients can jump to any page, and the total number of pages can be calculated from a COUNT query. The disadvantage is that performance degrades linearly with the offset. At OFFSET 1,000,000, the database must read and discard one million rows before returning the requested fifty. For small tables or internal tools where deep pagination is rare, offset pagination is adequate. For anything facing the public internet or processing large datasets, it is not.

Cursor-Based (Keyset) Pagination

Cursor-based pagination eliminates the performance cliff by using a WHERE clause instead of an OFFSET. After returning a page of results sorted by id, the API includes a cursor (the id of the last row) in the response metadata. The next request passes ?cursor=417&per_page=50, and the API translates this to WHERE id > 417 ORDER BY id LIMIT 50. The database satisfies this with an index seek, making the performance constant regardless of how deep the client has paginated.

The cursor can be any indexed column or combination of columns. For multi-column sort orders, the cursor encodes the values of all sort columns, and the WHERE clause uses a row-value comparison or an equivalent set of AND/OR conditions. Encoding the cursor as an opaque base64 string (rather than exposing raw column values) gives you flexibility to change the underlying sort columns without breaking clients.

Communicating Pagination Metadata

The response should include pagination metadata in a consistent location. A top-level meta or pagination object containing next_cursor, has_more, and optionally total_count is the most common pattern. The Link header (RFC 8288) with rel="next" and rel="prev" is an alternative that keeps pagination out of the response body, but many client libraries ignore it.

Filtering and Sorting via Query Parameters

Filtering translates query parameters into SQL WHERE clauses. A request to /orders?status=shipped&total_gte=100 should produce WHERE status = 'shipped' AND total >= 100. The parameter naming convention varies: some APIs use field[operator] syntax (total[gte]=100), others use dot notation (total.gte=100), and others define a filter query language (?filter=total ge 100 in OData style).

Regardless of syntax, the API must whitelist which columns can be filtered and which operators are valid for each column type. Allowing arbitrary filter expressions opens the door to SQL injection if parameterized queries are not used, and to catastrophic query plans if the client filters on unindexed columns. A strict schema-driven whitelist, where the set of filterable columns is derived from the database indexes, prevents both problems.

Sorting follows a similar pattern. A ?sort=created_at parameter maps to ORDER BY created_at ASC, and ?sort=-created_at (with a leading hyphen) maps to ORDER BY created_at DESC. Multiple sort keys (?sort=-created_at,id) produce ORDER BY created_at DESC, id ASC. As with filtering, only indexed columns should be sortable, and the API should reject sort requests on non-whitelisted fields with a 400 response.

Understanding the full scope of what constitutes a database API helps contextualize why these filtering and sorting conventions matter: the API is not merely a passthrough to SQL but a contract that must remain stable as the underlying schema evolves.

Custom Endpoints for Stored Procedures and Views

Not all SQL operations fit the resource CRUD model. Stored procedures that encapsulate business logic, batch operations, or multi-step calculations need dedicated endpoints. The convention is to use an action-oriented URI (POST /reports/generate or POST /orders/batch-cancel) rather than forcing the operation into a resource-shaped endpoint. The request body contains the procedure’s input parameters, and the response contains the result set or a status indicator.

Database views can be exposed as read-only collection endpoints. A view named v_order_summary maps to GET /order-summaries with filtering and pagination support. Because views cannot accept INSERT, UPDATE, or DELETE (with some engine-specific exceptions for simple views), the API should return 405 Method Not Allowed for any write operation on a view-backed endpoint.

Implementation Approaches

The implementation path for a REST API over SQL depends on the project’s constraints. Hand-coded frameworks (Express with Knex, Flask with SQLAlchemy, Spring Boot with JPA) give full control over every endpoint, query, and response format. This is the right choice when the API has complex business logic interleaved with data access. Code generation tools read the database schema and produce boilerplate CRUD endpoints, reducing development time for straightforward data APIs. DreamFactory, a platform that auto-generates REST endpoints directly from SQL schema introspection, produces OpenAPI documentation without requiring endpoint-by-endpoint development. This approach is relevant for organizations that need to move quickly or manage many database sources. Hybrid approaches are also common: generate the standard CRUD surface automatically and hand-code the endpoints that require custom logic.

Whichever path you choose, the API must produce an OpenAPI (Swagger) specification that documents every endpoint, parameter, request body schema, and response format. This specification drives client SDK generation, integration testing, and developer portal documentation.

Deployment and Operational Considerations

A REST API over SQL introduces a network hop between the client and the database, and that hop needs to be fast, observable, and resilient. Connection pooling (via PgBouncer, ProxySQL, or the application framework’s built-in pool) prevents the API from exhausting database connections under load. Read replicas can serve GET requests while the primary handles writes, provided the API tolerates replication lag for read-after-write scenarios.

Rate limiting at the API gateway protects the database from abusive clients. A per-key or per-token limit of N requests per second, enforced by a sliding window counter in Redis, is the standard approach. Structured logging of every request (method, path, status code, latency, query count) enables debugging and capacity planning. Distributed tracing (OpenTelemetry) connects an API request to the specific SQL queries it executed, which is indispensable when diagnosing slow endpoints.

Health check endpoints (GET /health) should verify that the API can connect to the database and execute a trivial query (SELECT 1). Kubernetes liveness and readiness probes depend on this endpoint to restart unhealthy pods and route traffic away from pods that have lost their database connection.

Frequently Asked Questions

How do I configure CORS for a REST API backed by a SQL database?

CORS (Cross-Origin Resource Sharing) is configured at the API server layer, not at the database level. Your API framework needs to send the appropriate Access-Control-Allow-Origin, Access-Control-Allow-Methods, and Access-Control-Allow-Headers response headers. For development, a wildcard origin (*) is common, but production APIs should whitelist specific domains. Preflight OPTIONS requests must return the correct headers before the browser will issue the actual GET, POST, PUT, or DELETE request. Most frameworks (Express, Flask, Django REST Framework, ASP.NET Core) provide CORS middleware that handles both the preflight negotiation and the injection of headers on standard responses. Be aware that credentials (cookies, Authorization headers) require Access-Control-Allow-Credentials: true and a non-wildcard origin.

How should a REST API represent SQL NULL values in JSON responses?

JSON has a native null type that maps directly to SQL NULL. When a column value is NULL, serialize it as the JSON literal null rather than omitting the key or returning an empty string. Omitting the key creates ambiguity: the client cannot distinguish between a field that does not exist on the resource and a field that exists but has no value. Returning an empty string is worse because it conflates the absence of data with a zero-length string, which are semantically different in SQL. For PATCH operations, the convention is that a key set to null means “set this column to NULL,” while an omitted key means “do not modify this column.” Documenting this behavior in your OpenAPI specification eliminates client-side guessing.

What is the best way to version a REST API that sits on top of a SQL database?

URI path versioning (e.g., /v1/orders, /v2/orders) is the most widely adopted approach because it is explicit, cacheable, and easy to route at the load balancer or gateway level. Header-based versioning using a custom header like Api-Version or the Accept header with a vendor media type (application/vnd.myapi.v2+json) keeps URIs clean but is harder to test in a browser and complicates CDN caching. When the underlying SQL schema changes, you can maintain backward compatibility by creating database views that present the old shape to the v1 layer while the v2 layer reads the new schema directly. This lets you evolve the database without breaking existing consumers. Deprecation headers (Sunset, Deprecation) should accompany old versions so clients have a migration window.

Should I expose SQL JOINs as nested resources or use separate endpoints with links?

Both patterns are valid, and the choice depends on query frequency and payload size. Nested resources (e.g., /customers/42/orders) are appropriate when the child resource is almost always accessed in the context of the parent and the typical result set is small. Separate endpoints with hypermedia links (/customers/42 returns a link to /orders?customer_id=42) are better when the child collection is large, independently queryable, or shared across multiple parents. A pragmatic middle ground is to support an include or expand query parameter that lets the client request embedded sub-resources on demand, similar to JSON:API’s include parameter. This avoids over-fetching for clients that do not need the nested data while saving an extra round trip for those that do.

How do I handle SQL transactions across multiple REST API calls?

REST is stateless by design, so long-running, multi-request transactions do not map naturally onto the protocol. The preferred pattern is to design your endpoints so that each request is a self-contained unit of work that executes within a single database transaction. For operations that must atomically modify multiple resources, expose a composite endpoint (e.g., POST /transfers that debits one account and credits another in one transaction) rather than requiring the client to issue two separate calls. If you truly need multi-step workflows, consider the Saga pattern with compensating transactions or a server-side reservation model where a draft resource is created, modified, and then committed or discarded through explicit state transitions. Attempting to hold a SQL transaction open across HTTP requests leads to connection pool exhaustion and lock contention under any meaningful load.

What pagination strategy works best for SQL-backed REST APIs with millions of rows?

Cursor-based pagination (also called keyset pagination) is the most performant strategy for large SQL tables. Offset-based pagination (LIMIT/OFFSET) degrades as the offset grows because the database must scan and discard all preceding rows. Keyset pagination uses a WHERE clause against an indexed column (e.g., WHERE id > :last_seen_id ORDER BY id LIMIT 100), which the database can satisfy with an index seek regardless of how deep into the result set the client has paginated. The API returns an opaque cursor (often a base64-encoded value of the sort key) along with each page, and the client passes it back to request the next page. The trade-off is that keyset pagination does not support jumping to an arbitrary page number. For use cases that genuinely require random page access, offset pagination with a reasonable maximum offset and a total count estimate from pg_class or equivalent catalog stats is a practical compromise.