The Art of Web API Design

**The Challenges of Building External APIs**

Building external APIs is often touted as a way to make data more accessible and usable by developers outside of an organization's walls. However, in practice, it can be a complex and time-consuming process that requires careful consideration of many factors.

One of the biggest challenges when building external APIs is dealing with existing problems within an organization's systems. For example, if an internal API exposes sensitive data that should not be accessible to outsiders, developers may need to limit what is exposed or add additional security measures to protect it. This can require significant work and modifications to existing codebases.

Another challenge is ensuring consistency in naming conventions throughout the system. In our experience, this can be a major issue when building an external API for a business flow that is not well-prepared for B2B interactions. For instance, we recently had to add support for issuing invoices, which required modifying multiple parts of the system.

When it comes to designing an external API, there are many factors to consider. One key principle is to keep the API simple and easy to use for developers who will be using it. However, this can sometimes mean making compromises on how data is represented or accessed.

In our experience, one of the most common mistakes when building an external API is to underestimate the amount of work required. This can lead to delays and frustration for all parties involved. On the other hand, with careful planning and execution, an external API can be a powerful tool for making data more accessible and usable by developers outside of an organization's walls.

**The Importance of Consensus on Naming Conventions**

One of the key challenges when building an external API is achieving consensus on naming conventions throughout the system. In our experience, this can be a major issue when building APIs for different parts of a business flow that do not communicate well with each other.

For example, when we built an API for travel, we encountered significant issues with inconsistent naming conventions between different microservices. The solution was to unify these names and ensure they made sense from a business perspective. This required careful consideration and communication among developers and stakeholders.

In general, achieving consensus on naming conventions is crucial when building external APIs. Without it, developers may struggle to understand how the API works or what data is being returned, leading to frustration and delays.

**The Role of HTTP Semantics in Building External APIs**

When building external APIs, it's essential to take advantage of HTTP semantics, such as methods, headers, and status codes. These are the same principles that govern web development and ensure that APIs are consistent and easy to use.

However, this doesn't mean following these principles blindly. In some cases, deviating from the norm may be necessary to achieve the desired outcome. For example, if you need to return a large list of resources in the response body, it may make more sense to do so rather than trying to fit it all into the URL.

In our experience, one common mistake when building external APIs is not considering the impact of HTTP semantics on API design. This can lead to APIs that are difficult to use or understand, leading to frustration for developers who rely on them.

**The Limitations of REST and Alternatives**

When building external APIs, REST (Representational State of Resource) is often the go-to choice. However, it's not the only option, and GraphQL is becoming increasingly popular as an alternative.

GraphQL offers a more flexible and powerful way to query data than traditional REST APIs. With GraphQL, developers can specify exactly what data they need, reducing the amount of unnecessary data that needs to be returned in the response.

However, GraphQL also has its own set of challenges and complexities, particularly when it comes to performance and scalability. In our experience, GraphQL can be a great choice for building external APIs, but careful planning and consideration are required to ensure its success.

**Conclusion**

Building an external API is a complex process that requires careful consideration of many factors. From dealing with existing problems in the system to ensuring consistency in naming conventions, there are many challenges to overcome. However, with careful planning and execution, an external API can be a powerful tool for making data more accessible and usable by developers outside of an organization's walls.

As we continue to evolve and improve our understanding of what makes a great API, it's essential to stay informed about the latest developments in the field. Whether you're building an external API or simply looking to learn more about how they work, there's always something new to discover.

"WEBVTTKind: captionsLanguage: eneveryone my name is pavel and welcome to the talk about how to design a good api what to look into specific so few words about me i'm a programmer since more or less nine years commercially i was working at well first telecom then logistics industry then i think banking car factory and currently in a polish software house previous software and indirectly say in a travel industry for a client so the agenda of this talk will look more or less like an rpg party who of you likes roleplaying games there are some people okay cool so first we'll look at things from let's say distant perspective to looks look what's what's going on in general and then we will be looking from the perspective of three wizards from the ivory tower so just a little bit theory to to give you some insights and some bases then we'll get to business so there will be this this dwarf who is hacking orcs and goblins and doing the action is doing the actual business job uh what we get money from and behind this dwarf there is usually the cleric who is buffing him and hitting him and there will be talking about things that are that are kind kind of orthogonal to this business perspective but nevertheless important so let's begin there was a guy in the marketing by the name of simon sinek and he wrote a book start with why and indeed when we are going on some kind of business or otherwise adventure we need to think about three questions why are we doing something what are we actually doing and how to do this so the most important question is why so let's look into that why the answer is of course simple for money so basically how how do we earn money on web applications there was usually in the past there was some kind of front end business logic probably database users were clicking doing some stuff they need to and we earn on that as time passes user needs of users grow so they require additional features and it didn't really make sense to do all those features in our system for example if you are it's a travel company and you want to display a map it's not the best idea to just jump into the car start driving around the city and doing pictures of of stuff so that people can see this on our website it's better idea to connect to some api for example google maps so as time passes further our system grew larger more valuable especially our data were more precious so it's good idea to open an additional revenue stream and create an external api so that someone else can connect to us use our data and pay us for that so that's what we'll be more or less talking about today so an external web api what is this well it's kind of simple it's just a set of end points with which we can talk using a specified set of messages requests and responses and how to do this well that would be the objective of the talk but uh in general there is an observation that although we are talking about machine talking to the machine there are people behind those machines of developers programmers and web api is kind of like graphical user interface to the programmer it's like user interface to the programmer similarly as graphical user interface is to regular people who just click stuff on the web so there was a lot of things were said uh in the realm of how to design a good graphical user interface and we can actually leverage some of that knowledge in designing web api so the two important traits of people is let's say laziness and intuition so we are all lazy we want to do stuff as easy as possible and as fast as possible and we have intuition so we have some idea how stuff works some standards so experiences and if we confront to that well it's it will be good for us in general so that was a quick introduction then free wizards so just a little bit of theory uh who recognizes peter morviel nobody oh there is one person cool so the guy from design basically and he wrote an article 10 years ago or so maybe even a bit more where he pointed out the six important traits of good product design so those traits are in general that product should be useful so we should be able to do the the business stuff that we want to to do the reason for for which we bought the product in the first place it should be usable so it should be easy to do this it should be intuitive it should be desirable so you are happy with the product for example if you are developers acting on an api the api has a nice documentation and always works and well we desire this let's say it should be findable so it should be easy to find the stuff that we want to do for example api is not buried under some a lot of slashes and long urls it's somewhere close to the top domain of of the of the company its own slot uh it should be accessible originally in the article it was about the products used by people with some kind of disability for example let's say blind people it's nice is a product then talks to them or it's possible to touch the product and for example read braille characters uh in case of api an example would be we are disabled because we are sitting behind a proxy that only grants only oppresses the get requests and we want to do a post request or put request and how to deal with that i will be talking about this later and and it should be credible meaning well every product should be credible it shouldn't break but if it breaks the the producer of the product shouldn't leave us alone with that so in case of api you know of course sometimes stuff breaks we get 500 from the server but it's also nice if if the api provider gives us more information about how to deal with this problem and that would be another chapter in our story the second wizard roy fielding anyone familiar with roy fielding okay there are some people anyone actually read the fielding phd dissertation okay oh nice uh so refuelling this is a guy who's who's involved in the i'd say standardization of http and uri and in 2000 he wrote a ph.d 13 defended phd dissertations where he defined the rest architecture so basically if someone asks you what is actually rest because we heard this a lot of time but if we think a bit deeper about what it is so it's not a protocol it's not technology it's not not really a standard it's an architectural style and it was basically the thesis is about properties how we can assess web architectures there are several web architectures in the assessment and then there is this rest so representational state transfer and what's the idea there are those six principles first client server so we have separate client and server originally it was more of a front end and back end but it's not really the case anymore so we can develop them independently we have statelessness so basically each request has enough information to be processed without storing some kind of inside state because if the machine dies then we lose the state we can send the request to another machine and if we have everything with the request that we need to do we are good but there comes a problem we need to send more data so there is a principle of cachability so every response and from the server basically tell us on what condition we can store it and to use later so then we can save a bit on bandwidth and computing power on both sides basically uh layered system is about that we can in let's say inject some kind of layer between client and server and it should be basically transparent to the client inform interfaces more or less about that our addresses our naming schemes should be consistent and we should use http semantics so methods status codes and standard headers and code of demand is that's an optional requirement but the idea is that we can from the server we can send some executable code that can be invoked by the client and those extend its capabilities so that's briefly about rest learner t-shirts on anyone yep so just quickly oh no richardson i created i think 20 or 10 years ago or so on the qcon talk he defined the levels of how restful the architecture is and just the results of model you might have heard and the idea is that well at the level zero we have this swamp of plain old xml so we have and one endpoint to which we sent on xml file which basically says what has to be done then the level two is resources so entities in our system have different endpoints and if we have people cars groups or whatever they have separate endpoints and we can work on that the next level is usage of http verbs so no longer we have to wonder should i invoke delete cars or maybe remove cars i don't know we just go to cars and use the delete method and the last level is hyper media control also known as haitos this beautiful acronym in our i.t so basically we add links to our responses that can lead us to related resources okay those were those those was a bit of theory now practice so the business stuff about we talk about resources naming relations between resources the behavior of particular http methods depending whether we have collection or single objects about functions because you know rest is about nouns so how to deal with functions then how to parameterize requests so that we can operate efficiently on collections and how to handle status and as it sometimes happens errors so quickly resources the centric focus point of rest nouns plural forms so that we have cars which is a collection of cars and we can have cars slash id which is a particular car uh it's good to remember about not sharing technical details in url so for example if you have apache server that's great but do not put it in the url just keep it clean and simple as possible about naming there was this ancient war between camel case and snake case and actually now we have a kind of a bit popular new style hyphen case also known as kebab case because you have like this piece of meat on the stick then you have wars with high pens and basically the old war was between commerce and snake you know camel people from javascript say hey we have json we like json so json is javascript so camel case makes sense on the other hand hand snake case well they say it's a bit more readable because words are a bit far apart from each other looks better hyphen case actually there's one problem with hyphen case because if you run this through some automatic automating frameworks that matches the name of property with the objects or the the object itself then you can run to the problem that there is a minus sign basically and that's a operator in programming language so it looks cool in a content that is consumable by people like you have a blog you have new article there's a few words in the title so your wordpress generates a nice link with hyphens in between and it looks good but in api maybe not the best idea personally i think i prefer camo case because i'm from java but yeah anyway whichever you choose just try to stay consistent not like some standard libraries for some languages that makes cases not the best idea so we have resources now we have relations between those resources so how to model that basically if the resource resources are independent they can exist independently for example we have people and we have groups well people can live without groups groups can formally be without people then group membership is another resource so basically it can be created deleted or manipulated on the other hand if we have resource that depends on another say we have a building and room in a building if the building collapses then the room is probably not too good either in this case we can model that in this character that first is building an id of the building then there's rooms and id of the room behavior of http methods so what to do when we get http verb depending on whether that's a collection or object so basically we've got it's kind of simple you just return the collection or the object of course if the collection is big we need to somehow limit that with post it's kind of tricky if you post on a collection that means hey there's an object i want to put it in the collection i don't know the idea of that so just figure it out and let me know uh if you do a post on an object it used to be a partially modification of the objects but then we have this new method not that new anymore patch in http so it's basically no longer used i think in that context we've put logic would dictate that if you put a collection you just replace the existing collection but it's again not usually the case we want we probably want to just append the collection it's good to market documentation somehow and if we do a put a single object we are replacing it and here's the question what to do if the object doesn't exist yet well we can allow to put new objects with a given id so doing put on an object it doesn't exist say says hey there's an object please put it under this id i'm providing you it's a bit tricky because if you let somebody outside of your system create ids for your entities you might be in trouble unless you have some kind of good validation or just allow this to be done from internal services so we have to be careful with that and delete pretty simple these entire collection of course that is risky so should be controlled in appropriate manner and deleting single item this is the item and like the newest method patch patch eye collection i think doesn't really makes a lot of sense but part on a single object changes a small say part of it okay we have behavior now let's move to function so we have this noun centric system and what to do if something a piece appears to be a function well first we can treat it as patching some other resource for example there is let's say we bought a ticket for a train and now we want to cancel it how to do this where we can patch the state of the ticket and change its state to cancel or something but it might not be always the best idea uh sometimes invoking a function is kind of related with creating a resource so we can here think okay i've created a constellation object which also can have some additional data to it otherwise or if you don't have really good idea how to do this we can just bend the rules and if it makes sense go with that don't be too dogmatic all the time okay parameters so just quickly reminder on how to how can you parameterize requests you can put something in a path it's not really exactly a parameter because if we say cars slash id it's not a parameter it's just a name of this particular car if we operate on collection as well usually we go with queries of everything after the question mark that is optional if you want to separate ourselves from the let's say business services api we can go with headers or even deepen with custom headers and sometimes it makes sense to send data in just in the body of the request one quick note i encountered the idea that if you if you use some parameters that are kind of meta or platform related or very general they might be named with underscore at the beginning and it says be careful with that so that's something that we can also use so collections search in a collection the simplest idea is just use equal operators so you want people that are of age 27 so h equals 27 simple what about if you want someone older or younger well we can have operator for that so for example less than greater than than some kind of delimiter and the value we are comparing against we can combine the property we are comparing against with with the operator in one single name so if we are looking for an arnold we can use name like sword for example and okay we found objects so now how to look into those objects maybe we are interesting only in a subset of fields so we can list the subset of fields on the other hand maybe object has just one big fit we are not yet interesting in maybe there is an outer with large biography and just few other short fields then we can just exclude some things and we can define several styles of object like small medium full something like that and let users deal with that we found something now it's a good idea to sort what we found so we can just specify one property by which we are sorting with some default order we can add a prefix or suffix that determines the order of the searching of the sorting then we can list properties then list properties that should be ascending and those that should be descending or we can just for each property we have a mirror property that selects a direction of the sort personally i like this short version with just simple one character to determine the order of the sword next operation on collections pagination so normally we use offset and limit for that number of the page but there's a problem if someone inserts something or deletes something while we are iterating on some parts of the collection so we can define cursor that we just point to to one item and work on that it's not perfect but it's somehow protects us from from getting item twice or not at all in case of concurrent changes there's something between pagination and sorting that is defining of some sort parameters uh search parameter for example let's say we have ticket system uh we can define that there is a page recently clause which has some kind of sorting in that searching in that already another idea besides parameters is using headers if you want to again separate yourself from the business surface of the api and it's always nice to to include some links to the next page in the collection previous page last or the first okay status so what happened with with our request hopefully it was okay but if not we can do something with that so first http codes generally whatever happening happens on server whether it was successful or not it should return the status the question is how many statuses should i use it's a difficult question i believe in http 1.1 there's around 70 statuses or so usually in apis i found between 1 and 20 that's maybe there is a reasonable maximum for example facebook always returns status 200 and then in the body of the response actually informs what happened and so do national railways of switzerland but they are not the best example of api design anyway aside from that i think that more or less 15 codes is to 15 is the reasonable number of codes we should use there what else if something goes wrong then it's a good idea to say exactly what happened so first some kind of enum that says which situation took place it could be either some name like payment fraud or something that says something or i found an idea of extending http code so we are adding an additional digit or two to http code uh second thing message so just say what happens for example if you have if your clients are developers from china or some other country maybe then they don't like english very much you can for example i can display this message in different language forgot the word and the last but not least what you can include in such objects is a link to some kind of faq or documentation chapter which explains exactly what happened and again don't expose internals unless you are open source of course but returning stack traces in responses is not exactly a good idea usually so briefly on http codes 200 yeah success 201 is used when we create an object at 202 when we start some kind of asynchronous processing and the response is not ready but we notified that okay we got the requests we are on it 204 when it when we delete an object then we return this and when we return just a part of the response and there will be more 206 okay 300 directions kind of tricky 300 is not that not that often used because there is no really a good standard on how to handle that it says that there are multiple choices 301 something was moved per numently so don't try again on this address 302 it's kind of deprecated because it was meant as 307 but instead implemented in the early days as 303 so what are those 307 temporary direct means the others change temporary and please use the same method that you did before so i do a i post on the server and if i got 307 it means redirect and use post again i did nothing for you on the other hand 300 or free means go to another place but use get to retrieve results so basically i got the request i got some processing and i have results for you and they are somewhere else but use get to retrieve that 400 i think the most calls are in this category unauthorized really means unauthenticated i don't know who you are forbidden me three authorized but the name was already taken it means i know who you are but you can do this uh 404 yeah we know that one row five means the object is there but the method is not allowed because maybe it's immutable or for some other reasons conflict well the thing that you are trying to do in the request is in a conflict state with the server so for example aws s3 would return that if you try to delete a bucket that has still some files in it there's a conflict in their definition gone means that there was a resource but it is not anymore in in its permanent situation and 422 and possible entity is basically kind of bad request but on a higher level of abstraction so you would return but request when the json is for example my form it lacks one parenthesis but if you do a request to book a ticket and say arrival date is before the departure date that's a good situation to return 422 so it says that as a business situation and we can process it although we understood the request and server errors so 500 generic error we don't know what happened we don't like that 501 is kind of a way saying of saying that it's kind of under construction it will be there try again maybe later in like in two weeks or something um but gateway and gate gateway timeout are somehow similar and it's saying that when we were acting as a gateway something bad happened with someone else so either another service screwed screwed the response or didn't respond at all sorry there is nothing i could do and it's okay if we are talking internal apis and microservices then we just go to another team and ask hey what's going on but on the other hand if if i'm an external client i don't really care which one service screw up something i'm interested i'm interested in what happened basically so it's not very recommended to to use as status code in external api i think and service on available is a way of saying that hey we are down but there is some maintenance everything will be okay instead of just 500 it's not working so that's the main let's say business part now let's go to this quickly to the supporting thing uh so a few words about security about versioning cash and performance throttling haters and then some less technical things but also some miscellaneous if you don't know where to put something you can always put it in miscellaneous category security well 2019 you should use tsl whenever possible and https if someone goes to our endpoint without https we shouldn't redirect them to https just return errors so that situation is clear cryptography is difficult so even if you have phd in cryptography and 10 years of experience is probably not the best idea to invent our own ciphers and cryptographic schemes especially security by obscurity is a bad idea instead of passwords and username is good to use api cases keys as they have they are more secure it's difficult to guess them and there is probably everyone or almost everyone knows the old wsap opponent application security projects but if you don't just check this out dot org it's a list of 10 most important threats in web application and it's always always good to see what is currently on top there okay versioning so our api changes and how to deal with that the most popular and usual solution is just to embed version in the url so we say photo is api version two of our one or without anything is version one on the other hand we can go a bit deeper and use parameter for that so that way for example if someone does not include this parameter so it's optional you can assume that we are returning the newest version of the api again if we want to go away from the business surface of the api we can use headers either accept header for that or we can use our own headers and api versioning and resource versioning are two different things so we can have new api all the api and return new version of resource and old version of resource you shouldn't be confused cache control cache is difficult but in general what we can do about this we have this values of cache control headers basically public and private is about whether the information is meant for single use just single users or is publicly available so if there is a logo of the web page it can be marked as public if there is some particular data for the user it probably should be private those should be public and this will be private um if we can allow resource to be stale we don't care about it's to be very fresh then we specify some kind of max save so how stale it should be no cache doesn't really mean that we don't have any benefits from caching it just means that the resource should be always fresh so whatever the client checks for the resource in proxy we need to check on the server and if the results didn't change well we are in luck we can just return the value from the proxy if not we need to refresh that on the other hand no store says that we can't use caching at all basically it's about sensitive data like medical records or anything related with money usually and we don't want to store even of on the device of the user or maybe especially on the device of the user so how to uh how to deal with all those values there are actually like 15 or so more basically we need to think if the data is private or public if it can be if it can be stale or not and if it is sensitive and then we can set some effective caching strategy so that we save both on bandwidth and performance and computing power of our systems and clients throttling so how to deal with many requests basically you can just return 500 server kaput but we can do better for example there are three headers not standard but commonly used that say that there's a limit in time window how many requests are left in current time window and when the time new time window starts and if the client is in no luck and there is no more requests in the time window then you can return 400 429 to my request and the information that let's say hey there will be new opening in seven seconds so just try again then and not try every second because it won't work anyway and again we save on both on our site and client side hey toast who of you did play diablo 2 yeah lots of guys so it's lord of hate mephisto it kind of went well with this slide haters so hyper media as the engine of application states the most beautiful acronym for in our industry the idea is that we with the response we return links to related resources and what we can do about this resource around this resource so that way a client doesn't have to remember all those links and follow the versioning you can just take the link that we return and and follow that and there are several there's no really one standard of how to do this there are several personally i used hull it worked pretty well for me there is also collection plus json which is kind of originating in collections but works pretty well with links too there is json link document which is kind of good if you don't want to break backward compatibility so like encapsulate the requests in another object and there is siren which is i think quite powerful but not that popular documentation so moving a bit away from the technical side um documentation should be easy to find and ideally public so i don't have to mail the support and wait two weeks to get the documentation there are three tiers of kinds of documentation there is like this address book which is exhaustive reference everything we can do with the api but really boring then there is engaging tutorial which is kind of a business story we start with something we do some other requests we complete something and we can follow that and there is an engaging that was engaging so how does console quickstart there's something that we put on the front page of our product so the developer can just take it and start from that and move efficiently miscellaneous so as i said about those six principles of good product design and people with disabilities so if i'm a person with disability and i can't do a post request because of my proxy then i can use a header for that x http method override and just specify the value of the method i really want to use while i'm using get method it's kind of maybe obsolete because most modern web browsers do this but we can add a parameter to pretty pin json so automatically insert indentation and new lines so it's more readable in process that do not support this for encoding we have standard header so let's invent our own solutions uh it's just ued when we are creating new resources it's not the best idea to just go with consecutive numbers one two three and four because then somebody can just ask for all the all the numbers and get all the data from our database and aside from losing data we are losing quite a lot of performance here so it's it's good to use some longer and random id so it's difficult to guess them and somewhat similar request uids sometimes called correlation ids if you start let's say the processing flow on the boundary of our system then some calls are fired to microservices maybe there are quite a lot of them we can add a parameter with this id of the request and then it's easy to find this in logs and debug the situation um rest clients and browsers there are some api vendors who have an idea that if somebody access api from web browser it means we want to return html instead of json it's generally a bad idea we have content negotiation for that we have standards for time so let's not invent our own standards for that health endpoints it's more about internal apis not external but aside from the health endpoint that we need for for example our kubernetes or some other automation to to see if our service is okay so it's a good idea to add more information for developer for example the git version from which the service was built or the values of some important properties or some other stuff so we can it can be a bit enriched and last but not least external apis there are often product projects in the company which has this b2c business to client model and many front-end to add an external api and connect with big business and it usually sounds quite good and easy you just make some kind of gateway separate service do some configuration and security a few other things it will be nice but in practice it turns out that those projects are actually just a bit about new technology and in quite large part in dealing with uh existing problems in our systems meaning if we if if we if you want to create external api we have to think what should be exposing that because often internal apis we expose a lot of things that are not very secure to expose so now we have to limit that often there is a situation where a business flow of our system is not really prepared for b2b for example you have a flaws to pay with credit cards or braintree paypal or whatever but you don't have an option to issue an invoice and you have to add that to make it work or on the other hand it's often common that you have to do a little small modifications in many parts of the system for example remember last time when i was taking part in similar project i think it was the moment when i checked out the um the largest number of git repositories in this company because we had to add a small field partner id in very many places so when you hear that there is an external api under construction beware it might it might mean that there is a lot of dirty work to be done but on the other hand that's an opportunity to get to know the system better because you work on many places and like get this good overview so how how the client would see it from the outside and another aspect is that often there is no good consensus of the naming of part of the system for example when we did this api for travel it turns out that one micro service returns something by the name of segment another service calls this itinerary another service calls this lag and now we suddenly have to somehow unify that and and it should make sense from the business point of view to sum up good api takes a lot of effort a lot of things we need to think about especially some of them are not that visible from the from the business side api is percy's a product for developers thinking as a developer might be kind of tricky for business people but api to developer is more like graphical user interface is to regular user so we need to keep that in mind if you are going with rest because it's not the only way for example you can choose graphql yesterday there was a nice talk about graphql literally one attempt maybe yeah there are many many alternatives there are binary protocols but the if you are talking about rest verbs are the center of rest uh we need to take advantage of http semantics so methods headers status all that good stuff and look at those principles but do not follow it blindly because something sometimes something doesn't really fit into the into into the picture but it's i mean doesn't fit into the rest picture but still makes sense and we can just bend the rules if we know what we are doing okay if you fall asleep or just come later to this lecture most of what i said and quite a lot more is in i think six or seven currently articles on my blog it's how to train your java just like the movie but with java instead of dragon so welcome to drop by okay questions yes i'm not sure actually but uh i think this is one of the common functionalities that is actually implemented but by api management platforms like apg or however it is now called because it was about bought by google wso2 or aws gateway i think in all of them i saw the point about frotlink so i suppose they do any other questions yeah you didn't mention anything about parameters inside the body that we are using it that we are actually picking in external rp i'm often seeing that people use a trade to use request response as a part of naming and to me it's well i saw the problem that somebody needed to request a long list of and the question basically about parameters in eureka's body if they are good or not the problem was that somebody has a situation that he wanted to request a very very very long list of resources and the list was particularly that long that it may not fit the the length of the url and he was wondering that maybe then it is a good idea to put it in a body actually i mean when i'm looking for a sweater yes i'm often seeing the description of posts that people describing elements yes as for example peoples yes and they hold the request people request and define the body and responds people respond and defines the body of response it's not looking like resource-oriented style but very often i see the style of defining documentation and i didn't find any good place describing how this app should like and how we should describe this requested response i'm not very sure how to answer that i don't gotta think about it uh another questions i don't want to stand between you and beer for too long so maybe if there are other questions just catch me later today or tomorrow and that will be all thank youeveryone my name is pavel and welcome to the talk about how to design a good api what to look into specific so few words about me i'm a programmer since more or less nine years commercially i was working at well first telecom then logistics industry then i think banking car factory and currently in a polish software house previous software and indirectly say in a travel industry for a client so the agenda of this talk will look more or less like an rpg party who of you likes roleplaying games there are some people okay cool so first we'll look at things from let's say distant perspective to looks look what's what's going on in general and then we will be looking from the perspective of three wizards from the ivory tower so just a little bit theory to to give you some insights and some bases then we'll get to business so there will be this this dwarf who is hacking orcs and goblins and doing the action is doing the actual business job uh what we get money from and behind this dwarf there is usually the cleric who is buffing him and hitting him and there will be talking about things that are that are kind kind of orthogonal to this business perspective but nevertheless important so let's begin there was a guy in the marketing by the name of simon sinek and he wrote a book start with why and indeed when we are going on some kind of business or otherwise adventure we need to think about three questions why are we doing something what are we actually doing and how to do this so the most important question is why so let's look into that why the answer is of course simple for money so basically how how do we earn money on web applications there was usually in the past there was some kind of front end business logic probably database users were clicking doing some stuff they need to and we earn on that as time passes user needs of users grow so they require additional features and it didn't really make sense to do all those features in our system for example if you are it's a travel company and you want to display a map it's not the best idea to just jump into the car start driving around the city and doing pictures of of stuff so that people can see this on our website it's better idea to connect to some api for example google maps so as time passes further our system grew larger more valuable especially our data were more precious so it's good idea to open an additional revenue stream and create an external api so that someone else can connect to us use our data and pay us for that so that's what we'll be more or less talking about today so an external web api what is this well it's kind of simple it's just a set of end points with which we can talk using a specified set of messages requests and responses and how to do this well that would be the objective of the talk but uh in general there is an observation that although we are talking about machine talking to the machine there are people behind those machines of developers programmers and web api is kind of like graphical user interface to the programmer it's like user interface to the programmer similarly as graphical user interface is to regular people who just click stuff on the web so there was a lot of things were said uh in the realm of how to design a good graphical user interface and we can actually leverage some of that knowledge in designing web api so the two important traits of people is let's say laziness and intuition so we are all lazy we want to do stuff as easy as possible and as fast as possible and we have intuition so we have some idea how stuff works some standards so experiences and if we confront to that well it's it will be good for us in general so that was a quick introduction then free wizards so just a little bit of theory uh who recognizes peter morviel nobody oh there is one person cool so the guy from design basically and he wrote an article 10 years ago or so maybe even a bit more where he pointed out the six important traits of good product design so those traits are in general that product should be useful so we should be able to do the the business stuff that we want to to do the reason for for which we bought the product in the first place it should be usable so it should be easy to do this it should be intuitive it should be desirable so you are happy with the product for example if you are developers acting on an api the api has a nice documentation and always works and well we desire this let's say it should be findable so it should be easy to find the stuff that we want to do for example api is not buried under some a lot of slashes and long urls it's somewhere close to the top domain of of the of the company its own slot uh it should be accessible originally in the article it was about the products used by people with some kind of disability for example let's say blind people it's nice is a product then talks to them or it's possible to touch the product and for example read braille characters uh in case of api an example would be we are disabled because we are sitting behind a proxy that only grants only oppresses the get requests and we want to do a post request or put request and how to deal with that i will be talking about this later and and it should be credible meaning well every product should be credible it shouldn't break but if it breaks the the producer of the product shouldn't leave us alone with that so in case of api you know of course sometimes stuff breaks we get 500 from the server but it's also nice if if the api provider gives us more information about how to deal with this problem and that would be another chapter in our story the second wizard roy fielding anyone familiar with roy fielding okay there are some people anyone actually read the fielding phd dissertation okay oh nice uh so refuelling this is a guy who's who's involved in the i'd say standardization of http and uri and in 2000 he wrote a ph.d 13 defended phd dissertations where he defined the rest architecture so basically if someone asks you what is actually rest because we heard this a lot of time but if we think a bit deeper about what it is so it's not a protocol it's not technology it's not not really a standard it's an architectural style and it was basically the thesis is about properties how we can assess web architectures there are several web architectures in the assessment and then there is this rest so representational state transfer and what's the idea there are those six principles first client server so we have separate client and server originally it was more of a front end and back end but it's not really the case anymore so we can develop them independently we have statelessness so basically each request has enough information to be processed without storing some kind of inside state because if the machine dies then we lose the state we can send the request to another machine and if we have everything with the request that we need to do we are good but there comes a problem we need to send more data so there is a principle of cachability so every response and from the server basically tell us on what condition we can store it and to use later so then we can save a bit on bandwidth and computing power on both sides basically uh layered system is about that we can in let's say inject some kind of layer between client and server and it should be basically transparent to the client inform interfaces more or less about that our addresses our naming schemes should be consistent and we should use http semantics so methods status codes and standard headers and code of demand is that's an optional requirement but the idea is that we can from the server we can send some executable code that can be invoked by the client and those extend its capabilities so that's briefly about rest learner t-shirts on anyone yep so just quickly oh no richardson i created i think 20 or 10 years ago or so on the qcon talk he defined the levels of how restful the architecture is and just the results of model you might have heard and the idea is that well at the level zero we have this swamp of plain old xml so we have and one endpoint to which we sent on xml file which basically says what has to be done then the level two is resources so entities in our system have different endpoints and if we have people cars groups or whatever they have separate endpoints and we can work on that the next level is usage of http verbs so no longer we have to wonder should i invoke delete cars or maybe remove cars i don't know we just go to cars and use the delete method and the last level is hyper media control also known as haitos this beautiful acronym in our i.t so basically we add links to our responses that can lead us to related resources okay those were those those was a bit of theory now practice so the business stuff about we talk about resources naming relations between resources the behavior of particular http methods depending whether we have collection or single objects about functions because you know rest is about nouns so how to deal with functions then how to parameterize requests so that we can operate efficiently on collections and how to handle status and as it sometimes happens errors so quickly resources the centric focus point of rest nouns plural forms so that we have cars which is a collection of cars and we can have cars slash id which is a particular car uh it's good to remember about not sharing technical details in url so for example if you have apache server that's great but do not put it in the url just keep it clean and simple as possible about naming there was this ancient war between camel case and snake case and actually now we have a kind of a bit popular new style hyphen case also known as kebab case because you have like this piece of meat on the stick then you have wars with high pens and basically the old war was between commerce and snake you know camel people from javascript say hey we have json we like json so json is javascript so camel case makes sense on the other hand hand snake case well they say it's a bit more readable because words are a bit far apart from each other looks better hyphen case actually there's one problem with hyphen case because if you run this through some automatic automating frameworks that matches the name of property with the objects or the the object itself then you can run to the problem that there is a minus sign basically and that's a operator in programming language so it looks cool in a content that is consumable by people like you have a blog you have new article there's a few words in the title so your wordpress generates a nice link with hyphens in between and it looks good but in api maybe not the best idea personally i think i prefer camo case because i'm from java but yeah anyway whichever you choose just try to stay consistent not like some standard libraries for some languages that makes cases not the best idea so we have resources now we have relations between those resources so how to model that basically if the resource resources are independent they can exist independently for example we have people and we have groups well people can live without groups groups can formally be without people then group membership is another resource so basically it can be created deleted or manipulated on the other hand if we have resource that depends on another say we have a building and room in a building if the building collapses then the room is probably not too good either in this case we can model that in this character that first is building an id of the building then there's rooms and id of the room behavior of http methods so what to do when we get http verb depending on whether that's a collection or object so basically we've got it's kind of simple you just return the collection or the object of course if the collection is big we need to somehow limit that with post it's kind of tricky if you post on a collection that means hey there's an object i want to put it in the collection i don't know the idea of that so just figure it out and let me know uh if you do a post on an object it used to be a partially modification of the objects but then we have this new method not that new anymore patch in http so it's basically no longer used i think in that context we've put logic would dictate that if you put a collection you just replace the existing collection but it's again not usually the case we want we probably want to just append the collection it's good to market documentation somehow and if we do a put a single object we are replacing it and here's the question what to do if the object doesn't exist yet well we can allow to put new objects with a given id so doing put on an object it doesn't exist say says hey there's an object please put it under this id i'm providing you it's a bit tricky because if you let somebody outside of your system create ids for your entities you might be in trouble unless you have some kind of good validation or just allow this to be done from internal services so we have to be careful with that and delete pretty simple these entire collection of course that is risky so should be controlled in appropriate manner and deleting single item this is the item and like the newest method patch patch eye collection i think doesn't really makes a lot of sense but part on a single object changes a small say part of it okay we have behavior now let's move to function so we have this noun centric system and what to do if something a piece appears to be a function well first we can treat it as patching some other resource for example there is let's say we bought a ticket for a train and now we want to cancel it how to do this where we can patch the state of the ticket and change its state to cancel or something but it might not be always the best idea uh sometimes invoking a function is kind of related with creating a resource so we can here think okay i've created a constellation object which also can have some additional data to it otherwise or if you don't have really good idea how to do this we can just bend the rules and if it makes sense go with that don't be too dogmatic all the time okay parameters so just quickly reminder on how to how can you parameterize requests you can put something in a path it's not really exactly a parameter because if we say cars slash id it's not a parameter it's just a name of this particular car if we operate on collection as well usually we go with queries of everything after the question mark that is optional if you want to separate ourselves from the let's say business services api we can go with headers or even deepen with custom headers and sometimes it makes sense to send data in just in the body of the request one quick note i encountered the idea that if you if you use some parameters that are kind of meta or platform related or very general they might be named with underscore at the beginning and it says be careful with that so that's something that we can also use so collections search in a collection the simplest idea is just use equal operators so you want people that are of age 27 so h equals 27 simple what about if you want someone older or younger well we can have operator for that so for example less than greater than than some kind of delimiter and the value we are comparing against we can combine the property we are comparing against with with the operator in one single name so if we are looking for an arnold we can use name like sword for example and okay we found objects so now how to look into those objects maybe we are interesting only in a subset of fields so we can list the subset of fields on the other hand maybe object has just one big fit we are not yet interesting in maybe there is an outer with large biography and just few other short fields then we can just exclude some things and we can define several styles of object like small medium full something like that and let users deal with that we found something now it's a good idea to sort what we found so we can just specify one property by which we are sorting with some default order we can add a prefix or suffix that determines the order of the searching of the sorting then we can list properties then list properties that should be ascending and those that should be descending or we can just for each property we have a mirror property that selects a direction of the sort personally i like this short version with just simple one character to determine the order of the sword next operation on collections pagination so normally we use offset and limit for that number of the page but there's a problem if someone inserts something or deletes something while we are iterating on some parts of the collection so we can define cursor that we just point to to one item and work on that it's not perfect but it's somehow protects us from from getting item twice or not at all in case of concurrent changes there's something between pagination and sorting that is defining of some sort parameters uh search parameter for example let's say we have ticket system uh we can define that there is a page recently clause which has some kind of sorting in that searching in that already another idea besides parameters is using headers if you want to again separate yourself from the business surface of the api and it's always nice to to include some links to the next page in the collection previous page last or the first okay status so what happened with with our request hopefully it was okay but if not we can do something with that so first http codes generally whatever happening happens on server whether it was successful or not it should return the status the question is how many statuses should i use it's a difficult question i believe in http 1.1 there's around 70 statuses or so usually in apis i found between 1 and 20 that's maybe there is a reasonable maximum for example facebook always returns status 200 and then in the body of the response actually informs what happened and so do national railways of switzerland but they are not the best example of api design anyway aside from that i think that more or less 15 codes is to 15 is the reasonable number of codes we should use there what else if something goes wrong then it's a good idea to say exactly what happened so first some kind of enum that says which situation took place it could be either some name like payment fraud or something that says something or i found an idea of extending http code so we are adding an additional digit or two to http code uh second thing message so just say what happens for example if you have if your clients are developers from china or some other country maybe then they don't like english very much you can for example i can display this message in different language forgot the word and the last but not least what you can include in such objects is a link to some kind of faq or documentation chapter which explains exactly what happened and again don't expose internals unless you are open source of course but returning stack traces in responses is not exactly a good idea usually so briefly on http codes 200 yeah success 201 is used when we create an object at 202 when we start some kind of asynchronous processing and the response is not ready but we notified that okay we got the requests we are on it 204 when it when we delete an object then we return this and when we return just a part of the response and there will be more 206 okay 300 directions kind of tricky 300 is not that not that often used because there is no really a good standard on how to handle that it says that there are multiple choices 301 something was moved per numently so don't try again on this address 302 it's kind of deprecated because it was meant as 307 but instead implemented in the early days as 303 so what are those 307 temporary direct means the others change temporary and please use the same method that you did before so i do a i post on the server and if i got 307 it means redirect and use post again i did nothing for you on the other hand 300 or free means go to another place but use get to retrieve results so basically i got the request i got some processing and i have results for you and they are somewhere else but use get to retrieve that 400 i think the most calls are in this category unauthorized really means unauthenticated i don't know who you are forbidden me three authorized but the name was already taken it means i know who you are but you can do this uh 404 yeah we know that one row five means the object is there but the method is not allowed because maybe it's immutable or for some other reasons conflict well the thing that you are trying to do in the request is in a conflict state with the server so for example aws s3 would return that if you try to delete a bucket that has still some files in it there's a conflict in their definition gone means that there was a resource but it is not anymore in in its permanent situation and 422 and possible entity is basically kind of bad request but on a higher level of abstraction so you would return but request when the json is for example my form it lacks one parenthesis but if you do a request to book a ticket and say arrival date is before the departure date that's a good situation to return 422 so it says that as a business situation and we can process it although we understood the request and server errors so 500 generic error we don't know what happened we don't like that 501 is kind of a way saying of saying that it's kind of under construction it will be there try again maybe later in like in two weeks or something um but gateway and gate gateway timeout are somehow similar and it's saying that when we were acting as a gateway something bad happened with someone else so either another service screwed screwed the response or didn't respond at all sorry there is nothing i could do and it's okay if we are talking internal apis and microservices then we just go to another team and ask hey what's going on but on the other hand if if i'm an external client i don't really care which one service screw up something i'm interested i'm interested in what happened basically so it's not very recommended to to use as status code in external api i think and service on available is a way of saying that hey we are down but there is some maintenance everything will be okay instead of just 500 it's not working so that's the main let's say business part now let's go to this quickly to the supporting thing uh so a few words about security about versioning cash and performance throttling haters and then some less technical things but also some miscellaneous if you don't know where to put something you can always put it in miscellaneous category security well 2019 you should use tsl whenever possible and https if someone goes to our endpoint without https we shouldn't redirect them to https just return errors so that situation is clear cryptography is difficult so even if you have phd in cryptography and 10 years of experience is probably not the best idea to invent our own ciphers and cryptographic schemes especially security by obscurity is a bad idea instead of passwords and username is good to use api cases keys as they have they are more secure it's difficult to guess them and there is probably everyone or almost everyone knows the old wsap opponent application security projects but if you don't just check this out dot org it's a list of 10 most important threats in web application and it's always always good to see what is currently on top there okay versioning so our api changes and how to deal with that the most popular and usual solution is just to embed version in the url so we say photo is api version two of our one or without anything is version one on the other hand we can go a bit deeper and use parameter for that so that way for example if someone does not include this parameter so it's optional you can assume that we are returning the newest version of the api again if we want to go away from the business surface of the api we can use headers either accept header for that or we can use our own headers and api versioning and resource versioning are two different things so we can have new api all the api and return new version of resource and old version of resource you shouldn't be confused cache control cache is difficult but in general what we can do about this we have this values of cache control headers basically public and private is about whether the information is meant for single use just single users or is publicly available so if there is a logo of the web page it can be marked as public if there is some particular data for the user it probably should be private those should be public and this will be private um if we can allow resource to be stale we don't care about it's to be very fresh then we specify some kind of max save so how stale it should be no cache doesn't really mean that we don't have any benefits from caching it just means that the resource should be always fresh so whatever the client checks for the resource in proxy we need to check on the server and if the results didn't change well we are in luck we can just return the value from the proxy if not we need to refresh that on the other hand no store says that we can't use caching at all basically it's about sensitive data like medical records or anything related with money usually and we don't want to store even of on the device of the user or maybe especially on the device of the user so how to uh how to deal with all those values there are actually like 15 or so more basically we need to think if the data is private or public if it can be if it can be stale or not and if it is sensitive and then we can set some effective caching strategy so that we save both on bandwidth and performance and computing power of our systems and clients throttling so how to deal with many requests basically you can just return 500 server kaput but we can do better for example there are three headers not standard but commonly used that say that there's a limit in time window how many requests are left in current time window and when the time new time window starts and if the client is in no luck and there is no more requests in the time window then you can return 400 429 to my request and the information that let's say hey there will be new opening in seven seconds so just try again then and not try every second because it won't work anyway and again we save on both on our site and client side hey toast who of you did play diablo 2 yeah lots of guys so it's lord of hate mephisto it kind of went well with this slide haters so hyper media as the engine of application states the most beautiful acronym for in our industry the idea is that we with the response we return links to related resources and what we can do about this resource around this resource so that way a client doesn't have to remember all those links and follow the versioning you can just take the link that we return and and follow that and there are several there's no really one standard of how to do this there are several personally i used hull it worked pretty well for me there is also collection plus json which is kind of originating in collections but works pretty well with links too there is json link document which is kind of good if you don't want to break backward compatibility so like encapsulate the requests in another object and there is siren which is i think quite powerful but not that popular documentation so moving a bit away from the technical side um documentation should be easy to find and ideally public so i don't have to mail the support and wait two weeks to get the documentation there are three tiers of kinds of documentation there is like this address book which is exhaustive reference everything we can do with the api but really boring then there is engaging tutorial which is kind of a business story we start with something we do some other requests we complete something and we can follow that and there is an engaging that was engaging so how does console quickstart there's something that we put on the front page of our product so the developer can just take it and start from that and move efficiently miscellaneous so as i said about those six principles of good product design and people with disabilities so if i'm a person with disability and i can't do a post request because of my proxy then i can use a header for that x http method override and just specify the value of the method i really want to use while i'm using get method it's kind of maybe obsolete because most modern web browsers do this but we can add a parameter to pretty pin json so automatically insert indentation and new lines so it's more readable in process that do not support this for encoding we have standard header so let's invent our own solutions uh it's just ued when we are creating new resources it's not the best idea to just go with consecutive numbers one two three and four because then somebody can just ask for all the all the numbers and get all the data from our database and aside from losing data we are losing quite a lot of performance here so it's it's good to use some longer and random id so it's difficult to guess them and somewhat similar request uids sometimes called correlation ids if you start let's say the processing flow on the boundary of our system then some calls are fired to microservices maybe there are quite a lot of them we can add a parameter with this id of the request and then it's easy to find this in logs and debug the situation um rest clients and browsers there are some api vendors who have an idea that if somebody access api from web browser it means we want to return html instead of json it's generally a bad idea we have content negotiation for that we have standards for time so let's not invent our own standards for that health endpoints it's more about internal apis not external but aside from the health endpoint that we need for for example our kubernetes or some other automation to to see if our service is okay so it's a good idea to add more information for developer for example the git version from which the service was built or the values of some important properties or some other stuff so we can it can be a bit enriched and last but not least external apis there are often product projects in the company which has this b2c business to client model and many front-end to add an external api and connect with big business and it usually sounds quite good and easy you just make some kind of gateway separate service do some configuration and security a few other things it will be nice but in practice it turns out that those projects are actually just a bit about new technology and in quite large part in dealing with uh existing problems in our systems meaning if we if if we if you want to create external api we have to think what should be exposing that because often internal apis we expose a lot of things that are not very secure to expose so now we have to limit that often there is a situation where a business flow of our system is not really prepared for b2b for example you have a flaws to pay with credit cards or braintree paypal or whatever but you don't have an option to issue an invoice and you have to add that to make it work or on the other hand it's often common that you have to do a little small modifications in many parts of the system for example remember last time when i was taking part in similar project i think it was the moment when i checked out the um the largest number of git repositories in this company because we had to add a small field partner id in very many places so when you hear that there is an external api under construction beware it might it might mean that there is a lot of dirty work to be done but on the other hand that's an opportunity to get to know the system better because you work on many places and like get this good overview so how how the client would see it from the outside and another aspect is that often there is no good consensus of the naming of part of the system for example when we did this api for travel it turns out that one micro service returns something by the name of segment another service calls this itinerary another service calls this lag and now we suddenly have to somehow unify that and and it should make sense from the business point of view to sum up good api takes a lot of effort a lot of things we need to think about especially some of them are not that visible from the from the business side api is percy's a product for developers thinking as a developer might be kind of tricky for business people but api to developer is more like graphical user interface is to regular user so we need to keep that in mind if you are going with rest because it's not the only way for example you can choose graphql yesterday there was a nice talk about graphql literally one attempt maybe yeah there are many many alternatives there are binary protocols but the if you are talking about rest verbs are the center of rest uh we need to take advantage of http semantics so methods headers status all that good stuff and look at those principles but do not follow it blindly because something sometimes something doesn't really fit into the into into the picture but it's i mean doesn't fit into the rest picture but still makes sense and we can just bend the rules if we know what we are doing okay if you fall asleep or just come later to this lecture most of what i said and quite a lot more is in i think six or seven currently articles on my blog it's how to train your java just like the movie but with java instead of dragon so welcome to drop by okay questions yes i'm not sure actually but uh i think this is one of the common functionalities that is actually implemented but by api management platforms like apg or however it is now called because it was about bought by google wso2 or aws gateway i think in all of them i saw the point about frotlink so i suppose they do any other questions yeah you didn't mention anything about parameters inside the body that we are using it that we are actually picking in external rp i'm often seeing that people use a trade to use request response as a part of naming and to me it's well i saw the problem that somebody needed to request a long list of and the question basically about parameters in eureka's body if they are good or not the problem was that somebody has a situation that he wanted to request a very very very long list of resources and the list was particularly that long that it may not fit the the length of the url and he was wondering that maybe then it is a good idea to put it in a body actually i mean when i'm looking for a sweater yes i'm often seeing the description of posts that people describing elements yes as for example peoples yes and they hold the request people request and define the body and responds people respond and defines the body of response it's not looking like resource-oriented style but very often i see the style of defining documentation and i didn't find any good place describing how this app should like and how we should describe this requested response i'm not very sure how to answer that i don't gotta think about it uh another questions i don't want to stand between you and beer for too long so maybe if there are other questions just catch me later today or tomorrow and that will be all thank you\n"