Ever since I have started working, applications have had to run an increasing number of threads. This is mostly due to the adoption of web applications in favor of desktop applications. Due to this change; One the more difficult work that has come up is looking at the logs and figuring out what happened.
For a significantly busy server, looking at the logs is not as straightforward as I would expect.
- In the logs, each http request is represented as its own thread, so you can identify which logs belongs to which http request since the thread name is included in the log. However, it would still be hard to hard to follow because multiple threads be adding new lines to the log in a mixed fashion because these threads are running simultaneously.
- Much more difficult is when each http request would also spawn new background threads where each has their own name which you cannot reliably associate with the parent http request. For example a spawn background thread called acme_Quartz_Worker-9 . With the logs being appended by many threads on-demand.. I think it is impossible to identify which worker thread belong to which http request thread.
There are basic I have learned over the years that are must-have things to keep the logs as helpful as possible:
- In Java, always make sure to always include the exception when you call logger.error(). Failing to include it may hide valuable clue since most exceptions have a root cause exception that would usually be the most important clue.
- If available, always add contextual information in the log message. Usually this would be the id of the object being processed (like the UUID of a particular entity). Human readable information is also welcome.
- Have a well designed exception system where you ensure which layers do log an error and which layer does the logging of errors. For instance, logging an error in a class called StringUtils is a very bad idea since different callers of this utility class may want to handle errors in different ways. Also, it is hard to determine which process had an error with the StringUtils. In addition, you usually do not (and you should not) pass contextual information to a StringUtils method - which is recommended to include in #2.
Further thoughts: I found this great blog post reflecting on how our logs are not human readable yet it is also not computer-readable (since the software stack like ELK is popular).
No comments:
Post a Comment