Case Study - The FAILURES Project

Jira is an issue tracking system. You can easily track issues in a multitude of ways and it will hum along, doing its job. You have issues and search functionality that is very robust for this function. Many add-ons give Jira a wide range of other functionalities. However, it’s important to remember that Jira should remain true to its core purpose.

Let’s explore a use case: If you find someone is using Jira to store data that is not really being tracked on issues, then this data should not be stored in Jira. It can get tricky as to what data is stored to Jira and many end-users will create a use case where it would appear to be an appropriate use of Jira, but it is not.

I will explain by example here:

I encountered a Jira instance which was used for software development. The engineers had created an automated testing system for their software. They would run a testing harness and see if any of the recent changes to their code would break using the same series of tests. They would then automatically generate a Jira issue when a test failed. Failures would occur like an avalanche and not digested as just one event. The project they created in Jira was called: FAILURES. This sounds like a good idea, but, guess what happened? It became a monster.

The Jira FAILURES project was being flooded with issues because the automated failure testing would be run randomly and at certain points, the code was not building correctly. The issues were being created, and stored to the issue was Stack Trace information. If you haven’t seen a Stack Trace, it is a large printout from a failure in code that defines the error seen in the system. It is like a traffic accident and can contain a great deal of information. Also, with some FAILURE issues, multiple comments would be written to a Jira issue for different types of fail events. Many issues had over 2,000+ comments. End users had also created JQL search filters with complex searches to search these Jira issues to find the information they required. This one FAILURE project had over 80% of the total issue count of their Jira instance and accounted for a huge part of the load on their systems.

It gets worse. Many of the developers who were running automated testing had their own testing code bases. Many of these bases were not maintained any longer but still running. They were running on a wide variety of computers. So, there was no control on what was being run, who was maintaining this testing code, and who was mining it. The Jira REST API was also being used to run these queries. Many times a user account was created for the sole purpose of running this code. The developers would run this code, sign it in, and the other developers could see their username and passwords. Many times code was copied and reworked for other testing purposes. Some of the access to the account granted administrator access. Yup… admin access credentials stored in a code repository and shared by the development team. The REST API calls would often fail and flood Jira as well.

What was this FAILURE project? It is a data mining attempt. Jira, in this case, was being used as a database. Jira was also being used as a mining tool (by using JQL to mine these Issue created from automated error testing). So, essentially, Jira is being used and abused and turned into a data mining tool. People were not working these tickets directly. They were not transition them or commenting on them. They would be mined and then pushed into another project for processing.

What was result of this?

This put a huge load on the Jira application
- Issue count. In this case over 1 million issues were created on their instance.
- Issue size. Large stack traces stored to Jira issues in text fields where the admins set the upper limits of text fields to a very high size.
- Comment Count. Repeat comments, multiple comments in the 1000s, … and flooding of the number of comments and size of comments occurred.
- JQL Search. Large, complex searches were created using the ScriptRunner JQL search capabilities. Some of these searches could take down a Jira node if run multiple times.

Indexing failures
- JIRA’s nodes each have an index that allows for searching to work correctly. These nodes stay in sync with each other. With very large indexes, caused by very large issues and issue counts, this would cause the indexing to constantly be out of sync. End users would not be able to always get correct results, depending upon which node they landed on when they were running searches.

Maintenance nightmare
- Maintenance to troubleshoot things such as the index. Indexes on individual nodes became huge and would not synchronize correctly with other nodes. This affected the JQL search filters. Many return results were missing data. Reports, Dashboards, and Boards would be randomly missing some issues.

Maintenance to scale problem
- Jira had to hold a large capacity of issues. Creating more nodes. Troubleshooting indexes.
- Maintenance to troubleshoot loading type failures.
- Maintenance of code used to populate Jira.
- Maintenance of the security mechanisms used to run this code

What happened?

Well, Jira was hijacked. Someone is using it for something beyond what Jira was designed for. Here, with the FAIL example, they are using it as a data mining tool and storage area. The real place the FAIL project should be is outside of Jira and a database somewhere. Their searches should be done with SQL and not JIRA’s JQL. Jira is not a data mining tool.

There are other possible scenarios where Jira can be hijacked. This isn’t the only use case. Jira has addons that do interesting things and provide a lot of functionality. Many end users want to use this functionality for things beyond JIRA. What is tricky is, yes, Jira can do these things. You can create issues for it, search those issues and transition some while others get closed. The question is not can Jira do this but should Jira be being used to do this? The answer is clearly: no. If anyone is looking to store additional data to JIRA, in particular, large amount of data to JIRA, this should be a strong red flag that perhaps this data should exist outside of Jira and not in JIRA. They could pump this data into Jira and flood JIRA. They could use some other functionality Jira has, but in reality, they shouldn’t do this. Jira is only a ticketing system and people interactive. End users should be interacting with the data being put into JIRA. If they aren’t or if they are automated this data in some way, be careful what data you allow to be pumped into JIRA.

What should they have done?

They should have stored their coding failures into a database, and used SQL statements to mine this database for errors. Those errors could then be used to create issues in Jira.

How do we prevent this type of thing from happening?

We need to identify if someone is storing data to Jira that really does not belong to Jira. Remember, Jira is a ticketing system. It is designed to have a flow of tickets (or issues) running through it. If someone turns on the hose of data and floods Jira with data that simply does not belong in Jira, you should make every effort to stop this before it happens. Some teams will push back and insist that you do it their way. They will also skirt your reasoning and make it such that their usage is a ticketing usage. The FAILURE project, for example was said to be used in just such a way.

Early detection and identification of such usage of Jira should be done. Tell your team you see some data flooding of Jira. Note, that you might not be able to get the end user to stop doing their behaviours. This was the case with the FAILURE project. I was able to get them to reduce their load by using archiving, removing code that was not longer owned, use a token app for access control for the REST API, … all good things but they could still store failures to a project and this would flood the project. Once they have something like this in Jira, they can be reluctant to remove it.

If you see something like this, and you are not sure, notify the solution architects to take a look at it before it is implemented on the live system. Talk with your team about it and if it can be stopped before it happens.

Let's Work Together

Let's Work Together

Agile, Blog, General

Case Study – The FAILURES Project

Explore

Helpful Links