Hosting code coverage report for a project targeting multiple Python versions by Travis CI and Github Pages

In the past few days, I was finding a solution for:

  1. running tests with code coverage for a project targeting multiple Python versions automatically
  2. public and easily accessible coverage report

To meet the requirements of point 1, we can run pytest and pytest-cov (in which coverage.py is integrated) on Travis CI.

And thanks for the powerful coverage.py, it is easy to generate a summarized report of code coverage for multiple Python versions by the following steps:

1
2
3
4
5
6
7
# Just run this command repeatedly for each virtual environment ---
$ COVERAGE_FILE=[name_of_coverage_file] python -mpytest --cov=[module_name] ./tests

# Then, use this command to combine data files.
# Note that name of data file should be in the format of `.coverage.[suffix]`
# ref: https://coverage.readthedocs.io/en/coverage-5.0.3/cmd.html#combining-data-files
$ python -mcoverage combine

Use third party services or play with complex settings?

As for point 2, it took me a while to make a decision. Though there are several existing awesome services like codecov, coverall, I wonder if I could achieve it without those services.

To do that, I have to figure out a way to access data files of code coverage generated in each single CI job (environment). However, it’s unlike Jenkins and CircleCI, you cannot access/share artifacts generated by jobs by simply adding some operations in your .travis.yml. Though it’s a general concept of CI jobs and it’s also stated in official documentation:

It is important to note that jobs do not share storage, as each job runs in a fresh VM or container. If your jobs need to share files (e.g., using build artifacts from the “Test” stage for deployment in the subsequent “Deploy” stage), you need to use an external storage mechanism such as S3 and a remote scp server.

Oh… Third party service is still required?

After rethinking about this problem, it seems Github Pages is a good candidate as a solution. However, it requires some interactions between Travis CI and Github. Therefore, I started googling with keywords like travis ci push back to github.

Then, yeah, a good article emerged from the sea: How to set up TravisCI for projects that push back to github

It solved a part of problem. The remaining one is: “How can I access those data files of code coverage after each CI job is done?”
It somehow seems that we are getting back to the starting point again. But, it isn’t.

The complicated but not difficult part

Since we known that we could authorize Travis CI to push files back to our own Github repository, we could also playing with branches. Here is the strategy:

  1. Define a test stage containing jobs running tests on different Python version.
  2. After a job is done, push the data file of code coverage back to specific branch.
  3. While all tests are done, we enter to the next stage for combining data and generating report. In this stage, we checkout those data files from each branch for data storage.

And I implemented it in this repository. All operations related to the strategy mentioned above are written in the file .travis/utils.sh, and the workflows of running tests and generate report of code coverage are written in .travis/runtests.sh and .travis/gen_report.sh respectively.

There are few things worth noting:

  • create_branch_for_coverage
    We create orphan branches for tracking coverage files only. And the argument --orphan make these branches being history-independent to the main branch (master) of our project. It’s a cool technique for such purpose, and there are more cool use cases here.

  • pull_branch_for_coverage
    By default, Travis CI pulls only one branch into docker container of a job (if it is triggered by master branch, then it pulls that branch).
    Therefore, to commit code coverage data to corresponding branch, we have to pull that branch to local first. And that’s why you can see this function is called in both .travis/runtests.sh and .travis/gen_report.sh.

  • commit_artifacts
    Argument --allow-empty in command git-commit is necessary for this workflow, because there might be no changes in code coverage. e.g. commits for fixing typo, documentation…

    1
    2
    3
    4
    5
    commit_artifacts() {
    # ...
    git commit -q --allow-empty -m "Travis build: $TRAVIS_BUILD_NUMBER"
    # ...
    }
  • use argument -q to silence output message
    For security reasons. As this comment from that post created by willprice.

After setting up all these things and push them to Github, Travis should be ready and work normally. And you can check the coverage report hosted on Github Pages of your repository (which should be https://[your_github_user_name].github.io/[repo_name])

Postscript

Here is another concern taken before making the decision mentioned above:

Instead of creating multiple branches for coverage files, is it good to push all those files to a single branch?

In my opinion,

  1. Keeping one branch for each single Python version is not a bad thing, it makes us able to focus on what should be done in a single CI job.
    Besides, CI jobs are executed parallelly. With this strategy, we can take each branch as a temporary workspace for corresponding job, and it reduces the interference among jobs.

  2. We can also merge those branches to a single branch after the whole pipeline is done. Deleting those branches is also an optional post-action.

  3. It is easier to checkout files from multiple branches, and it is time independent for each job in a single CI pipeline.

    1
    2
    3
    4
    5
    6
    7
    8
    9
                               timeline
    branch\commit [job_1] [job_2] [job_3]
    branch_py35 j1_py35 <- j2_py35 <- j3_py35
    branch_py36 j1_py36 <- j2_py36 <- j3_py36
    branch_py37 j1_py37 <- j2_py37 <- j3_py37

    ----------
    # Checkout files from each branch
    combination j1_all <- j2_all <- j3_all

    If we push every coverage file of different version to a single branch, commit history becomes as the following graph:

    1
    2
    3
    4
    5
    6
    7
                [           job_1           ]      [           job_2           ]
    branch j1_py35 <- j1_py36 <- j1_py37 <- j2_py36 <- j2_py35 <- j2_py37

    ----------
    # Note that order of commits is not guaranteed to be sorted
    # Use `git rebase` to combine multiple commits
    combination j1_all <- j2_all

    We can get the same result with different approach. However, once there is a failure occuring in a job, we need to do a few more steps to handle it.

    1
    2
    3
    4
    5
    6
    7
    8
    9
                   [job_1]      [job_2]
    branch_py35 j1_py35 <- j2_py35
    branch_py36 (NA*) <- j2_py36
    branch_py37 j1_py37 <- j2_py37

    *Job is failed, so that there is no commit.

    ----------
    combination NA <- j2_all

    Since there is a failure at branch_py36 in job_1, CI pipeline will be terminated. Hence that there is no combined coverage files at branch combination.
    After pushing a hotfix to start pipeline again, we just need to checkout the latest commit from each branch. It means that the next run of CI pipeline won’t be affected by previous failure.

    1
    2
    3
    4
    5
                [          job_1          ]       [           job_2           ]
    branch j1_py35 <- (NA*) <- j1_py37 <- j2_py36 <- j2_py35 <- j2_py37

    ----------
    combination NA <- j2_all

    Due to the parallel execution of CI jobs, j1_py37 is still pushed to the branch. In this case, there are only 2 commits in job_1.
    Though that it doesn’t affect the execution of job_2, the missing commit will increase the difficulty of backtracing on branch.