Software Analytics

Led by Diomidis Spinellis

How can we harness the massive data modern development and deployment processes generate, as well as Big Code, to increase development productivity and operational efficiency?


Modern software projects are more than just the code that comprises them: teams follow specific development processes; the code runs on servers or mobile phones and produces runtime logs; users talk about the software in forums like StackOverflow and GitHub and rate the product in app stores. The software is part of a collection of similar applications and depends on external code or service API’s to deliver its functionality. Modern software teams need data to make informed decisions that enable continuous, feedback-driven improvement.

At the Software Analytics lab, we work to make software analytics a core asset for software development teams. Our research touches topics such as computer-supported collaborative work (CSCW), big data systems, software engineering processes, software reliability, software analysis, machine learning, and data science.

Currently, we focus on the following 2 research lines, even though we are always open to new ideas:

  • Engineering for (software) analytics: creating platforms for data ingestion, integration and querying in a streaming fashion. Related projects:

    • AI4Fintech Making large software-based organizations more efficient.
    • Codefeedr A platform to ingest and process software analytics data in a streaming fashion
    • GHTorrent Collects all data from the GitHub event API
  • Software ecosystems: We build ecosystem-wide, versioned call graphs out of package networks to make studies such as precise security vulnerability tracking, software license tracking, data-based API evolution, etc possible.

    • FASTEN A platform for analyzing dependency management services at the call graph level granularity


Responsible Researchers

Diomidis Spinellis
Diomidis Spinellis
Full Professor (0.2fte)



Fine-grained analysis of software ecosystems (FASTEN)
Mining software engineering data from GitHub
Next generation software analytics