Class Notes: Power of Text-mining in BPM


In this note we investigate text-mining’s potential to foster BPM capabilities. We provide a brief introduction to text-mining, examples of text-mining by probabilistic topic modeling, and a tool that makes text-mining accessible for BPM practice. Then we present a framework with which to plan for text-mining projects in BPM and give examples for each of the potential application areas.

We transfer the body of knowledge in text-mining to the field of BPM and show that text-mining offers significant potential for building BPM capabilities in both exploitation and exploration. Our examples cover capability areas that include strategic alignment, governance, methods, IT, people, and culture. We identify the considerable potential of applying text-mining in BPM and conclude with a call for more discussion and contributions to this promising new lens through which to build BPM capabilities.


During the past two years, more data has been captured than humankind has ever captured before. It is estimated that up to 80 percent of this data is stored in an unstructured form and expressed in rich and ambiguous natural language (IDC, 2011, 2014). Such data offers an ocean of opportunities for business analytics, and big data analytics is a topic of considerable interest today (e.g., examples in Müller 2016).

For example, Michel et al. (2011), a study that has received a good deal of public and scholarly attention, investigated cultural trends by computing the yearly frequency of words that appear in the five million digitized books in Google Books. This simple statistical analysis found, for instance, that the diffusion of innovations, measured by word frequencies corresponding to certain technologies (e.g., radio, telephone) over time, is accelerating at an increasing rate. While it took an average of sixty-six years from invention to widespread adoption of a technology at the beginning of the nineteenth century, the average time to adoption dropped to twenty-seven years at the beginning of the twentieth.

In this note, we investigate how Business Process Management (BPM) can benefit from text-mining methods and tools. vom Brocke and Schmiedel (2015) observed that BPM must be more open to digital technologies, and Rosemann (2015) emphasized that combining both exploitation and exploration is important. Therefore, this note presents some practical ideas on how BPM can leverage text-mining to do so. We first provide a brief introduction to text-mining methods and tools, building on Debortoli (2016). We describe Probabilistic Topic Modeling, which can be useful for BPM and present a tool,, that provides easy-to-use cloud-based text-mining services, including Probabilistic Topic Modeling, to show the accessibility and practicability of text-mining for BPM experts. Then we discuss opportunities to apply text-mining in BPM, presenting a systematic framework with which to identify application areas and examples of how to increase BPM capabilities through text-mining for both exploitation and exploration. We conclude with a brief summary and call for more text-mining applications in BPM.

Text-mining Tools and Data


Text-mining’s potential is characterized by both the data that is available and the methods and tools that are ready to use to analyze the data. Let’s talk about data first and then describe the methods and tools.

As we write, alone offers more than 140 million customer reviews of more than 9 million products written by millions of Amazon users and spanning almost twenty years (McAuley, Pandey, & Leskovec, 2015; McAuley, Targett, Shi, & van den Hengel, 2015). At the same time, the more than 300 million active Twitter users generate an average of 500 million tweets per day (Twitter, 2015). Such digital trace data is “naturally occurring data.” Naturally occurring data, as it occurs on social networks (Richter et al. 2009), is of value because it is available in large amounts; it documents what is “real,” in that the data is not provoked by analysts but generated by people as they use and interact with technology; and it provides rich descriptions in natural language that allow for many kinds of analysis.

The analysis of large amounts of text data would not be possible (or efficient) without tools that automate text analysis. Text-mining techniques allow the researcher to extract automatically implicit, previously unknown, and potentially useful knowledge from large amounts of unstructured textual data in a scalable and repeatable way (Fan, Wallace, Rich, & Zhang, 2006). Powerful IT infrastructures, such as in-memory technology (vom Brocke et al. 2014), provide information processing capabilities that further support efficient analysis of large volumes of data. Of course the automated computational analysis of text only scratches the surface of a natural language’s semantics, but when applied to sufficiently large data sets it produces surprisingly accurate results (Halevy, Norvig, & Pereira, 2009).

Against this background, text-mining offers a complementary data collection and analysis method for BPM. In particular, automated text-mining allows BPM researchers and practitioners to generate new insights from data that can help to improve and innovate processes, products, services, and business models (vom Brocke 2007, vom Brocke, Lindner 2004).

Debortoli et al. (2015) provide an overview of techniques with which to analyze large text corpora. In the following, we describe unsupervised machine-learning methods for text categorization that are able to find hidden patterns in texts. These methods have some advantages over manual coding of data, as they require little human intervention, generate reproducible results, and can cope with volumes of text that would be otherwise impossible to analyze.

Example: Probabilistic Topic Modeling

To illustrate how text-mining works, we describe probabilistic topic modeling using Latent Dirichlet Allocation (LDA) (Blei et al., 2003; Blei, 2012), a method for extracting semantically meaningful topics from texts and categorizing text according to these topics. LDA is widely used in both research and practice, and free and open-source LDA software libraries are available for most statistical programming languages (including R, Python, and Java).

The core idea behind LDA, first proposed by Blei et al. (2003), is that authors compose documents by first choosing a mix of topics to write about and then using words that are typical for each topic (Figure 1). Each document exhibits topics in its own proportions, possibly ranging from 0 percent (if a document does not talk about a topic at all) to 100 percent (if a document talks about one topic exclusively).

Fig. 1: Schematic Overview of LDA (Debortoli et al. 2016)
Fig. 1: Schematic Overview of LDA (Debortoli et al. 2016)

Figure 2 uses an exemplary online customer review about a “Fitbit Flex” device and its topic distribution, as well as six topics and their word distributions, to illustrate t basic idea behind LDA. We use this case, taken from Debortoli et al. (2016) as an ongoing example in this note.

The review covers three topics to varying degrees: Topic 3 (55%), Topic 2 (35%), and Topic 6 (10%). Each topic is represented by a distribution of words. For example, Topic 3 contains words like “weight” (8%), “loss” (5%), and “pounds” (4%). Topic 2, on the other hand, contains words like “gift” (10%), “love” (7%), and “Christmas” (7%). Finally, the most probable words for Topic 6 are “app” (12%), “iPhone” (8%), and “sync” (3%). Taken together, one can infer from these distributions that the review talks about losing weight as the result of using a Fitbit device that the person has received as a gift; in addition, the reviewer reports that the synchronization of the device with the mobile phone works well.

Fig. 2: Illustration of LDA Analysis of Customer Reviews of Fitbit Devices (Debortoli et al. 2016)
Fig. 2: Illustration of LDA Analysis of Customer Reviews of Fitbit Devices (Debortoli et al. 2016)

Tool Support

In order to apply text-mining in practice, methods like LDA must be implemented in software and a number of libraries must be integrated in order to, for example, clean and analyze the data. In an effort to make text-mining accessible to people in research and practice, a group of researchers developed the cloud service (MineMyText, 2016). This section demonstrates the accessibility and practicability of text-mining for BPM professionals by means of an illustration of working with the platform.

The MineMyText tool provides text-mining services in the cloud that are easy to use, following a three-step approach: 1- Upload data, 2 – Set parameters, 3 – Get results. We discuss each step using customer reviews from the Fitbit Flex Wireless Activity & Sleep Wristband (, one of the early wearable technologies used to track and analyze personal health and fitness data.

Step 1 – Upload data

We collected all 12,910 reviews on the Fitbit product between April 16 and May 12, 2013, on using a crawler we developed (Debortoli et al., 2015). The reviews contained 457,239 worlds and 25.34 MB. We automatically formatted the data as a list of valid JavaScript Object Notation (JSON) objects, which is one of the formats supports as input data.

Step 2 – Set parameters

The only parameter to set was the number of topics we wanted the service to identify. We experimented with different numbers of topics and finally chose fifty topics that we felt provided a good balance between achieving a fine-grained overview of topics and being able to handle the number of topics. Since executing the algorithm for identifying topics takes only a few minutes, depending on the file data volume and the number of topics chosen, we were able to experiment with different numbers until we felt that we had a meaningful set of topics.

Step 3 – Get results

The text-mining tool automatically identified the fifty most significant topics about which people were writing in the 12,910 reviews. The tool provides an overview of the topics found (figure 3), and further analysis can be done for each topic.

Fig. 3: Overview of Topics Identified in the Reviews (1-20 out of 50)
Fig. 3: Overview of Topics Identified in the Reviews (1-20 out of 50)

Each topic is described by a sequence of words that characterizes it. Clicking on the topic retrieves a more detailed overview of the derivation of words (figure 4).

Fig. 4: Detailing Word Distribution and Topic Timeline (Topic 3)
Fig. 4: Detailing Word Distribution and Topic Timeline (Topic 3)

The word cloud in figure 4 shows that topic 3 deals with “weight loss,” so customers who reviewed the Fitbit product appear to be concerned with “weight” and “pounds” per “week,” “month,” and  “year.” These findings indicate that customers report that they have lost weight through using the Fitbit product. Other topics indicate that the Fitbit product can be a gift (topic 9), that the battery life and the charging process concern users (topic 12), and that users recognize the value of the sleep-tracking function (topic 14). Such observations reveal useful insights for the product’s management on how to promote and develop additional products and services.

The topic timeline in figure 4 shows how often the topic is referred to in all reviews, that is, the probability that a review will relate to a particular topic. The figure shows that, after the product was introduced, the frequency of the “weight loss” topic (topic 3) rose and still keeps a 3 percent frequency among all reviews. Other topics lose importance over time (e.g. topic 12) since the battery life might have been improved; gain importance over time (e.g., topic 14); or peak according to seasonal influence (e.g., topic 9, which peaks over Christmas). Monitoring and mining customer reviews in real time allows managers to sense customer concerns the moment they come up and to take corrective action with a minimal latency time (vom Brocke et al., 2013).

Topic modeling reveals with what issues customers are concerned, but it does not necessarily show whether these concerns are positively or negatively connoted. An additional sentiment analysis on the data provided by the tool can evaluate the degree of reviews’ positivity or negativity on a scale from +5 = very positive to -5 = very negative. The results for topic 3 are presented in figure 5.

Fig. 5: Detailing Sentiments Associated with Reviews in Topic 3
Fig. 5: Detailing Sentiments Associated with Reviews in Topic 3

The results shown in figure 5 demonstrate that customers have posted both positive and negative reviews on the topic of “weight loss” in relation to the Fitbit product. The results suggest that some customers have been happy with the results of using Fitbit for weight loss, while others have not. The analysis also shows that most of the customer reviews relating to topic 3 were positive (value of +3 in the possible range of -5 to +5).

In order to make additional sense out of the data, such as by finding root causes for negative sentiments, the tool can drill down to a single review. Our experience shows that looking at specific reviews that are associated with a topic and a sentiment builds trust in the analysis, since one can easily confirm the tools results by looking at single reviews. Root causes for specific sentiments related to a topic can also be identified. The results from drilling down to specific customer reviews associated with a positive sentiment regarding the topic of weight loss are presented in figure 6.

Fig. 6: Drill-down to Individual Reviews on Topic 3
Fig. 6: Drill-down to Individual Reviews on Topic 3

Clearly, a great deal can be extracted from the data since naturally accruing data is available in numerous forms and can be analyzed with modern tools effectively and efficiently. The next section addresses how this potential might be used in BPM and in fostering BPM capabilities in particular.

Leveraging Text-mining in Business Process Management

BPM has been characterized as a comprehensive management discipline that builds organizational capabilities to improve and innovate organizational processes. More specifically, six capability areas have been identified (Rosemann, DeBruin, 2005), which are considered core elements of contemporary BPM (Rosemann, vom Brocke, 2015): strategic alignment, governance, methods, IT, people, and culture.

While contributions to BPM have often focused more on aspects of process improvement, such as by methods like Six Sigma (Conger, 2015), BPM must now contribute more to building organizations’ innovation capabilities (vom Brocke, Schmiedel 2015) and, eventually, support both the exploitation and exploration of business opportunities. Rosemann (2015) refers to balancing these two objectives has also been referred to ambidextrous BPM (Rosemann 2015).

We developed a grid in which we contrast the two dimensions of BPM elements and BPM objectives in order to explore the potential of text-mining for BPM. For each of the six core elements of BPM, we provide examples of how to use text-mining for exploitation and exploration. The grid is presented in figure 7.

Fig. 7: Framework with Which to Identify Scenarios Where Text-mining Can Build BPM Capabilities
Fig. 7: Framework with Which to Identify Scenarios Where Text-mining Can Build BPM Capabilities

The twelve examples in figure 7 show that topic modeling  (as conducted based on LDA) can deliver valuable contributions to both exploitive and explorative BPM. Both internal and external data can be analyzed to reveal topics that matter to customers or employees. By choosing the data wisely and interpreting the results, managers can revisit strategy and governance structures, develop and innovate information technology and methods, and measure skill sets and cultural values through a new lens.

The examples in figure 7 also show that results do not necessarily come immediately and that investigations usually require an experimental approach. Results largely depend on the availability, quantity, and quality of data, so researchers must try specific analyses, see how far the results might get them, and then refine and extend the results toward further analysis. Human interpretation is an important part of this process, as domain knowledge and creative thinking are required.

Studies have demonstrated that insights can be drawn from such an iterative text-mining process. For instance, topic modeling has been successfully applied to job postings available on global job sites like to develop detailed skill sets for BPM professionals (Müller et al., 2014) and data scientists (Debortoli et al., 2014). Similarly, organizations may use topic modeling to analyze external media sources to identify trends and emerging topics related to their business. Such information might fuel skill mining regarding such emerging topics, and the results can inform skill development in any organization.

Another example of the use of text-mining is for compliance management. Processes can be automatically evaluated regarding the likelihood that they will relate to specific topics that are relevant to compliance. For this purpose, mining texts that describe specific compliance regulations may reveal topics that can be compared to the topics found in the organization’s process data. Natural language processing algorithms can calculate the probability that a process description will relate a topic that is relevant to compliance. Considering process flow, documents created in a process can also be evaluated automatically for their relevance to compliance regulations, and additional operations may be triggered to handle the document according to certain regulations.

Similarly, in order to evaluate an organization’s level of strategic alignment, text-mining might be applied to process data, such as process descriptions, process-related documents, and communication logs. Thus, topic modeling can reveal the topics to be found in such data, and process analysts can compare these to strategically relevant topics, such as those that reflect an organization’s overall strategy. For instance, if agility is part of the organization’s strategic value system, text-mining can help to reveal to what extent process knowledge, as found in process descriptions or process-related communications, relates to the topic of agility topic.

Regarding information technology, text-mining can analyze service tickets to reveal typical issues users raise. Service tickets can be automatically classified by type of incident, and solutions that have been applied in the past can be presented, improving IT service management’s efficiency and reducing reaction time to incidents. Similarly, user feedback on recurrent concerns can be analyzed to be used to develop information systems that improve support processes. Analyzing such data might even reveal user needs that suggest new information systems that would support BPM.

Even new insights into how to develop the organizational culture (as well as new organizational values) can be derived from text-mining. For example, Schmiedel et al.’s (2016) pioneering study used the text-mining cloud service to analyze data from the company review site to reveal organizational values hidden in customer reviews. This tool can be used to analyze the as-is state of an organization’s culture based on such reviews or to analyze the organizational values of a reference organization (e.g., referent to performance or innovativeness) as source of inspiration in developing a value system.

The examples show that effective interpretation of the mining results is needed, so one cannot rely on text-mining data alone. Still, results from text-mining appear to be a promising source of knowledge in building BPM capabilities by, in particular, allowing us to measure “what is really going on” in executing and developing processes. In this regard, text-mining has an impressive truth value since it is not based on one or more opinions but on the digital trace data that is already available inside and outside the organization.

Summary and Outlook

This note explores the potential of using text-mining methods and tools in BPM. It explains the foundations of text-mining for those who are not yet been familiar with related approaches, introduces methods, provides examples, and presents a tool,, that makes text-mining accessible for everyday process-related work.

The note also details how BPM can benefit from text-mining. It presents a framework with which to structure application areas, differentiating the six core elements of BPM (strategic alignment, government, methods, IT, people, and culture) and both objectives of BPM—to improve and innovate (or to exploit and to explore, respectively). The note presents ideas for each field of the 6 × 2 matrix that can stimulate thinking. We are convinced of a plethora of opportunities, in pursuit of which we should all engage.

We hope that the note demonstrates adequately both the potential of applying text-mining to advance BPM capabilities for exploitive and explorative BPM and the ease of applying text-mining based on openly available data and easy-to-use tools.

Our experience shows that it takes good tools to analyze data efficiently. Tools make life easier, particularly if they are easy to use. However, smart people must ask the right questions, find the right data sources, guide evolutional and experimental analytical journeys, and interpret results in the most meaningful way.

A Call to Get Involved

We invite readers to engage in text-mining for BPM. The sample case presented in this note is publically available on, where readers can do further analysis based on the data. A free version, available on, can be uploaded to analyze individual data.


I teamed with Oliver Müller from IT University Copenhagen, who is a visiting researcher in our BPM group in Liechtenstein, and with Stefan Debortoli, a co-founder of the cloud service to write this note.

Many people and institutions contributed to the research presented in this note. In particular, we thank Iris Junglas of Florida State University, USA, and Michael Gau of the University of Liechtenstein for their important contributions to this emerging and promising field of research. We also thank the Hilti Corporation ( and Inventx ( for many projects that led to the findings in this note.


Blei, D. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Blei, D., Ng, A., & Jordan, M. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(1), 993–1022.
Conger, S. (2015). Six Sigma and Business Process Management. In J. vom Brocke & M. Rosemann (Eds.), Handbook on Business Process Management: Introduction, Methods, and Information Systems (International Handbooks on Information Systems) (2 ed., Vol. 1, pp. 105-122). Berlin et al.: Springer.
Debortoli, S., Junglas, I., Müller, O., & vom Brocke, J. (2016). Text-mining For Information Systems Researchers: An Annotated Topic Modeling Tutorial. Communications of the Association for Information Systems (CAIS).
Debortoli, S., Müller, O., & vom Brocke, J. (2014). Comparing Business Intelligence and Big Data Skills – A Text-mining Study Using Job Advertisements. Business & Information Systems Engineering, 6(5), 289-300.
Fan, W., Wallace, L., Rich, S., & Zhang, Z. (2006). Tapping the power of text-mining. Communications of the ACM, 49(9), 76–82.
Halevy, A., Norvig, P., & Pereira, F. (2009). The unreasonable effectiveness of data. IEEE Intelligent Systems, 24(2), 8–12.
IDC. (2011). The 2011 Digital Universe Study: Extracting Value from Chaos. Retrieved from
IDC. (2014). The 2014 Digital Universe Study: Rich Data and the Increasing Value of the Internet of Things. Retrieved from
McAuley, J., Pandey, R., & Leskovec, J. (2015). Inferring Networks of Substitutable and Complementary Products. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pages. Sydney.
McAuley, J., Targett, C., Shi, Q., & van den Hengel, A. (2015). Image-based Recommendations on Styles and Substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval. Santiago.
Michel, J.-B., Shen, Y. K., Aiden, A. P., Veres, A., Gray, M. K., Pickett, J. P., … Aiden, E. L. (2011). Quantitative analysis of culture using millions of digitized books. Science, 331(6014), 176–182.
MineMyText (2016). Text-Mining in the Cloud. Retrieved June 30, 2016, from
Müller, O., Junglas, I., vom Brocke, J., & Debortoli, S. (2016). Utilizing Big Data Analytics for Information Systems Research: Challenges, Promises and Guidelines. European Journal of Information Systems, forthcoming.
Müller, O., Schmiedel, T., Gorbacheva, E., & vom Brocke, J. (2014). Toward a Typology of Business Process Management Professionals: Identifying Patterns of Competence through Latent Semantic Analysis. Enterprise Information Systems, 10(1), 50-80.
Quinn, K. M., Monroe, B. L., Colaresi, M., Crespin, M. H., & Radev, D. R. (2010). How to analyze political attention with minimal assumptions and costs. American Journal of Political Science, 54(1), 209–228.
Richter, D; Riemer, K; vom Brocke, J; Grosse Böckmann, S. (2009). Internet Social Networking-Distinguishing Phenomenon and practical Manifestation,"Proceedings of the 17th European Conference on Information Systems, Verona, Italy", 2009.
Rosemann, M. (2015). Foreword, in: vom Brocke, J., Schmiedel, T. (Eds.). BPM – Driving Innovation in a Digital World. Heidelberg: Springer.
Rosemann, M. and de Bruin, T. (2005), Towards a Business Process Management Maturity Model. European Conference on Information Systems (ECIS). Regensburg, Germany.
Rosemann, M., & vom Brocke, J. (2015). Six Core Elements of Business Process Management. In J. vom Brocke & M. Rosemann (Eds.), Handbook on Business Process Management: Introduction, Methods, and Information Systems (2 ed., Vol. 1, pp. 105-122). Berlin et al.: Springer.
Schmiedel, T., Müller, O., Debortoli, S., & vom Brocke, J. (2016). Identifying and quantifying cultural factors that matter to the IT workforce: An approach based on automated content analysis. Paper presented at the 24th European Conference on Information Systems (ECIS), Istanbul, Turkey.
Twitter. (2015). Twitter Usage and Company Facts. Retrieved June 30, 2016, from
vom Brocke, J, Debortoli, S., Müller, O.; Reuter, N,, (2014), How in-memory technology can create business value: insights from the Hilti case, Communications of the Association for Information Systems,34,1,151-167,2014.
vom Brocke, J. (2007). Service portfolio measurement: Evaluating financial performance of service-oriented business processes,International Journal of Web Services Research (IJWSR),4,2,1-32,2007.
vom Brocke, J. (2013). In-Memory Value Creation, or now that we found love, what are we gonna do with it? BPTrends, 10, 1-8.
vom Brocke, J., Lindner, M. A. (2004), Service portfolio measurement: a framework for evaluating the financial consequences of out-tasking decisions,Proceedings of the 2nd international conference on Service oriented computing, 203-211,2004.
vom Brocke, J., Schmiedel, T. (2015). BPM – Driving Innovation in a Digital World. Heidelberg: Springer.

Jan vom Brocke, Oliver Müller, and Stefan Debortoli

Jan vom Brocke is head of the BPM group in Liechtenstein. He is Professor of Information Systems, the Hilti Chair of Business Process Management, and Director of the Institute of Information Systems. He is Founder and Co-Director of the award-winning Master Program in Information Systems with Majors in Business Process Management and Data Science and Director of the PhD Program in Information and Process Management at the University of Liechtenstein(see: Since 2012 he has been appointed Vice-President of the University of Liechtenstein responsible for research and innovation, re-elected in 2015. Jan has over 15 years of experience in IT and BPM projects and he has published more than 300 papers in reknowned outlets, including MIS Quarterly (MISQ), the Journal of Management Information Systems (JMIS), European Journal of Information Systems (EJIS), and the Business Process Management Journal (BPMJ). He has authored and edited 29 books, including Business Process Management – Driving Innovation in a Digital World and Green BPM – Towards the Sustainable Enterprise, and the International Handbook on Business Process Management. Jan is an invited speaker and trusted advisor on BPM serving many organizations around the world. You can contact and follow Jan via his website: Oliver Müller is an Associate Professor at the IT University of Copenhagen. He holds a BSc and MSc in Information Systems and a PhD in Business Economics from the University of Münster, Germany. The goal of Oliver’s research is to help organizations and individuals to create value through (big) data and analytics. At this, he particularly focuses on extracting knowledge from large amounts of unstructured text data, from both the Internet and enterprise-internal sources. His research has been published in the European Journal of Information Systems, Journal of the Association for Information Systems, Communications of the Association for Information Systems, IEEE Transactions on Engineering Management, and others. Stefan Debortoli is an associated researcher at the Institute of Information Systems at the University of Liechtenstein. He holds a BSc and MSc in Information Systems and a PhD in Business Economics from the University of Liechtenstein. His doctoral studies focused on applying big data analytics as a new strategy of inquiry in Information Systems Research. In the field of big data analytics, he focused on applying text-mining techniques for research purposes. His work has been published in the European Journal of Information Systems, Communications of the Association for Information Systems, and Business & Information Systems Engineering.

Latest posts by Jan vom Brocke, Oliver Müller, and Stefan Debortoli (see all)



Leave a Reply

Your email address will not be published. Required fields are marked *