The annotation protocol defines the classes or a taxonomy of classes, their corresponding labels, categories, characteristics, and their relationships. However, in practice—and this is a crucial point — different classes and concepts are often ambiguous and assessed subjectively by annotators due to cultural influences, different background knowledge and interpretation. Especially for concepts with a certain level of semantic complexity, ambiguities in interpretation arise. An example stemming from previous research in the DH are scenes in a film Furthermore, even for much simpler concepts, such as shot cuts which have low semantic complexity, ambiguities have been reported when it comes to concrete annotations In addition, experience has shown that it is beneficial for the process of ground truth generation to allow for annotators to highlight ambiguous cases with a predetermined label instead of forcing them to decide for one of the pre-defined categories.
This approach helps to identify unclear cases, ambiguities and inaccurate definitions of categories early in the process and thereby fosters knowledge gain. While still most computational tools used in the DH operate on a low-level e. On this level, relations of objects and more complex actions of these objects need to be interpreted automatically The human power of imagination generates an unlimited array of possible tasks to recognise automatically, such as the automated recognition of same-sex couples and their representation in films, the analysis of gait velocity in order to measure the historic development of acceleration in TV-series, or the recognition of kissing and killing and their relation in Hollywood blockbusters.
Nevertheless, there might be limits to the realisation of automating these. On the one hand, these limits relate to the computational script. The computational script refers to the question of what can be realised with specific computer vision solutions and how does technology influence, restrict and prescribe the creation of a ground truth. On the other hand, these limits connect to the narrow universalisation of specific domains. Every algorithm emerges in a specific context at a specific place and at a specific point of time. In addition to this, many IPAs and therefore many ground truths are custom-made for specific tasks and data.
A simple transfer to other contexts is often not possible without loss of scope or depth or this transfer might even create forms of bias.
Join Kobo & start eReading today
A simple process of universalisation is at least questionable. To demonstrate this and possible implications, we bring in the example of creating a ground truth for the task of automatically recognising the fall of people that was observed during ethnographic fieldwork of the first author. Both scenarios make use of visual content for a similar type of action i. In the case of automated fall detection of elderly people in private homes, a challenge for computer vision is to differentiate between critical falls, which need emergency assistance, and other uncritical actions, such as bending forward or lying down on a bed or couch for resting.
This means, while the physical process of falling might look similar in many different cases, there is a wide array of what this physical process really means e. In the AAL case of elderly people falling in their homes, ground truth was constructed within the framework of the computational script of a Microsoft Xbox Kinect sensor as well as through the ingenuity and creativity of the researcher in de-scripting 42 , rewriting and making use of this specific hardware-software-data assemblage.
The researchers used a Kinect sensor because at that time it was considered a keenly priced and affordable hardware and to keep down future prices of an imagined commercial system. Next to the relatively low price, the Kinect sensor also had the advantage of delivering 3D information using a depth sensor, meaning it was possible to estimate distances.
It was further privacy enhancing, since in depth images individual persons can hardly be recognised as only the body shape is outlined. So all that followed in the research process took place within this specific script of the Kinect sensor. This strong dependency on a specific hardware and type of data depth images makes the approach specific to the problem at hand but hinders its application to other fall detection problems, for example fall detection in films.
Even if there is a solution to tackle the transformation problem of Kinect 3D input data to 2D video data, the respective ground truth would hardly be transferable to the other problem because the nature of the falls, their visual appearance and the contextual embedding is different. The adaption of an algorithm to a certain problem always introduces bias with respect to the data and the targeted task. As there were no training sequences showing real falls of elderly people, the training sequences had to be created synthetically by the young computer scientists by filming them simulating a fall to train the algorithm.
The ground truth was defined within this framework and a mathematical equation was formulated for the detection of falls. This equation was based on the relationship between two central visualized geometric elements; first, the orientation of the floor plane and second, a straight line visually representing the human body medial axis in the scene. The assumption was that the more similar the orientation of these two elements was, the more likely it was that a fall had occurred.
Once a specific critical threshold of this relation was reached, a fall was declared to be detected. From this observation can be concluded that ground truth, and connected to it the specific decision rules, should be generated for the specific material to avoid ambiguities as best as possible and to increase the likelihood of a correct detection.
Get this edition
Similarly, for detecting people in films being killed and hence falling, a specific training dataset and ground truth needs to be established to enable a robust detection. To sum up, the data-centric nature of IPAs makes them less generalisable to other type of material or to the same actions in different contexts.
For the use of computer vision in the DH, this means that the more specific a given research topic or research question is, the more important it gets to actively participate in the generation of ground truth to improve the adaption of the tools to the actual problem. In the sequel, problems of ground truth construction and IPAs such as false negative and false positive detections and systematic biases are discussed.
What is at stake becomes apparent when talking about false negative and false positive detections. False negatives are relevant cases e. False positives are irrelevant cases e. Usually the performance of algorithms is measured by computing false negative rates and false positive rates. Here it has to be noted that the concept of gathering false negative and false positive rates does always imply that there is one universal ground truth with which any domains of scrutiny are examined to evaluate accuracy.
If in a specific case or context the detection rate of a certain class or object is rather low, this does not automatically mean that the specific class or object does not appear. It might be the case that the object is of course there but not detected because the algorithm has a systematic bias and is not able to detect it in the given context. To give an example: if a ground truth of people being killed in films was generated solely based on training images deriving from car accident scenes, sequences of people being killed in a shoot-out may not be recognized correctly.
The goal in the development of IPAs is always to achieve a large generalisation ability, i. At the same time the number of false detections should be minimised, which is usually a conflicting goal to generalisability. Thus, a tradeoff has to be found for each concrete task and application.
Furthermore, there is the risk of generating algorithmic bias. For example, Introna and Wood analysed the politics and implications of face recognition technologies One of their central results was that facial recognition algorithms have a systemic bias: men, Asian and Afro-American populations as well as older people are more likely to be recognised than women, white populations and younger people Bias in gender and race was also discovered in a more recent evaluation of three commercial gender classification systems. This bias originated from the fact that it was mainly men who developed those systems at this time and they used their own voices to record their training data speech sequences due to a lack of large training corpuses.
While there will be always be forms of algorithmic and human bias and we understand that an unbiased algorithm or world might not exist, it seems to be crucial to reduce bias in terms of transparency and algorithmic explanation possibilities as more and more political decisions are grounded in algorithms. Another example for scholars researching and fighting bias in machine learning is Joy Buolamwini. What are the consequences and implications for the Digital Humanities and for the use of automated visual analysis tools in this domain?
While there are promising results to extract visual information on a low-level of semantic complexity, the higher-level interpretation of moving images is highly dependent on situation and context. Drawing meaningful conclusions from situation and context is a major challenge for computer vision algorithms because it requires a high-level of understanding of objects, relations, diversity, ambiguity, situated actions 49 and cultural local particularities e.
Therefore, we argue in support of an integrated and critical approach to the use of computer vision tools, whereby we attach particular importance to transparency and the involvement of expert users e. We argue that a major limitation for the application of existing pre-trained computer vision approaches in DH is their limitation to a certain and previously defined ground truth. Research questions addressed in the DH often relate to specific high-level concepts, which are not covered by existing ground truths, making the training they received insufficient. Since there is no unique and all-encompassing ground truth, we propose to move away from rigidly pre-defined ground truths to more flexible ground truths that adapt to the specific requirements of the actual expert users.
We believe that this is a useful strategy in particular in DH-related research where highly complex and hitherto not analysed concepts are the subject of investigation. Furthermore, the ground truth is often not known a priori and ground truth concepts may evolve or are discovered during the analysis of the visual content under consideration. Such a flexibility should be provided by computer vision tools to support exploration and the establishment of hypotheses. In AL the algorithm is not trained in an offline manner from pre-existing ground truth like in most existing approaches today.
Instead, the algorithm is trained in an online fashion by taking input from the user into account. This is best explained with an example. Imagine that a DH scholar wants to find political symbols e. With a very high probability, no algorithm exists which has been trained for detecting such specific symbols.
Algorithms for Image Processing and Computer Vision
AL can circumvent this problem. Starting with these examples, the algorithm tries to train an initial detector for the desired symbols. Concerning symbols that the classifier is not certain about, it can ask the user who can then assign the symbol to a specific category. This queried feedback is used by the algorithm to improve its detection capabilities iteratively.
Additionally, the user may provide feedback on the relevance of detection results and communicate them back to the algorithm, i. More recent developments further extend the idea of AL to interactive data exploration methodology i. The different strategies for incorporating user feedback into the training process makes the algorithm more adaptive to the actual data and thereby enables it to better fit to the actual research questions investigated by the user. AL further circumvents the explicit and a priori definition of a ground truth. AL can lead to a higher level of transparency and understanding of the algorithms by making the training data explicit to the user.
In this regard, the methodology connects well to the research on explainable artificial intelligence XAI This demo shows what kind of visual patterns are learned in the individual network layers. Here the user can generate input data themselves and trace how the decision of the classifier is made. We argue that the combination of XAI techniques with AL is especially promising as both types of approaches are likely to exploit mutual benefits. This in turn fosters the generation of more useful and targeted algorithms and tools for expert users Especially in the DH, where the research questions are often very specific and semantically complex, we believe that such a user-guided approach is a promising solution.
The combination with XAI approaches will be essential in the future not only to verify decisions made but also to discover biases learned from the data and to better understand false detections false positives and false negatives.
- Barometers of Quality of Life Around the Globe: How Are We Doing?.
- Joining a Nonprofit Board: What You Need to Know.
- Base Shop Data, - Rifle, Auto., Cal. .30, Browning M1918A2 [B.A.R.].
- Punk Rock: Guitar Play-Along Volume 9 (Hal Leonard Guitar Play-Along).
- George Gordon, Lord Byron;
- Digital Circuit Testing. A Guide to DFT and Other Techniques!
Berry, ed, Understanding Digital Humanities , , p. IEEE Access , 6, , pp. Understanding the negotiation and implementation of image processing algorithms, PhD dissertation, University of Vienna, Boczkowski, and Kirsten A. Foot, eds, Media Technologies.
Essays on Communication, Materiality, and Society, , p. Essays on Communication, Materiality, and Society , , p.
Big Data, Big Brother? The Construction of Scientific Facts. Sage Publications, Pergamon, Plans and Situated Actions.
- Engineering - Electrical.
- 5th Edition.
- Computer Vision.
Third Edition. The MIT Press, , pp. Collins, Tacit and Explicit Knowlegde. University of Chicago Press, , p. The MIT Press, , p. University of Chicago Press, Studies in Sociotechnical Change. Musik, C. Musik C, Zeppelzauer M. Musik, Christoph, and Matthias Zeppelzauer. Start Submission Become a Reviewer. Matthias Zeppelzauer St. Abstract Automated computer vision methods and tools offer new ways of analysing audio-visual material in the realm of the Digital Humanities DH.
While there are some promising results where these tools can be applied, there are basic challenges, such as algorithmic bias and the lack of sufficient transparency, one needs to carefully use these tools in a productive and responsible way. This article aims at providing scholars in the DH with knowledge about how automated tools for image analysis work and how they are constructed.
How to Cite: Musik, C. Published on 31 Dec Information Theory, Inference,. Geometric Tomography. Curless, Van Gool, and Szeliski. Applied Combin atorics on Words. Human Identification Based on Gait. Mathematics of Digital Images. Advances in Image and Video Segmentation.
Image Processing from CRC Press - Page 1
Zhang, Editor. Handbook of Mathematical Models in Computer Vision. The Geometry of Information Retrieval. Biometric Inverse Problems. Correlation Pattern Recognition. Pattern Recognition 3rd Edition. Dictionary of Computer Vision and. Image Processing. Fisher, et. Kernel Methods for Pattern Analysis.
Machine Vision Books. The Guide to Biometrics by Bolle, et al. Pattern Recognition Books.
- CO22-320671 Computer Vision!
- Textbook and optional references.
- CVonline: Vision Related Books including Online Books and Book Support Sites.
- Riding Lessons.
- What is Kobo Super Points?.
Click here for Top of Page. Al Jan.
Related Algorithms for Image Processing and Computer Vision (2nd Edition)
Copyright 2019 - All Right Reserved