#protestoes ‘s image crawling and visualization methods (english translation)


(Para ler este artigo em português, clique aqui)

The project Visagem aimed to identify, through the hashtag “#protestoes” from Facebook and Instagram, patterns and resonances between images from these two online tools. It was possible to observe, after this analysis, the structural differences of the tools and of the process of photographic production that characterizes the images shared online, as well as the visual character of this social movement.

Due to our limit of time and due to the necessity of analyzing these images as the protests of June 2013 in Brazil happened, our image extraction occurred manually, by downloading each image, copying data such as the user that shared the image. This manual extraction gave us even more experience in understanding the differences between theses social network sites.

Since Facebook organizes its information based on the interests and preferences of the user, through their history of visualization, comments and like options, images showed by the hashtag search did not follow a chronological or a number of access standard. This leads us to infer that if the research was made by another user, the order of the information would be different. In addition, Facebook’s images do not follow a size standard, like Instagram images which are formatted to 4:3 (similar to the Polaroid standard).

To extract images from Instagram, we used the website Webstagram, due to its possibility of visualizing a large quantity of images in the same webpage and due to its fast download possibility. As the Instagram data standard follows the “time passing” logic, we organized the images into two groups, week 1 (17/07 – 22/06) and week 2 (23/06 – 24/06), once we could not precise the date and the time in which the images were posted.

The methodology used to create the Facebook (500 images) and Instagram (492 images) visualizations, that contained the hashtag “#protestoes”, was based on the software ImageJ and on the macro developed especially for this program, called ImagePlot1.

ImageJ is a public domain java based software, used for image processing and mining. The Macro ImagePlot, used with the ImageJ software, was developed by the Software Studies Initiative (http://lab.softwarestudies.com/), in NY, coordinated by Lev Manovich. Both are available for download at Software Studies Initiative, and have in its installation suite some datasets examples for initial experiences with the software.

In the project Visagem, the 500 images from Instagram and the 492 images from Facebook were organized in a dataset that included the “File Name”, “ID”, “Brightness median”, “Saturation median”, “Hue median”, “Caption” and “User”. The dataset information was created on excel, where each one of these divisions forms a column and is posteriorly transformed into a text extension (.txt) and separated by tabs as so it can be read by the macro ImagePlot. The measurements for bright, saturation and hue of each image were obtained from another macro called “measurements”, available with ImageJ.

The column “File name” is the identification of the image in the computer. Its through this column that the software correlate each tab with an determined image. Even though the classification of these images was an easy task, one difficulty found was the sequential numbering of the images (1.jpg; 2.jpg; 3.jpg and so on). ImageJ, when running the macro “measurements”, organizes the results in a confusing way, following the ideia that the first images were the one that started with 1 (1.jpg; 10.jpg; 11.jpg; (…) 100.jpg). As so, a way of avoiding this problem was to name the images with as much zeros as the number of images there are in the dataset (from 001.jpg to 099.jpg).

The “Caption” and the “User” tabs are linked together. The users are the profiles (or pages, in Facebook’s case) that posted or shared the images that contained the hashtag “#protestoes”. The “Caption” was a way found to make it possible to use the users as a data in the visualizations, once adding the text as a parameter in the graphic wasn’t possible.

When creating visualizations some problems became evident. In a general manner, the visualizations needed to be created using more potent computers due to its high memory necessity and better image processing. One detailed observed in the Instagram images was that some filters in black and white altered the composition of the image, transforming it to grayscale (LUT). The solution found was to convert these images to RGB, without transforming the basic characteristics of the image, but permitting the graphics render.

1 ImagePLot was developed by the Software Studies Initiative (softwarestudies.com)  with support of the National Endowment for Humanities (NEH), California Institute for Telecommunications and Information Technology (Calit2), and Center for Research in Computing and the Arts (CRCA).

Compartilhe

Comentários