Principal Arriba Thomas Muhr Wrote Fases investigación Recursos CTE Comprar ATLAS.ti CAT ATLAS.ti en Mac ¿Grounded Theory?

CTE

Funcionamiento de la Tabla de Exploración de Co-ocurrencia

Tutorial basado en un correo de Thomas Muhr (desarrollador principal de ATLAS.ti) a la lista de correo de usuarios el 19-10-2009

 La TEC es un instrumento muy útil, pero tiene algunas peculiaridades, que es necesario detallar.

La TEC muestra para cada par de códigos el recuento de sus co-ocurrencias en todos los documentos actuales.

  1. Si tiene un filtro de familia de documento primario activado puede obtener una tabla que afecte solamente a esta parte de los datos

  2. Para crear una familia de documentos primarios debe hacerlo en Documents | Edit families | Open family browser (Documentos | Editar familias | Abrir el visor de familias)

  3. Tras crear la familia puede activar el filtro de la misma con un doble clic de ratón, se resalta en negrita, o bien en Documents | Filter | Families (Documentos | Filtro | Familias), y seleccionando la familia correspondiente.

Cada celda de la tabla –que representa una pareja de códigos – también nos muestra un coeficiente normalizado junto al recuento, este coeficiente debería variar entre 0 (los códigos no co-ocurren) y 1 (los códigos co-ocurren en cualquier lugar en el que se usen). Este índice de co-ocurrencia (C-índice, ver García 2006) toma en consideración el recuento de la concurrencia de cada código:

c := n12/(n1 + n2) – n12. (n12 = co-occurrence frequency of two codes c1 and c2, n1 and n2 being their occurrence frequency).

UNDER CONSTRUCTION

The CTE displays for each pair of codes the count of their co-occurrence in all current documents. Each cell – which represents a code pair - also displays a normalized coefficient along with the count, which should vary between 0 (codes do not co-occur) and 1 (codes co-occur wherever they are used). This Co-occurrence index (C-index, see Garcia, 2006) takes the occurrence count of each code into account:
c := n12/(n1 + n2) – n12. (n12 = co-occurrence frequency of two codes c1 and c2, n1 and n2 being their occurrence frequency).
The coefficient is only displayed unless you have disabled this option.

What you may experience is the following:
1. Mismatch. The number of quotations in the cell drop down list does not always resemble the cell’s frequency count, which can be larger.
2. Out of range. The C-index exceeds the 0..1 range it is supposed to stay with.
3. Funny circles. Cells can have additional visual cues, e.g., a red, yellow or orange circle.

1. Mismatch
-------------
The co-occurrence frequency does not count single quotations it counts co-occurrence „events“. If a single quotation is coded by two codes, this would count as a single co-occurrence. The complications arise when we take overlapping quotations into account. In such a case when each of the two quotations is coded by one of the codes, this also counts as a single co-occurrence. However, in the cell drop down list you will find both quotations. In fact there are currently no means to discriminate between a single quotation’s „strong“ co-occurrence and the „weak“ case for two quotations in close proximity. The drop down list will display an ordered list of all quotations for all co-occurrence events for the pair of codes. We may need to improve this by displaying single and pairs of quotations as groups.

2. Out of range
----------------
The c-index (structurally resembling the Tanimoto and Jaquard Coefficient, which are similarity measures) assumes separate non-overlapping text entities. Only then can we expect a correct range of values.
However, ATLAS.ti’s quotations may overlap to any degree. Overlaps would only then bear no problem if there wasn't any „coding redundancy“ (the ones you can eliminate using the Coding Analyzer). Let's look at a few scenarios.

Case 1: two differently coded quotations overlap, We assume no more quotations available. Let P1 be a textual document, q1 and q2 be quotations and a,b be codes. q1 is coded with a, q2 is coded with b.

Using c := n_ab/(n_a + n_b) – n_ab (renamed variables to match our code notation) we get:
n_ab = 1 one co-occurrence of a and b
n_a = 1, n_b = 1 a and b each code exactly one quotation.
c = 1/(1 + 1) – 1 = 1 Wow, maximum co-occurrence!

Case 2: q1 is coded with both codes a and b, the overlapping quotation q2 is coded with b.

n_ab = 2. q1 alone counts for a co-occurrence event and the overlapping q1*q2 for another.
n_a = 1, n_b = 2
c = 2/(1 + 2) – 2 = 2!! Bad! This value is twice the allowed maximum.

Conclusion: the C index is not appropriate to correctly represent co-occurrence in overlapping texts. We either need to find a formula that does or we need to „normalize“ our quotations, that is, to eliminate overlapping before calculating an index.

after eliminating the overlap between q1 and q2 we get three quotations. q1' coded with a and b, q1*2 coded with a and b, q2' coded with b:


n_ab = 2, n_a = 2, n_b = 3
c = 2/(2 + 3) – 2 = 2/3 = 0.67 which looks rather nicely. It is in the allowed range and it correctly takes into account that of the three possible co-occurrence events only two apply.

3. Circles
-----------
Circles with different colors are painted into a cell's upper right corner when certain conditions apply.


The red circle: When the c-index exceeds 1.
The yellow circle: an inherent issue with the C-index and similar measures is that it is distorted by code frequencies that differ too much. In such cases the coefficient tends to be much smaller than the actual co-occurrence's semantic significance. For instance, if you had coded 100 quotations with code "depression" and 10 with "mother" and you had 5 co-occurrences:
n_dep = 100, n_mother = 10, n_dep-mother = 5
c = 5/(100 + 10) - 5 = 5/105 = 0.048
A c index of only 0.048 may slip your eye easily, although code "mother" appears in 50% of all its applications with code "depression". Looking from code "depression" only 5% co-occurr with code "mother".
If the ratio between the codes frequencies exceeds a cerain threshold (currently 5 but will be user definable) the yellow light goes on in the cell. So whenever a cell shows the yellow marker it should invite you to look into the co-occurrences of this cell despite a low c-index.
Note: When the mouse rests over a cell with a yellow mark, a pop-up displays the ratio of the two codes.


The orange circle is simply a mixture of the two conditions above.

Conclusions for our users and for us: Despite the above described deficiencies of the chosen normalization method (C-index) for overlapping data entities and its distortion by unequal coding frequencies, the main purpose of the co-occurrence explorer is still met: its navigational capabilities and explorative approach. The co-occurrence count and the c-index in combination with additional colored hints are still helpful.
For precise quantitative hypothesis testing purposes some issues need to be improved, e.g. the partitioning of quotations into non-overlapping segments. Navigation can be improved by grouping co-occurrence events. In any case, co-occurrence measures need to be clearly understood, not only for the mechanical problems above but also for semantic issues involved in their meaningful interpretation (e.g., mixed application of codes with different level like broader and sub terms). Furthermore, you need to be aware of the artifacts enforced by a table approach like being reduced to a pairwise comparison. Higher order co-occurrences which would take more than two codes into account need more elaborate methods (clustering).

Garcia (2004) http://www.miislita.com/semantics/c-index-1.html


Contacte con nosotros  

lunes, 22 de febrero de 2010