Advanced Features¶

This notebook demonstrates advanced features including custom orderings, working with DataFrames, and pairwise matrix comparisons.

In [1]:

Copied!

import numpy as np
import pandas as pd

import rthor
import numpy as np
import pandas as pd

import rthor

Custom Orderings¶

While rthor provides preset orderings like "circular6" and "circular8", you can specify custom hypothesized orderings for any number of variables.

The ordering is specified as a vector where each element represents the hypothesized relationship between pairs of variables.

Example: 4-Variable Linear Ordering¶

Let's create a custom ordering for 4 variables arranged linearly: Variable 1 < Variable 2 < Variable 3 < Variable 4

For 4 variables, we have 4×(4-1)/2 = 6 pairwise comparisons.

In [2]:

Copied!





# Create a correlation matrix with linear structure
corr_linear = np.array(
    [
        [1.00, 0.80, 0.60, 0.40],
        [0.80, 1.00, 0.75, 0.55],
        [0.60, 0.75, 1.00, 0.70],
        [0.40, 0.55, 0.70, 1.00],
    ]
)

# Custom ordering: [1,2,1] means:
# - Pair (1,2): Expect corr(1,2) > corr(1,3)  → order value 1
# - Pair (1,3): Expect corr(1,3) > corr(1,4)  → order value 2
# - Pair (1,4): ...and so on
#
# For a simple linear order (1<2<3<4), a common pattern is:
custom_order = [1, 2, 3, 2, 3, 3]

result_custom = rthor.test(corr_linear, order=custom_order, print_results=True)
# Create a correlation matrix with linear structure
corr_linear = np.array(
    [
        [1.00, 0.80, 0.60, 0.40],
        [0.80, 1.00, 0.75, 0.55],
        [0.60, 0.75, 1.00, 0.70],
        [0.40, 0.55, 0.70, 1.00],
    ]
)

# Custom ordering: [1,2,1] means:
# - Pair (1,2): Expect corr(1,2) > corr(1,3)  → order value 1
# - Pair (1,3): Expect corr(1,3) > corr(1,4)  → order value 2
# - Pair (1,4): ...and so on
#
# For a simple linear order (1<2<3<4), a common pattern is:
custom_order = [1, 2, 3, 2, 3, 3]

result_custom = rthor.test(corr_linear, order=custom_order, print_results=True)

                                  RTHOR Test Results                                   
               1 matrix • 4 variables • 11 predictions • 24 permutations               
╭──────────────┬────┬───────┬────────────────┬──────────────┬─────────────┬───────────╮
│ Matrix       │    │    CI │ Interpretation │ Significance │   Satisfied │  Violated │
├──────────────┼────┼───────┼────────────────┼──────────────┼─────────────┼───────────┤
│ [1] Matrix 1 │ ✓  │ 0.818 │ Excellent fit  │   p=0.125    │ 10/11 (91%) │ 1/11 (9%) │
╰──────────────┴────┴───────┴────────────────┴──────────────┴─────────────┴───────────╯
               ℹ️  Higher CI values indicate better fit (range: -1 to +1)

Working with DataFrames¶

rthor can work directly with pandas DataFrames containing raw data. It will compute the correlation matrices automatically.

In [3]:

Copied!





# Create sample datasets with varying degrees of circular structure
np.random.seed(42)

n_samples = 200
# Variable positions around the circle (6 positions, 60 degrees apart)
angles_vars = np.linspace(0, 2 * np.pi, 6, endpoint=False)

# Dataset 1: Excellent fit - strong circular structure
# Each observation has a circular position, variables measure proximity to that position
person_angles1 = np.random.uniform(0, 2 * np.pi, n_samples)
data1 = pd.DataFrame(
    {
        f"var{i + 1}": np.cos(person_angles1 - angles_vars[i])
        + np.random.normal(0, 0.3, n_samples)
        for i in range(6)
    }
)

# Dataset 2: Good fit - circular structure with substantial noise
person_angles2 = np.random.uniform(0, 2 * np.pi, n_samples)
data2 = pd.DataFrame(
    {
        f"var{i + 1}": np.cos(person_angles2 - angles_vars[i])
        + np.random.normal(0, 2.5, n_samples)
        for i in range(6)
    }
)

# Dataset 3: Minimal fit - no circular structure (random data)
data3 = pd.DataFrame(
    {f"var{i + 1}": np.random.normal(0, 1, n_samples) for i in range(6)}
)

data1.head()
# Create sample datasets with varying degrees of circular structure
np.random.seed(42)

n_samples = 200
# Variable positions around the circle (6 positions, 60 degrees apart)
angles_vars = np.linspace(0, 2 * np.pi, 6, endpoint=False)

# Dataset 1: Excellent fit - strong circular structure
# Each observation has a circular position, variables measure proximity to that position
person_angles1 = np.random.uniform(0, 2 * np.pi, n_samples)
data1 = pd.DataFrame(
    {
        f"var{i + 1}": np.cos(person_angles1 - angles_vars[i])
        + np.random.normal(0, 0.3, n_samples)
        for i in range(6)
    }
)

# Dataset 2: Good fit - circular structure with substantial noise
person_angles2 = np.random.uniform(0, 2 * np.pi, n_samples)
data2 = pd.DataFrame(
    {
        f"var{i + 1}": np.cos(person_angles2 - angles_vars[i])
        + np.random.normal(0, 2.5, n_samples)
        for i in range(6)
    }
)

# Dataset 3: Minimal fit - no circular structure (random data)
data3 = pd.DataFrame(
    {f"var{i + 1}": np.random.normal(0, 1, n_samples) for i in range(6)}
)

data1.head()

Out[3]:

	var1	var2	var3	var4	var5	var6
0	-0.909068	0.518907	1.317004	0.412299	-0.637071	-1.158358
1	1.022110	0.164318	-0.663808	-0.636341	0.220830	0.343207
2	-0.024972	-0.922641	-0.702761	-0.171925	0.892291	1.296646
3	-1.028248	-1.210843	-0.219705	1.603657	1.245273	0.399087
4	1.116511	0.992209	0.294703	-0.408783	-0.894945	-0.647430

In [4]:

Copied!





# Test DataFrames
result_dfs = rthor.test(
    [data1, data2, data3],
    order="circular6",
    labels=["Excellent Fit", "Good Fit", "Minimal Fit"],
    print_results=True,
)
# Test DataFrames
result_dfs = rthor.test(
    [data1, data2, data3],
    order="circular6",
    labels=["Excellent Fit", "Good Fit", "Minimal Fit"],
    print_results=True,
)

                                      RTHOR Test Results                                       
                 3 matrices • 6 variables • 72 predictions • 720 permutations                  
╭───────────────────┬────┬───────┬────────────────┬──────────────┬──────────────┬─────────────╮
│ Matrix            │    │    CI │ Interpretation │ Significance │    Satisfied │    Violated │
├───────────────────┼────┼───────┼────────────────┼──────────────┼──────────────┼─────────────┤
│ [1] Excellent Fit │ ✓  │ 1.000 │ Excellent fit  │   p<.05 *    │ 72/72 (100%) │   0/72 (0%) │
│ [2] Good Fit      │ ↗  │ 0.583 │ Good fit       │   p<.05 *    │  57/72 (79%) │ 15/72 (21%) │
│ [3] Minimal Fit   │ ⚠  │ 0.056 │ Minimal fit    │   p=0.433    │  38/72 (53%) │ 34/72 (47%) │
╰───────────────────┴────┴───────┴────────────────┴──────────────┴──────────────┴─────────────╯
                   ℹ️  Higher CI values indicate better fit (range: -1 to +1)

Notice how the CI values and p-values reflect the degree of fit to the circular pattern. The first dataset shows excellent fit with strong circular structure, the second shows good fit despite substantial noise, and the third shows minimal fit as it contains only random data.

Pairwise Matrix Comparisons¶

The compare() function performs two analyses:

Individual RTHOR tests for each matrix
Pairwise comparisons to determine which matrix fits better

In [5]:

Copied!





# Compare matrices pairwise
individual, pairwise = rthor.compare(
    [data1, data2, data3], order="circular6", print_results=True
)
# Compare matrices pairwise
individual, pairwise = rthor.compare(
    [data1, data2, data3], order="circular6", print_results=True
)

                                RTHOR Matrix Comparison                                
             3 matrices • 6 variables • 72 predictions • 720 permutations              
╭────────────┬────┬────────┬─────────────────┬──────────────┬───────┬────────┬────────╮
│ Comparison │    │     CI │ Result          │ Significance │  Both │ Only 1 │ Only 2 │
├────────────┼────┼────────┼─────────────────┼──────────────┼───────┼────────┼────────┤
│ Matrix 1   │ ✓  │  1.000 │ Excellent fit   │   p<.05 *    │ 72/72 │      — │      — │
│ Matrix 2   │ ↗  │  0.583 │ Good fit        │   p<.05 *    │ 57/72 │      — │      — │
│ Matrix 3   │ ⚠  │  0.056 │ Minimal fit     │   p=0.433    │ 38/72 │      — │      — │
├────────────┼────┼────────┼─────────────────┼──────────────┼───────┼────────┼────────┤
│ 1 vs 2     │ ↓  │ -0.208 │ Matrix 1 better │   p=0.933    │    57 │     15 │      0 │
│ 1 vs 3     │ ↓  │ -0.472 │ Matrix 1 better │   p=0.983    │    38 │     34 │      0 │
│ 2 vs 3     │ ↓  │ -0.264 │ Matrix 2 better │   p=0.967    │    37 │     20 │      1 │
╰────────────┴────┴────────┴─────────────────┴──────────────┴───────┴────────┴────────╯
   Info: Positive CI means matrix 2 fits better, negative means matrix 1 fits better

Individual Results¶

First, let's look at how each matrix performed individually:

In [6]:

Copied!

individual.round(3)
individual.round(3)

Out[6]:

	matrix	predictions	agreements	ci	p_value	n_permutations	n_variables
0	1	72	72	1.000	0.017	720	6
1	2	72	57	0.583	0.033	720	6
2	3	72	38	0.056	0.433	720	6

Pairwise Comparisons¶

Now let's see the pairwise comparisons:

both_agree: Predictions satisfied by both matrices
only1: Predictions satisfied only by matrix 1
only2: Predictions satisfied only by matrix 2
neither: Predictions satisfied by neither
ci: Comparison CI (positive means matrix 2 fits better)
p_value: Significance of the difference

In [7]:

Copied!

pairwise.round(3)
pairwise.round(3)

Out[7]:

	matrix1	matrix2	both_agree	only1	only2	neither	ci	p_value	n_permutations	n_variables
0	1	2	57	15	0	0	-0.208	0.933	720	6
1	1	3	38	34	0	0	-0.472	0.983	720	6
2	2	3	37	20	1	14	-0.264	0.967	720	6

Reading from Files¶

For large-scale analyses, you can read correlation matrices from text files:

result = rthor.test(
    "correlations.txt",
    n_matrices=10,
    n_variables=6,
    order="circular6"
)

The file should contain lower triangular matrices (including diagonal) with values separated by whitespace.

Export Results¶

Results can be easily exported for further analysis:

# To CSV
result.results.to_csv("rthor_results.csv", index=False)

# To dictionary (for JSON)
result_dict = result.to_dict()

# Get specific statistics
significant_matrices = result.results[result.results['p_value'] < 0.05]

Next Steps¶

Explore the API Reference for complete function documentation
Read the User Guide for theoretical background
Check the Input Formats guide for data preparation

In [ ]: