Advanced Features¶
This notebook demonstrates advanced features including custom orderings, working with DataFrames, and pairwise matrix comparisons.
import numpy as np
import pandas as pd
import rthor
Custom Orderings¶
While rthor provides preset orderings like "circular6" and "circular8", you can specify custom hypothesized orderings for any number of variables.
The ordering is specified as a vector where each element represents the hypothesized relationship between pairs of variables.
Example: 4-Variable Linear Ordering¶
Let's create a custom ordering for 4 variables arranged linearly: Variable 1 < Variable 2 < Variable 3 < Variable 4
For 4 variables, we have 4×(4-1)/2 = 6 pairwise comparisons.
# Create a correlation matrix with linear structure
corr_linear = np.array(
[
[1.00, 0.80, 0.60, 0.40],
[0.80, 1.00, 0.75, 0.55],
[0.60, 0.75, 1.00, 0.70],
[0.40, 0.55, 0.70, 1.00],
]
)
# Custom ordering: [1,2,1] means:
# - Pair (1,2): Expect corr(1,2) > corr(1,3) → order value 1
# - Pair (1,3): Expect corr(1,3) > corr(1,4) → order value 2
# - Pair (1,4): ...and so on
#
# For a simple linear order (1<2<3<4), a common pattern is:
custom_order = [1, 2, 3, 2, 3, 3]
result_custom = rthor.test(corr_linear, order=custom_order, print_results=True)
RTHOR Test Results 1 matrix • 4 variables • 11 predictions • 24 permutations ╭──────────────┬────┬───────┬────────────────┬──────────────┬─────────────┬───────────╮ │ Matrix │ │ CI │ Interpretation │ Significance │ Satisfied │ Violated │ ├──────────────┼────┼───────┼────────────────┼──────────────┼─────────────┼───────────┤ │ [1] Matrix 1 │ ✓ │ 0.818 │ Excellent fit │ p=0.125 │ 10/11 (91%) │ 1/11 (9%) │ ╰──────────────┴────┴───────┴────────────────┴──────────────┴─────────────┴───────────╯ ℹ️ Higher CI values indicate better fit (range: -1 to +1)
Working with DataFrames¶
rthor can work directly with pandas DataFrames containing raw data. It will compute the correlation matrices automatically.
# Create sample datasets with varying degrees of circular structure
np.random.seed(42)
n_samples = 200
# Variable positions around the circle (6 positions, 60 degrees apart)
angles_vars = np.linspace(0, 2 * np.pi, 6, endpoint=False)
# Dataset 1: Excellent fit - strong circular structure
# Each observation has a circular position, variables measure proximity to that position
person_angles1 = np.random.uniform(0, 2 * np.pi, n_samples)
data1 = pd.DataFrame(
{
f"var{i + 1}": np.cos(person_angles1 - angles_vars[i])
+ np.random.normal(0, 0.3, n_samples)
for i in range(6)
}
)
# Dataset 2: Good fit - circular structure with substantial noise
person_angles2 = np.random.uniform(0, 2 * np.pi, n_samples)
data2 = pd.DataFrame(
{
f"var{i + 1}": np.cos(person_angles2 - angles_vars[i])
+ np.random.normal(0, 2.5, n_samples)
for i in range(6)
}
)
# Dataset 3: Minimal fit - no circular structure (random data)
data3 = pd.DataFrame(
{f"var{i + 1}": np.random.normal(0, 1, n_samples) for i in range(6)}
)
data1.head()
| var1 | var2 | var3 | var4 | var5 | var6 | |
|---|---|---|---|---|---|---|
| 0 | -0.909068 | 0.518907 | 1.317004 | 0.412299 | -0.637071 | -1.158358 |
| 1 | 1.022110 | 0.164318 | -0.663808 | -0.636341 | 0.220830 | 0.343207 |
| 2 | -0.024972 | -0.922641 | -0.702761 | -0.171925 | 0.892291 | 1.296646 |
| 3 | -1.028248 | -1.210843 | -0.219705 | 1.603657 | 1.245273 | 0.399087 |
| 4 | 1.116511 | 0.992209 | 0.294703 | -0.408783 | -0.894945 | -0.647430 |
# Test DataFrames
result_dfs = rthor.test(
[data1, data2, data3],
order="circular6",
labels=["Excellent Fit", "Good Fit", "Minimal Fit"],
print_results=True,
)
RTHOR Test Results 3 matrices • 6 variables • 72 predictions • 720 permutations ╭───────────────────┬────┬───────┬────────────────┬──────────────┬──────────────┬─────────────╮ │ Matrix │ │ CI │ Interpretation │ Significance │ Satisfied │ Violated │ ├───────────────────┼────┼───────┼────────────────┼──────────────┼──────────────┼─────────────┤ │ [1] Excellent Fit │ ✓ │ 1.000 │ Excellent fit │ p<.05 * │ 72/72 (100%) │ 0/72 (0%) │ │ [2] Good Fit │ ↗ │ 0.583 │ Good fit │ p<.05 * │ 57/72 (79%) │ 15/72 (21%) │ │ [3] Minimal Fit │ ⚠ │ 0.056 │ Minimal fit │ p=0.433 │ 38/72 (53%) │ 34/72 (47%) │ ╰───────────────────┴────┴───────┴────────────────┴──────────────┴──────────────┴─────────────╯ ℹ️ Higher CI values indicate better fit (range: -1 to +1)
Notice how the CI values and p-values reflect the degree of fit to the circular pattern. The first dataset shows excellent fit with strong circular structure, the second shows good fit despite substantial noise, and the third shows minimal fit as it contains only random data.
Pairwise Matrix Comparisons¶
The compare() function performs two analyses:
- Individual RTHOR tests for each matrix
- Pairwise comparisons to determine which matrix fits better
# Compare matrices pairwise
individual, pairwise = rthor.compare(
[data1, data2, data3], order="circular6", print_results=True
)
RTHOR Matrix Comparison 3 matrices • 6 variables • 72 predictions • 720 permutations ╭────────────┬────┬────────┬─────────────────┬──────────────┬───────┬────────┬────────╮ │ Comparison │ │ CI │ Result │ Significance │ Both │ Only 1 │ Only 2 │ ├────────────┼────┼────────┼─────────────────┼──────────────┼───────┼────────┼────────┤ │ Matrix 1 │ ✓ │ 1.000 │ Excellent fit │ p<.05 * │ 72/72 │ — │ — │ │ Matrix 2 │ ↗ │ 0.583 │ Good fit │ p<.05 * │ 57/72 │ — │ — │ │ Matrix 3 │ ⚠ │ 0.056 │ Minimal fit │ p=0.433 │ 38/72 │ — │ — │ ├────────────┼────┼────────┼─────────────────┼──────────────┼───────┼────────┼────────┤ │ 1 vs 2 │ ↓ │ -0.208 │ Matrix 1 better │ p=0.933 │ 57 │ 15 │ 0 │ │ 1 vs 3 │ ↓ │ -0.472 │ Matrix 1 better │ p=0.983 │ 38 │ 34 │ 0 │ │ 2 vs 3 │ ↓ │ -0.264 │ Matrix 2 better │ p=0.967 │ 37 │ 20 │ 1 │ ╰────────────┴────┴────────┴─────────────────┴──────────────┴───────┴────────┴────────╯ Info: Positive CI means matrix 2 fits better, negative means matrix 1 fits better
Individual Results¶
First, let's look at how each matrix performed individually:
individual.round(3)
| matrix | predictions | agreements | ties | ci | p_value | label | n_permutations | n_variables | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 72 | 72 | 0 | 1.000 | 0.017 | 720 | 6 | |
| 1 | 2 | 72 | 57 | 0 | 0.583 | 0.033 | 720 | 6 | |
| 2 | 3 | 72 | 38 | 0 | 0.056 | 0.433 | 720 | 6 |
Pairwise Comparisons¶
Now let's see the pairwise comparisons:
- both_agree: Predictions satisfied by both matrices
- only1: Predictions satisfied only by matrix 1
- only2: Predictions satisfied only by matrix 2
- neither: Predictions satisfied by neither
- ci: Comparison CI (positive means matrix 2 fits better)
- p_value: Significance of the difference
pairwise.round(3)
| matrix1 | matrix2 | both_agree | only1 | only2 | neither | ci | p_value | n_permutations | n_variables | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 2 | 57 | 15 | 0 | 0 | -0.208 | 0.933 | 720 | 6 |
| 1 | 1 | 3 | 38 | 34 | 0 | 0 | -0.472 | 0.983 | 720 | 6 |
| 2 | 2 | 3 | 37 | 20 | 1 | 14 | -0.264 | 0.967 | 720 | 6 |
Reading from Files¶
For large-scale analyses, you can read correlation matrices from text files:
result = rthor.test(
"correlations.txt",
n_matrices=10,
n_variables=6,
order="circular6"
)
The file should contain lower triangular matrices (including diagonal) with values separated by whitespace.
Export Results¶
Results can be easily exported for further analysis:
# To CSV
result.results.to_csv("rthor_results.csv", index=False)
# To dictionary (for JSON)
result_dict = result.to_dict()
# Get specific statistics
significant_matrices = result.results[result.results['p_value'] < 0.05]
Next Steps¶
- Explore the API Reference for complete function documentation
- Read the User Guide for theoretical background
- Check the Input Formats guide for data preparation