When accessing a user of an instance whose original server failed, the first attempt fixed failure.

naskya commented

2025-02-05 20:52:14 +09:00

Owner

What type of issue is this?

label: Server

label: Bug

What happened?

When accessing a user of an instance whose original server failed, the first attempt fixed failure.

What did you expect to happen?

Firefish should not show an error message, but display the local content directly.

Steps to reproduce the issue

Find an account with a local cached user object but an upstream server failure, such as @WordlessEcho@lolic.at
Access it on Firefish, such as https://dvd.chat/@WordlessEcho@lolic.at
The first visit of the day will prompt an error, and the next visit will be normal.

Reproduces how often

Once per account per instance per day.

What did you try to solve the issue / Do you have any insights

I suspect that the picture frame code is not fault-tolerant, but further verification is needed.

Version

v20240725

Instance

dvd.chat

What browser are you using? (client-side issues only)

What operating system are you using? (client-side issues only)

How do you deploy Firefish on your server? (server-side issues only)

What operating system are you using? (Server-side issues only)

Relevant log output

Contribution Guidelines

By submitting this issue, you agree to follow our Contribution Guidelines

I agree to follow this project's Contribution Guidelines
I have searched the issue tracker for similar issues, and this is not a duplicate.

Are you willing to fix this bug? (optional)

Yes, I will open a merge request that closes this ticket.

## What type of issue is this?   * label: Server           * label: Bug ## What happened?  When accessing a user of an instance whose original server failed, the first attempt fixed failure. ## What did you expect to happen?  Firefish should not show an error message, but display the local content directly. ## Steps to reproduce the issue  1. Find an account with a local cached user object but an upstream server failure, such as @WordlessEcho@lolic.at 2. Access it on Firefish, such as https://dvd.chat/@WordlessEcho@lolic.at 3. The first visit of the day will prompt an error, and the next visit will be normal. ## Reproduces how often  Once per account per instance per day. ## What did you try to solve the issue / Do you have any insights  I suspect that the picture frame code is not fault-tolerant, but further verification is needed. ![image](/uploads/08a68f80affa88d72645370770779c86/image.png) ## Version  v20240725 <details> ### Instance  dvd.chat ### What browser are you using? (client-side issues only) ### What operating system are you using? (client-side issues only) ### How do you deploy Firefish on your server? (server-side issues only) ### What operating system are you using? (Server-side issues only) ### Relevant log output  </details> ## Contribution Guidelines By submitting this issue, you agree to follow our [Contribution Guidelines](https://firefish.dev/firefish/firefish/-/blob/develop/CONTRIBUTING.md) - [X] I agree to follow this project's Contribution Guidelines - [X] I have searched the issue tracker for similar issues, and this is not a duplicate. ## Are you willing to fix this bug? (optional) - [X] Yes, I will open a merge request that closes this ticket.

naskya commented

2025-02-05 20:52:14 +09:00

Author

Owner

Author: laozhoubuluo

Thank you very much for providing a way to clear cache. Strictly speaking, lastFetchedAt is a user attribute that needs to be modified in Postgres. I'm too lazy to change database in Postgres.
However, the previously submitted code change plan can solve this problem. I will submit a MR later.
The resolveUser interface can support timeout, but Zotan only implements the 1500ms option. Do you think this time is appropriate? Do you need to implement a longer timeout?

Aug 01 00:11:32 FirefishDev firefish[8755]:  INFO 1        [remote resolve-user]        try resync: laozhoubuluo@firefish-pre.nglab.bid
Aug 01 00:11:32 FirefishDev firefish[8755]:  INFO 1        [remote resolve-user]        WebFinger for laozhoubuluo@firefish-pre.nglab.bid
Aug 01 00:11:33 FirefishDev firefish[8755]: ERROR 1        [remote resolve-user]        Failed to WebFinger for laozhoubuluo@firefish-pre.nglab.bid: 502
Aug 01 00:11:33 FirefishDev firefish[8755]: ERROR 1        [remote resolve-user]        error resolving remote user WebFinger: Error: Failed to WebFinger for laozhoubuluo@firefish-pre.nglab.bid: 502
Aug 01 00:11:33 FirefishDev firefish[8755]:  INFO 1        [remote resolve-user]        return existing remote user: laozhoubuluo@firefish-pre.nglab.bid

*Author: laozhoubuluo* 1. Thank you very much for providing a way to clear cache. Strictly speaking, `lastFetchedAt` is a user attribute that needs to be modified in Postgres. I'm too lazy to change database in Postgres. 2. However, the previously submitted code change plan can solve this problem. I will submit a MR later. 3. The resolveUser interface can support timeout, but Zotan only implements the 1500ms option. Do you think this time is appropriate? Do you need to implement a longer timeout? ``` Aug 01 00:11:32 FirefishDev firefish[8755]: INFO 1 [remote resolve-user] try resync: laozhoubuluo@firefish-pre.nglab.bid Aug 01 00:11:32 FirefishDev firefish[8755]: INFO 1 [remote resolve-user] WebFinger for laozhoubuluo@firefish-pre.nglab.bid Aug 01 00:11:33 FirefishDev firefish[8755]: ERROR 1 [remote resolve-user] Failed to WebFinger for laozhoubuluo@firefish-pre.nglab.bid: 502 Aug 01 00:11:33 FirefishDev firefish[8755]: ERROR 1 [remote resolve-user] error resolving remote user WebFinger: Error: Failed to WebFinger for laozhoubuluo@firefish-pre.nglab.bid: 502 Aug 01 00:11:33 FirefishDev firefish[8755]: INFO 1 [remote resolve-user] return existing remote user: laozhoubuluo@firefish-pre.nglab.bid ```

naskya commented

2025-02-05 20:52:15 +09:00

Author

Owner

Author: naskya

wait for the cache to time out

You can manually clear caches by deleting Redis keys

# delete specific cache
redis-cli 'DEL cache_key_name'

# delete all caches
redis-cli --scan | xargs -L 100 redis-cli DEL

if you’re using the “db-container” setup, you can $ make redis-cli to enter the Redis CLI.

I wonder if anyone have encountered this.

Personally, I don’t think this is good, but the backend behavior changes depending on NODE_ENV, so it may be related. Slow responses should be timed out as they can cause a DoS attack.

$ grep -r 'production' packages/backend/src
packages/backend/src/services/logger.ts:                        process.env.NODE_ENV !== "production"
packages/backend/src/services/drive/upload-from-url.ts:         process.env.NODE_ENV === "production" &&
packages/backend/src/boot/master.ts:    if (env !== "production") {
packages/backend/src/boot/master.ts:            logger.warn("The environment is not in production mode.");
packages/backend/src/server/api/api-handler.ts:                                         ...(y!.info && process.env.NODE_ENV !== "production"
packages/backend/src/server/index.ts:if (!["production", "test"].includes(process.env.NODE_ENV || "")) {
packages/backend/src/db/postgre.ts:const log = process.env.NODE_ENV !== "production";
packages/backend/src/misc/download-url.ts:                              (process.env.NODE_ENV === "production" ||

*Author: naskya* > wait for the cache to time out You can manually clear caches by deleting Redis keys ```sh # delete specific cache redis-cli 'DEL cache_key_name' # delete all caches redis-cli --scan | xargs -L 100 redis-cli DEL ``` if you’re using the [“db-container” setup](https://firefish.dev/firefish/firefish/-/blob/86d3f8f5b5f483d647f1f062d8363263ab47770f/dev/docs/db-container.md), you can `$ make redis-cli` to enter the Redis CLI. > I wonder if anyone have encountered this. Personally, I don’t think this is good, but the backend behavior changes depending on `NODE_ENV`, so it may be related. Slow responses should be timed out as they can cause a DoS attack. ```console $ grep -r 'production' packages/backend/src packages/backend/src/services/logger.ts: process.env.NODE_ENV !== "production" packages/backend/src/services/drive/upload-from-url.ts: process.env.NODE_ENV === "production" && packages/backend/src/boot/master.ts: if (env !== "production") { packages/backend/src/boot/master.ts: logger.warn("The environment is not in production mode."); packages/backend/src/server/api/api-handler.ts: ...(y!.info && process.env.NODE_ENV !== "production" packages/backend/src/server/index.ts:if (!["production", "test"].includes(process.env.NODE_ENV || "")) { packages/backend/src/db/postgre.ts:const log = process.env.NODE_ENV !== "production"; packages/backend/src/misc/download-url.ts: (process.env.NODE_ENV === "production" || ```

naskya commented

2025-02-05 20:52:15 +09:00

Author

Owner

Author: laozhoubuluo

The simulation of the server disconnection problem needs to wait for the cache to time out, and the test results need to be synchronized later.

But by the way, api/users/show calls resolveUser without a timeout mechanism. In the local test environment with slow network (network through proxy), we can see the problem that the backend has not completed the request but Nginx has timed out.

In this example, the frontend gave up after 1.5 minutes, and the backend completed the interface request after 6.5 minutes. I feel that there should be no problem in the production environment. I wonder if anyone have encountered this.

*Author: laozhoubuluo* The simulation of the server disconnection problem needs to wait for the cache to time out, and the test results need to be synchronized later. But by the way, api/users/show calls resolveUser without a timeout mechanism. In the local test environment with slow network (network through proxy), we can see the problem that the backend has not completed the request but Nginx has timed out. In this example, the frontend gave up after 1.5 minutes, and the backend completed the interface request after 6.5 minutes. I feel that there should be no problem in the production environment. I wonder if anyone have encountered this. ![image](/uploads/e5b1cc3a0ffb018a05094516d3879bc2/image.png)

naskya commented

2025-02-05 20:52:15 +09:00

Author

Owner

Author: laozhoubuluo

Jul 30 22:19:45 Firefish firefish[709]:  INFO 1        [remote resolve-user]        try resync: wordlessecho@lolic.at
Jul 30 22:19:45 Firefish firefish[709]:  INFO 1        [remote resolve-user]        WebFinger for wordlessecho@lolic.at
Jul 30 22:20:45 Firefish firefish[709]: ERROR 1        [remote resolve-user]        Failed to WebFinger for wordlessecho@lolic.at: The operation was aborted.
Jul 30 22:21:10 Firefish firefish[709]:  INFO 1        [remote resolve-user]        return existing remote user: wordlessecho@lolic.at

*Author: laozhoubuluo* ``` Jul 30 22:19:45 Firefish firefish[709]: INFO 1 [remote resolve-user] try resync: wordlessecho@lolic.at Jul 30 22:19:45 Firefish firefish[709]: INFO 1 [remote resolve-user] WebFinger for wordlessecho@lolic.at Jul 30 22:20:45 Firefish firefish[709]: ERROR 1 [remote resolve-user] Failed to WebFinger for wordlessecho@lolic.at: The operation was aborted. Jul 30 22:21:10 Firefish firefish[709]: INFO 1 [remote resolve-user] return existing remote user: wordlessecho@lolic.at ```

naskya commented

2025-02-05 20:52:15 +09:00

Author

Owner

Author: laozhoubuluo

I reproduced the problem in the local test environment and found that the error message was still Failed to WebFinger for wordlessecho@lolic.at: The operation was aborted. . After carefully checking the screenshots and code, I found that my local Git was not updated to 20240725. After the update, I found that resolveUserWebFinger could also throw this error.

But I still have to try to know whether adding a catch here can fix the problem.

*Author: laozhoubuluo* I reproduced the problem in the local test environment and found that the error message was still `Failed to WebFinger for wordlessecho@lolic.at: The operation was aborted.` . After carefully checking the screenshots and code, I found that my local Git was not updated to 20240725. After the update, I found that resolveUserWebFinger could also throw this error. ![image](/uploads/40e86fd494137dcebee568583fff3d07/image.png) But I still have to try to know whether adding a catch here can fix the problem. ![image](/uploads/6756ff8b756c657a78286b68c93b84a0/image.png)

naskya commented

2025-02-05 20:52:16 +09:00

Author

Owner

Author: naskya

Actually, we don’t need to use the resolveSelf function. packages/backend/src/remote/resolve-user.ts has been updated by the huge commit f282549900780a3413373dab444968d19db38102, and the Iceshrimp’s code should handle it better (but we may be using it wrong).

*Author: naskya* Actually, we don’t need to use the `resolveSelf` function. [`packages/backend/src/remote/resolve-user.ts`](https://firefish.dev/firefish/firefish/-/blob/develop/packages/backend/src/remote/resolve-user.ts) has been updated by the huge commit f282549900780a3413373dab444968d19db38102, and the Iceshrimp’s code should handle it better (but we may be using it wrong).

naskya commented

2025-02-05 20:52:16 +09:00

Author

Owner

Author: naskya

Thanks for your insights! If you can fix the problem, please feel free to open a merge request.

side note:

I believe the function name resolveSelf is taken from the WebFinger spec (see https://info.firefish.dev/.well-known/webfinger?resource=acct:firefish@info.firefish.dev for example).

I personally don’t think we need to stick to the word self. Perhaps const webfingerLink = ... is a better variable name?

*Author: naskya* Thanks for your insights! If you can fix the problem, please feel free to open a merge request. --- side note: I believe the function name `resolveSelf` is taken from the [WebFinger spec](<https://docs.joinmastodon.org/spec/webfinger/#example>) (see <https://info.firefish.dev/.well-known/webfinger?resource=acct:firefish@info.firefish.dev> for example). I personally don’t think we need to stick to the word `self`. Perhaps `const webfingerLink = ...` is a better variable name?

naskya commented

2025-02-05 20:52:16 +09:00

Author

Owner

Author: laozhoubuluo

changed the description

*Author: laozhoubuluo* changed the description

Rows
Columns

When accessing a user of an instance whose original server failed, the first attempt fixed failure. #174

What type of issue is this?

What happened?

What did you expect to happen?

Steps to reproduce the issue

Reproduces how often

What did you try to solve the issue / Do you have any insights

Version

Instance

What browser are you using? (client-side issues only)

What operating system are you using? (client-side issues only)

How do you deploy Firefish on your server? (server-side issues only)

What operating system are you using? (Server-side issues only)

Relevant log output

Contribution Guidelines

Are you willing to fix this bug? (optional)