Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This is a page collecting answers to requests arrived to the HPC Helpdesk.
Please check here before sending a specific request.

In this page:

Table of Contents


General:

  • How can I add a collaborator to my project?

    Project group leaders can also manage their user's membership on their UserDB page. 

  • I still didn’t receive the username and the password for the system access?

    You have to do the complete registration on the UserDB page and to be associated with a project (PI has to add you). Once you have inserted all the necessary information and you are associated with a project a new access button will appear, just click on it and you will receive in two mails the username and the password.

  • Backup Policy 

    Just the $HOME filesystem is guaranteed by daily backups. The backup procedure runs daily and we preserve a maximum of three different copies of the same file. Older versions are kept for 1 month. The last version of deleted files is kept for 2 months, then definitely removed from the backup archive.

  • Information about my project on CINECA clusters (end data, total end monthly amount of hours, the usage?)

    You can list all the Accounts attached to your username on the current cluster, together with the "budget" and the consumed resources, with the command:

    > saldo -b 

    More information in our documentation.

  • I have finished my budget but my project is still active, how can I do?

    Non-expired projects with exhausted budgets may be allowed to keep using the computational resources at the cost of minimal priority. Ask superc@cineca.it to motivate your request and, in case of a positive evaluation, you will be enabled to use the qos_lowprio QOS.

  • Which filesystems do I have available? Which usage is intended?

    • $HOME: to store programs and small light results. This is permanent, backed-uped, user-specific, and local area.
    • $CINECA_SCRATCH: where you can execute your programs. This is a large disk for the storage of run time data and files. It is a temporary area
    • $WORK: An area visible to all the collaborators of the project. This is a safe storage area to keep run time data for the whole life of the project and six months after the end of the project.
    • $DRES: An additional area to store your results if they are heavy. This space is not automatic. You need to request for it writing to superc@cineca.it

     More detailed information can be found here.

  • How can I check how much free disk have I available? 

    You can check your occupancy with a command "cindata".  Option "-h" shows the help for this command and in our documentation, you can find the description of the output.


Connection/login

  • I haven't been login for a while, recently I found I couldn't login it and return me a message: access denied

    If you have forgotten your password, just write to the CINECA Help Desk (superc@cineca.it) to reset your password.

  • How to change my password?

    You can change your current password on the front-end system using the command passwd. Please look at our password policy

  • RCM does not connect on the new infrastructure GALILEO100

    Please check that you are using the latest version of RCM. You can download  the application compatible with your operating system in the download page. Moreover, if you still experienced the issue, please delete .rcm from your home on the Galileo100 (it was copied from the home of the old infrastructure Galileo)

  • I receive the error message "WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!" when trying to login

The problem may happen because we have reinstalled the login node changing the fingerprint. We should have informed you through an HPC-news. If this is the case you can remove the old fingerprint from your known_hosts file with the command

ssh-keygen -f "~/.ssh/known_hosts" -R "login.<cluster_name>.cineca.it"

  • I keep receiving the error message "WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!" even if I modify known_host file

The issue may be related to the version 8.6 of openssh installed on your PC that wrongly deals with aliases as login.<cluster_name>.cineca.it .

The procedure we have found so far is the following:

1) remove from the known_host file in ~/.ssh all the lines associated to the cluster you would like to login
2) login to the cluster directly to all the login nodes in the following way (for example on Marconi)

ssh <username>@login01.marconi.cineca.it
ssh <username>@login02.marconi.cineca.it
ssh <username>@login03.marconi.cineca.it
ssh <username>@login06.marconi.cineca.it

in all the steps above a new line in the known_host file will be added with the fingerprint of all the specific login nodes.
Please check the cluster specific guide for the naming of all the login nodes of the cluster you would like to login.

3) edit the known_host file replacing 'login01.<cluster_name>.cineca.it' with 'login*.<cluster_name>.cineca.it'
and repeat the same for the other login.


As an alternative you may create a config file, located in $HOME/.ssh/config for Linux/mac OS, and C:\Users\username.ssh\config for Windows, with the following content:

Host login.<cluster_name>.cineca.it
        HostName login.<cluster_name>.cineca.it
        HostKeyAlgorithms ecdsa-sha2-nistp256


Now the problem should not appear again.

  • Windows WSL issue DNS resolution failing 

         If the DNS resolution fails with Temporary failure in name resolution or resolution timing out, an automatic change in /etc/resolv.conf occured.  You can change it manually by replacing the name server value with 8.8.8.8 . This file is automatically generated by WSL: to stop the automatic generation of this file, add the following entry to /etc/wsl.conf: [network] generateResolvConf = false. Then, add in your .bashrc the following commands for the automatic creation of the name server value in the resolv.conf file:
 > if [ ! -f /etc/resolv.conf ]; then
>            echo "nameserver 8.8.8.8" > /etc/resolv.conf
> fi

2FA:

  • Windows PowerShell: verify smallstep error

If running the command step to verify the installation of samllstep you incour in the following error:

PS C:\Users\user > step
step : The term 'step' is not recognized as the name of a cmdlet,
function, script file, or operable program. Check
the spelling of the name, or if a path was included, verify that the path
is correct and try again.
At line:1 char:1
+ step
+ ~~~~
+ CategoryInfo : ObjectNotFound: (step:String) [],
ParentContainsErrorRecordException
+ FullyQualifiedErrorId : CommandNotFoundException

check if you find the executable step.exe inside the folder:

C:\Users\user\scoop\shims

The installation command should have placed it there. If you don't find it, run on your Powershell  the command:

scoop install smallstep/step



Executions/scheduler:

  • I was copying data or executing something in the login and the process was killed. Why?

    The login nodes in our facilities have 10 minutes CPU time limit. This means that any execution requiring more than that is automatically killed. You may avoid this restriction by using the batch script on the SLURM partitions or an interactive run. The partition and the resources depend on the machine you are considering, so please see the "UG3.0 System specific guide" page.  Important details and suggestions on how to transfer your data can be found on the "Data management" page.

  • My job has been waiting for a long time. 

The priorities in the queue are composed of several factors and the value may change due to the presence of other jobs, of the resources required, and your priority.  You can see the reason for your job in the queue with the squeue command. If the state is PD, the job is pending. Some reasons for the pending state that could bee displayed:

    • Priority= The job is waiting for free resources. 
    • Dependency= It is depending to the end of another job.
    • QOSMaxJobsPerUserLimit = The maximum number of jobs a user can have running at a given time.

More information about the reason meanings can be found on the SLURM resource limits page.
You can also consult the estimated  starting run time with the SLURM command scontrol:

 > scontrol show job #JOBID

or you can see the priority of your job with the sprio  SLURM command:

 > sprio -j #JOBID

Here you can find more info on Budget linearization.

  • Troubles running a dynamically linked executable

...