Message-ID:

As of next Thursday, UNIX will be flushed in favor of TOPS-10. Please update your programs.

devel / comp.unix.solaris / Re: S11.4 x86 system panic when large file copied.

I have an "image" iso, that was copied OK to my server, but now when I
try to either scp it to another server or copy it locally my server
crashes/panics.

S11.4
Supermicro H/W (X11SSSZ-F), I5 7400
48Gb

File size is 16Gb
*always* panics at 9.1Gb copied

Not a prod box.
runs a number of zones (including a KZ)

Just wondering if a known issue, or one that needs chasing.

I'll try and test with vbox, when I have created another large image
file :-)

And will also test my T4

--
Bruce Porter
"The internet is a huge and diverse community but mainly friendly"
http://ytc1.blogspot.co.uk/
There *is* an alternative! http://www.openoffice.org/

On 10/12/2021 16:33, YTC#1 wrote:
> I have an "image" iso, that was copied OK to my server, but now when I
> try to either scp it to another server or copy it locally my server
> crashes/panics.
>
> S11.4
> Supermicro H/W (X11SSSZ-F), I5 7400
> 48Gb
>
>
> File size is 16Gb
> *always* panics at 9.1Gb copied
>
> Not a prod box.
> runs a number of zones (including a KZ)
>
> Just wondering if a known issue, or one that needs chasing.
>
> I'll try and test with vbox, when I have created another large image
> file :-)
>
> And will also test my T4
>

I can confirm there are no issues with S11.4 in VBox copying a large
(20Gb) file

--
Bruce Porter
"The internet is a huge and diverse community but mainly friendly"
http://ytc1.blogspot.co.uk/
There *is* an alternative! http://www.openoffice.org/

Re: S11.4 x86 system panic when large file copied.

<bIGdne3hH7_2Cy78nZ2dnUU7-cWdnZ2d@giganews.com>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=54&group=comp.unix.solaris#54

copy link Newsgroups: comp.unix.solaris

Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!buffer1.nntp.dca1.giganews.com!news.giganews.com.POSTED!not-for-mail
NNTP-Posting-Date: Fri, 10 Dec 2021 12:03:23 -0600
Newsgroups: comp.unix.solaris
Subject: Re: S11.4 x86 system panic when large file copied.
References: <sovvh1$4k5$1@dont-email.me>
From: merlyn@dork.geeks.org (Doug McIntyre)
User-Agent: nn/6.7.3
Message-ID: <bIGdne3hH7_2Cy78nZ2dnUU7-cWdnZ2d@giganews.com>
Date: Fri, 10 Dec 2021 12:03:23 -0600
Lines: 28
X-Usenet-Provider: http://www.giganews.com
X-Trace: sv3-9I4j84a6FpVrKdaaXq48PwkfJ+ctWp0+xqUVvZnhGuk+B/G+kM1dia2wYaJHxnMPv81Wn3PYDulxsbN!xWu/JHydkr6HlHNwQOkkWgb1n0OO1el3UUvanHWWF0MZLqeMtZ1EEX7Lw9U8tvJNhXRUsmA+KaVe!8g==
X-Complaints-To: abuse@giganews.com
X-DMCA-Notifications: http://www.giganews.com/info/dmca.html
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
X-Original-Bytes: 1726

by: Doug McIntyre - Fri, 10 Dec 2021 18:03 UTC

YTC#1 <bdp@ytc1-spambin.co.uk> writes:
>I have an "image" iso, that was copied OK to my server, but now when I
>try to either scp it to another server or copy it locally my server
>crashes/panics.

>S11.4
>Supermicro H/W (X11SSSZ-F), I5 7400
>48Gb

>File size is 16Gb
>*always* panics at 9.1Gb copied

>Not a prod box.
>runs a number of zones (including a KZ)

>Just wondering if a known issue, or one that needs chasing.

Most likely it is bad hardware. I've certainly dealt with many files
(including ISOs) larger than 16GB in size on Solaris boxes, as well as
a bazillion other people.

If you are running ZFS, what does 'zpool status' show? I'm guessing
you'd see errors here. You should see a bunch of zeros.

--
Doug McIntyre
doug@themcintyres.us

On 10/12/2021 18:03, Doug McIntyre wrote:
> YTC#1 <bdp@ytc1-spambin.co.uk> writes:
>> I have an "image" iso, that was copied OK to my server, but now when I
>> try to either scp it to another server or copy it locally my server
>> crashes/panics.
>
>> S11.4
>> Supermicro H/W (X11SSSZ-F), I5 7400
>> 48Gb
>
>> File size is 16Gb
>> *always* panics at 9.1Gb copied
>
>> Not a prod box.
>> runs a number of zones (including a KZ)
>
>> Just wondering if a known issue, or one that needs chasing.
>
>
> Most likely it is bad hardware. I've certainly dealt with many files
> (including ISOs) larger than 16GB in size on Solaris boxes, as well as
> a bazillion other people.
>
> If you are running ZFS, what does 'zpool status' show? I'm guessing
> you'd see errors here. You should see a bunch of zeros.
>

That ws my 1st port of call.
---8<
pool: rpool
id: 7278334453663277700
state: ONLINE
scan: scrub repaired 0 in 3h58m with 0 errors on Tue Nov 23 13:32:11 2021
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t0d0 ONLINE 0 0 0
c4t2d0 ONLINE 0 0 0

---8<

I've just another scrub, as I have realsied the file was copied to the
system after Sunday.

FMA not being particlarly helful on this one :-)
---8<
Dec 08 16:42:42 5755d888-8fae-4575-8f0f-8290b018c178 SUNOS-8000-KL
Major

Suspect 1 of 1 :
Problem class : defect.sunos.kernel.panic
Certainty : 100%
Affects :
sw:///:path=/var/crash/data/5755d888-8fae-4575-8f0f-8290b018c178
Status : faulted but still in service

Resource
FMRI :
"sw:///:path=/var/crash/data/5755d888-8fae-4575-8f0f-8290b018c178"
Status : faulty

Description : The system has rebooted after a kernel panic. The
following are
potential bugs.
stack[0] - 27096339
---8<

--
Bruce Porter
"The internet is a huge and diverse community but mainly friendly"
http://ytc1.blogspot.co.uk/
There *is* an alternative! http://www.openoffice.org/

On 11/12/2021 09:50, YTC#1 wrote:
> On 10/12/2021 18:03, Doug McIntyre wrote:
>> YTC#1 <bdp@ytc1-spambin.co.uk> writes:
>>> I have an "image" iso, that was copied OK to my server, but now when I
>>> try to either scp it to another server or copy it locally my server
>>> crashes/panics.
>>
>>> S11.4
>>> Supermicro H/W (X11SSSZ-F), I5 7400
>>> 48Gb
>>
>>> File size is 16Gb
>>> *always* panics at 9.1Gb copied
>>
>>> Not a prod box.
>>> runs a number of zones (including a KZ)
>>
>>> Just wondering if a known issue, or one that needs chasing.
>>
>>
>> Most likely it is bad hardware. I've certainly dealt with many files
>> (including ISOs) larger than 16GB in size on Solaris boxes, as well as
>> a bazillion other people.
>>
>> If you are running ZFS, what does 'zpool status' show? I'm guessing
>> you'd see errors here. You should see a bunch of zeros.
>>
>
> That ws my 1st port of call.
> ---8<
> pool: rpool
>     id: 7278334453663277700
> state: ONLINE
> scan: scrub repaired 0 in 3h58m with 0 errors on Tue Nov 23 13:32:11
> 2021
> config:
>
>         NAME        STATE      READ WRITE CKSUM
>         rpool       ONLINE        0     0     0
>           mirror-0 ONLINE        0     0     0
>             c1t0d0 ONLINE        0     0     0
>             c4t2d0 ONLINE        0     0     0
>
> ---8<
>
> I've just another scrub, as I have realsied the file was copied to the
> system after Sunday.
>

Well, that broke it. Good style.

Fails to boot, beyond devices, hangs at
pci@0,0/pci15d9,888@14/storage,c/esi@0,1 (ses 0) unknown
(nah, I don't know what the esi is either :-) )

It appears to have seen all the (4 disks).

Looks like I need to go into debug mode tomorrow, probably try single
disk (no mirror) boots (after I bring up an inspect via PXE)

But of course my PXE boot is a zone on the server :-) (I'll have to use
my spare on my Mac :-) ).

Looks like I will have to test my backup/DR procedures then .....

--
Bruce Porter
"The internet is a huge and diverse community but mainly friendly"
http://ytc1.blogspot.co.uk/
There *is* an alternative! http://www.openoffice.org/

On 11/12/2021 18:27, YTC#1 wrote:
> On 11/12/2021 09:50, YTC#1 wrote:
>> On 10/12/2021 18:03, Doug McIntyre wrote:
>>> YTC#1 <bdp@ytc1-spambin.co.uk> writes:
>>>> I have an "image" iso, that was copied OK to my server, but now when I
>>>> try to either scp it to another server or copy it locally my server
>>>> crashes/panics.
>>>
>>>> S11.4
>>>> Supermicro H/W (X11SSSZ-F), I5 7400
>>>> 48Gb
>>>
>>>> File size is 16Gb
>>>> *always* panics at 9.1Gb copied
>>>
>>>> Not a prod box.
>>>> runs a number of zones (including a KZ)
>>>
>>>> Just wondering if a known issue, or one that needs chasing.
>>>
>>>
>>> Most likely it is bad hardware. I've certainly dealt with many files
>>> (including ISOs) larger than 16GB in size on Solaris boxes, as well as
>>> a bazillion other people.
>>>
>>> If you are running ZFS, what does 'zpool status' show? I'm guessing
>>> you'd see errors here. You should see a bunch of zeros.
>>>
>>
>> That ws my 1st port of call.
>> ---8<
>>    pool: rpool
>>      id: 7278334453663277700
>>   state: ONLINE
>>    scan: scrub repaired 0 in 3h58m with 0 errors on Tue Nov 23
>> 13:32:11 2021
>> config:
>>
>>          NAME        STATE      READ WRITE CKSUM
>>          rpool       ONLINE        0     0     0
>>            mirror-0 ONLINE        0     0     0
>>              c1t0d0 ONLINE        0     0     0
>>              c4t2d0 ONLINE        0     0     0
>>
>> ---8<
>>
>> I've just another scrub, as I have realsied the file was copied to the
>> system after Sunday.
>>
>
> Well, that broke it. Good style.
>
> Fails to boot, beyond devices, hangs at
> pci@0,0/pci15d9,888@14/storage,c/esi@0,1 (ses 0) unknown
> (nah, I don't know what the esi is either :-) )
>
> It appears to have seen all the (4 disks).
>
> Looks like I need to go into debug mode tomorrow, probably try single
> disk (no mirror) boots (after I bring up an inspect via PXE)
>
> But of course my PXE boot is a zone on the server :-) (I'll have to use
> my spare on my Mac :-) ).
>
> Looks like I will have to test my backup/DR procedures then .....
>
>
>
After letting it "rest" and having a mull over it, I concluded it is
possibly a SATA controller issue. I have 2 controllers in the server
(built in and a PCIE card).

I disconnected all drives, except a single rpool, connected to the on
board SATA.
System booted.

I added a single data pool to the on board SATA.
System booted.

I added all disks to on board SATA only.
System booted.

Next test (tomorrow) will be to copy the large file again.

If it is the 2nd SATA that will annoy me, as I would have expected to
just lose 1/2 my disks if that failed, not the entire sysetm.

--
Bruce Porter
"The internet is a huge and diverse community but mainly friendly"
http://ytc1.blogspot.co.uk/
There *is* an alternative! http://www.openoffice.org/

Re: S11.4 x86 system panic when large file copied.

<sp9s6d$p5t$1@dont-email.me>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=58&group=comp.unix.solaris#58

copy link Newsgroups: comp.unix.solaris

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bdp@ytc1-spambin.co.uk (YTC#1)
Newsgroups: comp.unix.solaris
Subject: Re: S11.4 x86 system panic when large file copied.
Date: Tue, 14 Dec 2021 10:38:05 +0000
Organization: A noiseless patient Spider
Lines: 176
Message-ID: <sp9s6d$p5t$1@dont-email.me>
References: <sovvh1$4k5$1@dont-email.me>
<bIGdne3hH7_2Cy78nZ2dnUU7-cWdnZ2d@giganews.com> <sp1s9j$1l7$1@dont-email.me>
<sp2qir$vdl$1@dont-email.me> <sp72t9$uk5$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Tue, 14 Dec 2021 10:38:05 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f4efde4ce88b381ab54a039a491de061";
logging-data="25789"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/OVJcJQf8l3zZXyEkJ4HYT"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
Gecko/20100101 Thunderbird/91.3.2
Cancel-Lock: sha1:bDYv07M3HupMdSi7W3CPG70onw4=
In-Reply-To: <sp72t9$uk5$1@dont-email.me>

by: YTC#1 - Tue, 14 Dec 2021 10:38 UTC

On 13/12/2021 09:14, YTC#1 wrote:
> On 11/12/2021 18:27, YTC#1 wrote:
>> On 11/12/2021 09:50, YTC#1 wrote:
>>> On 10/12/2021 18:03, Doug McIntyre wrote:
>>>> YTC#1 <bdp@ytc1-spambin.co.uk> writes:
>>>>> I have an "image" iso, that was copied OK to my server, but now when I
>>>>> try to either scp it to another server or copy it locally my server
>>>>> crashes/panics.
>>>>
>>>>> S11.4
>>>>> Supermicro H/W (X11SSSZ-F), I5 7400
>>>>> 48Gb
>>>>
>>>>> File size is 16Gb
>>>>> *always* panics at 9.1Gb copied
>>>>
>>>>> Not a prod box.
>>>>> runs a number of zones (including a KZ)
>>>>
>>>>> Just wondering if a known issue, or one that needs chasing.
>>>>
>>>>
>>>> Most likely it is bad hardware. I've certainly dealt with many files
>>>> (including ISOs) larger than 16GB in size on Solaris boxes, as well as
>>>> a bazillion other people.
>>>>
>>>> If you are running ZFS, what does 'zpool status' show? I'm guessing
>>>> you'd see errors here. You should see a bunch of zeros.
>>>>
>>>
>>> That ws my 1st port of call.
>>> ---8<
>>>    pool: rpool
>>>      id: 7278334453663277700
>>>   state: ONLINE
>>>    scan: scrub repaired 0 in 3h58m with 0 errors on Tue Nov 23
>>> 13:32:11 2021
>>> config:
>>>
>>>          NAME        STATE      READ WRITE CKSUM
>>>          rpool       ONLINE        0     0     0
>>>            mirror-0 ONLINE        0     0     0
>>>              c1t0d0 ONLINE        0     0     0
>>>              c4t2d0 ONLINE        0     0     0
>>>
>>> ---8<
>>>
>>> I've just another scrub, as I have realsied the file was copied to
>>> the system after Sunday.
>>>
>>
>> Well, that broke it. Good style.
>>
>> Fails to boot, beyond devices, hangs at
>> pci@0,0/pci15d9,888@14/storage,c/esi@0,1 (ses 0) unknown
>> (nah, I don't know what the esi is either :-) )
>>
>> It appears to have seen all the (4 disks).
>>
>> Looks like I need to go into debug mode tomorrow, probably try single
>> disk (no mirror) boots (after I bring up an inspect via PXE)
>>
>> But of course my PXE boot is a zone on the server :-) (I'll have to
>> use my spare on my Mac :-) ).
>>
>> Looks like I will have to test my backup/DR procedures then .....
>>
>>
>>
> After letting it "rest" and having a mull over it, I concluded it is
> possibly a SATA controller issue. I have 2 controllers in the server
> (built in and a PCIE card).
>
> I disconnected all drives, except a single rpool, connected to the on
> board SATA.
> System booted.
>
> I added a single data pool to the on board SATA.
> System booted.
>
> I added all disks to on board SATA only.
> System booted.
>
> Next test (tomorrow) will be to copy the large file again.
>
> If it is the 2nd SATA that will annoy me, as I would have expected to
> just lose 1/2 my disks if that failed, not the entire sysetm.
>
>

And during the copy (scp to desktop), the follwoing appeared in
/var/adm/messages at approx 9.1Gb copied. The copy stalled and the continued
---8<
Dec 14 10:12:47 ytc1 genunix: [ID 408114 kern.notice]
/pci@0,0/pci15d9,888@14/storage@7 (scsa2usb2) removed
Dec 14 10:20:14 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
port 2 has task file error
Dec 14 10:20:14 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
port 2 is trying to do error recovery
Dec 14 10:20:14 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
port 2 task_file_status = 0x4041
Dec 14 10:20:14 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 2 succeed
Dec 14 10:20:14 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
ahci_tran_reset_dport port 2 reset device
Dec 14 10:20:20 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
port 2 has task file error
Dec 14 10:20:20 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
port 2 is trying to do error recovery
Dec 14 10:20:20 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
port 2 task_file_status = 0x4041
Dec 14 10:20:20 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 2 succeed
Dec 14 10:20:20 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
ahci_tran_reset_dport port 2 reset device
Dec 14 10:20:25 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
port 2 has task file error
Dec 14 10:20:25 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
port 2 is trying to do error recovery
Dec 14 10:20:25 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
port 2 task_file_status = 0x4041
Dec 14 10:20:25 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 2 succeed
Dec 14 10:20:25 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
ahci_tran_reset_dport port 2 reset device
Dec 14 10:20:30 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
port 2 has task file error
Dec 14 10:20:30 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
port 2 is trying to do error recovery
Dec 14 10:20:30 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
port 2 task_file_status = 0x4041
Dec 14 10:20:30 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 2 succeed
Dec 14 10:20:30 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
ahci_tran_reset_dport port 2 reset device
Dec 14 10:20:35 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
port 2 has task file error
Dec 14 10:20:35 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
port 2 is trying to do error recovery
Dec 14 10:20:35 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
port 2 task_file_status = 0x4041
Dec 14 10:20:35 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 2 succeed
Dec 14 10:20:36 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
ahci_tran_reset_dport port 2 reset device
Dec 14 10:20:41 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
port 2 has task file error
Dec 14 10:20:41 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
port 2 is trying to do error recovery
Dec 14 10:20:41 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
port 2 task_file_status = 0x4041
Dec 14 10:20:41 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
error recovery for port 2 succeed
Dec 14 10:20:41 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
ahci_tran_reset_dport port 2 reset device
Dec 14 10:20:41 ytc1 scsi: [ID 583609 kern.warning] WARNING:
/pci@0,0/pci15d9,888@17/disk@2,0 (sd8): disk not responding to selection
---8<

sd8 is the rpool mirror, which had been attached to the PCIE sata controller

No issue now when copying from internal disk to internal disk (rpool to
data pool)

recopied from server to desktop (scp, message did not re-appear).

So, I guess I am looking at a HDD issue, time to buy a new one. Or maybe
upgrade to SSD :-)

--
Bruce Porter
"The internet is a huge and diverse community but mainly friendly"
http://ytc1.blogspot.co.uk/
There *is* an alternative! http://www.openoffice.org/

Re: S11.4 x86 system panic when large file copied.

<spkfi4$7e0$1@dont-email.me>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=62&group=comp.unix.solaris#62

copy link Newsgroups: comp.unix.solaris

Path: i2pn2.org!i2pn.org!eternal-september.org!reader02.eternal-september.org!.POSTED!not-for-mail
From: bdp@ytc1-spambin.co.uk (YTC#1)
Newsgroups: comp.unix.solaris
Subject: Re: S11.4 x86 system panic when large file copied.
Date: Sat, 18 Dec 2021 11:09:55 +0000
Organization: A noiseless patient Spider
Lines: 188
Message-ID: <spkfi4$7e0$1@dont-email.me>
References: <sovvh1$4k5$1@dont-email.me>
<bIGdne3hH7_2Cy78nZ2dnUU7-cWdnZ2d@giganews.com> <sp1s9j$1l7$1@dont-email.me>
<sp2qir$vdl$1@dont-email.me> <sp72t9$uk5$1@dont-email.me>
<sp9s6d$p5t$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 18 Dec 2021 11:09:56 -0000 (UTC)
Injection-Info: reader02.eternal-september.org; posting-host="f26ee2490bb87d775d7d32256994b440";
logging-data="7616"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+iNDKQGTjUekSGrMbshEBu"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0)
Gecko/20100101 Thunderbird/91.3.2
Cancel-Lock: sha1:9SdP1AuAbux6tI0xNWtRnky9iKQ=
In-Reply-To: <sp9s6d$p5t$1@dont-email.me>

by: YTC#1 - Sat, 18 Dec 2021 11:09 UTC

On 14/12/2021 10:38, YTC#1 wrote:
> On 13/12/2021 09:14, YTC#1 wrote:
>> On 11/12/2021 18:27, YTC#1 wrote:
>>> On 11/12/2021 09:50, YTC#1 wrote:
>>>> On 10/12/2021 18:03, Doug McIntyre wrote:
>>>>> YTC#1 <bdp@ytc1-spambin.co.uk> writes:
>>>>>> I have an "image" iso, that was copied OK to my server, but now
>>>>>> when I
>>>>>> try to either scp it to another server or copy it locally my server
>>>>>> crashes/panics.
>>>>>
>>>>>> S11.4
>>>>>> Supermicro H/W (X11SSSZ-F), I5 7400
>>>>>> 48Gb
>>>>>
>>>>>> File size is 16Gb
>>>>>> *always* panics at 9.1Gb copied
>>>>>
>>>>>> Not a prod box.
>>>>>> runs a number of zones (including a KZ)
>>>>>
>>>>>> Just wondering if a known issue, or one that needs chasing.
>>>>>
>>>>>
>>>>> Most likely it is bad hardware. I've certainly dealt with many files
>>>>> (including ISOs) larger than 16GB in size on Solaris boxes, as well as
>>>>> a bazillion other people.
>>>>>
>>>>> If you are running ZFS, what does 'zpool status' show? I'm guessing
>>>>> you'd see errors here. You should see a bunch of zeros.
>>>>>
>>>>
>>>> That ws my 1st port of call.
>>>> ---8<
>>>>    pool: rpool
>>>>      id: 7278334453663277700
>>>>   state: ONLINE
>>>>    scan: scrub repaired 0 in 3h58m with 0 errors on Tue Nov 23
>>>> 13:32:11 2021
>>>> config:
>>>>
>>>>          NAME        STATE      READ WRITE CKSUM
>>>>          rpool       ONLINE        0     0     0
>>>>            mirror-0 ONLINE        0     0     0
>>>>              c1t0d0 ONLINE        0     0     0
>>>>              c4t2d0 ONLINE        0     0     0
>>>>
>>>> ---8<
>>>>
>>>> I've just another scrub, as I have realsied the file was copied to
>>>> the system after Sunday.
>>>>
>>>
>>> Well, that broke it. Good style.
>>>
>>> Fails to boot, beyond devices, hangs at
>>> pci@0,0/pci15d9,888@14/storage,c/esi@0,1 (ses 0) unknown
>>> (nah, I don't know what the esi is either :-) )
>>>
>>> It appears to have seen all the (4 disks).
>>>
>>> Looks like I need to go into debug mode tomorrow, probably try single
>>> disk (no mirror) boots (after I bring up an inspect via PXE)
>>>
>>> But of course my PXE boot is a zone on the server :-) (I'll have to
>>> use my spare on my Mac :-) ).
>>>
>>> Looks like I will have to test my backup/DR procedures then .....
>>>
>>>
>>>
>> After letting it "rest" and having a mull over it, I concluded it is
>> possibly a SATA controller issue. I have 2 controllers in the server
>> (built in and a PCIE card).
>>
>> I disconnected all drives, except a single rpool, connected to the on
>> board SATA.
>> System booted.
>>
>> I added a single data pool to the on board SATA.
>> System booted.
>>
>> I added all disks to on board SATA only.
>> System booted.
>>
>> Next test (tomorrow) will be to copy the large file again.
>>
>> If it is the 2nd SATA that will annoy me, as I would have expected to
>> just lose 1/2 my disks if that failed, not the entire sysetm.
>>
>>
>
> And during the copy (scp to desktop), the follwoing appeared in
> /var/adm/messages at approx 9.1Gb copied. The copy stalled and the
> continued
> ---8<
> Dec 14 10:12:47 ytc1 genunix: [ID 408114 kern.notice]
> /pci@0,0/pci15d9,888@14/storage@7 (scsa2usb2) removed
> Dec 14 10:20:14 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
> port 2 has task file error
> Dec 14 10:20:14 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
> port 2 is trying to do error recovery
> Dec 14 10:20:14 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
> port 2 task_file_status = 0x4041
> Dec 14 10:20:14 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
> error recovery for port 2 succeed
> Dec 14 10:20:14 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
> ahci_tran_reset_dport port 2 reset device
> Dec 14 10:20:20 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
> port 2 has task file error
> Dec 14 10:20:20 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
> port 2 is trying to do error recovery
> Dec 14 10:20:20 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
> port 2 task_file_status = 0x4041
> Dec 14 10:20:20 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
> error recovery for port 2 succeed
> Dec 14 10:20:20 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
> ahci_tran_reset_dport port 2 reset device
> Dec 14 10:20:25 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
> port 2 has task file error
> Dec 14 10:20:25 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
> port 2 is trying to do error recovery
> Dec 14 10:20:25 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
> port 2 task_file_status = 0x4041
> Dec 14 10:20:25 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
> error recovery for port 2 succeed
> Dec 14 10:20:25 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
> ahci_tran_reset_dport port 2 reset device
> Dec 14 10:20:30 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
> port 2 has task file error
> Dec 14 10:20:30 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
> port 2 is trying to do error recovery
> Dec 14 10:20:30 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
> port 2 task_file_status = 0x4041
> Dec 14 10:20:30 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
> error recovery for port 2 succeed
> Dec 14 10:20:30 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
> ahci_tran_reset_dport port 2 reset device
> Dec 14 10:20:35 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
> port 2 has task file error
> Dec 14 10:20:35 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
> port 2 is trying to do error recovery
> Dec 14 10:20:35 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
> port 2 task_file_status = 0x4041
> Dec 14 10:20:35 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
> error recovery for port 2 succeed
> Dec 14 10:20:36 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
> ahci_tran_reset_dport port 2 reset device
> Dec 14 10:20:41 ytc1 ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci
> port 2 has task file error
> Dec 14 10:20:41 ytc1 ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci
> port 2 is trying to do error recovery
> Dec 14 10:20:41 ytc1 ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci
> port 2 task_file_status = 0x4041
> Dec 14 10:20:41 ytc1 ahci: [ID 657156 kern.warning] WARNING: ahci0:
> error recovery for port 2 succeed
> Dec 14 10:20:41 ytc1 ahci: [ID 811322 kern.notice] NOTICE: ahci0:
> ahci_tran_reset_dport port 2 reset device
> Dec 14 10:20:41 ytc1 scsi: [ID 583609 kern.warning] WARNING:
> /pci@0,0/pci15d9,888@17/disk@2,0 (sd8): disk not responding to selection
> ---8<
>
> sd8 is the rpool mirror, which had been attached to the PCIE sata
> controller
>
> No issue now when copying from internal disk to internal disk (rpool to
> data pool)
>
> recopied from server to desktop (scp, message did not re-appear).
>
> So, I guess I am looking at a HDD issue, time to buy a new one. Or maybe
> upgrade to SSD :-)
>
>
>

Replaced both HDDs in rpool (2Tb WD, circa 2015) with newer 2Tb HDDS.

ZFS may have let me donw not trapping a disk issue, but splitting and
replacing rpool is a doddle :-)

--
Bruce Porter
"The internet is a huge and diverse community but mainly friendly"
http://ytc1.blogspot.co.uk/
There *is* an alternative! http://www.openoffice.org/

Subject	Author
S11.4 x86 system panic when large file copied.	YTC#1
Re: S11.4 x86 system panic when large file copied.	YTC#1
Re: S11.4 x86 system panic when large file copied.	Doug McIntyre
Re: S11.4 x86 system panic when large file copied.	YTC#1
Re: S11.4 x86 system panic when large file copied.	YTC#1
Re: S11.4 x86 system panic when large file copied.	YTC#1
Re: S11.4 x86 system panic when large file copied.	YTC#1
Re: S11.4 x86 system panic when large file copied.	YTC#1