Thursday, July 13, 2017

Connection Testing cannot be enabled for a pool when the ManagedConnectionFactory does not implement ValidatingManagedConnectionFactory




Applies To: Oracle SOA Suite/Weblogic 10g,11g,12C

I was trying to set below set below JCA Adapter connection pool properties to true. The values I was setting them to is mentioned below.

Test Frequency Seconds: 300 (Default 0)
Test Connections On Reserve: True (Default False)

I was trying to set it for a Oracle MQ Series connection pool. The customer reported that the MQ adapter stop picking message from the Websphere MQ once the system went down for maintenance e.g. queue manager unavailable. After confirming the MQ system was up and running the connecting SOA interface/composite had to be restarted to make the system pick messages. With my past experience with datasource connection pools I thought I could fix this issue by forcing the adapter to validate connections to MQ system periodically. I could easily achieve this by setting above connection pool parameters. Ideally above setting mean that the MQ connection pool checks for the MQ system to be present every 300 seconds and only passes valid connections to application when it tries to reserve a connection to the MQ system. Theoretically I got it all figured out until I applied the settings and the logs reported below issue ! 


weblogic.management.DeploymentException: Connection Testing cannot be enabled for a pool when the ManagedConnectionFactory does not implement ValidatingManagedConnectio
 nFactory. The following invalid settings have been detected:test-frequency-seconds is set to non-zero
   [Connector:199167]test-connections-on-reserve is set to true


The adapter would not come up till I reverted my settings back ! So what happened? 

On further research (googling of course) I could find that only if a resource adapter's ManagedConnectionFactory implements the ValidatingManagedConnectionFactory interface, then the application server can test the validity of existing connections. Refer to Testing Connections section here

 As per Oracle Metalink doc ID 957853.1 Oracle MQ adapter does not use above stated interface hence as per the error in logs connection validity can't be tested ! Hence enabling connection test setting like below for the MQ series adapter won't work as expected and you will get above mentioned error.

Test Connections On Create:
Test Connections On Reserve:
Test Connections On Release:

So now the question remains, is there a way we can work around this known limitation? Well I think it is possible. Oracle Metalink doc ID 957853.1  declares this issue as Bug 8913481 & Bug 8918056. The note prescribes applying patch 8918056 (MQAdapter fix) to fix above explained issue. But the patch applies to older version of BPEL (10.1.3.4) 

I would encourage you to present this analysis to Oracle and request a patch for above mentioned bugs for newer versions of Oracle SOA Suite/Middleware like 11g/12C. I got the above issue on Oracle SOA Suite version 11.1.1.7.8. Will update this article in case I am able to get a fix from Oracle for the newer versions ! 

On a side note this issue is also relevant for other Oracle adapters available with SOA Suite. As per Oracle Metalink Doc ID 1282064.1 testing of the Apps adapter and DB adapter connection factories via the WebLogic console is not supported since the ManagedConnectionFactory of both adapters does not implement ValidatingManagedConnectionFactory due to performance considerations (during runtime) and due to the fact that the DB/Apps JCA connection is just a shallow wrapper around a data source connection handle.

Soumya Mishra



Thursday, June 29, 2017

JMS Store declared unhealthy and unavailable: start() failed on resource 'WLStore_XXX_base_domain_SOAJMSFileStore': XAER_RMFAIL : Resource manager is unavailable

So what happened? 

The SOA JMS Store (Handled as an XA Resource by WLS) went unavailable (declared Unhealthy)  for 30 minutes. During this time the logs were full of below errors.

start() failed on resource 'WLStore_XXX_base_domain_SOAJMSFileStore': XAER_RMFAIL : Resource manager is unavailable

After 30 minutes the persistent store was available.After the store was available, JVM was filled with pending/back log messages which resulted in Full GC condition. The Full GCs are Stop the World and rendered the JVM inaccessible for application use. To make the server accessible and available for application work to proceed had to restart the SOA server.

Above error/issue applies to Oracle SOA Suite 11g/12C.

What were the error in logs?

The JTA health state has changed from HEALTH_OK to HEALTH_WARN with reason codes: Resource WLStore_XXX_base_domain_SOAJMSFileStore declared unhealthy

start() failed on resource 'WLStore_XXX_base_domain_SOAJMSFileStore': XAER_RMFAIL : Resource manager is unavailable

Exception occured when binding was invoked.
Exception occured during invocation of JCA binding: "JCA Binding execute of Reference operation 'Produce_Message' failed due to: ERRJMS_PROVIDER_ERR.
ERRJMS_PROVIDER_ERR.
Unable to produce message due to JMS provider internal error.
Please examine the log file to determine the problem.
".
The invoked JCA adapter raised a resource exception.
Please examine the above error message carefully to determine a resolution.
" . Root cause :
javax.transaction.SystemException: start() failed on resource 'WLStore_XXX_base_domain_SOAJMSFileStore': XAER_RMFAIL : Resource manager is unavailable
javax.transaction.xa.XAException: Internal error: XAResource 'WLStore_XXX_base_domain_SOAJMSFileStore' is unavailable


So why was the persistent store declared unhealthy?

By default, if an XA resource that is participating in a global transaction fails to respond to an XA call from the WebLogic Server transaction manager, WebLogic Server flags the resource as unhealthy and unavailable, and blocks any further calls to the resource in an effort to preserve resource threads. The failure can be caused by either an unhealthy transaction or an unhealthy resource—there is no distinction between the two causes. In both cases, the resource is marked as unhealthy (Doc ID 1484996.1)

Here JMS store/XA resource has not responded to a request from the WebLogic Transaction Manager for 120 seconds "MaxXACallMillis." When this happened, the WLS Transaction Manager marked that XA resource as unhealthy and then stopped all further communication to that XA resource until the time "MaxResourceUnavailableMillis"  passed, which is set to 30 mins (in a default install)

Q. Why did the persistent store go inaccessible for 30 minutes? 

A. MaxResourceUnavailableMillis  defines the maximum duration (in milliseconds) that an XA resource is marked as unhealthy. This is by default set to 30 Minutes After this duration, the XA resource is declared available again.

Q. Why did the  JMS Store not respond to transaction manager on time?

A. There could be various reasons. It could be because –

1. As per Oracle Note# 1358303.1 which has the same error code we faced - file store itself had an issue. It had grown very big, so it was showing as unhealthy and compromising the JTA health as it is a participating resource in the complete transaction. 

2. There could be a minor NW Issue that would have caused accessibility issue between server and JMS Store which resides on disks. I could not see anything in logs regarding NW connectivity errors so far.

3. The JMS store could be busy processing other transactions and would need more time to respond than configured as per MaxXACallMillis. Talk to developers and understand the code design and see how busy the JMS queues/topics are?

Q. What are the tuning recommendations to prevent this error/issue in future?

1. Set WLS domain parameter MaxResourceUnavailableMillis to lesser minutes from existing 30 minutes, I would start with 10. (This recommendation is as per Metalink Note # 1320141.1). This will ensure the WLS resources are tried for availability after 10 minutes instead of current 30, hence causing minimal system downtime. This will also cause less messages to queue up for processing once the store comes back up in case of similar failures in future. Fewer back logs will prevent server to go into long duration GCs which happened in above case.

2. In case you see this issue reappearing and anticipate a busy store, increase MaxXACallMillis to 3 Minutes and see if the issue reappears. By making this change we will allow more time to store to respond before being declared unhealthy. Keep tuning this parameter until you see optimal performance in your environment based on the application design and usage. Again no one size fits all, so try coming up with number that will work for your environment/application.

3. Compacting the file store would help to compact and fragment the space occupied by the file store. The compact command does not delete current data, and only works when the WebLogic Server that hosts the store is off-line. Make sure you back up the old store file before you run the compact command. Refer here to see how you can run the compaction commands.

4. In most situations, file stores do not grow too large. After a message is consumed, it is deleted from the file store and the space it consumed is made available for other messages. However, if too many messages are stored in the file store so that the file store keeps getting too large repeatedly, then we must set lower quotas so that producers are blocked from sending more messages into the destination until the consumers have consumed and deleted the message. Note that it is recommended that JMS configurations should configure quotas on each JMS server.  The quota can be set based on application requirements. I will try and discuss this at length in another post. 


Please let me know in the comment section if above tuning helped you. I will be glad to listen to your stories and experiences. Happy learning ! 

Soumya Mishra 


Tuesday, August 23, 2016

Faster Weblogic Server Startup on Linux VM


Problem


Weblogic server running on a Linux Virtual Machine is stuck while startup. For instance a Oracle SOA 12.2.1 weblogic manage server may take upto 12 minutes to startup on a Linux VM. A Weblogic 12C manage Server running nothing may take upto 4 minutes to start. The virtual machines are all equipped with enough CPUs and memory. 


Applies To 


Weblogic 12.1.x,12.2.x
RHEL X86,X86_64 Virtual Machines


Cause


Linux has two devices to provide random data at any time: /dev/random and /dev/urandom. Both ways should be secure enough to use them in generating PGP keys, ssh challenges, and other applications where secure random numbers are required. Starting on kernel 2.6, default entropy is 4096 bits and problem arises when the entropy available on the system is minimum (around 100 bits or less).


How to verify if you are encountering this issue?


1. Check the default system entropy

$ cat /proc/sys/kernel/random/poolsize 
4096

2. Check the available entropy.

$ cat /proc/sys/kernel/random/entropy_avail 
160

3. On previous example, entropy is too low.

Monitor the current entropy of the system by using the following command:


$ for i in $(seq 500); do cat /proc/sys/kernel/random/entropy_avail ; sleep 5; done

4. Start a WebLogic server instance. You should see that entropy decreases or stalls (use script in step 3)

Solution


1. Temporary Solution (Use for testing purpose)

Start the WLS Server with below startup arguments.

-Djava.security.egd=file:/dev/./urandom

Override the JAVA_OPTIONS environment variable before starting WebLogic Server via shell scripts.

export JAVA_OPTIONS="${JAVA_OPTIONS} -Djava.security.egd=file:/dev/./urandom"

Start the Weblogic Server and note the timings!


2. Permanent Solution (Use if Step 1 works)

If the above solution works it is time to setup the fix permanently in the env. The fix can be applied in the JAVA_HOME the Weblogic server refers to.


i.   Edit the Java Security Properties file ($JAVA_HOME/jre/lib/security/java.security)

ii.  The securerandom.source property specifies the source of seed data for secure random.

Change
securerandom.source=file:/dev/random

To

securerandom.source=file:/dev/urandom

iii.  Save changes and start the WebLogic Server instances.

Observation


We could see the startup tiings for Weblogic server improved dramatically. For instance the SOA Manage Server now took only less than 3 minutes as against 12 minutes before the fix was applied! The bare Weblogic manage servers took less tha 20 seconds to start !



References


How to Diagnose a Linux Entropy Issue on WebLogic Server Instances (Doc ID 1574979.1)

Monday, August 15, 2016

Starting OHS 12c via startComponent takes several minutes on a Linux Virtual Machine - SOLVED


I recently started working with Oracle HTTP server on a Linux VM. On trying to start the OHS using the startComponent script I was surprised to observe the startup duration was over 8 minutes! The same took seconds in 11g version of the same software. After few research on metalink I could solve the issue. I am providing the solution below.

Issue Applies To

The issue described and solution offered applied to below Oracle FMW Components.

Oracle HTTP Server 12.2.1 Installed on RHEL 6 Virtual Machine. The same issue/solution applies to OHS 12.1. and later.

Issue Description

Starting from OHS 12C, the OHSservices are monitored/managed by Node Manager. Once the Node manager is started ,the OHS service is started using startComponent script.While trying to do the script gets stuck  for minutes (8 Minutes in my case!)

$DOMAIN_HOME/bin/startComponent.sh ohs1

    Starting system Component ohs1 ...

    Initializing WebLogic Scripting Tool (WLST) ...

    Welcome to WebLogic Server Administration Scripting Shell

    Type help() for help on available commands

    Reading domain from
     Here it just sits for several minutes 
     
    Connecting to Node Manager ...
    Successfully Connected to Node Manager.
    Starting server ohs1 ...

Cause 

The problem is due to random number generation (entropy) on the Linux VM. More on the entropy issues can be found in Metalink note NOTE:1574979.1.The vm was running out of entropy. After changing where Java got its random numbers from, the startup time came down manifold!

Solution 

1) Stop OHS
stopComponent.sh ohs1

2) Backup and edit java.security
$ORACLE_HOME/oracle_common/jdk/jre/lib/security/java.security

3) Change securerandom.source
From:
securerandom.source=file:/dev/urandom
(In 12.2.1 this is securerandom.source=file:/dev/random)
To:
securerandom.source=file:/dev/./urandom

4) Start OHS
startComponent.sh ohs1

Observation

Hurray! The startup time for OHS now takes close to 25 seconds compared to 8 minutes earlier!

References

NOTE:2006106.1 - Starting OHS 12c via startComponent takes several minutes on a Virtual Machine

NOTE:1574979.1 - How to Diagnose a Linux Entropy Issue on WebLogic Server Instances


Saturday, September 6, 2014

URL Redirection on Oracle HTTP Server

An application is supposed to be decommissioned now and a new Application needs to replace it. The new application has a new/different URL. However the business would want the old applications users to face less issues. The business people would like the old application users to be redirected to new Application URL on hitting the old Application URL automatically!


How can this be done? Well this involves a small change in the OHS level. Read below to know :

Old Application URL:
https://learning.oraclefusionfacts.com/welcome/name.jspx

Redirected to New Application URL:
https://focusonfusion.oraclefusionfacts.com/app/name.jspx

On the OHS server where we have the configuration for our old application the HTTP server Virtual Server entry look like this: You will find the below entry in the HTTPD.CONF file!

# Start Virtual Server Settings

NameVirtualHost *:7777

    ServerName https://learning.oraclefusionfacts.com:443
    ServerAdmin siddharth.mishra@oraclefusionfacts.in
    RewriteEngine On
    RewriteOptions inherit

 
    SetHandler weblogic-handler
    WebLogicCluster app.oraclefusionfacts.in:8001
 



# End Virtual Server Settings

In order to REDIRECT the request coming to the Old URL to our new URL make the following changes to the Virtual Host entry on the HTTPD.CONF file as shown below:

# Start Virtual Server Settings

NameVirtualHost *:7777

    ServerName https://learning.oraclefusionfacts.com:443
    Redirect 301 /welcome https://focusonfusion.oraclefusionfacts.com/app/name.jspx
    ServerAdmin siddharth.mishra@oraclefusionfacts.in
    RewriteEngine On
    RewriteOptions inherit

 
    SetHandler weblogic-handler
    WebLogicCluster app.oraclefusionfacts.in:8001
 



# End Virtual Server Settings

After making this change make sure you restart the OHS server using opmnctl HTTP utility!

Now open a new browser session. Make sure you clear the cache for the browser session. in case you do not do it the old URL will not redirect to the new URL! Once done now key in the old URL and Hit Enter. The old URL must automatically change to the new URL !

So go ahead and make your Boss proud !!! Do not forget to mention in comments if this helped you :)

Commands to find out Allocated CPU/Memory on a Solaris Zone

With times changing servers have changed. With the advent of virtualization we have see bigger servers being virtualized into smaller servers and provided to admins for use. Have you heard about Solaris T2/T3/T4 or the newer ones like T4-4 machines. All of those machines have found out acceptance these days in organizations these days. These are mostly CMT servers which offer huge resources in terms of Memory and CPU. The UNIX Admins generally slice and dice these servers into smaller servers known as Zones and allocate certain amount of Memory.CPU to each such zone created out of the master box.

For Admins it might get a bit tricky to find out the Memory/CPU allocated on such zones. I am providing below commands which might help find out the resources assigned to Solaris zones!

For Finding CPU Shares Assigned use below command:

bash-3.00$ prctl -n zone.cpu-shares $$
process: 4352: bash
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
zone.cpu-shares
        privileged         20       -   none                                 -
        system          65.5K     max   none                                 -


For Finding Memory Assigned to the Solaris Zone use below:

bash-3.00$
bash-3.00$ prctl -n zone.max-shm-memory $$
process: 4352: bash
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
zone.max-shm-memory
        privileged      2.73GB      -   deny                                 -
        system          16.0EB    max   deny                                 -

In case you want to be more descriptive with resources assigned to your zone use below command:

prctl $$

Now if this post helped you please say a Hi on the Comments Section Folks !!! Hope it helped :)

Sunday, March 11, 2012

ORA-01438:value larger than specified precision allowed error in SOA 11g logs

Issue:
SQLDataException: ORA-01438 Error Frequently in SOA Log Files (the SOA managed server out and diognostic log file)


Applies to:
Oracle SOA Suite 11.1.1.3.0 and later (11g R1)


What you see in logs:

Error while invoking bean "cube delivery": Exception not handled by the Collaxa Cube system.[[
an unhandled exception has been thrown in the Collaxa Cube systemr; exception reported is: "ORABPEL-00000

Exception not handled by the Collaxa Cube system.
an unhandled exception has been thrown in the Collaxa Cube systemr; exception reported is: "java.sql.SQLDataException: ORA-01438: value larger than specified precision allowed for this column

at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:83)
at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:135)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:210)

Effects:

Degraded performance of SOA Composites. Server spends additional processing power on logging errors, so performance of server is impacted. Also lots of log files are generated which might fill your disk system fast.

Cause:

This is an issue which has been identified by Oracle. The bug 12621337: ORA-01438 ON LIVE_INSTANCES COLUMN

The problem is with the precision of the column COMPOSITE_INSTANCE.LIVE_INSTANCES currently defined to be NUMBER(3) which can at the most hold a value up to 999. As there is no explicit limitation on this column, the correct way to handle this is to increase the precision of this column to accommodate more instances.


Solution:
The workaround is to modify the COMPOSITE_INSTANCE.LIVE_INSTANCES column's precision to be NUMBER(38) in the SOAINFRA schema.
To apply this workaround, follow the steps below:

1. Stop the SOA domain
2. Log in to the SOA infra repository database as SYSDBA
3. Modify the COMPOSITE_INSTANCE table as shown below

ALTER TABLE 
soa11g_soainfra.composite_instance 
MODIFY (live_instances NUMBER(38))
;


In the above sql statement I have assumed the dehydration prefix is soa11g. Please modify the statement according to your prefix.Restart the SOA domain. Check the logs.